LU factorization for acceleratorbased systems, Proceedings of the 9th IEEE/ACS International Conference on Computer Systems and Applications (AICCSA'11), pp.217-224 ,
URL : https://hal.archives-ouvertes.fr/hal-00654193
Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, Journal of Physics: Conference Series, vol.180, issue.11, pp.10-1088, 2009. ,
DOI : 10.1088/1742-6596/180/1/012037
Accelerating Linear System Solutions Using Randomization Techniques, ACM Transactions on Mathematical Software, vol.39, issue.2, pp.1-8, 2013. ,
DOI : 10.1145/2427023.2427025
URL : https://hal.archives-ouvertes.fr/inria-00593306
Complex version of high performance computing LINPACK benchmark (HPL) Concurrency and Computation: Practice and Experience, pp.573-587, 2010. ,
The Impact of Multicore on Math Software, Applied Parallel Computing. State of the Art in Scientific Computing, 8th International Workshop, pp.1-10, 2006. ,
DOI : 10.1007/978-3-540-75755-9_1
Parallel tiled QR factorization for multicore architectures, Concurrency and Computation: Practice and Experience, vol.21, issue.8, pp.1573-1590, 2008. ,
DOI : 10.1002/cpe.1301
A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing, vol.35, issue.1, pp.38-53, 2009. ,
DOI : 10.1016/j.parco.2008.10.002
Scaling LAPACK panel operations using parallel cache assignment, ACM SIG- PLAN Symposium on Principles and Practice of Parallel Programming, PPoPP'10, pp.10-1145, 2010. ,
Implementing communication-optimal parallel and sequential QR factorizations, Arxiv preprint, 2008. ,
Error bounds from extra-precise iterative refinement, ACM Transactions on Mathematical Software, vol.32, issue.2, pp.325-351, 2006. ,
DOI : 10.1145/1141885.1141894
Adapting communication-avoiding LU and QR factorizations to multicore architectures, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pp.1-10, 2010. ,
DOI : 10.1109/IPDPS.2010.5470348
Recursive Approach in Sparse Matrix LU Factorization, Scientific Programming, vol.9, issue.1, pp.51-60, 2001. ,
DOI : 10.1155/2001/569670
Exploiting fine-grain parallelism in recursive LU factorization, ParCo 2011 ? International Conference on Parallel Computing, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00809755
High performance matrix inversion based on LU factorization for multicore architectures, Proceedings of the 2011 ACM international workshop on Many task computing on grids and supercomputers, MTAGS '11, pp.33-42, 2011. ,
DOI : 10.1145/2132876.2132885
URL : https://hal.archives-ouvertes.fr/hal-00809750
Exploiting fine-grain parallelism in recursive LU factorization Advances in Parallel Computing, Special Issue (print), pp.429-436, 2012. ,
Large Dense Numerical Linear Algebra in 1993: the Parallel Computing Influence, International Journal of High Performance Computing Applications, vol.7, issue.2, pp.113-128, 1993. ,
DOI : 10.1177/109434209300700203
Gaussian Elimination with Partial Pivoting Can Fail in Practice, SIAM Journal on Matrix Analysis and Applications, vol.15, issue.4, pp.1354-1362, 1994. ,
DOI : 10.1137/S0895479892239755
Mathematicians of Gaussian elimination, Notices of the AMS, vol.58, issue.6, pp.782-792, 2011. ,
Communication Avoiding Gaussian elimination, 2008 SC, International Conference for High Performance Computing, Networking, Storage and Analysis, p.29, 2008. ,
DOI : 10.1109/SC.2008.5214287
URL : https://hal.archives-ouvertes.fr/inria-00277901
CALU: A Communication Optimal LU Factorization Algorithm, SIAM Journal on Matrix Analysis and Applications, vol.32, issue.4, pp.1317-1350, 2011. ,
DOI : 10.1137/100788926
URL : https://hal.archives-ouvertes.fr/hal-00651137
Recursion leads to automatic variable blocking for dense linear-algebra algorithms, IBM Journal of Research and Development, vol.41, issue.6, pp.737-756, 1997. ,
DOI : 10.1147/rd.416.0737
Parallel and Cache-Efficient In-Place Matrix Storage Format Conversion, ACM Transactions on Mathematical Software, vol.38, issue.3, 2012. ,
DOI : 10.1145/2168773.2168775
Parallel reduction to condensed forms for symmetric eigenvalue problems using aggregated fine-grained and memory-aware kernels, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, pp.1-8, 2011. ,
DOI : 10.1145/2063384.2063394
A Comprehensive Study of Task Coalescing for Selecting Parallelism Granularity in a Two-Stage Bidiagonal Reduction, 2012 IEEE 26th International Parallel and Distributed Processing Symposium, 2012. ,
DOI : 10.1109/IPDPS.2012.13
Analysis of dynamically scheduled tile algorithms for dense linear algebra on multicore architectures, Concurrency and Computation: Practice and Experience, vol.10, issue.1, 2011. ,
DOI : 10.1002/cpe.1829
Origin and development of the method of moments for field computation, IEEE Antennas and Propagation Magazine, vol.32, issue.3, pp.31-35, 1990. ,
DOI : 10.1109/74.80522
Panel Methods in Computational Fluid Dynamics, Annual Review of Fluid Mechanics, vol.22, issue.1, pp.255-274, 1990. ,
DOI : 10.1146/annurev.fl.22.010190.001351
Calculation of potential flow about arbitrary bodies, Progress in Aeronautical Sciences, pp.1-138, 1967. ,
DOI : 10.1016/0376-0421(67)90003-6
Accuracy and Stability of Numerical Algorithms Advances in full-wave modeling of radio frequency heated multidimensional plasmas, SIAM: Society for Industrial and Applied Mathematics Physics of Plasmas, vol.9, issue.5, pp.1873-1881, 2002. ,
Sheared Poloidal Flow Driven by Mode Conversion in Tokamak Plasmas, Physical Review Letters, vol.90, issue.19, p.90, 2003. ,
DOI : 10.1103/PhysRevLett.90.195001
Scheduling dense linear algebra operations on multicore processors, Concurrency and Computation: Practice and Experience, vol.35, issue.2, pp.15-44, 2009. ,
DOI : 10.1145/1377612.1377615
Parallel band two-sided matrix bidiagonalization for multicore architectures, IEEE Transactions on Parallel and Distributed Systems, vol.21, issue.4, 2010. ,
High-performance bidiagonal reduction using tile algorithms on homogeneous multicore architectures, ACM Transactions on Mathematical Software, vol.39, issue.3, p.2013 ,
DOI : 10.1145/2450153.2450154
Enhancing Parallelism of Tile Bidiagonal Transformation on Multicore Architectures Using Tree Reduction, 9th International Conference of Parallel Processing and Applied Mathematics, pp.661-670, 2011. ,
DOI : 10.1007/978-3-642-31464-3_67
Anatomy of a globally recursive embedded LINPACK benchmark, Proceedings of 2012 IEEE High Performance Extreme Computing Conference Westin Hotel, pp.12-978, 2012. ,
Two-Stage Tridiagonal Reduction for Dense Symmetric Matrices Using Tile Algorithms on Multicore Architectures, 2011 IEEE International Parallel & Distributed Processing Symposium, 2011. ,
DOI : 10.1109/IPDPS.2011.91
Iterative Refinement in Floating Point, Journal of the ACM, vol.14, issue.2, pp.316-321, 1967. ,
DOI : 10.1145/321386.321394
A randomizing butterfly transformation useful in block matrix computations, 1995. ,
Programming matrix algorithms-by-blocks for thread-level parallelism, ACM Transactions on Mathematical Software, vol.36, issue.3, pp.1-1426, 2009. ,
DOI : 10.1145/1527286.1527288
Analysis of Pairwise Pivoting in Gaussian Elimination, IEEE Transactions on Computers, vol.34, issue.3, p.34, 1985. ,
DOI : 10.1109/TC.1985.1676570
Introduction to Matrix Computations, 1973. ,
Average-Case Stability of Gaussian Elimination, SIAM Journal on Matrix Analysis and Applications, vol.11, issue.3, pp.335-360, 1990. ,
DOI : 10.1137/0611023
Generalized Moment Methods in Electromagnetics, 1991. ,
The Algebraic Eigenvalue Problem, 1965. ,
QUARK users' guide: QUeueing And Runtime for Kernels, 2011. ,
Probabilistic analysis of Gaussian elimination without pivoting, 1995. ,
FORTRAN Subroutines for Out-of-Core Solutions of Large Complex Linear Systems, 1979. ,