E. Agullo, C. Augonnet, J. Dongarra, M. Faverge, J. Langou et al., LU factorization for acceleratorbased systems, Proceedings of the 9th IEEE/ACS International Conference on Computer Systems and Applications (AICCSA'11), pp.217-224
URL : https://hal.archives-ouvertes.fr/hal-00654193

E. Agullo, J. Demmel, J. Dongarra, B. Hadri, J. Kurzak et al., Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, Journal of Physics: Conference Series, vol.180, issue.11, pp.10-1088, 2009.
DOI : 10.1088/1742-6596/180/1/012037

M. Baboulin, J. Dongarra, J. Herrmann, and S. Tomov, Accelerating Linear System Solutions Using Randomization Techniques, ACM Transactions on Mathematical Software, vol.39, issue.2, pp.1-8, 2013.
DOI : 10.1145/2427023.2427025
URL : https://hal.archives-ouvertes.fr/inria-00593306

R. F. Barrett, T. H. Chan, E. F. D-'azevedo, E. F. Jaeger, K. Wong et al., Complex version of high performance computing LINPACK benchmark (HPL) Concurrency and Computation: Practice and Experience, pp.573-587, 2010.

A. Buttari, J. Dongarra, J. Kurzak, J. Langou, P. Luszczek et al., The Impact of Multicore on Math Software, Applied Parallel Computing. State of the Art in Scientific Computing, 8th International Workshop, pp.1-10, 2006.
DOI : 10.1007/978-3-540-75755-9_1

A. Buttari, J. Langou, J. Kurzak, and J. J. Dongarra, Parallel tiled QR factorization for multicore architectures, Concurrency and Computation: Practice and Experience, vol.21, issue.8, pp.1573-1590, 2008.
DOI : 10.1002/cpe.1301

A. Buttari, J. Langou, J. Kurzak, and J. J. Dongarra, A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing, vol.35, issue.1, pp.38-53, 2009.
DOI : 10.1016/j.parco.2008.10.002

A. M. Castaldo and R. C. Whaley, Scaling LAPACK panel operations using parallel cache assignment, ACM SIG- PLAN Symposium on Principles and Practice of Parallel Programming, PPoPP'10, pp.10-1145, 2010.

J. Demmel, L. Grigori, M. Hoemmen, and J. Langou, Implementing communication-optimal parallel and sequential QR factorizations, Arxiv preprint, 2008.

J. Demmel, Y. Hida, W. Kahan, X. S. Li, S. Mukherjee et al., Error bounds from extra-precise iterative refinement, ACM Transactions on Mathematical Software, vol.32, issue.2, pp.325-351, 2006.
DOI : 10.1145/1141885.1141894

S. Donfack, L. Grigori, and A. Gupta, Adapting communication-avoiding LU and QR factorizations to multicore architectures, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pp.1-10, 2010.
DOI : 10.1109/IPDPS.2010.5470348

J. Dongarra, V. Eijkhout, and P. Luszczek, Recursive Approach in Sparse Matrix LU Factorization, Scientific Programming, vol.9, issue.1, pp.51-60, 2001.
DOI : 10.1155/2001/569670

J. Dongarra, M. Faverge, H. Ltaief, and P. Luszczek, Exploiting fine-grain parallelism in recursive LU factorization, ParCo 2011 ? International Conference on Parallel Computing, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00809755

J. Dongarra, M. Faverge, H. Ltaief, and P. Luszczek, High performance matrix inversion based on LU factorization for multicore architectures, Proceedings of the 2011 ACM international workshop on Many task computing on grids and supercomputers, MTAGS '11, pp.33-42, 2011.
DOI : 10.1145/2132876.2132885
URL : https://hal.archives-ouvertes.fr/hal-00809750

J. Dongarra, M. Faverge, H. Ltaief, and P. Luszczek, Exploiting fine-grain parallelism in recursive LU factorization Advances in Parallel Computing, Special Issue (print), pp.429-436, 2012.

A. Edelman, Large Dense Numerical Linear Algebra in 1993: the Parallel Computing Influence, International Journal of High Performance Computing Applications, vol.7, issue.2, pp.113-128, 1993.
DOI : 10.1177/109434209300700203

L. V. Foster, Gaussian Elimination with Partial Pivoting Can Fail in Practice, SIAM Journal on Matrix Analysis and Applications, vol.15, issue.4, pp.1354-1362, 1994.
DOI : 10.1137/S0895479892239755

J. F. Grcar, Mathematicians of Gaussian elimination, Notices of the AMS, vol.58, issue.6, pp.782-792, 2011.

L. Grigori, J. Demmel, and H. Xiang, Communication Avoiding Gaussian elimination, 2008 SC, International Conference for High Performance Computing, Networking, Storage and Analysis, p.29, 2008.
DOI : 10.1109/SC.2008.5214287
URL : https://hal.archives-ouvertes.fr/inria-00277901

L. Grigori, J. Demmel, and H. Xiang, CALU: A Communication Optimal LU Factorization Algorithm, SIAM Journal on Matrix Analysis and Applications, vol.32, issue.4, pp.1317-1350, 2011.
DOI : 10.1137/100788926
URL : https://hal.archives-ouvertes.fr/hal-00651137

F. G. Gustavson, Recursion leads to automatic variable blocking for dense linear-algebra algorithms, IBM Journal of Research and Development, vol.41, issue.6, pp.737-756, 1997.
DOI : 10.1147/rd.416.0737

F. G. Gustavson, L. Karlsson, and B. Kågström, Parallel and Cache-Efficient In-Place Matrix Storage Format Conversion, ACM Transactions on Mathematical Software, vol.38, issue.3, 2012.
DOI : 10.1145/2168773.2168775

A. Haidar, H. Ltaief, and J. Dongarra, Parallel reduction to condensed forms for symmetric eigenvalue problems using aggregated fine-grained and memory-aware kernels, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, pp.1-8, 2011.
DOI : 10.1145/2063384.2063394

A. Haidar, H. Ltaief, P. Luszczek, and J. Dongarra, A Comprehensive Study of Task Coalescing for Selecting Parallelism Granularity in a Two-Stage Bidiagonal Reduction, 2012 IEEE 26th International Parallel and Distributed Processing Symposium, 2012.
DOI : 10.1109/IPDPS.2012.13

A. Haidar, H. Ltaief, A. Yarkhan, and J. J. Dongarra, Analysis of dynamically scheduled tile algorithms for dense linear algebra on multicore architectures, Concurrency and Computation: Practice and Experience, vol.10, issue.1, 2011.
DOI : 10.1002/cpe.1829

R. Harrington, Origin and development of the method of moments for field computation, IEEE Antennas and Propagation Magazine, vol.32, issue.3, pp.31-35, 1990.
DOI : 10.1109/74.80522

J. L. Hess, Panel Methods in Computational Fluid Dynamics, Annual Review of Fluid Mechanics, vol.22, issue.1, pp.255-274, 1990.
DOI : 10.1146/annurev.fl.22.010190.001351

L. Hess and M. O. Smith, Calculation of potential flow about arbitrary bodies, Progress in Aeronautical Sciences, pp.1-138, 1967.
DOI : 10.1016/0376-0421(67)90003-6

N. J. Higham, E. Jaeger, L. Berry, E. Dâ, ?. Zazevedo et al., Accuracy and Stability of Numerical Algorithms Advances in full-wave modeling of radio frequency heated multidimensional plasmas, SIAM: Society for Industrial and Applied Mathematics Physics of Plasmas, vol.9, issue.5, pp.1873-1881, 2002.

E. Jaeger, L. Berry, J. Myra, D. Batchelor, E. Dâ et al., Sheared Poloidal Flow Driven by Mode Conversion in Tokamak Plasmas, Physical Review Letters, vol.90, issue.19, p.90, 2003.
DOI : 10.1103/PhysRevLett.90.195001

J. Kurzak, H. Ltaief, J. J. Dongarra, and R. M. Badia, Scheduling dense linear algebra operations on multicore processors, Concurrency and Computation: Practice and Experience, vol.35, issue.2, pp.15-44, 2009.
DOI : 10.1145/1377612.1377615

H. Ltaief, J. Kurzak, and J. Dongarra, Parallel band two-sided matrix bidiagonalization for multicore architectures, IEEE Transactions on Parallel and Distributed Systems, vol.21, issue.4, 2010.

H. Ltaief, P. Luszczek, and J. Dongarra, High-performance bidiagonal reduction using tile algorithms on homogeneous multicore architectures, ACM Transactions on Mathematical Software, vol.39, issue.3, p.2013
DOI : 10.1145/2450153.2450154

H. Ltaief, P. Luszczek, A. Haidar, and J. Dongarra, Enhancing Parallelism of Tile Bidiagonal Transformation on Multicore Architectures Using Tree Reduction, 9th International Conference of Parallel Processing and Applied Mathematics, pp.661-670, 2011.
DOI : 10.1007/978-3-642-31464-3_67

P. Luszczek and J. Dongarra, Anatomy of a globally recursive embedded LINPACK benchmark, Proceedings of 2012 IEEE High Performance Extreme Computing Conference Westin Hotel, pp.12-978, 2012.

P. Luszczek, H. Ltaief, and J. Dongarra, Two-Stage Tridiagonal Reduction for Dense Symmetric Matrices Using Tile Algorithms on Multicore Architectures, 2011 IEEE International Parallel & Distributed Processing Symposium, 2011.
DOI : 10.1109/IPDPS.2011.91

C. B. Moler, Iterative Refinement in Floating Point, Journal of the ACM, vol.14, issue.2, pp.316-321, 1967.
DOI : 10.1145/321386.321394

D. S. Parker, A randomizing butterfly transformation useful in block matrix computations, 1995.

G. Quintana-ortí, E. S. Quintana-ortí, R. A. Geijn, F. G. Zee, and E. Chan, Programming matrix algorithms-by-blocks for thread-level parallelism, ACM Transactions on Mathematical Software, vol.36, issue.3, pp.1-1426, 2009.
DOI : 10.1145/1527286.1527288

D. C. Sorensen, Analysis of Pairwise Pivoting in Gaussian Elimination, IEEE Transactions on Computers, vol.34, issue.3, p.34, 1985.
DOI : 10.1109/TC.1985.1676570

G. W. Stewart, Introduction to Matrix Computations, 1973.

L. Trefethen and R. Schreiber, Average-Case Stability of Gaussian Elimination, SIAM Journal on Matrix Analysis and Applications, vol.11, issue.3, pp.335-360, 1990.
DOI : 10.1137/0611023

J. J. Wang, Generalized Moment Methods in Electromagnetics, 1991.

J. H. Wilkinson, The Algebraic Eigenvalue Problem, 1965.

A. Yarkhan, J. Kurzak, and J. Dongarra, QUARK users' guide: QUeueing And Runtime for Kernels, 2011.

M. Yeung and T. F. Chan, Probabilistic analysis of Gaussian elimination without pivoting, 1995.

E. L. Yip, FORTRAN Subroutines for Out-of-Core Solutions of Large Complex Linear Systems, 1979.