E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel et al., LAPACK User's Guide, 1999.

E. Agullo, B. Hadri, H. Ltaief, and J. Dongarrra, Comparative study of one-sided factorizations with multiple software packages on multicore hardware, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pp.1-12, 2009.

E. Agullo, J. Demmel, J. Dongarra, B. Hadri, J. Kurzak et al., Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, Journal of Physics: Conference Series, vol.180, 2009.
DOI : 10.1088/1742-6596/180/1/012037

P. Luszczek, H. Ltaief, and J. Dongarra, Two-Stage Tridiagonal Reduction for Dense Symmetric Matrices Using Tile Algorithms on Multicore Architectures, 2011 IEEE International Parallel & Distributed Processing Symposium, 2011.
DOI : 10.1109/IPDPS.2011.91

D. Sorensen, Analysis of Pairwise Pivoting in Gaussian Elimination, IEEE Transactions on Computers, vol.34, issue.3, p.34, 1985.
DOI : 10.1109/TC.1985.1676570

A. Buttari, J. Langou, J. Kurzak, and J. Dongarra, A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing, vol.35, issue.1, pp.38-53, 2009.
DOI : 10.1016/j.parco.2008.10.002

G. Quintana-ortí, E. Quintana-ortí, R. Geijn, F. Zee, and E. Chan, Programming matrix algorithms-by-blocks for thread-level parallelism, ACM Transactions on Mathematical Software, vol.36, issue.3, pp.141-1426, 2009.
DOI : 10.1145/1527286.1527288

E. Agullo, C. Augonnet, J. Dongarra, M. Faverge, J. Langou et al., LU factorization for accelerator-based systems, 2011 9th IEEE/ACS International Conference on Computer Systems and Applications (AICCSA), 2010.
DOI : 10.1109/AICCSA.2011.6126599

URL : https://hal.archives-ouvertes.fr/hal-00654193

J. Kurzak, H. Ltaief, J. Dongarra, and R. Badia, Scheduling dense linear algebra operations on multicore processors. Concurrency and Computation: Practice and Experience, pp.15-44, 2010.

H. Ltaief, J. Kurzak, J. Dongarra, and R. Badia, Scheduling Two-Sided Transformations Using Tile Algorithms on Multicore Architectures, Scientific Programming, vol.18, issue.1, pp.33-50, 2010.
DOI : 10.1155/2010/574728

. Intel, http://www.intel.com/software/products, Math Kernel Library (MKL)

E. Elmroth and F. Gustavson, New serial and parallel recursive QR factorization algorithms for SMP systems, Proceedings, 1998.
DOI : 10.1007/BFb0095328

K. Georgiev and J. Wasniewski, Recursive Version of LU Decomposition, Revised Papers from the Second International Conference on Numerical Analysis and Its Applications, pp.325-332, 2001.
DOI : 10.1007/3-540-45262-1_38

J. Dongarra, V. Eijkhout, and P. Luszczek, Recursive approach in sparse matrix LU factorization. Sci. Program, pp.51-60, 2001.

D. Irony and S. Toledo, Communication-efficient parallel dense LU using a 3-dimensional approach, Proceedings of the 10th SIAM Conference on Parallel Processing for Scientific Computing, 2001.

J. Dongarra, P. Luszczek, and A. Petitet, The LINPACK Benchmark: past, present and future, Concurrency and Computation: Practice and Experience, vol.38, issue.9, pp.1-18, 2003.
DOI : 10.1002/cpe.728

A. Castaldo and R. Whaley, Scaling LAPACK panel operations using Parallel Cache Assignment, Proceedings of the 15th ACM SIG- PLAN symposium on Principles and practice of parallel programming, pp.223-232, 2010.

F. Gustavson, Recursion leads to automatic variable blocking for dense linear-algebra algorithms, IBM Journal of Research and Development, vol.41, issue.6, pp.737-755, 1997.
DOI : 10.1147/rd.416.0737

E. Chan, R. Van-de-geijn, and A. Chapman, Managing the complexity of lookahead for LU factorization with pivoting, Proceedings of the 22nd ACM symposium on Parallelism in algorithms and architectures, SPAA '10, pp.200-208, 2010.
DOI : 10.1145/1810479.1810520

E. Anderson and J. Dongarra, Implementation guide for lapack, 1990.

G. Amdahl, Validity of the single processor approach to achieving large scale computing capabilities, Proceedings of the April 18-20, 1967, spring joint computer conference on, AFIPS '67 (Spring), pp.483-485, 1967.
DOI : 10.1145/1465482.1465560

J. Gustafson, Reevaluating Amdahl's law, Communications of the ACM, vol.31, issue.5, pp.532-533, 1988.
DOI : 10.1145/42411.42415

H. Sundell, Efficient and practical non-blocking data structures. Department of computer science, 2004.

Q. Yi, K. Kennedy, H. You, K. Seymour, and J. Dongarra, Automatic blocking of QR and LU factorizations for locality, Proceedings of the 2004 workshop on Memory system performance , MSP '04, 2004.
DOI : 10.1145/1065895.1065898

A. Haidar, H. Ltaief, A. Yarkhan, and J. Dongarra, Analysis of dynamically scheduled tile algorithms for dense linear algebra on multicore architectures, Concurrency and Computation: Practice and Experience, vol.10, issue.1, 2010.
DOI : 10.1002/cpe.1829

E. Dijkstra, On the role of scientific thought Selected writings on Computing: A Personal Perspective, Dijkstra EW, pp.60-66, 1982.

C. Reade, Elements of Functional Programming, 1989.

A. Buttari, J. Langou, J. Kurzak, and J. Dongarra, Parallel tiled QR factorization for multicore architectures, Concurrency and Computation: Practice and Experience, vol.21, issue.8, pp.1573-1590, 2008.
DOI : 10.1002/cpe.1301

J. Perez, R. Badia, and J. Labarta, A dependency-aware task-based programming environment for multi-core architectures. Cluster Computing, IEEE International Conference on, pp.142-151, 2008.

J. Dongarra, M. Faverge, Y. Ishikawa, R. Namyst, F. Rue et al., Eztrace: a generic framework for performance analysis, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00587216