E. Anderson and J. Dongarra, Evaluating block algorithm variants in LAPACK, LAPACK Working Note #19), 1990.

T. E. Anderson, E. D. Lazowska, and H. M. Levy, The performance implications of thread management alternatives for shared-memory multiprocessors, IEEE Transactions on Computers, vol.38, issue.12, pp.1631-1644, 1989.
DOI : 10.1109/12.40843

M. Baboulin, J. Demmel, J. Dongarra, S. Tomov, and V. Volkov, Enhancing the performance of dense linear algebra solvers on GPUs in the MAGMA project, 2008.

M. Baboulin, S. Donfack, J. Dongarra, L. Grigori, A. Rémy et al., A Class of Communication-avoiding Algorithms for Solving General Dense Linear Systems on CPU/GPU Parallel Machines, International Conference on Computational Science of Procedia Computer Science, pp.17-26, 2012.
DOI : 10.1016/j.procs.2012.04.003

URL : https://hal.archives-ouvertes.fr/hal-00656457

M. Baboulin, J. Dongarra, J. Herrmann, and S. Tomov, Accelerating Linear System Solutions Using Randomization Techniques, ACM Transactions on Mathematical Software, vol.39, issue.2, p.2013
DOI : 10.1145/2427023.2427025

URL : https://hal.archives-ouvertes.fr/inria-00593306

M. Baboulin, J. Dongarra, and S. Tomov, Some issues in dense linear algebra for multicore and special purpose architectures, 9th International Workshop on State-of-the-Art in Scientific and Parallel Computing (PARA'08), volume 6126-6127 of Lecture Notes in Computer Science, 2008.

F. Bellosa and M. Steckermeier, The Performance Implications of Locality Information Usage in Shared-Memory Multiprocessors, Journal of Parallel and Distributed Computing, vol.37, issue.1, pp.113-121, 1996.
DOI : 10.1006/jpdc.1996.0112

S. Browne, J. Dongarra, N. Garner, G. Ho, and P. Mucci, A Portable Programming Interface for Performance Evaluation on Modern Processors, International Journal of High Performance Computing Applications, vol.14, issue.3, pp.189-204, 2000.
DOI : 10.1177/109434200001400303

S. Donfack, J. Dongarra, M. Faverge, M. Gates, J. Kurzak et al., On algorithmic variants of parallel gaussian elimination: Comparison of implementations in terms of performance and numerical properties
URL : https://hal.archives-ouvertes.fr/hal-00867837

S. Donfack, L. Grigori, and A. K. Gupta, Adapting communication-avoiding LU and QR factorizations to multicore architectures, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pp.1-10, 2010.
DOI : 10.1109/IPDPS.2010.5470348

J. Dongarra, I. Duff, D. Sorensen, H. Van, and . Vorst, Numerical Linear Algebra for High-Performance Computers, 1998.
DOI : 10.1137/1.9780898719611

J. Gonzàlez-domìnguez, G. L. Taboada, B. B. Fraguela, M. J. Martín, and J. Touriño, Automatic mapping of parallel applications on multicore architectures using the Servet benchmark suite, Computers & Electrical Engineering, vol.38, issue.2, pp.258-269, 2012.
DOI : 10.1016/j.compeleceng.2011.12.007

L. Grigori, J. W. Demmel, and H. Xiang, Communication Avoiding Gaussian elimination, 2008 SC, International Conference for High Performance Computing, Networking, Storage and Analysis, 2008.
DOI : 10.1109/SC.2008.5214287

URL : https://hal.archives-ouvertes.fr/inria-00277901

G. Hager and G. Wellein, Introduction to High Performance Computing for Scientists and Engineers, 2011.
DOI : 10.1201/EBK1439811924

H. Corporation, Intel Corporation, Microsoft Corporation, Phoenix Technologies Ltd., Toshiba Corporation, ADVANCED CONFIGURATION AND POWER INTERFACE SPECIFICATION, vol.4, 2010.

. Intel, Math Kernel Library (MKL) http://www.intel.com/software/products

R. Iyer, H. Wang, and L. N. Bhuyan, Design and analysis of static memory management policies for CC-NUMA multiprocessors, Journal of Systems Architecture, vol.48, issue.1-3, pp.59-80, 2002.
DOI : 10.1016/S1383-7621(02)00066-8

A. Kleen, A numa api for linux, Novel Inc, 2004.

J. Kurzak and J. Dongarra, Implementing Linear Algebra Routines on Multi-core Processors with Pipelining and a Look Ahead, LAPACK Working Note, vol.178, 2006.
DOI : 10.1007/978-3-540-75755-9_18

C. Lameter, Local and remote memory: Memory in a linux/numa system, Linux Symposium (OLS2006), 2006.

A. Srinivasa and M. Sosonkina, Nonuniform memory affinity strategy in multithreaded sparse matrix computations, Proceedings of the 2012 Symposium on High Performance Computing, pp.1-9, 2012.

A. Srinivasa, M. Sosonkina, P. Maris, J. P. Vary, S. Tomov et al., Effcient shared-array accesses in ab initio nuclear structure calculations on multicore architectures Towards dense linear algebra for hybrid GPU accelerated manycore systems, Procedia CS Parallel Computing, vol.9, issue.5&6, pp.256-265, 2010.

J. Treibig, G. Hager, and G. Wellein, LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments, 2010 39th International Conference on Parallel Processing Workshops, 2010.
DOI : 10.1109/ICPPW.2010.38