Evaluating block algorithm variants in LAPACK, LAPACK Working Note #19), 1990. ,
The performance implications of thread management alternatives for shared-memory multiprocessors, IEEE Transactions on Computers, vol.38, issue.12, pp.1631-1644, 1989. ,
DOI : 10.1109/12.40843
Enhancing the performance of dense linear algebra solvers on GPUs in the MAGMA project, 2008. ,
A Class of Communication-avoiding Algorithms for Solving General Dense Linear Systems on CPU/GPU Parallel Machines, International Conference on Computational Science of Procedia Computer Science, pp.17-26, 2012. ,
DOI : 10.1016/j.procs.2012.04.003
URL : https://hal.archives-ouvertes.fr/hal-00656457
Accelerating Linear System Solutions Using Randomization Techniques, ACM Transactions on Mathematical Software, vol.39, issue.2, p.2013 ,
DOI : 10.1145/2427023.2427025
URL : https://hal.archives-ouvertes.fr/inria-00593306
Some issues in dense linear algebra for multicore and special purpose architectures, 9th International Workshop on State-of-the-Art in Scientific and Parallel Computing (PARA'08), volume 6126-6127 of Lecture Notes in Computer Science, 2008. ,
The Performance Implications of Locality Information Usage in Shared-Memory Multiprocessors, Journal of Parallel and Distributed Computing, vol.37, issue.1, pp.113-121, 1996. ,
DOI : 10.1006/jpdc.1996.0112
A Portable Programming Interface for Performance Evaluation on Modern Processors, International Journal of High Performance Computing Applications, vol.14, issue.3, pp.189-204, 2000. ,
DOI : 10.1177/109434200001400303
On algorithmic variants of parallel gaussian elimination: Comparison of implementations in terms of performance and numerical properties ,
URL : https://hal.archives-ouvertes.fr/hal-00867837
Adapting communication-avoiding LU and QR factorizations to multicore architectures, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pp.1-10, 2010. ,
DOI : 10.1109/IPDPS.2010.5470348
Numerical Linear Algebra for High-Performance Computers, 1998. ,
DOI : 10.1137/1.9780898719611
Automatic mapping of parallel applications on multicore architectures using the Servet benchmark suite, Computers & Electrical Engineering, vol.38, issue.2, pp.258-269, 2012. ,
DOI : 10.1016/j.compeleceng.2011.12.007
Communication Avoiding Gaussian elimination, 2008 SC, International Conference for High Performance Computing, Networking, Storage and Analysis, 2008. ,
DOI : 10.1109/SC.2008.5214287
URL : https://hal.archives-ouvertes.fr/inria-00277901
Introduction to High Performance Computing for Scientists and Engineers, 2011. ,
DOI : 10.1201/EBK1439811924
Intel Corporation, Microsoft Corporation, Phoenix Technologies Ltd., Toshiba Corporation, ADVANCED CONFIGURATION AND POWER INTERFACE SPECIFICATION, vol.4, 2010. ,
Math Kernel Library (MKL) http://www.intel.com/software/products ,
Design and analysis of static memory management policies for CC-NUMA multiprocessors, Journal of Systems Architecture, vol.48, issue.1-3, pp.59-80, 2002. ,
DOI : 10.1016/S1383-7621(02)00066-8
A numa api for linux, Novel Inc, 2004. ,
Implementing Linear Algebra Routines on Multi-core Processors with Pipelining and a Look Ahead, LAPACK Working Note, vol.178, 2006. ,
DOI : 10.1007/978-3-540-75755-9_18
Local and remote memory: Memory in a linux/numa system, Linux Symposium (OLS2006), 2006. ,
Nonuniform memory affinity strategy in multithreaded sparse matrix computations, Proceedings of the 2012 Symposium on High Performance Computing, pp.1-9, 2012. ,
Effcient shared-array accesses in ab initio nuclear structure calculations on multicore architectures Towards dense linear algebra for hybrid GPU accelerated manycore systems, Procedia CS Parallel Computing, vol.9, issue.5&6, pp.256-265, 2010. ,
LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments, 2010 39th International Conference on Parallel Processing Workshops, 2010. ,
DOI : 10.1109/ICPPW.2010.38