LAPACK User's Guide, 1999. ,
Comparative study of one-sided factorizations with multiple software packages on multicore hardware, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pp.1-12, 2009. ,
Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, Journal of Physics: Conference Series, vol.180, 2009. ,
DOI : 10.1088/1742-6596/180/1/012037
Two-Stage Tridiagonal Reduction for Dense Symmetric Matrices Using Tile Algorithms on Multicore Architectures, 2011 IEEE International Parallel & Distributed Processing Symposium, 2011. ,
DOI : 10.1109/IPDPS.2011.91
Analysis of Pairwise Pivoting in Gaussian Elimination, IEEE Transactions on Computers, vol.34, issue.3, p.34, 1985. ,
DOI : 10.1109/TC.1985.1676570
A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing, vol.35, issue.1, pp.38-53, 2009. ,
DOI : 10.1016/j.parco.2008.10.002
Programming matrix algorithms-by-blocks for thread-level parallelism, ACM Transactions on Mathematical Software, vol.36, issue.3, pp.141-1426, 2009. ,
DOI : 10.1145/1527286.1527288
LU factorization for accelerator-based systems, 2011 9th IEEE/ACS International Conference on Computer Systems and Applications (AICCSA), 2010. ,
DOI : 10.1109/AICCSA.2011.6126599
URL : https://hal.archives-ouvertes.fr/hal-00654193
Scheduling dense linear algebra operations on multicore processors. Concurrency and Computation: Practice and Experience, pp.15-44, 2010. ,
Scheduling Two-Sided Transformations Using Tile Algorithms on Multicore Architectures, Scientific Programming, vol.18, issue.1, pp.33-50, 2010. ,
DOI : 10.1155/2010/574728
http://www.intel.com/software/products, Math Kernel Library (MKL) ,
New serial and parallel recursive QR factorization algorithms for SMP systems, Proceedings, 1998. ,
DOI : 10.1007/BFb0095328
Recursive Version of LU Decomposition, Revised Papers from the Second International Conference on Numerical Analysis and Its Applications, pp.325-332, 2001. ,
DOI : 10.1007/3-540-45262-1_38
Recursive approach in sparse matrix LU factorization. Sci. Program, pp.51-60, 2001. ,
Communication-efficient parallel dense LU using a 3-dimensional approach, Proceedings of the 10th SIAM Conference on Parallel Processing for Scientific Computing, 2001. ,
The LINPACK Benchmark: past, present and future, Concurrency and Computation: Practice and Experience, vol.38, issue.9, pp.1-18, 2003. ,
DOI : 10.1002/cpe.728
Scaling LAPACK panel operations using Parallel Cache Assignment, Proceedings of the 15th ACM SIG- PLAN symposium on Principles and practice of parallel programming, pp.223-232, 2010. ,
Recursion leads to automatic variable blocking for dense linear-algebra algorithms, IBM Journal of Research and Development, vol.41, issue.6, pp.737-755, 1997. ,
DOI : 10.1147/rd.416.0737
Managing the complexity of lookahead for LU factorization with pivoting, Proceedings of the 22nd ACM symposium on Parallelism in algorithms and architectures, SPAA '10, pp.200-208, 2010. ,
DOI : 10.1145/1810479.1810520
Implementation guide for lapack, 1990. ,
Validity of the single processor approach to achieving large scale computing capabilities, Proceedings of the April 18-20, 1967, spring joint computer conference on, AFIPS '67 (Spring), pp.483-485, 1967. ,
DOI : 10.1145/1465482.1465560
Reevaluating Amdahl's law, Communications of the ACM, vol.31, issue.5, pp.532-533, 1988. ,
DOI : 10.1145/42411.42415
Efficient and practical non-blocking data structures. Department of computer science, 2004. ,
Automatic blocking of QR and LU factorizations for locality, Proceedings of the 2004 workshop on Memory system performance , MSP '04, 2004. ,
DOI : 10.1145/1065895.1065898
Analysis of dynamically scheduled tile algorithms for dense linear algebra on multicore architectures, Concurrency and Computation: Practice and Experience, vol.10, issue.1, 2010. ,
DOI : 10.1002/cpe.1829
On the role of scientific thought Selected writings on Computing: A Personal Perspective, Dijkstra EW, pp.60-66, 1982. ,
Elements of Functional Programming, 1989. ,
Parallel tiled QR factorization for multicore architectures, Concurrency and Computation: Practice and Experience, vol.21, issue.8, pp.1573-1590, 2008. ,
DOI : 10.1002/cpe.1301
A dependency-aware task-based programming environment for multi-core architectures. Cluster Computing, IEEE International Conference on, pp.142-151, 2008. ,
Eztrace: a generic framework for performance analysis, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00587216