A Fully Empirical Autotuned Dense QR Factorization for Multicore Architectures, 2011. ,
DOI : 10.1007/978-3-642-23397-5_19
URL : https://hal.archives-ouvertes.fr/hal-00726654
Comparative study of one-sided factorizations with multiple software packages on multi-core hardware, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, pp.1-12, 2009. ,
DOI : 10.1145/1654059.1654080
Installation guide for LAPACK, 1992. ,
Parallel tiled QR factorization for multicore architectures, Concurrency and Computation: Practice and Experience, vol.21, issue.8, pp.1573-1590, 2008. ,
DOI : 10.1002/cpe.1301
URL : http://arxiv.org/abs/0707.3548
A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing, vol.35, issue.1, pp.38-53, 2009. ,
DOI : 10.1016/j.parco.2008.10.002
Parallel QR decomposition of a rectangular matrix, Numerische Mathematik, vol.1, issue.1, pp.239-249, 1986. ,
DOI : 10.1007/BF01389871
URL : https://hal.archives-ouvertes.fr/hal-00857127
Complexity of parallel QR factorization, Journal of the ACM, vol.33, issue.4, pp.712-723, 1986. ,
DOI : 10.1145/6490.214102
URL : https://hal.archives-ouvertes.fr/hal-00857125
Minimizing communication in sparse matrix solvers, SC'09, the 2009 ACM/IEEE conference on Supercomputing, pp.1-12, 2009. ,
Communicationavoiding parallel and sequential QR and LU factorizations: theory and practice, 2008. ,
DOI : 10.1137/080731992
URL : http://arxiv.org/abs/0808.2664
Enhancing parallelism of tile QR factorization for multicore architectures, 2009. ,
Tile QR factorization with parallel panel processing for multicore architectures, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2010. ,
DOI : 10.1109/IPDPS.2010.5470443
URL : https://hal.archives-ouvertes.fr/inria-00548899
Scheduling dense linear algebra operations on multicore processors. Concurrency and Computation: Practice and Experience, pp.15-44, 2010. ,
DOI : 10.1002/cpe.1467
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.177.3294
An alternative givens ordering, Numerische Mathematik, vol.25, issue.1, pp.83-90, 1984. ,
DOI : 10.1007/BF01389639
Programming matrix algorithms-by-blocks for thread-level parallelism, ACM Transactions on Mathematical Software, vol.36, issue.3, 2009. ,
DOI : 10.1145/1527286.1527288
On Stable Parallel Linear System Solvers, Journal of the ACM, vol.25, issue.1, pp.81-91, 1978. ,
DOI : 10.1145/322047.322054
Achieving accurate and context-sensitive timing for code optimization, Software: Practice and Experience, vol.93, issue.2, pp.1621-1642, 2008. ,
DOI : 10.1002/spe.884
Roofline, Communications of the ACM, vol.52, issue.4, pp.65-76, 2009. ,
DOI : 10.1145/1498765.1498785