E. Agullo, J. Dongarra, R. Nath, and S. Tomov, A Fully Empirical Autotuned Dense QR Factorization for Multicore Architectures, 2011.
DOI : 10.1007/978-3-642-23397-5_19

URL : https://hal.archives-ouvertes.fr/hal-00726654

E. Agullo, B. Hadri, H. Ltaief, and J. Dongarra, Comparative study of one-sided factorizations with multiple software packages on multi-core hardware, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, pp.1-12, 2009.
DOI : 10.1145/1654059.1654080

S. Blackford and J. J. Dongarra, Installation guide for LAPACK, 1992.

A. Buttari, J. Langou, J. Kurzak, and J. Dongarra, Parallel tiled QR factorization for multicore architectures, Concurrency and Computation: Practice and Experience, vol.21, issue.8, pp.1573-1590, 2008.
DOI : 10.1002/cpe.1301

URL : http://arxiv.org/abs/0707.3548

A. Buttari, J. Langou, J. Kurzak, and J. Dongarra, A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing, vol.35, issue.1, pp.38-53, 2009.
DOI : 10.1016/j.parco.2008.10.002

M. Cosnard, J. Muller, and Y. Robert, Parallel QR decomposition of a rectangular matrix, Numerische Mathematik, vol.1, issue.1, pp.239-249, 1986.
DOI : 10.1007/BF01389871

URL : https://hal.archives-ouvertes.fr/hal-00857127

M. Cosnard and Y. Robert, Complexity of parallel QR factorization, Journal of the ACM, vol.33, issue.4, pp.712-723, 1986.
DOI : 10.1145/6490.214102

URL : https://hal.archives-ouvertes.fr/hal-00857125

J. Demmel, M. Hoemmen, M. Mohiyuddin, and K. Yelick, Minimizing communication in sparse matrix solvers, SC'09, the 2009 ACM/IEEE conference on Supercomputing, pp.1-12, 2009.

J. W. Demmel, L. Grigori, M. Hoemmen, and J. Langou, Communicationavoiding parallel and sequential QR and LU factorizations: theory and practice, 2008.
DOI : 10.1137/080731992

URL : http://arxiv.org/abs/0808.2664

B. Hadri, H. Ltaief, E. Agullo, and J. Dongarra, Enhancing parallelism of tile QR factorization for multicore architectures, 2009.

B. Hadri, H. Ltaief, E. Agullo, and J. Dongarra, Tile QR factorization with parallel panel processing for multicore architectures, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2010.
DOI : 10.1109/IPDPS.2010.5470443

URL : https://hal.archives-ouvertes.fr/inria-00548899

J. Kurzak, H. Ltaief, J. Dongarra, and R. M. Badia, Scheduling dense linear algebra operations on multicore processors. Concurrency and Computation: Practice and Experience, pp.15-44, 2010.
DOI : 10.1002/cpe.1467

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

J. Modi and M. Clarke, An alternative givens ordering, Numerische Mathematik, vol.25, issue.1, pp.83-90, 1984.
DOI : 10.1007/BF01389639

G. Quintana-ortí, E. S. Quintana-ortí, R. A. Van-de-geijn, F. G. Zee, and E. Chan, Programming matrix algorithms-by-blocks for thread-level parallelism, ACM Transactions on Mathematical Software, vol.36, issue.3, 2009.
DOI : 10.1145/1527286.1527288

A. Sameh and D. Kuck, On Stable Parallel Linear System Solvers, Journal of the ACM, vol.25, issue.1, pp.81-91, 1978.
DOI : 10.1145/322047.322054

R. C. Whaley and A. M. Castaldo, Achieving accurate and context-sensitive timing for code optimization, Software: Practice and Experience, vol.93, issue.2, pp.1621-1642, 2008.
DOI : 10.1002/spe.884

S. Williams, A. Waterman, and D. Patterson, Roofline, Communications of the ACM, vol.52, issue.4, pp.65-76, 2009.
DOI : 10.1145/1498765.1498785