N. R. Adiga, G. Almási, G. S. Almasi, Y. Aridor, R. Barik et al., An Overview of the BlueGene/L Supercomputer, ACM/IEEE SC 2002 Conference (SC'02), 2002.
DOI : 10.1109/SC.2002.10017

A. Bhatele and L. V. Kale, Application-specific topology-aware mapping for three dimensional topologies, 2008 IEEE International Symposium on Parallel and Distributed Processing, 2008.
DOI : 10.1109/IPDPS.2008.4536348

G. Bosilca, A. Bouteiller, A. Danalis, T. Herault, P. Lemarinier et al., DAGuE: A generic distributed DAG engine for High Performance Computing, Parallel Computing, vol.38, issue.1-2, pp.37-51, 2012.
DOI : 10.1016/j.parco.2011.10.003

J. Dongarra, M. Faverge, T. Herault, M. Jacquelin, J. Langou et al., Hierarchical QR factorization algorithms for multi-core clusters, Parallel Computing, vol.39, issue.4-5, 2013.
DOI : 10.1016/j.parco.2013.01.003
URL : https://hal.archives-ouvertes.fr/hal-00809770

A. Buttari, J. Langou, J. Kurzak, and J. Dongarra, Parallel tiled QR factorization for multicore architectures, Concurrency and Computation: Practice and Experience, vol.21, issue.8, pp.1573-1590, 2008.
DOI : 10.1002/cpe.1301

A. Buttari, J. Langou, J. Kurzak, and J. Dongarra, A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing, vol.35, issue.1, pp.38-53, 2009.
DOI : 10.1016/j.parco.2008.10.002

G. Quintana-ortí, E. S. Quintana-ortí, R. A. Van-de-geijn, F. G. Zee, and E. Chan, Programming matrix algorithms-by-blocks for thread-level parallelism, ACM Transactions on Mathematical Software, vol.36, issue.3, 2009.
DOI : 10.1145/1527286.1527288

A. Sameh and D. Kuck, On Stable Parallel Linear System Solvers, Journal of the ACM, vol.25, issue.1, pp.81-91, 1978.
DOI : 10.1145/322047.322054

J. Modi and M. Clarke, An alternative givens ordering, Numerische Mathematik, vol.25, issue.1, pp.83-90, 1984.
DOI : 10.1007/BF01389639

A. Pothen and P. Raghavan, Distributed Orthogonal Factorization: Givens and Householder Algorithms, SIAM Journal on Scientific and Statistical Computing, vol.10, issue.6, pp.1113-1134, 1989.
DOI : 10.1137/0910067

R. Da-cunha, D. Becker, and J. Patterson, New Parallel (Rank-Revealing) QR Factorization Algorithms, In: Euro-Par, 2002.
DOI : 10.1007/3-540-45706-2_94

J. W. Demmel, L. Grigori, M. Hoemmen, and J. Langou, Communication-avoiding parallel and sequential QR and LU factorizations: theory and practice, 2008.

J. Langou, Computing the R of the QR factorization of tall and skinny matrices using MPI Reduce, p.arXiv, 2010.

B. Hadri, H. Ltaief, E. Agullo, and J. Dongarra, Tile QR factorization with parallel panel processing for multicore architectures, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2010.
DOI : 10.1109/IPDPS.2010.5470443
URL : https://hal.archives-ouvertes.fr/inria-00548899

H. Bouwmeester, M. Jacquelin, J. Langou, and Y. Robert, Tiled QR factorization algorithms, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, 2011.
DOI : 10.1145/2063384.2063393
URL : https://hal.archives-ouvertes.fr/inria-00585721

M. Cosnard and Y. Robert, Complexity of parallel QR factorization, Journal of the ACM, vol.33, issue.4, pp.712-723, 1986.
DOI : 10.1145/6490.214102
URL : https://hal.archives-ouvertes.fr/hal-00857125

E. Agullo, C. Coti, J. Dongarra, T. Herault, and J. Langou, QR factorization of tall and skinny matrices in a grid computing environment, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2010.
DOI : 10.1109/IPDPS.2010.5470475
URL : https://hal.archives-ouvertes.fr/inria-00548900

F. Song, H. Ltaief, B. Hadri, and J. Dongarra, Scalable Tile Communication-Avoiding QR Factorization on Multicore Cluster Systems, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, 2010.
DOI : 10.1109/SC.2010.48

G. Bosilca, A. Bouteiller, A. Danalis, T. Herault, P. Lemarinier et al., DAGuE: A generic distributed DAG engine for high performance computing, In: HIPS, 2011.

G. Bosilca, A. Bouteiller, A. Danalis, M. Faverge, A. Haidar et al., Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, 2011.
DOI : 10.1109/IPDPS.2011.299

J. Kurzak, P. Luszczek, M. Gates, I. Yamazaki, and J. Dongarra, Virtual Systolic Array for QR Decomposition, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, 2013.
DOI : 10.1109/IPDPS.2013.119

J. J. Dongarra, P. Luszczek, and A. Petitet, The LINPACK Benchmark: past, present and future, Concurrency and Computation: Practice and Experience, vol.38, issue.9, pp.1-18, 2003.
DOI : 10.1002/cpe.728