Parallel tiled QR factorization for multicore architectures, Concurrency and Computation: Practice and Experience, vol.21, issue.8, pp.1573-1590, 2008. ,
DOI : 10.1002/cpe.1301
URL : http://arxiv.org/abs/0707.3548
Programming matrix algorithms-by-blocks for thread-level parallelism, ACM Transactions on Mathematical Software, vol.36, issue.3, pp.1-26, 2009. ,
DOI : 10.1145/1527286.1527288
Stability of blockLU factorization, Numerical Linear Algebra with Applications, vol.31, issue.2, pp.173-190, 1995. ,
DOI : 10.1002/nla.1680020208
DAGuE: A generic distributed DAG engine for High Performance Computing, Parallel Computing, vol.38, issue.1-2, pp.37-51, 2012. ,
DOI : 10.1016/j.parco.2011.10.003
Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, 2011. ,
DOI : 10.1109/IPDPS.2011.299
Hierarchical QR Factorization Algorithms for Multi-core Cluster Systems, 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pp.4-5, 2013. ,
DOI : 10.1109/IPDPS.2012.62
URL : https://hal.archives-ouvertes.fr/hal-00764022
Efficient algorithms for all-to-all communications in multiport message-passing systems, IEEE Transactions on Parallel and Distributed Systems, vol.8, issue.11, pp.1143-1156, 1997. ,
DOI : 10.1109/71.642949
Accuracy and Stability of Numerical Algorithms, 2002. ,
DOI : 10.1137/1.9780898718027
Strategies for Scaling and Pivoting for Sparse Symmetric Indefinite Problems, SIAM Journal on Matrix Analysis and Applications, vol.27, issue.2, pp.313-340, 2005. ,
DOI : 10.1137/04061043X
Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting, Concurrency and Computation: Practice and Experience, 2013. ,
DOI : 10.1002/cpe.3110
CALU: A Communication Optimal LU Factorization Algorithm, SIAM Journal on Matrix Analysis and Applications, vol.32, issue.4, pp.1317-1350, 2011. ,
DOI : 10.1137/100788926
URL : https://hal.archives-ouvertes.fr/hal-00651137
The LINPACK Benchmark: past, present and future, Concurrency and Computation: Practice and Experience, pp.803-820, 2003. ,
DOI : 10.1002/cpe.728
ScaLAPACK: a portable linear algebra library for distributed memory computers ??? design issues and performance, Computer Physics Communications, vol.97, issue.1-2, pp.1-15, 1996. ,
DOI : 10.1016/0010-4655(96)00017-3
Tiled QR factorization algorithms, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11 ,
DOI : 10.1145/2063384.2063393
URL : https://hal.archives-ouvertes.fr/inria-00585721
Communication-optimal Parallel and Sequential QR and LU Factorizations, SIAM Journal on Scientific Computing, vol.34, issue.1, pp.206-239, 2012. ,
DOI : 10.1137/080731992
URL : https://hal.archives-ouvertes.fr/hal-00870930
Accelerating Linear System Solutions Using Randomization Techniques, ACM Transactions on Mathematical Software, vol.39, issue.2, pp.1-8, 2013. ,
DOI : 10.1145/2427023.2427025
URL : https://hal.archives-ouvertes.fr/inria-00593306
Communication Avoiding Gaussian elimination, 2008 SC, International Conference for High Performance Computing, Networking, Storage and Analysis, 2008. ,
DOI : 10.1109/SC.2008.5214287
URL : https://hal.archives-ouvertes.fr/inria-00277901
On algorithmic variants of parallel Gaussian elimination: Comparison of implementations in terms of performance and numerical properties, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00867837