DAGuE: A generic distributed DAG engine for high performance computing, 2010. ,
Matrix algorithms, Society for Industrial and Applied Mathematics, 2001. ,
Parallel tiled QR factorization for multicore architectures, Concurrency and Computation: Practice and Experience, vol.21, issue.8, pp.1573-1590, 2008. ,
DOI : 10.1002/cpe.1301
A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing, vol.35, issue.1, pp.38-53, 2009. ,
DOI : 10.1016/j.parco.2008.10.002
Updating an LU Factorization with Pivoting, ACM Transactions on Mathematical Software, vol.35, issue.2, 2008. ,
DOI : 10.1145/1377612.1377615
Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization, IEEE Transactions on Parallel and Distributed Systems, vol.19, issue.9, pp.1175-1186, 2008. ,
DOI : 10.1109/TPDS.2007.70813
QR factorization for the CELL processor, Scientific Programming, pp.31-42, 2009. ,
Recursion leads to automatic variable blocking for dense linear-algebra algorithms, IBM Journal of Research and Development, vol.41, issue.6, pp.737-756, 1997. ,
DOI : 10.1147/rd.416.0737
New Generalized Matrix Data Structures Lead to a Variety of High-Performance Algorithms, Proceedings of the IFIP TC2/WG2.5 Working Conference on the Architecture of Scientific Software, pp.211-234, 2000. ,
DOI : 10.1007/978-0-387-35407-1_13
Minimal Data Copy for Dense Linear Algebra Factorization, Applied Parallel Computing, State of the Art in Scientific Computing, 8th International Workshop, pp.540-549, 2006. ,
DOI : 10.1007/978-3-540-75755-9_66
Recursive Blocked Algorithms and Hybrid Data Structures for Dense Matrix Library Software, SIAM Review, vol.46, issue.1, pp.3-45, 2004. ,
DOI : 10.1137/S0036144503428693
Automatic task graph generation techniques, HICSS '95: Proceedings of the 28th Hawaii International Conference on System Sciences, p.113, 1995. ,
Compact DAG Representation and Its Dynamic Scheduling, Journal of Parallel and Distributed Computing, vol.58, issue.3, pp.487-514, 1999. ,
DOI : 10.1006/jpdc.1999.1566
URL : https://hal.archives-ouvertes.fr/inria-00098841
Compact DAG representation and its symbolic scheduling, Journal of Parallel and Distributed Computing, vol.64, issue.8, pp.921-935, 2004. ,
DOI : 10.1016/j.jpdc.2004.05.001
URL : https://hal.archives-ouvertes.fr/inria-00099958
Dynamic task scheduling for linear algebra algorithms on distributedmemory multicore systems, SC '09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pp.1-11, 2009. ,
Static and dynamic scheduling for effective use of multicore systems, ScaLAPACK Users' Guide. SIAM, 1997. ,
Basic Linear Algebra Subprograms for Fortran Usage, ACM Transactions on Mathematical Software, vol.5, issue.3, pp.308-323, 1979. ,
DOI : 10.1145/355841.355847
A set of level 3 basic linear algebra subprograms, ACM Transactions on Mathematical Software, vol.16, issue.1, pp.1-17, 1990. ,
DOI : 10.1145/77626.79170
The WY Representation for Products of Householder Matrices, SIAM Journal on Scientific and Statistical Computing, vol.8, issue.1, pp.2-13, 1987. ,
DOI : 10.1137/0908009
A Storage-Efficient $WY$ Representation for Products of Householder Transformations, SIAM Journal on Scientific and Statistical Computing, vol.10, issue.1, pp.53-57, 1991. ,
DOI : 10.1137/0910005
Parallel out-of-core computation and updating of the QR factorization, ACM Transactions on Mathematical Software, vol.31, issue.1, pp.60-78, 2005. ,
DOI : 10.1145/1055531.1055534
Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures, Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures , SPAA '07, pp.116-125, 2007. ,
DOI : 10.1145/1248377.1248397
Communication-optimal Parallel and Sequential QR and LU Factorizations, SIAM Journal on Scientific Computing, vol.34, issue.1, 2008. ,
DOI : 10.1137/080731992
URL : https://hal.archives-ouvertes.fr/hal-00870930
Numerical Linear Algebra for High-Performance Computers, 1998. ,
DOI : 10.1137/1.9780898719611
Self-adapting software for numerical linear algebra and LAPACK for clusters, Parallel Computing, vol.29, issue.11-12, pp.1723-1743, 2003. ,
DOI : 10.1016/j.parco.2003.05.014
Distributed SBP Cholesky factorization algorithms with near-optimal scheduling, ACM Transactions on Mathematical Software, vol.36, issue.2, pp.1-25, 2009. ,
DOI : 10.1145/1499096.1499100