Accuracy and Stability of Numerical Algorithms, 2002. ,
DOI : 10.1137/1.9780898718027
The WY Representation for Products of Householder Matrices, SIAM Journal on Scientific and Statistical Computing, vol.8, issue.1, pp.2-13, 1987. ,
DOI : 10.1137/0908009
DAGuE: A generic distributed DAG engine for high performance computing, 2010. ,
Tile QR factorization with parallel panel processing for multicore architectures, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2010. ,
DOI : 10.1109/IPDPS.2010.5470443
URL : https://hal.archives-ouvertes.fr/inria-00548899
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures, Concurrency and Computation: Practice and Experience, 2009. ,
URL : https://hal.archives-ouvertes.fr/inria-00384363
Comparative study of one-sided factorizations with multiple software packages on multi-core hardware, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, 2009. ,
DOI : 10.1145/1654059.1654080
Parallel out-of-core computation and updating of the QR factorization, ACM Transactions on Mathematical Software, vol.31, issue.1, pp.60-78, 2005. ,
DOI : 10.1145/1055531.1055534
Parallel tiled QR factorization for multicore architectures, Concurrency and Computation: Practice and Experience, pp.1573-1590, 2008. ,
Scheduling of QR Factorization Algorithms on SMP and Multi-Core Architectures, 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008), 2008. ,
DOI : 10.1109/PDP.2008.37
Communication-optimal Parallel and Sequential QR and LU Factorizations, SIAM Journal on Scientific Computing, vol.34, issue.1, 2008. ,
DOI : 10.1137/080731992
URL : https://hal.archives-ouvertes.fr/hal-00870930
CULA: hybrid GPU accelerated linear algebra routines, Modeling and Simulation for Defense Systems and Applications V, 2010. ,
DOI : 10.1117/12.850538
Communication-avoiding QR decomposition for GPU, GPU Technology Conference, Research Poster A01, 2010. ,
Retargeting plapack to clusters with hardware accelerators flame working note #42, 2010. ,
A unified HPC environment for hybrid manycore/GPU distributed systems, LAPACK Working Note, 2010. ,
An Extension of the StarSs Programming Model for Platforms with Multiple GPUs, Proceedings of the 15th International Euro-Par Conference on Parallel Processing, pp.851-862, 2009. ,
DOI : 10.1109/TPDS.2003.1214317
Harmony, Proceedings of the 17th international symposium on High performance distributed computing, HPDC '08, pp.197-200, 2008. ,
DOI : 10.1145/1383422.1383447
Sequoia: Programming the Memory Hierarchy, ACM/IEEE SC 2006 Conference (SC'06), 2006. ,
DOI : 10.1109/SC.2006.55
Scaling Hierarchical N-body Simulations on GPU Clusters, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, 2010. ,
DOI : 10.1109/SC.2010.49
Dense linear algebra solvers for multicore with GPU accelerators, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010. ,
DOI : 10.1109/IPDPSW.2010.5470941
Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing, Parallel Computing, vol.36, issue.12, 2010. ,
DOI : 10.1016/j.parco.2010.06.001
An Improved MAGMA GEMM for Fermi GPUs, 2010. ,
Automated empirical optimizations of software and the ATLAS project, Parallel Computing, vol.27, issue.1-2, pp.3-35, 2001. ,
DOI : 10.1016/S0167-8191(00)00087-9
OSKI: A library of automatically tuned sparse matrix kernels, Proc. of SciDAC'05, ser. Journal of Physics: Conference Series, 2005. ,
DOI : 10.1088/1742-6596/16/1/071
QR factorization for the CELL processor Scientific Programming, Special Issue: High Performance Computing with the, Cell Broadband Engine, vol.17, issue.12, pp.31-42, 2009. ,
Scheduling dense linear algebra operations on multicore processors, Concurrency and Computation: Practice and Experience, vol.35, issue.2, pp.15-44, 2009. ,
DOI : 10.1145/1377612.1377615
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.177.3294
Performanceeffective and low-complexity task scheduling for heterogeneous computing Parallel and Distributed Systems, IEEE Transactions on, vol.13, issue.3, pp.260-274, 2002. ,
Data-Aware Task Scheduling on Multi-accelerator Based Platforms, 2010 IEEE 16th International Conference on Parallel and Distributed Systems, 2010. ,
DOI : 10.1109/ICPADS.2010.129
URL : https://hal.archives-ouvertes.fr/inria-00523937