A Hybridization Methodology for High-Performance Linear Algebra Software for GPUs, GPU Computing Gems, issue.2, pp.473-484, 2011. ,
DOI : 10.1016/B978-0-12-385963-1.00034-4
Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, Journal of Physics: Conference Series, vol.180, issue.1, 2009. ,
DOI : 10.1088/1742-6596/180/1/012037
Are Static Schedules so Bad? A Case Study on Cholesky Factorization, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2016. ,
DOI : 10.1109/IPDPS.2016.90
URL : https://hal.archives-ouvertes.fr/hal-01223573
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures, Concurrency and Computation: Practice and Experience, vol.23, issue.4, pp.187-198, 2011. ,
DOI : 10.1002/cpe.1631
URL : https://hal.archives-ouvertes.fr/inria-00384363
An Extension of the StarSs Programming Model for Platforms with Multiple GPUs, pp.851-862, 2009. ,
DOI : 10.1109/TPDS.2003.1214317
DAGuE: A generic distributed DAG engine for High Performance Computing, Parallel Computing, vol.38, issue.1-2, pp.37-51, 2012. ,
DOI : 10.1016/j.parco.2011.10.003
Dense linear algebra on distributed heterogeneous hardware with a symbolic dag approach, Scalable Computing and Communications: Theory and Practice, 2013. ,
hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, pp.180-186, 2010. ,
DOI : 10.1109/PDP.2010.67
URL : https://hal.archives-ouvertes.fr/inria-00429889
Multi-gpu and multicpu parallelization for interactive physics simulations, Euro-Par 2010 -Parallel Processing, pp.235-246, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00502448
Composing multiple starpu applications over heterogeneous machines: A supervised approach, IJHPCA, vol.28, issue.3, pp.285-300, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00824514
Dense matrix computation on a heterogenous architecture: A block synchronous approach, 2012. ,
Programming Heterogeneous Clusters with Accelerators Using Object-Based Programming, Scientific Programming, vol.19, issue.1, pp.47-62, 2011. ,
DOI : 10.1155/2011/525717
Composing parallel software efficiently with lithe. SIGPLAN Not, pp.376-387, 2010. ,
Programming matrix algorithms-by-blocks for thread-level parallelism, ACM Transactions on Mathematical Software, vol.36, issue.3, 2009. ,
DOI : 10.1145/1527286.1527288
Enabling and scaling matrix computations on heterogeneous multi-core and multi-GPU systems, Proceedings of the 26th ACM international conference on Supercomputing, ICS '12, pp.365-376, 2012. ,
DOI : 10.1145/2304576.2304625
Performance-effective and low-complexity task scheduling for heterogeneous computing. Parallel and Distributed Systems, IEEE Transactions on, vol.13, issue.3, pp.260-274, 2002. ,
Hierarchical DAG Scheduling for Hybrid Distributed Systems, 2015 IEEE International Parallel and Distributed Processing Symposium, 2015. ,
DOI : 10.1109/IPDPS.2015.56
URL : https://hal.archives-ouvertes.fr/hal-01078359