StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures Concurrency and Computation: Practice and Experience, Special Issue: Euro- Par, pp.187-198, 2009. ,
Guide: QUeueing And Runtime for Kernels, 2011. ,
PaRSEC: Exploiting Heterogeneity to Enhance Scalability, Computing in Science & Engineering, vol.15, issue.6, pp.36-45, 2013. ,
DOI : 10.1109/MCSE.2013.98
SuperMatrix, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming , PPoPP '08, pp.123-132, 2008. ,
DOI : 10.1145/1345206.1345227
Hierarchical Task-Based Programming With StarSs, International Journal of High Performance Computing Applications, vol.23, issue.3, pp.284-299, 2009. ,
DOI : 10.1177/1094342009106195
URL : http://hdl.handle.net/2117/28379
KAAPI, Proceedings of the 2007 international workshop on Parallel symbolic computation, PASCO '07, pp.15-23, 2007. ,
DOI : 10.1145/1278177.1278182
URL : https://hal.archives-ouvertes.fr/hal-00647474
Computers and Intractability, a Guide to the Theory of NP-Completeness, 1979. ,
Bounds for Certain Multiprocessing Anomalies, Bell System Technical Journal, vol.45, issue.9, pp.1563-1581, 1966. ,
DOI : 10.1002/j.1538-7305.1966.tb01709.x
Performance-effective and low-complexity task scheduling for heterogeneous computing, IEEE Transactions on Parallel and Distributed Systems, vol.13, issue.3, pp.260-274, 2002. ,
DOI : 10.1109/71.993206
High-performance linear algebra algorithms using new generalized data structures for matrices, IBM Journal of Research and Development, vol.47, issue.1, pp.31-55, 2003. ,
DOI : 10.1147/rd.471.0031
A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing, vol.35, issue.1, pp.38-53, 2009. ,
DOI : 10.1016/j.parco.2008.10.002
Programming matrix algorithms-byblocks for thread-level parallelism, ACM Transactions on Mathematical Software (TOMS), vol.36, issue.3, p.14, 2009. ,
Lapack working note 95 scalapack: A portable linear algebra library for distributed memory computers design issues and performance, 1995. ,
Lapack working note 191: A class of parallel tiled linear algebra algorithms for multicore architectures, 2007. ,
Dynamic scheduling of real-time tasks under precedence constraints, Real-Time Systems, vol.21, issue.10, pp.181-194, 1990. ,
DOI : 10.1007/BF00365326
An efficient dynamic scheduling algorithm for multiprocessor real-time systems Parallel and Distributed Systems, IEEE Transactions on, vol.9, issue.3, pp.312-319, 1998. ,
DOI : 10.1109/71.674322
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.3939
Partitioning and scheduling parallel programs for multiprocessing, ser. Research monographs in parallel and distributed computing, 1989. ,
SimGrid: A Generic Framework for Large-Scale Distributed Experiments, Tenth International Conference on Computer Modeling and Simulation (uksim 2008), 2008. ,
DOI : 10.1109/UKSIM.2008.28
URL : https://hal.archives-ouvertes.fr/inria-00260697
Modeling and Simulation of a Dynamic Task-Based Runtime System for Heterogeneous Multi-core Architectures, Euro-par -20th International Conference on Parallel Processing, 2014. ,
DOI : 10.1007/978-3-319-09873-9_5
URL : https://hal.archives-ouvertes.fr/hal-01011633
The FLAME approach: From dense linear algebra algorithms to high-performance multi-accelerator implementations, Journal of Parallel and Distributed Computing, vol.72, issue.9, pp.1134-1143, 2012. ,
DOI : 10.1016/j.jpdc.2011.10.014
Distributed dense numerical linear algebra algorithms on massively parallel architectures, 2010. ,
Automatic Calibration of Performance Models on Heterogeneous Multicore Architectures, Lecture Notes in Computer Science, vol.6043, pp.56-65, 2009. ,
DOI : 10.1007/978-3-642-14122-5_9
URL : https://hal.archives-ouvertes.fr/inria-00421333
Faster, Cheaper, Better ? a Hybridization Methodology to Develop Linear Algebra Software for GPUs, GPU Computing Gems, W. mei W. Hwu, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00547847
QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators, 2011 IEEE International Parallel & Distributed Processing Symposium, pp.932-943, 2011. ,
DOI : 10.1109/IPDPS.2011.90
URL : https://hal.archives-ouvertes.fr/inria-00547614