C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier, StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures Concurrency and Computation: Practice and Experience, Special Issue: Euro- Par, pp.187-198, 2009.

A. Yarkhan, J. Kurzak, and J. Dongarra, Guide: QUeueing And Runtime for Kernels, 2011.

G. Bosilca, A. Bouteiller, A. Danalis, M. Faverge, T. Hérault et al., PaRSEC: Exploiting Heterogeneity to Enhance Scalability, Computing in Science & Engineering, vol.15, issue.6, pp.36-45, 2013.
DOI : 10.1109/MCSE.2013.98

E. Chan, F. G. Van-zee, P. Bientinesi, E. S. Quintana-orti, G. Quintana-orti et al., SuperMatrix, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming , PPoPP '08, pp.123-132, 2008.
DOI : 10.1145/1345206.1345227

J. Planas, R. M. Badia, E. Ayguadé, and J. Labarta, Hierarchical Task-Based Programming With StarSs, International Journal of High Performance Computing Applications, vol.23, issue.3, pp.284-299, 2009.
DOI : 10.1177/1094342009106195

URL : http://hdl.handle.net/2117/28379

T. Gautier, X. Besseron, and L. Pigeon, KAAPI, Proceedings of the 2007 international workshop on Parallel symbolic computation, PASCO '07, pp.15-23, 2007.
DOI : 10.1145/1278177.1278182

URL : https://hal.archives-ouvertes.fr/hal-00647474

M. R. Garey and D. S. Johnson, Computers and Intractability, a Guide to the Theory of NP-Completeness, 1979.

R. L. Graham, Bounds for Certain Multiprocessing Anomalies, Bell System Technical Journal, vol.45, issue.9, pp.1563-1581, 1966.
DOI : 10.1002/j.1538-7305.1966.tb01709.x

H. Topcuoglu, S. Hariri, and M. Wu, Performance-effective and low-complexity task scheduling for heterogeneous computing, IEEE Transactions on Parallel and Distributed Systems, vol.13, issue.3, pp.260-274, 2002.
DOI : 10.1109/71.993206

F. Gustavson, High-performance linear algebra algorithms using new generalized data structures for matrices, IBM Journal of Research and Development, vol.47, issue.1, pp.31-55, 2003.
DOI : 10.1147/rd.471.0031

A. Buttari, J. Langou, J. Kurzak, and J. Dongarra, A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing, vol.35, issue.1, pp.38-53, 2009.
DOI : 10.1016/j.parco.2008.10.002

G. Quintana-ortí, E. S. Quintana-ortí, R. A. Geijn, F. G. Zee, and E. Chan, Programming matrix algorithms-byblocks for thread-level parallelism, ACM Transactions on Mathematical Software (TOMS), vol.36, issue.3, p.14, 2009.

C. D. Dhillon, J. Choi, J. Demmel, I. Dhillon, J. Dongarra et al., Lapack working note 95 scalapack: A portable linear algebra library for distributed memory computers design issues and performance, 1995.

]. A. Buttari, J. Langou, J. Kurzak, and J. Dongarra, Lapack working note 191: A class of parallel tiled linear algebra algorithms for multicore architectures, 2007.

H. Chetto, M. Silly, and T. Bouchentouf, Dynamic scheduling of real-time tasks under precedence constraints, Real-Time Systems, vol.21, issue.10, pp.181-194, 1990.
DOI : 10.1007/BF00365326

G. Manimaran and C. S. Murthy, An efficient dynamic scheduling algorithm for multiprocessor real-time systems Parallel and Distributed Systems, IEEE Transactions on, vol.9, issue.3, pp.312-319, 1998.
DOI : 10.1109/71.674322

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

V. Sarkar, Partitioning and scheduling parallel programs for multiprocessing, ser. Research monographs in parallel and distributed computing, 1989.

H. Casanova, A. Legrand, and M. Quinson, SimGrid: A Generic Framework for Large-Scale Distributed Experiments, Tenth International Conference on Computer Modeling and Simulation (uksim 2008), 2008.
DOI : 10.1109/UKSIM.2008.28

URL : https://hal.archives-ouvertes.fr/inria-00260697

L. Stanisic, S. Thibault, A. Legrand, B. Videau, and J. Méhaut, Modeling and Simulation of a Dynamic Task-Based Runtime System for Heterogeneous Multi-core Architectures, Euro-par -20th International Conference on Parallel Processing, 2014.
DOI : 10.1007/978-3-319-09873-9_5

URL : https://hal.archives-ouvertes.fr/hal-01011633

F. D. Igual, E. Chan, E. S. Quintana-ortí, G. Quintana-ortí, R. A. Van-de-geijn et al., The FLAME approach: From dense linear algebra algorithms to high-performance multi-accelerator implementations, Journal of Parallel and Distributed Computing, vol.72, issue.9, pp.1134-1143, 2012.
DOI : 10.1016/j.jpdc.2011.10.014

G. Bosilca, A. Bouteiller, A. Danalis, M. Faverge, A. Haidar et al., Distributed dense numerical linear algebra algorithms on massively parallel architectures, 2010.

C. Augonnet, S. Thibault, and R. Namyst, Automatic Calibration of Performance Models on Heterogeneous Multicore Architectures, Lecture Notes in Computer Science, vol.6043, pp.56-65, 2009.
DOI : 10.1007/978-3-642-14122-5_9

URL : https://hal.archives-ouvertes.fr/inria-00421333

E. Agullo, C. Augonnet, J. Dongarra, H. Ltaief, R. Namyst et al., Faster, Cheaper, Better ? a Hybridization Methodology to Develop Linear Algebra Software for GPUs, GPU Computing Gems, W. mei W. Hwu, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00547847

E. Agullo, C. Augonnet, J. Dongarra, M. Faverge, H. Ltaief et al., QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators, 2011 IEEE International Parallel & Distributed Processing Symposium, pp.932-943, 2011.
DOI : 10.1109/IPDPS.2011.90

URL : https://hal.archives-ouvertes.fr/inria-00547614