M. Frigo, C. Leiserson, and K. Randall, The implementation of the cilk-5 multithreaded language, SIGPLAN Not, pp.212-223, 1998.

J. Reinders, Intel Threading Building Blocks: Outfitting C++ for Multi- Core Processor Parallelism, 2007.

T. D. Hartley, E. Saule, and ¨. U. , Improving performance of adaptive component-based dataflow middleware, Parallel Computing, vol.38, issue.6-7, pp.289-309, 2012.
DOI : 10.1016/j.parco.2012.03.005

D. M. Kunzman and L. V. Kalé, Programming Heterogeneous Clusters with Accelerators Using Object-Based Programming, Scientific Programming, vol.19, issue.1, pp.47-62, 2011.
DOI : 10.1155/2011/525717

E. Hermann, B. Raffin, F. Faure, T. Gautier, and J. Allard, Multi-GPU and Multi-CPU Parallelization for Interactive Physics Simulations, Euro-Par 2010 -Parallel Processing, pp.235-246, 2010.
DOI : 10.1007/978-3-642-15291-7_23

URL : https://hal.archives-ouvertes.fr/inria-00502448

M. Bauer, S. Treichler, E. Slaughter, and A. Aiken, Legion: Expressing locality and independence with logical regions, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, p.66, 2012.
DOI : 10.1109/SC.2012.71

G. Bosilca, A. Bouteiller, A. Danalis, M. Faverge, T. Hérault et al., PaRSEC: Exploiting Heterogeneity to Enhance Scalability, Computing in Science & Engineering, vol.15, issue.6, pp.36-45, 2013.
DOI : 10.1109/MCSE.2013.98

C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier, StarPU: a unified platform for task scheduling on heterogeneous multicore architectures, Concurrency and Computation: Practice and Experience, vol.23, issue.4, pp.187-198, 2011.
DOI : 10.1002/cpe.1631

URL : https://hal.archives-ouvertes.fr/inria-00384363

E. Ayguadé, R. Badia, F. Igual, J. Labarta, R. Mayo et al., An Extension of the StarSs Programming Model for Platforms with Multiple GPUs, pp.851-862, 2009.
DOI : 10.1109/TPDS.2003.1214317

M. Frigo and S. G. Johnson, The Design and Implementation of FFTW3, Proceedings of the IEEE, pp.216-231, 2005.
DOI : 10.1109/JPROC.2004.840301

G. Quintana-ortí, E. S. Quintana-ortí, R. A. Van-de-geijn, F. G. Zee, and E. Chan, Programming matrix algorithms-by-blocks for thread-level parallelism, ACM Transactions on Mathematical Software, vol.36, issue.3
DOI : 10.1145/1527286.1527288

E. Agullo, J. Demmel, J. Dongarra, B. Hadri, J. Kurzak et al., Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, Journal of Physics: Conference Series, vol.180, issue.1
DOI : 10.1088/1742-6596/180/1/012037

W. Wu, A. Bouteiller, G. Bosilca, M. Faverge, and J. Dongarra, Hierarchical DAG Scheduling for Hybrid Distributed Systems, 2015 IEEE International Parallel and Distributed Processing Symposium, 2015.
DOI : 10.1109/IPDPS.2015.56

URL : https://hal.archives-ouvertes.fr/hal-01078359

A. Hugo, A. Guermouche, P. Wacrenier, and R. Namyst, Composing multiple starpu applications over heterogeneous machines: A supervised approach, IJHPCA, vol.28, issue.3, pp.285-300, 2014.
DOI : 10.1109/ipdpsw.2013.217

URL : https://hal.archives-ouvertes.fr/hal-00824514

F. Song, S. Tomov, and J. Dongarra, Enabling and scaling matrix computations on heterogeneous multi-core and multi-GPU systems, Proceedings of the 26th ACM international conference on Supercomputing, ICS '12, pp.365-376, 2012.
DOI : 10.1145/2304576.2304625

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

K. Kim, V. Eijkhout, and R. A. Van-de-geijn, Dense matrix computation on a heterogenous architecture: A block synchronous approach, 2012.

H. Pan, B. Hindman, and K. Asanovi´casanovi´c, Composing parallel software efficiently with lithe, SIGPLAN Not, pp.376-387, 2010.
DOI : 10.1145/1806596.1806639

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

H. Topcuoglu, S. Hariri, and M. Wu, Performance-effective and low-complexity task scheduling for heterogeneous computing, IEEE Transactions on Parallel and Distributed Systems, vol.13, issue.3, pp.260-274, 2002.
DOI : 10.1109/71.993206

F. Broquedis, J. Clet-ortega, S. Moreaud, N. Furmento, B. Goglin et al., hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, pp.180-186, 2010.
DOI : 10.1109/PDP.2010.67

URL : https://hal.archives-ouvertes.fr/inria-00429889

E. Agullo, C. Augonnet, J. Dongarra, H. Ltaief, R. Namyst et al., A hybridization methodology for high-performance linear algebra software for gpus, GPU Computing Gems, pp.473-484, 2011.
DOI : 10.1016/b978-0-12-385963-1.00034-4

O. Beaumont, T. Cojean, L. Eyraud-dubois, A. Guermouche, and S. Kumar, Scheduling of Linear Algebra Kernels on Multiple Heterogeneous Resources, 2016 IEEE 23rd International Conference on High Performance Computing (HiPC), 2016.
DOI : 10.1109/HiPC.2016.045

URL : https://hal.archives-ouvertes.fr/hal-01361992

E. Agullo, A. Buttari, A. Guermouche, and F. Lopez, Implementing Multifrontal Sparse Solvers for Multicore Architectures with Sequential Task Flow Runtime Systems, ACM Transactions on Mathematical Software, vol.43, issue.2
DOI : 10.1145/2898348

URL : https://hal.archives-ouvertes.fr/hal-01333645

A. Haidar, S. Tomov, K. Arturov, M. Guney, S. Story et al., Programming model, performance analysis and optimization techniques for the Intel Knights Landing Xeon Phi, IEEE High Performance Extreme Computing Conference (HPEC'16, p.2016

E. Agullo, O. Beaumont, L. Eyraud-dubois, and S. Kumar, Are Static Schedules so Bad? A Case Study on Cholesky Factorization, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2016.
DOI : 10.1109/IPDPS.2016.90

URL : https://hal.archives-ouvertes.fr/hal-01223573