A. Buttari, J. Langou, J. Kurzak, and J. Dongarra, A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing, vol.35, issue.1, pp.38-53, 2009.
DOI : 10.1016/j.parco.2008.10.002

G. Bosilca, A. Bouteiller, A. Danalis, T. Herault, J. Kurzak et al., Scalable Dense Linear Algebra on Heterogeneous Hardware HPC: Transition Towards Exascale Processing, the series Advances in Parallel Computing, pp.65-103, 2013.

J. A. Gunnels, F. G. Gustavson, G. M. Henry, and R. A. Van-de-geijn, FLAME: Formal Linear Algebra Methods Environment, ACM Transactions on Mathematical Software, vol.27, issue.4, pp.422-455, 2001.
DOI : 10.1145/504210.504213
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.118.7096

E. Agullo, C. Augonnet, J. Dongarra, H. Ltaief, R. Namyst et al., A Hybridization Methodology for High-Performance Linear Algebra Software for GPUs, GPU Computing Gems, Jade Edition, pp.473-484, 2011.
DOI : 10.1016/B978-0-12-385963-1.00034-4

X. Lacoste, M. Faverge, P. Ramet, S. Thibault, and G. Bosilca, Taking Advantage of Hybrid Systems for Sparse Direct Solvers via Task-Based Runtimes, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, pp.29-38, 2014.
DOI : 10.1109/IPDPSW.2014.9
URL : https://hal.archives-ouvertes.fr/hal-00925017

E. Agullo, A. Buttari, A. Guermouche, and F. Lopez, Implementing Multifrontal Sparse Solvers for Multicore Architectures with Sequential Task Flow Runtime Systems, ACM Transactions On Mathematical Software. [Online]. Available, 2014.
DOI : 10.1109/71.993206
URL : https://hal.archives-ouvertes.fr/hal-01333645

E. Agullo, L. Giraud, A. Guermouche, S. Nakov, and J. Roman, Task-Based Conjugate Gradient: From Multi-GPU Towards Heterogeneous Architectures, Inria, Tech. Rep, vol.44, issue.4, 2016.
DOI : 10.1137/1.9780898718003
URL : https://hal.archives-ouvertes.fr/hal-01334734

H. Ltaief and R. Yokota, Data-driven execution of fast multipole methods, Concurrency and Computation: Practice and Experience, vol.26, issue.11, 1203.
DOI : 10.1002/cpe.3132

E. Agullo, B. Bramas, O. Coulaud, E. Darve, M. Messner et al., Task-Based FMM for Multicore Architectures, SIAM Journal on Scientific Computing, vol.36, issue.1, pp.66-93, 2014.
DOI : 10.1137/130915662
URL : https://hal.archives-ouvertes.fr/hal-00807368

B. Lizé, Résolution directe rapide pour les eléments finis de frontiere en electromagnétisme et acoustique: H-matrices. Parallélisme et applications industrielles, 2014.

T. Heller, H. Kaiser, K. Iglberger, ]. L. Boillot, G. Bosilca et al., Application of the ParalleX execution model to stencil-based problems, 2014 IEEE International Conference on High Performance Computing and Communications (HPCC), pp.253-261, 2013.
DOI : 10.1016/j.enconman.2010.02.024

E. Tejedor, M. Farreras, D. Grove, R. M. Badia, G. Almasi et al., A high-productivity task-based programming model for clusters, Concurrency and Computation: Practice and Experience, pp.2421-2448, 2012.
DOI : 10.1145/2020373.2020377
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.301.2467

J. Bueno, X. Martorell, R. M. Badia, E. Ayguadé, and J. Labarta, Implementing OmpSs support for regions of data in architectures with multiple address spaces, Proceedings of the 27th international ACM conference on International conference on supercomputing, ICS '13, pp.359-368, 2013.
DOI : 10.1145/2464996.2465017

A. Yarkhan, J. Kurzak, and J. Dongarra, Quark users' guide: Queueing and runtime for kernels, 2011.

A. Zafari, M. Tillenius, and E. Larsson, Programming Models Based on Data Versioning for Dependency-aware Task-based Parallelisation, 2012 IEEE 15th International Conference on Computational Science and Engineering, pp.275-280, 2012.
DOI : 10.1109/ICCSE.2012.45

M. Bauer, S. Treichler, E. Slaughter, and A. Aiken, Legion: Expressing locality and independence with logical regions, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, pp.661-6611, 2012.
DOI : 10.1109/SC.2012.71
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.259.7715

E. Anderson, Z. Bai, J. Dongarra, A. Greenbaum, A. Mckenney et al., LAPACK: a portable linear algebra library for highperformance computers, The 1990 ACM/IEEE conference on Supercomputing, pp.2-11, 1990.

C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier, StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures, Concurrency and Computation: Practice and Experience, pp.187-198, 2011.
DOI : 10.1007/978-3-642-03869-3_80
URL : https://hal.archives-ouvertes.fr/inria-00384363

C. Augonnet, J. Clet-ortega, S. Thibault, and R. Namyst, Data-Aware Task Scheduling on Multi-accelerator Based Platforms, 2010 IEEE 16th International Conference on Parallel and Distributed Systems, pp.291-298, 2010.
DOI : 10.1109/ICPADS.2010.129
URL : https://hal.archives-ouvertes.fr/inria-00523937

M. Cosnard and M. Loi, Automatic task graph generation techniques, System Sciences Proceedings of the Twenty-Eighth Hawaii International Conference on, pp.113-122, 1995.

G. Bosilca, A. Bouteiller, A. Danalis, M. Faverge, A. Haidar et al., Distributed Dense Numerical Linear Algebra Algorithms on massively parallel architectures: DPLASMA, Proceedings of the 25th IEEE International Symposium on Parallel & Distributed Processing Workshops and Phd Forum (IPDPSW'11), PDSEC 2011, pp.1432-1441, 2011.
DOI : 10.1109/ipdps.2011.299
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.228.4744

A. J. Bernstein, Analysis of Programs for Parallel Processing, IEEE Transactions on Electronic Computers, vol.15, issue.5, pp.757-763, 1966.
DOI : 10.1109/PGEC.1966.264565

A. Yarkhan, Dynamic Task Execution on Shared and Distributed Memory Architectures, 2012.

G. Bosilca, A. Bouteiller, A. Danalis, T. Herault, P. Lemarinier et al., DAGuE: A generic distributed DAG engine for High Performance Computing, Parallel Computing, vol.38, issue.1-2, pp.37-51, 2012.
DOI : 10.1016/j.parco.2011.10.003
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.186.1874

M. Tillenius, E. Larsson, E. Lehto, and N. Flyer, A task parallel implementation of a scattered node stencil-based solver for the shallow water equations, Swedish Workshop on Multi-Core Computing, 2013.

E. Slaughter, W. Lee, S. Treichler, M. Bauer, and A. Aiken, Regent, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '15, pp.811-8112, 2015.
DOI : 10.1002/(SICI)1096-9128(199809/11)10:11/13<825::AID-CPE383>3.0.CO;2-H

F. Song and J. Dongarra, A scalable framework for heterogeneous GPU-based clusters, Proceedinbgs of the 24th ACM symposium on Parallelism in algorithms and architectures, SPAA '12, pp.91-100, 2012.
DOI : 10.1145/2312005.2312025
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.466.9268

T. D. Hartley, E. Saule, and Ü. V. Çatalyürek, Improving performance of adaptive component-based dataflow middleware, Parallel Computing, vol.38, issue.6-7, pp.6-7, 2012.
DOI : 10.1016/j.parco.2012.03.005

C. Luk, S. Hong, and H. Kim, Qilin, Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, Micro-42, pp.45-55, 2009.
DOI : 10.1145/1669112.1669121

T. Gautier, J. V. Lima, N. Maillard, and B. Raffin, Locality-Aware Work Stealing on Multi-CPU and Multi-GPU Architectures, Workshop on Programmability Issues for Heterogeneous Multicores (MULTIPROG), 2013.
URL : https://hal.archives-ouvertes.fr/hal-00780890

D. Kunzman, Runtime support for object-based message-driven parallel applications on heterogeneous clusters, 2012.

D. M. Kunzman and L. V. Kalé, Programming Heterogeneous Clusters with Accelerators Using Object-Based Programming, Scientific Programming, 2011.
DOI : 10.1155/2011/525717
URL : https://doi.org/10.1155/2011/525717

Y. Zheng, C. Iancu, P. H. Hargrove, S. Min, and K. Yelick, Extending Unified Parallel C for GPU Computing, SIAM Conference on Parallel Processing for Scientific Computing (SIAMPP), 2010.

J. Lee, M. T. Tran, T. Odajima, T. Boku, and M. Sato, An Extension of XcalableMP PGAS Language for Multi-node GPU Clusters, HeteroPar, 2011.

C. Augonnet, S. Thibault, and R. Namyst, Automatic Calibration of Performance Models on Heterogeneous Multicore Architectures, Proceedings of the International Euro-Par Workshops, pp.56-65, 2009.
DOI : 10.1007/978-3-642-14122-5_9
URL : https://hal.archives-ouvertes.fr/inria-00421333

H. Topcuoglu, S. Hariri, and M. Wu, Performance-effective and low-complexity task scheduling for heterogeneous computing Parallel and Distributed Systems, IEEE Transactions on, vol.13, issue.3, pp.260-274, 2002.
DOI : 10.1109/71.993206
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.119.122

W. Wu, A. Bouteiller, G. Bosilca, M. Faverge, and J. Dongarra, Hierarchical DAG Scheduling for Hybrid Distributed Systems Available: https, 29th IEEE International Parallel & Distributed Processing Symposium, 2015.
DOI : 10.1109/ipdps.2015.56

S. Verdoolaege, J. C. Juega, A. Cohen, J. I. Gómez, C. Tenllado et al., Polyhedral parallel code generation for CUDA, ACM Transactions on Architecture and Code Optimization, vol.9, issue.4, pp.1-5423, 2013.
DOI : 10.1145/2400682.2400713
URL : https://hal.archives-ouvertes.fr/hal-00786677

M. Sergent, D. Goudin, S. Thibault, and O. Aumage, Controlling the Memory Subscription of Distributed Applications with a Task-Based Runtime System, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2016.
DOI : 10.1109/IPDPSW.2016.105
URL : https://hal.archives-ouvertes.fr/hal-01380126