A. Buttari, J. Langou, J. Kurzak, and J. Dongarra, A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing, vol.35, issue.1, pp.38-53, 2009.
DOI : 10.1016/j.parco.2008.10.002

J. A. Gunnels, F. G. Gustavson, G. M. Henry, and R. A. Van-de-geijn, FLAME: Formal Linear Algebra Methods Environment, ACM Transactions on Mathematical Software, vol.27, issue.4, pp.422-455, 2001.
DOI : 10.1145/504210.504213

E. Agullo, C. Augonnet, J. Dongarra, H. Ltaief, R. Namyst et al., A Hybridization Methodology for High-Performance Linear Algebra Software for GPUs, GPU Computing Gems, Jade Edition, pp.473-484, 2011.
DOI : 10.1016/B978-0-12-385963-1.00034-4

X. Lacoste, M. Faverge, P. Ramet, S. Thibault, and G. Bosilca, Taking Advantage of Hybrid Systems for Sparse Direct Solvers via Task-Based Runtimes, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, pp.29-38, 2014.
DOI : 10.1109/IPDPSW.2014.9
URL : https://hal.archives-ouvertes.fr/hal-00925017

E. Agullo, A. Buttari, A. Guermouche, and F. Lopez, Implementing Multifrontal Sparse Solvers for Multicore Architectures with Sequential Task Flow Runtime Systems, ACM Transactions On Mathematical Software. [Online]. Available, 2014.
DOI : 10.1145/2898348
URL : https://hal.archives-ouvertes.fr/hal-01333645

E. Agullo, L. Giraud, A. Guermouche, S. Nakov, and J. Roman, Task-based conjugate gradient: from multi-GPU towards heterogeneous architectures, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01334734

H. Ltaief and R. Yokota, Data-driven execution of fast multipole methods, Concurrency and Computation: Practice and Experience, vol.26, issue.11, 1203.
DOI : 10.1002/cpe.3132

E. Agullo, B. Bramas, O. Coulaud, E. Darve, M. Messner et al., Task-Based FMM for Multicore Architectures, SIAM Journal on Scientific Computing, vol.36, issue.1, pp.66-93, 2014.
DOI : 10.1137/130915662
URL : https://hal.archives-ouvertes.fr/hal-00807368

B. Lizé, Résolution directe rapide pour les eléments finis de frontiere en electromagnétisme et acoustique: H-matrices. Parallélisme et applications industrielles, 2014.

T. Heller, H. Kaiser, and K. Iglberger, Application of the ParalleX execution model to stencil-based problems, International Supercomputing Conference, 2012.
DOI : 10.1007/s00450-012-0217-1

L. Boillot, G. Bosilca, E. Agullo, and H. Calandra, Task-based programming for Seismic Imaging: Preliminary Results Available: https, 2014 IEEE International Conference on High Performance Computing and Communications (HPCC), pp.1259-1266, 2014.

E. Anderson, Z. Bai, J. Dongarra, A. Greenbaum, A. Mckenney et al., Lapack: a portable linear algebra library for high-performance computers, The 1990 ACM/IEEE conference on Supercomputing, 1990.

E. Agullo, C. Augonnet, J. Dongarra, H. Ltaief, R. Namyst et al., A Hybridization Methodology for High-Performance Linear Algebra Software for GPUs, GPU Computing Gems, 2010.
DOI : 10.1016/B978-0-12-385963-1.00034-4

C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier, StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures, Concurrency and Computation: Practice and Experience, pp.187-198, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00384363

A. Yarkhan, J. Kurzak, and J. Dongarra, Quark users' guide: Queueing and runtime for kernels, 2011.

C. Augonnet, J. Clet-ortega, S. Thibault, and R. Namyst, Data-Aware Task Scheduling on Multi-accelerator Based Platforms, 2010 IEEE 16th International Conference on Parallel and Distributed Systems, 2010.
DOI : 10.1109/ICPADS.2010.129
URL : https://hal.archives-ouvertes.fr/inria-00523937

M. Cosnard and M. Loi, Automatic task graph generation techniques, System Sciences Proceedings of the Twenty-Eighth Hawaii International Conference on, pp.113-122, 1995.

G. Bosilca, A. Bouteiller, A. Danalis, M. Faverge, A. Haidar et al., Distributed Dense Numerical Linear Algebra Algorithms on massively parallel architectures: DPLASMA Available: https, Proceedings of the 25th IEEE International Symposium on Parallel & Distributed Processing Workshops and Phd Forum (IPDPSW'11), PDSEC 2011, pp.1432-1441, 2011.

R. Allen and K. Kennedy, Optimizing Compilers for Modern Architectures: A Dependence- Based Approach, 2002.

E. Tejedor, M. Farreras, D. Grove, R. M. Badia, G. Almasi et al., A high-productivity task-based programming model for clusters, Concurrency and Computation: Practice and Experience, pp.2421-2448, 2012.
DOI : 10.1002/cpe.2831

A. Yarkhan, Dynamic task execution on shared and distributed memory architectures, 2012.

J. Bueno, X. Martorell, R. M. Badia, E. Ayguadé, and J. Labarta, Implementing OmpSs support for regions of data in architectures with multiple address spaces, Proceedings of the 27th international ACM conference on International conference on supercomputing, ICS '13, 2013.
DOI : 10.1145/2464996.2465017

G. Bosilca, A. Bouteiller, A. Danalis, T. Herault, P. Lemarinier et al., DAGuE: A generic distributed DAG engine for High Performance Computing, Parallel Computing, vol.38, issue.1-2, pp.37-51
DOI : 10.1016/j.parco.2011.10.003

M. Tillenius, E. Larsson, E. Lehto, and N. Flyer, A task parallel implementation of a scattered node stencil-based solver for the shallow water equations Programming models based on data versioning for dependency-aware task-based parallelisation, Swedish Workshop on Multi-Core Computing, 2013. [26] A. Zafari, M. Tillenius, and E. Larsson International Conference on Computational Science and Engineering, 2012.

M. Bauer, S. Treichler, E. Slaughter, and A. Aiken, Legion: Expressing locality and independence with logical regions, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, 2012.
DOI : 10.1109/SC.2012.71

E. Slaughter, W. Lee, S. Treichler, M. Bauer, and A. Aiken, Regent, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '15, 2015.
DOI : 10.1145/2807591.2807629

F. Song and J. Dongarra, A scalable framework for heterogeneous GPU-based clusters, Proceedinbgs of the 24th ACM symposium on Parallelism in algorithms and architectures, SPAA '12, pp.91-100, 2012.
DOI : 10.1145/2312005.2312025

T. D. Hartley, E. Saule, and Ü. V. Çatalyürek, Improving performance of adaptive component-based dataflow middleware, Parallel Computing, vol.38, issue.6-7, pp.289-309, 2012.
DOI : 10.1016/j.parco.2012.03.005

C. Luk, S. Hong, and H. Kim, Qilin, Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, Micro-42, 2009.
DOI : 10.1145/1669112.1669121

T. Gautier, J. V. Lima, N. Maillard, and B. Raffin, Locality-aware work stealing on multi-CPU and multi-GPU architectures, Workshop on Programmability Issues for Heterogeneous Multicores (MULTIPROG), 2013.
URL : https://hal.archives-ouvertes.fr/hal-00780890

D. Kunzman, Runtime support for object-based message-driven parallel applications on heterogeneous clusters, 2012.

D. M. Kunzman and L. V. Kalé, Programming Heterogeneous Clusters with Accelerators Using Object-Based Programming, Scientific Programming, 2011.
DOI : 10.1155/2011/525717

Y. Zheng, C. Iancu, P. H. Hargrove, S. Min, and K. Yelick, Extending Unified Parallel C for GPU Computing, SIAM Conference on Parallel Processing for Scientific Computing (SIAMPP), 2010.

J. Lee, M. T. Tran, T. Odajima, T. Boku, and M. Sato, An Extension of XcalableMP PGAS Lanaguage for Multi-node GPU Clusters, HeteroPar, 2011.
DOI : 10.1007/978-3-642-29737-3_48

C. Augonnet, S. Thibault, and R. Namyst, Automatic Calibration of Performance Models on Heterogeneous Multicore Architectures, Proceedings of the International, pp.56-65, 2009.
DOI : 10.1007/978-3-642-14122-5_9
URL : https://hal.archives-ouvertes.fr/inria-00421333

H. Topcuoglu, S. Hariri, and M. Wu, Performance-effective and low-complexity task scheduling for heterogeneous computing Parallel and Distributed Systems, IEEE Transactions on, vol.13, issue.3, pp.260-274, 2002.

W. Wu, A. Bouteiller, G. Bosilca, M. Faverge, and J. Dongarra, Hierarchical DAG Scheduling for Hybrid Distributed Systems Available: https, 29th IEEE International Parallel & Distributed Processing Symposium, 2015.

M. Sergent, D. Goudin, S. Thibault, and O. Aumage, Controlling the Memory Subscription of Distributed Applications with a Task-Based Runtime System Available: https, 21st International Workshop on High-Level Parallel Programming Models and Supportive Environments, 2016.