H. Topcuoglu, S. Hariri, and M. Wu, Performance-effective and low-complexity task scheduling for heterogeneous computing, IEEE Transactions on Parallel and Distributed Systems, vol.13, issue.3, pp.260-274, 2002.
DOI : 10.1109/71.993206

E. Schulte, D. Davison, T. Dye, and C. Dominik, A Multi-Language Computing Environment for Literate Programming and Reproducible Research, Journal of Statistical Software, vol.46, issue.3, 2012.
DOI : 10.18637/jss.v046.i03

E. Agullo, G. Bosilca, B. Bramas, C. Castagnede, O. Coulaud et al., Poster: Matrices over Runtime Systems at Exascale, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, pp.1332-1332, 2012.
DOI : 10.1109/SC.Companion.2012.168

C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier, StarPU: a unified platform for task scheduling on heterogeneous multicore architectures, Conc. and Comp.: Pract. and Exp, vol.23, issue.2, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00384363

E. Agullo, J. Demmel, J. Dongarra, B. Hadri, J. Kurzak et al., Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, Journal of Physics: Conference Series, vol.180, issue.1, 2009.
DOI : 10.1088/1742-6596/180/1/012037

A. Duran, E. Ayguadé, R. M. Badia, J. Labarta, L. Martinell et al., OmpSs: A PROPOSAL FOR PROGRAMMING HETEROGENEOUS MULTI-CORE ARCHITECTURES, Parallel Processing Letters, vol.30, issue.02, 2011.
DOI : 10.1016/j.jcp.2004.10.011

G. Bosilca, A. Bouteiller, A. Danalis, T. Herault, P. Lemarinier et al., DAGuE: A generic distributed DAG engine for high performance computing, Parallel Computing, vol.38, issue.1, 2012.

C. Augonnet, O. Aumage, N. Furmento, R. Namyst, and S. Thibault, StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators, Proc. European Conf. Recent Advances in the Message Passing Interface (EuroMPI, pp.298-299, 2012.
DOI : 10.1007/978-3-642-33518-1_40

URL : https://hal.archives-ouvertes.fr/hal-00725477

S. Ohshima, S. Katagiri, K. Nakajima, S. Thibault, R. Namyst et al., Implementation of FEM Application on GPU with StarPU Towards seismic wave modeling on heterogeneous many-core architectures using task-based runtime system, SIAM Conference on Computational Science and Engineering Intl. Symp. on Comp. Arch. and High Perf. Comp. (SBAC-PAD, 2013.

E. Agullo, L. Giraud, A. Guermouche, S. Nakov, and J. Roman, Taskbased Conjugate Gradient: from multi-GPU towards heterogeneous architectures, Inria Bordeaux Research Report, vol.8912, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01334734

X. Lacoste, M. Faverge, P. Ramet, S. Thibault, and G. Bosilca, Taking Advantage of Hybrid Systems for Sparse Direct Solvers via Task-Based Runtimes, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, pp.29-38, 2014.
DOI : 10.1109/IPDPSW.2014.9

URL : https://hal.archives-ouvertes.fr/hal-00925017

R. L. Graham, Bounds for Certain Multiprocessing Anomalies, Bell System Technical Journal, vol.45, issue.9, pp.1563-1581, 1966.
DOI : 10.1002/j.1538-7305.1966.tb01709.x

K. Coulomb, M. Faverge, J. Jazeix, O. Lagrasse, J. Marcoueille et al., Visual trace explorer ViTE

L. M. Schnorr, M. Faverge, F. Trahay, B. O. Stein, and J. C. De-kergommeaux, The Paje trace file format, UFRGS, Tech. Rep, 2016.

V. Pillet, J. Labarta, T. Cortes, and S. Girona, Paraver: A tool to visualize and analyze parallel code, Proceedings of WoTUG-18: Transputer and occam Developments, pp.17-31, 1995.

A. Knüpfer, H. Brunst, J. Doleschal, M. Jurenz, M. Lieber et al., The Vampir performance analysis toolset, Tools for High Perf. Comp, pp.139-155, 2008.

A. Huynh, D. Thain, M. Pericàs, and K. Taura, DAGViz, Proceedings of the 2nd Workshop on Visual Performance Analysis, VPA '15, pp.1-3, 2015.
DOI : 10.1145/1594835.1504210

R. Keller, S. Brinkmann, J. Gracia, and C. Niethammer, Temanejo: Debugging of Thread-Based Task-Parallel Programs in StarSS, pp.131-137, 2012.
DOI : 10.1007/978-3-642-31476-6_11

B. Haugen, S. Richmond, J. Kurzak, C. A. Steed, and J. Dongarra, Visualizing execution traces with task dependencies, Proceedings of the 2nd Workshop on Visual Performance Analysis, VPA '15, pp.1-2, 2015.
DOI : 10.1109/CCGrid.2011.83

L. M. Schnorr and A. Legrand, Visualizing More Performance Data Than What Fits on Your Screen, pp.149-162, 2013.
DOI : 10.1007/978-3-642-37349-7_10

URL : https://hal.archives-ouvertes.fr/hal-00737651

G. Pagano and V. Marangozova-martin, FrameSoC Workbench: Facilitating Trace Analysis through a Consistent User Interface, Inria, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00977887

V. Danjean, R. Namyst, and P. Wacrenier, An Efficient Multi-level Trace Toolkit for Multi-threaded Applications, European Conference on Parallel Processing, pp.166-175, 2005.
DOI : 10.1007/11549468_21

URL : https://hal.archives-ouvertes.fr/hal-00360309

E. Agullo, O. Beaumont, L. Eyraud-dubois, J. Herrmann, S. Kumar et al., Bridging the Gap between Performance and Bounds of Cholesky Factorization on Heterogeneous Platforms, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015.
DOI : 10.1109/IPDPSW.2015.35

URL : https://hal.archives-ouvertes.fr/hal-01120507

E. Agullo, A. Buttari, A. Guermouche, and F. Lopez, Multifrontal QR Factorization for Multicore Architectures over Runtime Systems, European Conf. on Parallel Processing, pp.521-532, 2013.
DOI : 10.1007/978-3-642-40047-6_53

URL : https://hal.archives-ouvertes.fr/hal-01220611