J. J. Dongarra, H. W. Meuer, and E. Strohmaier, Top500 supercomputer sites, Supercomputer, vol.13, pp.89-111, 1997.

Y. Inadomi, T. Patki, K. Inoue, M. Aoyagi, B. Rountree et al., Analyzing and mitigating the impact of manufacturing variability in power-constrained supercomputing, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC '15, 2015.

O. Beaumont, A. Legrand, F. Rastello, and Y. Robert, Static LU decomposition on heterogeneous platforms, Int. Journal of High Performance Computing Applications, vol.15, pp.310-323, 2001.
URL : https://hal.archives-ouvertes.fr/hal-00856641

L. S. Blackford, J. Choi, A. Cleary, E. D'azeuedo, J. Demmel et al., ScaLAPACK User's Guide. USA: Society for Industrial and Applied Mathematics, 1997.

L. Grigori, J. W. Demmel, and H. Xiang, Communication avoiding gaussian elimination, SC '08: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, pp.1-12, 2008.
URL : https://hal.archives-ouvertes.fr/inria-00277901

A. Kalinov and A. Lastovetsky, Heterogeneous distribution of computations solving linear algebra problems on networks of heterogeneous computers, J. of Par. and Distr. Comp, vol.61, issue.4, p.520, 2001.

O. Beaumont, V. Boudet, F. Rastello, and Y. Robert, Partitioning a square into rectangles: NP-completeness and approximation algorithms, Algorithmica, vol.34, pp.217-239, 2002.
URL : https://hal.archives-ouvertes.fr/hal-00807407

, Matrix multiplication on heterogeneous platforms, IEEE Trans. Parallel Distributed Systems, vol.12, issue.10, pp.1033-1051, 2001.

H. Nagamochi and Y. Abe, An approximation algorithm for dissecting a rectangle into rectangles with specified areas, Discrete Applied Mathematics, vol.155, issue.4, pp.523-537, 2007.

O. Beaumont, V. Boudet, A. Petitet, F. Rastello, and Y. Robert, A proposal for a heterogeneous cluster ScaLAPACK (dense linear solvers), IEEE Trans. Computers, vol.50, issue.10, pp.1052-1070, 2001.
URL : https://hal.archives-ouvertes.fr/hal-00808287

J. Dongarra, S. Tomov, P. Luszczek, J. Kurzak, M. Gates et al., With extreme computing, the rules have changed, Comp. in Sci. Eng, vol.19, issue.3, p.52, 2017.

S. Thibault, On runtime systems for task-based programming on heterogeneous platforms, Hab. à diriger des rech., U. Bordeaux, 2018.
URL : https://hal.archives-ouvertes.fr/tel-01959127

G. Bosilca, A. Bouteiller, A. Danalis, M. Faverge, A. Haidar et al., Flexible development of dense linear algebra algorithms on massively parallel architectures with DPLASMA, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, pp.1432-1441, 2011.

E. Agullo, C. Augonnet, J. Dongarra, H. Ltaief, R. Namyst et al., Faster, cheaper, better -a hybridization methodology to develop linear algebra software for GPUs, GPU Computing Gems, vol.2, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00547847

L. Eyraud-dubois and T. Lambert, Using static allocation algorithms for matrix matrix multiplication on multicores and GPUs, ICPP 2018 -47th International Conference on Parallel Processing, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01670678

C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier, StarPU: A unified platform for task scheduling on heterogeneous multicore architectures, SI:EuroPar'09, vol.23, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00384363

L. Stanisic, S. Thibault, A. Legrand, B. Videau, and J. Méhaut, Faithful performance prediction of a dynamic task-based runtime system for heterogeneous multi-core architectures, Concurrency and Computation: Practice and Experience, p.16, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01147997

E. Agullo, A. Buttari, A. Guermouche, and F. Lopez, Implementing multifrontal sparse solvers for multicore architectures with sequential task flow runtime systems, ACM Tr. Math. Softw, vol.43, issue.2, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01333645

C. Augonnet, O. Aumage, N. Furmento, R. Namyst, and S. Thibault, StarPU-MPI: Task programming over clusters of machines enhanced with accelerators, Proceedings of the 19th European Conference on Recent Advances in the Message Passing Interface, ser. EuroMPI'12, pp.298-299, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00725477

O. Aumage, E. Brunet, N. Furmento, and R. Namyst, New madeleine: A fast communication scheduling engine for high performance networks, 2007 IEEE International Parallel and Distributed Processing Symposium, pp.1-8, 2007.

E. Gabriel, G. E. Fagg, G. Bosilca, T. Angskun, J. J. Dongarra et al., Open mpi: Goals, concept, and design of a next generation mpi implementation, European Parallel Virtual Machine/Message Passing Interface Users' Group Meeting, pp.97-104, 2004.

H. Casanova, A. Giersch, A. Legrand, M. Quinson, and F. Suter, Versatile, scalable, and accurate simulation of distributed applications and platforms, Journal of Parallel and Distributed Computing, vol.74, issue.10, pp.2899-2917, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01017319

V. G. Pinto, L. M. Schnorr, L. Stanisic, A. Legrand, S. Thibault et al., A visual performance analysis framework for task based parallel applications running on hybrid clusters, Concurrency and Computation: Practice and Experience, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01616632

L. Nesi, S. Thibault, L. Stanisic, and L. Schnorr, Visual performance analysis of memory behavior in a task-based runtime on hybrid platforms, 2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), 2019.
URL : https://hal.archives-ouvertes.fr/hal-02275363

Y. Pei, G. Bosilca, I. Yamazaki, A. Ida, and J. Dongarra, Evaluation of programming models to address load imbalance on distributed multicore CPUs: A case study with block low-rank factorization, p.2019

, Alternatives To MPI (PAW-ATM), pp.25-36, 2019.

A. Denis, E. Jeannot, P. Swartvagher, and S. Thibault, Using dynamic broadcasts to improve task-based runtime performances, Euro-Par 2020: 26th International European Conference on Parallel and Distributed Computing, 2020.