C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier, StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures, Concurrency and Computation: Practice and Experience, pp.187-198, 2011.
DOI : 10.1007/978-3-642-03869-3_80

URL : https://hal.archives-ouvertes.fr/inria-00384363

J. Planas, R. M. Badia, E. Ayguadé, and J. Labarta, Hierarchical Task-Based Programming With StarSs, The International Journal of High Performance Computing Applications, vol.17, issue.1, pp.284-299, 2009.
DOI : 10.1109/5.476078

G. Bosilca, A. Bouteiller, A. Danalis, M. Faverge, T. Hérault et al., PaRSEC: Exploiting Heterogeneity to Enhance Scalability, Computing in Science & Engineering, vol.15, issue.6, pp.36-45, 2013.
DOI : 10.1109/MCSE.2013.98

A. Kalinov and A. Lastovetsky, Heterogeneous Distribution of Computations Solving Linear Algebra Problems on Networks of Heterogeneous Computers, Journal of Parallel and Distributed Computing, vol.61, issue.4, pp.520-535, 2001.
DOI : 10.1006/jpdc.2000.1686

URL : http://hcl.ucd.ie/system/files/Papers/SolvinLinearAlgebra_2001.pdf

O. Beaumont, V. Boudet, F. Rastello, and Y. Robert, Partitioning a Square into Rectangles: NP-Completeness and Approximation Algorithms, Algorithmica, vol.34, issue.3, pp.217-239, 2002.
DOI : 10.1007/s00453-002-0962-9

URL : https://hal.archives-ouvertes.fr/hal-00807407

O. Beaumont, L. Eyraud-dubois, and T. Lambert, A New Approximation Algorithm for Matrix Partitioning in Presence of Strongly Heterogeneous Processors, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp.474-483, 2016.
DOI : 10.1109/IPDPS.2016.32

URL : https://hal.archives-ouvertes.fr/hal-01216245

V. Strassen, Gaussian Elimination is Not Optimal Numerische mathematik, pp.354-356, 1969.

G. Ballard, J. Demmel, O. Holtz, B. Lipshitz, and O. Schwartz, Communicationoptimal Parallel Algorithm for Strassen's Matrix Multiplication, Symposium on Parallelism in Algorithms and Architectures (SPAA), pp.193-204, 2012.

D. Coppersmith and S. Winograd, Matrix multiplication via arithmetic progressions, Journal of Symbolic Computation, vol.9, issue.3, pp.251-280, 1990.
DOI : 10.1016/S0747-7171(08)80013-2

J. Choi, J. Demmel, I. Dhillon, J. Dongarra, S. Ostrouchov et al., ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers: Design Issues and Performance, Computer Physics Communications, pp.1-15, 1996.
DOI : 10.1007/3-540-60902-4_12

URL : http://www.netlib.org/utk/papers/scalapack/paper.ps

E. Solomonik and J. Demmel, Communication-optimal Parallel 2.5 D Matrix Multiplication and LU factorization Algorithms, International European Conference on Parallel and Distributed Computing, pp.90-109, 2011.

B. Becker and A. Lastovetsky, Towards Data Partitioning for Parallel Computing on Three Interconnected Clusters, Sixth International Symposium on Parallel and Distributed Computing (ISPDC'07), pp.39-39, 2007.
DOI : 10.1109/ISPDC.2007.56

URL : http://researchrepository.ucd.ie/bitstream/10197/7328/1/Becker_Lastovetsky_2007.pdf

A. Deflumere, A. Lastovetsky, and B. Becker, Optimal Data Partitioning Shape for Matrix Multiplication on Three Fully Connected Heterogeneous Processors, International European Conference on Parallel and Distributed Computing, pp.201-214, 2014.
DOI : 10.1007/978-3-319-14325-5_18

H. Nagamochi and Y. Abe, An approximation algorithm for dissecting a rectangle into rectangles with specified areas, Discrete Applied Mathematics, vol.155, issue.4, pp.523-537, 2007.
DOI : 10.1016/j.dam.2006.08.005

URL : https://doi.org/10.1016/j.dam.2006.08.005

O. Beaumont, L. Eyraud-dubois, and T. Lambert, Cuboid Partitioning for Parallel Matrix Multiplication on Heterogeneous Platforms, International European Conference on Parallel and Distributed Computing, pp.171-182, 2016.
DOI : 10.1016/j.disc.2008.07.028

URL : https://hal.archives-ouvertes.fr/hal-01269881

O. Beaumont, L. Eyraud-dubois, A. Guermouche, and T. Lambert, Comparison of Static and Dynamic Resource Allocation Strategies for Matrix Multiplication, International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp.170-177, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01163936

A. Yarkhan, J. Kurzak, and J. Dongarra, Guide: QUeueing And Runtime for Kernels, 2011.

T. Cojean, A. Guermouche, A. Hugo, R. Namyst, and P. Wacrenier, Resource Aggregation for Task-Based Cholesky Factorization on Top of Heterogeneous Machines, International European Conference on Parallel and Distributed Computing, 2016.
DOI : 10.1109/IPDPS.2015.56

URL : https://hal.archives-ouvertes.fr/hal-01181135

L. Eyraud-dubois and T. Lambert, Matrix Matrix Multiplication using Static Algorithms on Multicores and GPUs Available: https: //gitlab.inria.fr/ordo-bdx/nrrp-with-starpu Available: https, Plateforme Fédérative pour la Recherche en Informatique et Mathématiques, 2009.

T. Lambert, On the Effect of Replication of Input Files on the Efficiency and the Robustness of a Set of Computations, 2017.
URL : https://hal.archives-ouvertes.fr/tel-01661588