C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier, StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures, Concurrency and Computation: Practice and Experience, pp.187-198, 2011.
DOI : 10.1007/978-3-642-03869-3_80
URL : https://hal.archives-ouvertes.fr/inria-00384363

J. Planas, R. M. Badia, E. Ayguadé, and J. Labarta, Hierarchical Task-Based Programming With StarSs, The International Journal of High Performance Computing Applications, vol.83, issue.12, pp.284-299, 2009.
DOI : 10.1109/5.476078

G. Bosilca, A. Bouteiller, A. Danalis, M. Faverge, T. Hérault et al., PaRSEC: Exploiting Heterogeneity to Enhance Scalability, Computing in Science & Engineering, vol.15, issue.6, pp.36-45, 2013.
DOI : 10.1109/MCSE.2013.98

A. Kalinov and A. Lastovetsky, Heterogeneous Distribution of Computations Solving Linear Algebra Problems on Networks of Heterogeneous Computers, Journal of Parallel and Distributed Computing, vol.61, issue.4, pp.520-535, 2001.
DOI : 10.1006/jpdc.2000.1686

O. Beaumont, V. Boudet, F. Rastello, and Y. Robert, Partitioning a Square into Rectangles: NP-Completeness and Approximation Algorithms, Algorithmica, vol.34, issue.3, pp.217-239, 2002.
DOI : 10.1007/s00453-002-0962-9
URL : https://hal.archives-ouvertes.fr/hal-00807407

O. Beaumont, L. Eyraud-dubois, and T. Lambert, A New Approximation Algorithm for Matrix Partitioning in Presence of Strongly Heterogeneous Processors, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp.474-483, 2016.
DOI : 10.1109/IPDPS.2016.32
URL : https://hal.archives-ouvertes.fr/hal-01216245

V. Strassen, Gaussian Elimination is Not Optimal Numerische mathematik, pp.354-356, 1969.
DOI : 10.1007/bf02165411

G. Ballard, J. Demmel, O. Holtz, B. Lipshitz, and O. Schwartz, Communication-optimal parallel algorithm for strassen's matrix multiplication, Proceedinbgs of the 24th ACM symposium on Parallelism in algorithms and architectures, SPAA '12, pp.193-204, 2012.
DOI : 10.1145/2312005.2312044

D. Coppersmith and S. Winograd, Matrix multiplication via arithmetic progressions, Journal of Symbolic Computation, vol.9, issue.3, pp.251-280, 1990.
DOI : 10.1016/S0747-7171(08)80013-2

J. Choi, J. Demmel, I. Dhillon, J. Dongarra, S. Ostrouchov et al., ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers: Design Issues and Performance, Computer Physics Communications, pp.1-15, 1996.
DOI : 10.1016/0010-4655(96)00017-3

E. Solomonik and J. Demmel, Communication-optimal Parallel 2.5 D Matrix Multiplication and LU factorization Algorithms, International European Conference on Parallel and Distributed Computing, pp.90-109, 2011.

B. Becker and A. Lastovetsky, Towards Data Partitioning for Parallel Computing on Three Interconnected Clusters, Sixth International Symposium on Parallel and Distributed Computing (ISPDC'07), pp.39-39, 2007.
DOI : 10.1109/ISPDC.2007.56
URL : http://hcl.ucd.ie/system/files/Papers/1183978301367.pdf

A. Deflumere, A. Lastovetsky, and B. Becker, Optimal Data Partitioning Shape for Matrix Multiplication on Three Fully Connected Heterogeneous Processors, International European Conference on Parallel and Distributed Computing, pp.201-214, 2014.
DOI : 10.1007/978-3-319-14325-5_18

H. Nagamochi and Y. Abe, An approximation algorithm for dissecting a rectangle into rectangles with specified areas, Discrete Applied Mathematics, vol.155, issue.4, pp.523-537, 2007.
DOI : 10.1016/j.dam.2006.08.005

O. Beaumont, L. Eyraud-dubois, and T. Lambert, Cuboid Partitioning for Parallel Matrix Multiplication on Heterogeneous Platforms, International European Conference on Parallel and Distributed Computing, pp.171-182, 2016.
DOI : 10.1016/j.disc.2008.07.028
URL : https://hal.archives-ouvertes.fr/hal-01269881

O. Beaumont, L. Eyraud-dubois, A. Guermouche, and T. Lambert, Comparison of Static and Dynamic Resource Allocation Strategies for Matrix Multiplication, International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp.170-177, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01163936

A. Yarkhan, J. Kurzak, and J. Dongarra, Guide: QUeueing And Runtime for Kernels, 2011.

T. Cojean, A. Guermouche, A. Hugo, R. Namyst, and P. Wacrenier, Resource Aggregation for Task-Based Cholesky Factorization on Top of Heterogeneous Machines, International European Conference on Parallel and Distributed Computing, 2016.
DOI : 10.1109/IPDPS.2015.56
URL : https://hal.archives-ouvertes.fr/hal-01181135

L. Eyraud-dubois and T. Lambert, Matrix Matrix Multiplication using Static Algorithms on Multicores and GPUs Available: https://gitlab.inria.fr/ordo-bdx/nrrp-with-starpu Available: https, Plateforme Fédérative pour la Recherche en Informatique et Mathématiques, 2009.