B. A. Becker and A. Lastovetsky, Matrix Multiplication on Two Interconnected Processors, 2006 IEEE International Conference on Cluster Computing, pp.1-9, 2006.
DOI : 10.1109/CLUSTR.2006.311901

A. Deflumere, A. Lastovetsky, and B. A. Becker, Partitioning for Parallel Matrix-Matrix Multiplication with Heterogeneous Processors: The Optimal Solution, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum, pp.125-139, 2012.
DOI : 10.1109/IPDPSW.2012.12

A. Deflumere, Optimal partitioning for parallel matrix computation on a small number of abstract heterogeneous processors, p.9, 2014.

O. Beaumont, L. Eyraud-dubois, and T. Lambert, A New Approximation Algorithm for Matrix Partitioning in Presence of Strongly Heterogeneous Processors, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2016.
DOI : 10.1109/IPDPS.2016.32

URL : https://hal.archives-ouvertes.fr/hal-01216245

H. Nagamochi and Y. Abe, An approximation algorithm for dissecting a rectangle into rectangles with specified areas, Discrete Applied Mathematics, vol.155, issue.4, pp.523-537, 2007.
DOI : 10.1016/j.dam.2006.08.005

A. Fügenschuh, K. Junosza-szaniawski, and Z. Lonc, Exact and approximation algorithms for a soft rectangle packing problem, Optimization, vol.63, issue.11, pp.1637-1663, 2014.
DOI : 10.1109/43.920707

A. Kalinov and A. Lastovetsky, Heterogeneous Distribution of Computations Solving Linear Algebra Problems on Networks of Heterogeneous Computers, Journal of Parallel and Distributed Computing, vol.61, issue.4, pp.520-535, 2001.
DOI : 10.1006/jpdc.2000.1686

O. Beaumont, V. Boudet, F. Rastello, and Y. Robert, Partitioning a Square into Rectangles: NP-Completeness and Approximation Algorithms, Algorithmica, vol.34, issue.3, pp.217-239, 2002.
DOI : 10.1007/s00453-002-0962-9

URL : https://hal.archives-ouvertes.fr/hal-00807407

B. A. Becker, High-level data partitioning for parallel computing on heterogeneous hierarchical computational platforms, 2010.

B. A. Becker and A. Lastovetsky, Max-Plus Algebra and Discrete Event Simulation on Parallel Hierarchical Heterogeneous Platforms, European Conference on Parallel Processing, pp.63-70, 2010.
DOI : 10.1002/9780470508206

URL : https://hal.archives-ouvertes.fr/hal-00690368

A. Deflumere and A. Lastovetsky, Searching for the Optimal Data Partitioning Shape for Parallel Matrix Matrix Multiplication on 3 Heterogeneous Processors, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, pp.17-28, 2014.
DOI : 10.1109/IPDPSW.2014.8

O. Beaumont, A. Legrand, F. Rastello, Y. Robert-]-o, V. Beaumont et al., Static LU Decomposition on Heterogeneous Platforms, The International Journal of High Performance Computing Applications, vol.36, issue.2, pp.310-323, 2001.
DOI : 10.1006/jpdc.1996.0092

URL : https://hal.archives-ouvertes.fr/hal-00856641

L. S. Blackford, J. Choi, A. Cleary, E. D. Azevedo, J. Demmel et al., ScaLAPACK users' guide. Siam, 1997.
DOI : 10.1137/1.9780898719642

D. Clarke, A. Ilic, A. Lastovetsky, and L. Sousa, Hierarchical Partitioning Algorithm for Scientific Computing on Highly Heterogeneous CPU + GPU Clusters, European Conference on Parallel Processing, pp.489-501, 2012.
DOI : 10.1007/978-3-642-32820-6_49

R. Shams and P. Sadeghi, On optimization of finite-difference time-domain (FDTD) computation on heterogeneous and GPU clusters, Journal of Parallel and Distributed Computing, vol.71, issue.4, pp.584-593, 2011.
DOI : 10.1016/j.jpdc.2010.10.011

N. Mohamed, J. , and H. Jiang, DDOps: dual-direction operations for load balancing on non-dedicated heterogeneous distributed systems, Cluster Computing, vol.16, issue.1, pp.503-528, 2014.
DOI : 10.1007/s10586-011-0171-x

C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier, Starpu: a unified platform for task scheduling on heterogeneous multicore architectures, Concurrency and Computation: Practice and Experience, pp.187-198, 2011.
DOI : 10.1007/978-3-642-03869-3_80

URL : https://hal.archives-ouvertes.fr/inria-00550877

J. Planas, R. M. Badia, E. Ayguadé, and J. Labarta, Hierarchical Task-Based Programming With StarSs, The International Journal of High Performance Computing Applications, vol.83, issue.12, pp.284-299, 2009.
DOI : 10.1109/5.476078

A. Yarkhan, J. Kurzak, and J. Dongarra, Quark users' guide: Queueing and runtime for kernels, 2011.

G. Bosilca, A. Bouteiller, A. Danalis, M. Faverge, T. Hérault et al., PaRSEC: Exploiting Heterogeneity to Enhance Scalability, Computing in Science & Engineering, vol.15, issue.6, pp.36-45, 2013.
DOI : 10.1109/MCSE.2013.98

O. Beaumont, L. Eyraud-dubois, A. Guermouche, and T. Lambert, Comparison of static and dynamic resource allocation strategies for matrix multiplication, 26th IEEE International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), 2015.
URL : https://hal.archives-ouvertes.fr/hal-01163936

G. Ballard, J. Demmel, O. Holtz, and O. Schwartz, Minimizing Communication in Numerical Linear Algebra, SIAM Journal on Matrix Analysis and Applications, vol.32, issue.3, pp.866-901, 2011.
DOI : 10.1137/090769156

URL : http://www.eecs.berkeley.edu/Pubs/TechRpts/2011/EECS-2011-15.pdf