M. Anderson, G. Ballard, J. Demmel, and K. Keutzer, Communication-Avoiding QR Decomposition for GPUs, 2011 IEEE International Parallel & Distributed Processing Symposium, pp.48-58, 2011.
DOI : 10.1109/IPDPS.2011.15

C. Augonnet, S. Thibault, R. Namyst, and P. A. Wacrenier, StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures. Concurrency and Computation: Practice and Experience, Special Issue: Euro-Par, pp.187-198, 2009.
URL : https://hal.archives-ouvertes.fr/inria-00384363

G. Ballard, J. Demmel, O. Holtz, B. Lipshitz, and O. Schwartz, Communicationoptimal Parallel Algorithm for Strassen's Matrix Multiplication, Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures, pp.193-204, 2012.

G. Ballard, J. Demmel, O. Holtz, and O. Schwartz, Minimizing Communication in Numerical Linear Algebra, SIAM Journal on Matrix Analysis and Applications, vol.32, issue.3, pp.866-901, 2011.
DOI : 10.1137/090769156

O. Beaumont, V. Boudet, F. Rastello, and Y. Robert, Matrix multiplication on heterogeneous platforms, IEEE Transactions on Parallel and Distributed Systems, vol.12, issue.10, pp.1033-1051, 2001.
DOI : 10.1109/71.963416

URL : https://hal.archives-ouvertes.fr/hal-00808288

O. Beaumont, V. Boudet, F. Rastello, and Y. Robert, Partitioning a Square into Rectangles: NP-Completeness and Approximation Algorithms, Algorithmica, vol.34, issue.3, pp.217-239, 2002.
DOI : 10.1007/s00453-002-0962-9

URL : https://hal.archives-ouvertes.fr/hal-00807407

O. Beaumont, L. Eyraud-dubois, A. Guermouche, and T. Lambert, Comparison of Static and Dynamic Resource Allocation Strategies for Matrix Multiplication, Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp.1-10, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01163936

O. Beaumont, L. Eyraud-dubois, and T. Lambert, A New Approximation Algorithm for Matrix Partitioning in Presence of Strongly Heterogeneous Processors, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2016.
DOI : 10.1109/IPDPS.2016.32

URL : https://hal.archives-ouvertes.fr/hal-01216245

B. Becker and A. Lastovetsky, Towards Data Partitioning for Parallel Computing on Three Interconnected Clusters, Sixth International Symposium on Parallel and Distributed Computing (ISPDC'07), pp.39-39, 2007.
DOI : 10.1109/ISPDC.2007.56

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

G. Bosilca, A. Bouteiller, A. Danalis, M. Faverge, T. Hérault et al., PaRSEC: Exploiting Heterogeneity to Enhance Scalability, Computing in Science & Engineering, vol.15, issue.6, pp.36-45, 2013.
DOI : 10.1109/MCSE.2013.98

J. Choi, J. Demmel, I. Dhillon, J. Dongarra, S. Ostrouchov et al., ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers: Design Issues and Performance, In: APCC in Physics Chemistry and Engineering Science, pp.95-106, 1995.
DOI : 10.1007/3-540-60902-4_12

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

D. Clarke, A. Ilic, A. Lastovetsky, and L. Sousa, Hierarchical Partitioning Algorithm for Ccientific Computing on Highly Heterogeneous CPU + GPU Clusters, Euro- Par 2012 Parallel Processing, pp.489-501, 2012.
DOI : 10.1007/978-3-642-32820-6_49

A. Deflumere, A. Lastovetsky, and B. Becker, Optimal Data Partitioning Shape for Matrix Multiplication on Three Fully Connected Heterogeneous Processors, Euro-Par 2014: Parallel Processing Workshops, pp.201-214, 2014.
DOI : 10.1007/978-3-319-14325-5_18

A. Fügenschuh, K. Junosza-szaniawski, and Z. Lonc, Exact and approximation algorithms for a soft rectangle packing problem, Optimization, vol.63, issue.11, pp.1637-1663, 2014.
DOI : 10.1109/43.920707

M. Hoemmen, Communication-avoiding Krylov Subspace Methods, 2010.

A. Kalinov and A. Lastovetsky, Heterogeneous Distribution of Computations Solving Linear Algebra Problems on Networks of Heterogeneous Computers, Journal of Parallel and Distributed Computing, vol.61, issue.4, pp.520-535, 2001.
DOI : 10.1006/jpdc.2000.1686

N. Mohamed, J. Al-jaroodi, and H. Jiang, DDOps: dual-direction operations for load balancing on non-dedicated heterogeneous distributed systems, Cluster Computing, vol.16, issue.1, pp.503-528, 2014.
DOI : 10.1007/s10586-013-0294-3

H. Nagamochi and Y. Abe, An approximation algorithm for dissecting a rectangle into rectangles with specified areas, Discrete Applied Mathematics, vol.155, issue.4, pp.523-537, 2007.
DOI : 10.1016/j.dam.2006.08.005

URL : http://doi.org/10.1016/j.dam.2006.08.005

J. Planas, R. M. Badia, E. Ayguadé, and J. Labarta, Hierarchical Task-Based Programming With StarSs, International Journal of High Performance Computing Applications, vol.23, issue.3, pp.284-299, 2009.
DOI : 10.1177/1094342009106195

URL : http://hdl.handle.net/2117/28379

R. Shams and P. Sadeghi, On optimization of finite-difference time-domain (FDTD) computation on heterogeneous and GPU clusters, Journal of Parallel and Distributed Computing, vol.71, issue.4, pp.584-593, 2011.
DOI : 10.1016/j.jpdc.2010.10.011

E. Solomonik and J. Demmel, Communication-optimal Parallel 2.5 D Matrix Multiplication and LU factorization Algorithms, Euro-Par 2011 Parallel Processing, pp.90-109, 2011.

M. Walters, Rectangles as sums of squares, Discrete Mathematics, vol.309, issue.9, pp.2913-2921, 2009.
DOI : 10.1016/j.disc.2008.07.028