StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures, Concurrency and Computation: Practice and Experience, pp.187-198, 2011. ,
DOI : 10.1007/978-3-642-03869-3_80
URL : https://hal.archives-ouvertes.fr/inria-00384363
Hierarchical Task-Based Programming With StarSs, The International Journal of High Performance Computing Applications, vol.83, issue.12, pp.284-299, 2009. ,
DOI : 10.1109/5.476078
PaRSEC: Exploiting Heterogeneity to Enhance Scalability, Computing in Science & Engineering, vol.15, issue.6, pp.36-45, 2013. ,
DOI : 10.1109/MCSE.2013.98
Heterogeneous Distribution of Computations Solving Linear Algebra Problems on Networks of Heterogeneous Computers, Journal of Parallel and Distributed Computing, vol.61, issue.4, pp.520-535, 2001. ,
DOI : 10.1006/jpdc.2000.1686
Partitioning a Square into Rectangles: NP-Completeness and Approximation Algorithms, Algorithmica, vol.34, issue.3, pp.217-239, 2002. ,
DOI : 10.1007/s00453-002-0962-9
URL : https://hal.archives-ouvertes.fr/hal-00807407
A New Approximation Algorithm for Matrix Partitioning in Presence of Strongly Heterogeneous Processors, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp.474-483, 2016. ,
DOI : 10.1109/IPDPS.2016.32
URL : https://hal.archives-ouvertes.fr/hal-01216245
Gaussian Elimination is Not Optimal Numerische mathematik, pp.354-356, 1969. ,
DOI : 10.1007/bf02165411
Communication-optimal parallel algorithm for strassen's matrix multiplication, Proceedinbgs of the 24th ACM symposium on Parallelism in algorithms and architectures, SPAA '12, pp.193-204, 2012. ,
DOI : 10.1145/2312005.2312044
Matrix multiplication via arithmetic progressions, Journal of Symbolic Computation, vol.9, issue.3, pp.251-280, 1990. ,
DOI : 10.1016/S0747-7171(08)80013-2
ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers: Design Issues and Performance, Computer Physics Communications, pp.1-15, 1996. ,
DOI : 10.1016/0010-4655(96)00017-3
Communication-optimal Parallel 2.5 D Matrix Multiplication and LU factorization Algorithms, International European Conference on Parallel and Distributed Computing, pp.90-109, 2011. ,
Towards Data Partitioning for Parallel Computing on Three Interconnected Clusters, Sixth International Symposium on Parallel and Distributed Computing (ISPDC'07), pp.39-39, 2007. ,
DOI : 10.1109/ISPDC.2007.56
URL : http://hcl.ucd.ie/system/files/Papers/1183978301367.pdf
Optimal Data Partitioning Shape for Matrix Multiplication on Three Fully Connected Heterogeneous Processors, International European Conference on Parallel and Distributed Computing, pp.201-214, 2014. ,
DOI : 10.1007/978-3-319-14325-5_18
An approximation algorithm for dissecting a rectangle into rectangles with specified areas, Discrete Applied Mathematics, vol.155, issue.4, pp.523-537, 2007. ,
DOI : 10.1016/j.dam.2006.08.005
Cuboid Partitioning for Parallel Matrix Multiplication on Heterogeneous Platforms, International European Conference on Parallel and Distributed Computing, pp.171-182, 2016. ,
DOI : 10.1016/j.disc.2008.07.028
URL : https://hal.archives-ouvertes.fr/hal-01269881
Comparison of Static and Dynamic Resource Allocation Strategies for Matrix Multiplication, International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp.170-177, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01163936
Guide: QUeueing And Runtime for Kernels, 2011. ,
Resource Aggregation for Task-Based Cholesky Factorization on Top of Heterogeneous Machines, International European Conference on Parallel and Distributed Computing, 2016. ,
DOI : 10.1109/IPDPS.2015.56
URL : https://hal.archives-ouvertes.fr/hal-01181135
Matrix Matrix Multiplication using Static Algorithms on Multicores and GPUs Available: https://gitlab.inria.fr/ordo-bdx/nrrp-with-starpu Available: https, Plateforme Fédérative pour la Recherche en Informatique et Mathématiques, 2009. ,