A high-performance matrixmultiplication algorithm on a distributed-memory parallel computer, using overlapped communication, IBM Journal of Research and Development, vol.38, pp.673-682, 1994. ,
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience, vol.23, pp.187-198, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00384363
Dongarra. PaRSEC: Exploiting Heterogeneity to Enhance Scalability, IEEE Computing in Science Engineering, vol.15, issue.6, pp.36-45, 2013. ,
,
, The design and implementation of the ScaLAPACK LU, QR, and Cholesky factorization routines. Scientific Programming, vol.5, pp.173-184, 1996.
A proposal for a set of parallel basic linear algebra subprograms, Applied Parallel Computing Computations in Physics, pp.107-114, 1996. ,
, Distributed Parallel Linear Algebra Software for Multicore Architectures
, Elemental: C++ library for distributed-memory linear algebra and optimization
SLATE: Design of a Modern Distributed and Accelerated Linear Algebra Library, SC'2019, the IEEE/ACM Conference on High Performance Computing Networking, Storage and Analysis, 2019. ,
Anatomy of High-performance Matrix Multiplication, ACM Trans. Math. Software, vol.34, issue.3, 2008. ,
I/O complexity: the red-blue pebble game, STOC '81: Proceedings of the 13th ACM symposium on Theory of Computing, pp.326-333, 1981. ,
Communication lower bounds for distributed-memory matrix multiplication, J. Parallel Distributed Computing, vol.64, issue.9, pp.1017-1026, 2004. ,
Linear systems solvers for distributed-memory machines with gpu accelerators, Parallel Processing, pp.495-506, 2019. ,
Red-blue pebbling revisited: near optimal parallel matrix-matrix multiplication, 2019. ,
, Parallel Linear Algebra PACKage
Matrix product on heterogeneous master-worker platforms, PPoPP'2008, the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp.53-62, 2008. ,
URL : https://hal.archives-ouvertes.fr/hal-00803487
, Scalable Linear Algebra PACKage
Parallel matrix multiplication: A systematic journey, SIAM J. Scientific Computing, vol.38, issue.6, pp.748-781, 2016. ,
, Task-Based Environment for Scientific Simulation at Extreme Scale
A survey of out-of-core algorithms in numerical linear algebra, External Memory Algorithms and Visualization, pp.161-180, 1999. ,
Top 500 Supercomputer Sites, 2019. ,
SUMMA: Scalable Universal Matrix Multiplication Algorithm, 1995. ,
Roofline: an insightful visual performance model for multicore architectures, Comm. ACM, vol.52, pp.65-76, 2009. ,