Tensorflow: A system for large-scale machine learning, 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16), pp.265-283, 2016. ,
Harnessing Supercomputers with a Sequential Task-based Runtime System, 2014. ,
Fast and Flexible Coupled Cluster Implementation, J. Chem. Theory Comput, vol.9, issue.8, pp.3385-3392, 2013. ,
StarPU: A unified platform for task scheduling on heterogeneous multicore architectures, Conc. Comp. Pract. Exper, vol.23, pp.187-198, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00384363
Legion: Expressing locality and independence with logical regions, International Conference for High Performance Computing, Networking, Storage and Analysis, 2012. ,
Porting of the DBCSR library for sparse matrix-matrix multiplications to intel xeon phi systems, Advances in Parallel Computing, pp.47-56, 2018. ,
Sparse matrix multiplication: The distributed block-compressed sparse row library, Parallel Computing, vol.40, issue.5-6, pp.47-58, 2014. ,
Exploiting Parallelism on GPUs and FPGAs with OmpSs, Proceedings of the 1st Workshop on AutotuniNg and ADaptivity AppRoaches for Energy Efficient HPC Systems, ANDARE '17, 2017. ,
PaRSEC: Exploiting Heterogeneity to Enhance Scalability, IEEE Computing in Science Engineering, vol.15, issue.6, pp.36-45, 2013. ,
Scalable task-based algorithm for multiplication of block-rank-sparse matrices, IA3 '15, pp.1-8, 2015. ,
A dense linear algebra software for heterogeneous architectures, 2020. ,
, Open source molecular dynamics, 2020.
PTG: an abstraction for unhindered parallelism, Proceedings of the Fourth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing, WOLFHPC '14, pp.21-30, 2014. ,
SuiteSparse : a suite of sparse matrix software, 2020. ,
Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication, 2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2013. ,
, Distributed Parallel Linear Algebra Software for Multicore Architectures
A Proposal to Extend the OpenMP Tasking Model with Dependent Tasks, Intl. Journal of Parallel Programming, vol.37, issue.3, pp.292-305, 2009. ,
, Elemental: C++ library for distributed-memory linear algebra and optimization
SLATE: Design of a Modern Distributed and Accelerated Linear Algebra Library, SC'2019, 2019. ,
Anatomy of High-performance Matrix Multiplication, ACM Trans. Math. Software, vol.34, issue.3, 2008. ,
, Gpu kernels for block-sparse weights, p.3, 2017.
Generic matrix multiplication for multi-GPU accelerated distributed-memory platforms over PaRSEC, 10th IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, pp.33-41, 2019. ,
URL : https://hal.archives-ouvertes.fr/hal-02282529
Distributed-memory multi-GPU block-sparse tensor contraction for electronic structure: software artifact, 2020. ,
I/O complexity: the red-blue pebble game, STOC '81: Proceedings of the 13th ACM symposium on Theory of Computing, pp.326-333, 1981. ,
Communication lower bounds for distributed-memory matrix multiplication, J. Parallel Distributed Computing, vol.64, issue.9, pp.1017-1026, 2004. ,
A direct coupled cluster algorithm for massively parallel computers, Chem. Phys. Lett, vol.265, issue.1-2, pp.1-11, 1997. ,
Linear systems solvers for distributed-memory machines with gpu accelerators, Euro-Par, pp.495-506, 2019. ,
Red-blue pebbling revisited: near optimal parallel matrixmatrix multiplication, 2019. ,
Clustered Low-Rank Tensor Format: Introduction and Application to Fast Construction of Hartree-Fock Exchange, J. Chem. Theory Comput, vol.12, issue.12, pp.5868-5880, 2016. ,
Optimizing tensor contraction expressions for hybrid CPU-GPU execution, Clust. Comput, vol.16, issue.1, pp.131-155, 2013. ,
, , 2013.
, Parallel Linear Algebra PACKage
Coupled-Cluster Singles, Doubles and Perturbative Triples with Density Fitting Approximation for Massively Parallel Heterogeneous Platforms, Int. J. Quant. Chem, vol.12, issue.119, p.25894, 2019. ,
Massively Parallel Implementation of Explicitly Correlated Coupled-Cluster Singles and Doubles Using TiledArray Framework, J. Phys. Chem. A, vol.120, issue.51, pp.10231-10244, 2016. ,
, The Massively Parallel Quantum Chemistry Program (MPQC), 2018.
Matrix product on heterogeneous master-worker platforms, ACM SIGPLAN PPoPP, pp.53-62, 2008. ,
URL : https://hal.archives-ouvertes.fr/hal-00803487
Sparse maps-A systematic infrastructure for reduced-scaling electronic structure methods. II. Linear scaling domain based pair natural orbital coupled cluster theory, J Chem Phys, vol.144, issue.2, 2016. ,
Locality-aware parallel block-sparse matrix-matrix multiplication using the chunks and tasks programming model, Parallel Computing, vol.57, pp.87-106, 2016. ,
A hierarchic sparse matrix data structure for large-scale hartree-fock/kohn-sham calculations, J. Computational Chemistry, vol.28, issue.16, pp.2531-2537, 2007. ,
, Scalable Linear Algebra PACKage
Parallel matrix multiplication: A systematic journey, SIAM J. Scientific Computing, vol.38, issue.6, pp.748-781, 2016. ,
GPU-Accelerated Sparse Matrix-Matrix Multiplication for Linear Scaling Density Functional Theory, pp.173-190, 2016. ,
, Many-Body Methods in Chemistry and Physics: MBPT and Coupled-Cluster Theory. Cambridge Molecular Science, 2009.
DBCSR: A blocked sparse tensor algebra library, Parallel Computing: Technology Trends, Proceedings of the International Conference on Parallel Computing, vol.36, pp.331-340, 2019. ,
A massively parallel tensor contraction framework for coupled-cluster computations, Journal of Parallel and Distributed Computing, vol.74, issue.12, pp.3176-3190, 2014. ,
A survey of out-of-core algorithms in numerical linear algebra, External Memory Algorithms and Visualization, pp.161-180, 1999. ,
Top 500 Supercomputer Sites, 2019. ,
Realm: Performance Portability through Composable Asynchrony, 2014. ,
SUMMA: scalable universal matrix multiplication algorithm, Concurrency: Practice and Experience, vol.9, issue.4, pp.255-274, 1997. ,