Exploiting Kepler architecture in sparse direct solver with runtime systems

Abstract : Sparse direct solvers is a time consuming operation required by many scientific applications to simulate physical problems. By its important overall cost, many studies tried to optimize the time to solution of those solvers on multi-core and distributed architectures. More recently, many works have addressed heterogeneous architectures to exploit accelerators such as GPUs or Intel Xeon Phi with interesting speedup. Despite researches towards generic solutions to efficiently exploit those accelerators, their hardware evolution requires continual adaptation of the kernels running on those architectures. The recent Nvidia architectures, as Kepler, present a larger number of parallel units thus requiring more data to feed every computational units. A solution considered to supply enough computation has been to study problems with large number of small computations. The batched BLAS libraries proposed by Intel, Nvidia, or the University of Tennessee are examples of this solution. We discuss in this talk the use of the variable size batched matrix-matrix multiply to improve the performance of a the PaStiX sparse direct solver. Indeed, this kernel suits the supernodal method of the solver, and the multiple updates of variable sizes that occur during the numerical factorization. Performance results on a spectrum of matrices with different properties will be presented.
Type de document :
Communication dans un congrès
9th International Workshop on Parallel Matrix Algorithms and Applications (PMAA'2016), Jul 2016, Bordeaux, France. 〈https://pmaa16.sciencesconf.org/〉
Liste complète des métadonnées

https://hal.inria.fr/hal-01421372
Contributeur : Pierre Ramet <>
Soumis le : jeudi 22 décembre 2016 - 10:27:21
Dernière modification le : jeudi 11 janvier 2018 - 06:22:35

Identifiants

  • HAL Id : hal-01421372, version 1

Collections

Citation

Mathieu Faverge, Grégoire Pichon, Pierre Ramet. Exploiting Kepler architecture in sparse direct solver with runtime systems. 9th International Workshop on Parallel Matrix Algorithms and Applications (PMAA'2016), Jul 2016, Bordeaux, France. 〈https://pmaa16.sciencesconf.org/〉. 〈hal-01421372〉

Partager

Métriques

Consultations de la notice

171