Locality Optimization on a NUMA Architecture for Hybrid LU Factorization

Abstract : We study the impact of non-uniform memory accesses (NUMA) on the solution of dense general linear systems using an LU factorization algorithm. In particular we illustrate how an appropriate placement of the threads and memory on a NUMA architecture can improve the performance of the panel factorization and consequently accelerate the global LU factorization. We apply these placement strategies and present performance results for a hybrid multicore/GPU LU algorithm as it is implemented in the public domain library MAGMA.
Type de document :
Chapitre d'ouvrage
Parallel Computing: Accelerating Computational Science and Engineering, 25, pp.153-162, 2014, Advances in Parallel Computing, 〈10.3233/978-1-61499-381-0-153〉
Liste complète des métadonnées

https://hal.inria.fr/hal-00987284
Contributeur : Marc Baboulin <>
Soumis le : lundi 5 mai 2014 - 19:17:53
Dernière modification le : jeudi 5 avril 2018 - 12:30:24

Identifiants

Collections

Citation

Adrien Rémy, Marc Baboulin, Masha Sosonkina, Brigitte Rozoy. Locality Optimization on a NUMA Architecture for Hybrid LU Factorization. Parallel Computing: Accelerating Computational Science and Engineering, 25, pp.153-162, 2014, Advances in Parallel Computing, 〈10.3233/978-1-61499-381-0-153〉. 〈hal-00987284〉

Partager

Métriques

Consultations de la notice

168