Concurrent number cruncher - A GPU implementation of a general sparse linear solver

Luc Buatois 1 Guillaume Caumon 2 Bruno Lévy 1
1 ALICE - Geometry and Lighting
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : A wide class of numerical methods needs to solve a linear system, where the matrix pattern of non-zero coefficients can be arbitrary. These problems can greatly benefit from highly multithreaded computational power and large memory bandwidth available on GPUs, especially since dedicated general purpose APIs such as CTM (AMD-ATI) and CUDA (NVIDIA) have appeared. CUDA even provides a BLAS implementation, but only for dense matrices (CuBLAS). Other existing linear solvers for the GPU are also limited by their internal matrix representation. This paper describes how to combine recent GPU programming techniques and new GPU dedicated APIs with high performance computing strategies (namely block compressed row storage, register blocking and vectorization), to implement a sparse general-purpose linear solver. Our implementation of the Jacobi-preconditioned Conjugate Gradient algorithm outperforms by up to a factor of 6.0x leading-edge CPU counterparts, making it attractive for applications which are content with single precision.
Type de document :
Article dans une revue
International Journal of Parallel, Emergent and Distributed Systems, Taylor & Francis, 2008, 24 (3), pp.205-223. 〈10.1080/17445760802337010〉
Liste complète des métadonnées

Littérature citée [30 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00331906
Contributeur : Nicolas Ray <>
Soumis le : lundi 20 octobre 2008 - 09:35:10
Dernière modification le : jeudi 11 janvier 2018 - 06:26:03
Document(s) archivé(s) le : lundi 7 juin 2010 - 18:27:49

Fichier

Buatois_et_al_CNC.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Luc Buatois, Guillaume Caumon, Bruno Lévy. Concurrent number cruncher - A GPU implementation of a general sparse linear solver. International Journal of Parallel, Emergent and Distributed Systems, Taylor & Francis, 2008, 24 (3), pp.205-223. 〈10.1080/17445760802337010〉. 〈inria-00331906〉

Partager

Métriques

Consultations de la notice

354

Téléchargements de fichiers

502