LU Factorization with Partial Pivoting for a Multi-CPU, Multi-GPU Shared Memory System

Abstract : LU factorization with partial pivoting is a canonical numerical procedure and the main component of the High Performance Linpack benchmark. This article presents an implementation of the algorithm for a hybrid, shared memory, system with standard CPU cores and GPU accelerators. The optimizations include lookahead, dynamic task scheduling, fine grain parallelism for memory-bound operations, autotuning, and data layout geared towards complex memory hierarchies. Performance in excess of one Tera flop/s is achieved using four AMD Magny Cours CPUs and four NVIDIA Fermi GPUs.
Type de document :
Communication dans un congrès
VECPAR 2012 - 10th International Meeting on High-Performance Computing for Computational Science, Jul 2012, Kobe, Japan. 2012
Liste complète des métadonnées

Littérature citée [6 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-00809654
Contributeur : Mathieu Faverge <>
Soumis le : mardi 9 avril 2013 - 16:16:04
Dernière modification le : mercredi 10 avril 2013 - 09:16:20
Document(s) archivé(s) le : lundi 3 avril 2017 - 02:47:56

Fichier

lawn266.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00809654, version 1

Citation

Jakub Kurzak, P. Luszczek, Mathieu Faverge, Jack J. Dongarra. LU Factorization with Partial Pivoting for a Multi-CPU, Multi-GPU Shared Memory System. VECPAR 2012 - 10th International Meeting on High-Performance Computing for Computational Science, Jul 2012, Kobe, Japan. 2012. 〈hal-00809654〉

Partager

Métriques

Consultations de la notice

83

Téléchargements de fichiers

172