LU Factorization with Partial Pivoting for a Multi-CPU, Multi-GPU Shared Memory System - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2012

LU Factorization with Partial Pivoting for a Multi-CPU, Multi-GPU Shared Memory System

Résumé

LU factorization with partial pivoting is a canonical numerical procedure and the main component of the High Performance Linpack benchmark. This article presents an implementation of the algorithm for a hybrid, shared memory, system with standard CPU cores and GPU accelerators. The optimizations include lookahead, dynamic task scheduling, fine grain parallelism for memory-bound operations, autotuning, and data layout geared towards complex memory hierarchies. Performance in excess of one Tera flop/s is achieved using four AMD Magny Cours CPUs and four NVIDIA Fermi GPUs.
Fichier principal
Vignette du fichier
lawn266.pdf (414.33 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00809654 , version 1 (09-04-2013)

Identifiants

  • HAL Id : hal-00809654 , version 1

Citer

Jakub Kurzak, P. Luszczek, Mathieu Faverge, Jack J. Dongarra. LU Factorization with Partial Pivoting for a Multi-CPU, Multi-GPU Shared Memory System. VECPAR 2012 - 10th International Meeting on High-Performance Computing for Computational Science, Jul 2012, Kobe, Japan. ⟨hal-00809654⟩
79 Consultations
251 Téléchargements

Partager

Gmail Facebook X LinkedIn More