Skip to Main content Skip to Navigation
Conference papers

LU Factorization with Partial Pivoting for a Multi-CPU, Multi-GPU Shared Memory System

Abstract : LU factorization with partial pivoting is a canonical numerical procedure and the main component of the High Performance Linpack benchmark. This article presents an implementation of the algorithm for a hybrid, shared memory, system with standard CPU cores and GPU accelerators. The optimizations include lookahead, dynamic task scheduling, fine grain parallelism for memory-bound operations, autotuning, and data layout geared towards complex memory hierarchies. Performance in excess of one Tera flop/s is achieved using four AMD Magny Cours CPUs and four NVIDIA Fermi GPUs.
Complete list of metadatas

Cited literature [6 references]  Display  Hide  Download

https://hal.inria.fr/hal-00809654
Contributor : Mathieu Faverge <>
Submitted on : Tuesday, April 9, 2013 - 4:16:04 PM
Last modification on : Thursday, December 13, 2018 - 6:48:07 PM
Document(s) archivé(s) le : Monday, April 3, 2017 - 2:47:56 AM

File

lawn266.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00809654, version 1

Citation

Jakub Kurzak, P. Luszczek, Mathieu Faverge, Jack J. Dongarra. LU Factorization with Partial Pivoting for a Multi-CPU, Multi-GPU Shared Memory System. VECPAR 2012 - 10th International Meeting on High-Performance Computing for Computational Science, Jul 2012, Kobe, Japan. ⟨hal-00809654⟩

Share

Metrics

Record views

139

Files downloads

399