LU Factorization with Partial Pivoting for a Multi-CPU, Multi-GPU Shared Memory System - Archive ouverte HAL Access content directly
Conference Papers Year : 2012

LU Factorization with Partial Pivoting for a Multi-CPU, Multi-GPU Shared Memory System

(1) , (1) , (1) , (1)
1

Abstract

LU factorization with partial pivoting is a canonical numerical procedure and the main component of the High Performance Linpack benchmark. This article presents an implementation of the algorithm for a hybrid, shared memory, system with standard CPU cores and GPU accelerators. The optimizations include lookahead, dynamic task scheduling, fine grain parallelism for memory-bound operations, autotuning, and data layout geared towards complex memory hierarchies. Performance in excess of one Tera flop/s is achieved using four AMD Magny Cours CPUs and four NVIDIA Fermi GPUs.
Fichier principal
Vignette du fichier
lawn266.pdf (414.33 Ko) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

hal-00809654 , version 1 (09-04-2013)

Identifiers

  • HAL Id : hal-00809654 , version 1

Cite

Jakub Kurzak, P. Luszczek, Mathieu Faverge, Jack J. Dongarra. LU Factorization with Partial Pivoting for a Multi-CPU, Multi-GPU Shared Memory System. VECPAR 2012 - 10th International Meeting on High-Performance Computing for Computational Science, Jul 2012, Kobe, Japan. ⟨hal-00809654⟩
75 View
225 Download

Share

Gmail Facebook Twitter LinkedIn More