Dynamically scheduled Cholesky factorization on multicore architectures with GPU accelerators.

Although the hardware has dramatically changed in the last few years, nodes of multicore chips augmented by Graphics Processing Units (GPUs) seem to be a trend of major importance. Previous approaches for scheduling dense linear operations on such a complex node led to high performance but at the double cost of not using the potential of all the cores and producing a static and non generic code. In this extended abstract, we present a new approach for scheduling dense linear algebra operations on multicore architectures with GPU accelerators using a dynamic scheduler capable of using the full potential of the node [1]. We underline the benefits both in terms of programmability and performance. We illustrate our approach with a Cholesky factorization relying on cutting edge GPU and CPU kernels [2], [3] achieving roughly 900 Gflop/s on an eight cores node accelerated with three NVIDIA Tesla GPUs.

Domains

Distributed, Parallel, and Cluster Computing [cs.DC]

Fichier principal

saahpc.pdf (64.47 Ko)

Origin : Files produced by the author(s)

Samuel Thibault : Connect in order to contact the contributor

https://inria.hal.science/inria-00547616

Submitted on : Thursday, December 16, 2010-7:09:19 PM

Last modification on : Wednesday, March 20, 2024-5:52:16 PM

Long-term archiving on: Monday, November 5, 2012-2:30:26 PM

Dates and versions

inria-00547616 , version 1 (16-12-2010)

Identifiers

HAL Id : inria-00547616 , version 1

Cite

Emmanuel Agullo, Cédric Augonnet, Jack Dongarra, Hatem Ltaief, Raymond Namyst, et al.. Dynamically scheduled Cholesky factorization on multicore architectures with GPU accelerators.. Symposium on Application Accelerators in High Performance Computing (SAAHPC), Jul 2010, Knoxville, United States. ⟨inria-00547616⟩

Export

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA LABRI INRIA2

488 View

300 Download