Pipelining the CG Solver Over a Runtime System

Whereas most today parallel High Performance Computing (HPC) software is written as highly tuned code taking care of low-level details, the advent of the manycore area forces the community to consider modular programming paradigms and delegate part of the work to a third party software. That latter approach has been shown to be very productive and efficient with regular algorithms, such as dense linear algebra solvers. In this paper we show that such a model can be efficiently applied to a much more irregular and less compute intensive algorithm. We illustrate our discussion with the standard unpreconditioned Conjugate Gradient (CG) that we carefully express as a task-based algorithm. We use the StarPU runtime system to assess the efficiency of the approach on a computational platform consisting of three NVIDIA Fermi GPUs. We show that almost optimum speed up (up to 2.89) may be reached (relatively to a mono-GPU execution) when processing large matrices and that the performance is portable when changing the low-level memory transfer mechanism.

Domaines

Calcul parallèle, distribué et partagé [cs.DC]

Stojce Nakov : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00934948

Soumis le : mercredi 22 janvier 2014-18:51:22

Dernière modification le : mercredi 20 mars 2024-17:52:16

Dates et versions

hal-00934948 , version 1 (22-01-2014)

Identifiants

HAL Id : hal-00934948 , version 1

Citer

Emmanuel Agullo, Luc Giraud, Abdou Guermouche, Stojce Nakov, Jean Roman. Pipelining the CG Solver Over a Runtime System. GPU Technology Conference, NVIIDA, Mar 2013, San Jose, United States. ⟨hal-00934948⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 CNRS INRIA IRISA INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC PLAFRIM UNIV-RENNES UR1-MATH-NUM

265 Consultations

6 Téléchargements