Adaptive Runtime Selection for GPU

Jean-François Dollinger; Vincent Loechner

doi:10.1109/ICPP.2013.16

Communication Dans Un Congrès Année : 2013

Adaptive Runtime Selection for GPU

(1, 2) , (1, 3)

1
2
3

Jean-François Dollinger

Fonction : Auteur
PersonId : 1250212
IdHAL : jf-dollinger
ORCID : 0000-0002-6688-2320

Compilation pour les Architectures MUlti-coeurS

ICPS

Vincent Loechner

Fonction : Auteur
PersonId : 739606
IdHAL : vincent-loechner
ORCID : 0000-0003-3481-4881
IdRef : 114245908

Compilation pour les Architectures MUlti-coeurS

Laboratoire des sciences de l'ingénieur, de l'informatique et de l'imagerie

Résumé

It is often hard to predict the performance of a statically generated code. Hardware availability, hardware specification and problem size may change from one execution context to another. The main contribution of this work is an entirely automatic method aiming to predict execution times of semantically equivalent versions of affine loop nests on GPUs; then, to run the best performing one on GPU or CPU. To make accurate predictions, our framework relies on three consecutive stages: a static code generation, an offline profiling and an online prediction. Different versions are statically gen- erated by PPCG, a source-to-source polyhedral compiler, able to generate CUDA code from static control loops written in C. The code versions differ by their block sizes, tiling and parallel schedule. The profiling code carries out the required measurements on the target machine: throughput between host and device memory, and execution time of the kernels with various parameters. At runtime, we rely on those results to calculate a predicted execution time on GPU. This is followed by a "fastest wins" algorithm, that runs instances of the target code concurrently on CPU and GPU; the first completed kills the other one. We validate this proposal on the polyhedral benchmark suite, showing that the predictions are accurate and that the runtime selection is effective on two different architectures.

Domaines

Calcul parallèle, distribué et partagé [cs.DC]

Vincent Loechner : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00869652

Soumis le : jeudi 3 octobre 2013-18:21:01

Dernière modification le : jeudi 11 avril 2024-13:08:14

Dates et versions

hal-00869652 , version 1 (03-10-2013)

Identifiants

HAL Id : hal-00869652 , version 1
DOI : 10.1109/ICPP.2013.16

Citer

Jean-François Dollinger, Vincent Loechner. Adaptive Runtime Selection for GPU. 42nd International Conference on Parallel Processing, 2013, Lyon, France. pp.70-79, ⟨10.1109/ICPP.2013.16⟩. ⟨hal-00869652⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA ENGEES INSA-STRASBOURG INRIA2 INC-CNRS SITE-ALSACE INSA-GROUPE

203 Consultations

0 Téléchargements

Adaptive Runtime Selection for GPU

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager