Split Tiling for GPUs: Automatic Parallelization Using Trapezoidal Tiles to Reconcile Parallelism and Locality, avoiding Divergence and Load Imbalance

Albert Cohen 1 Tobias Grosser 1 Paul H. J. Kelly 2 J. Ramanujam 3 P. Sadayappan 4 Sven Verdoolaege 1
1 Parkas - Parallélisme de Kahn Synchrone
DI-ENS - Département d'informatique de l'École normale supérieure, ENS Paris - École normale supérieure - Paris, Inria Paris-Rocquencourt, CNRS - Centre National de la Recherche Scientifique : UMR 8548
Abstract : Tiling is a key technology to increase data reuse in computation kernels. For computations structured as one sequential outer "time" loop enclosing a set of parallel inner loops, the option of tiling only the parallel inner loops is generally not profitable because it does not enable enough data reuse. To combine parallelism and locality, several tiling algorithms propose to tile the time loop together with one or more of the parallel inner loops. However, all these algorithms have some limitations: they are either limited to special computation patterns, require the redundant execution of certain iterations (overlapped tiling), or require the use of wavefront parallelism which makes the parallel workload unbalanced. One approach to tiling that addresses most of these issues is split tiling, where tiles are subdivided into a sequence of trapezoidal computation steps. In this paper, we develop an approach to generate split tiled code for GPUs in the PPCG polyhedral code generator. We propose a generic algorithm to calculate an affine schedule and index-set splitting that enable us to perform tiling for locality and synchronization avoidance, while simultaneously maintaining parallelism, without the need for skewing or redundant computations. Our algorithm performs split tiling for an arbitrary number of dimensions and without the need to construct any large integer linear programming problem. The method and its implementation are evaluated on standard stencil kernels and compared with a state-of-the-art polyhedral compiler and with a domain-specific stencil compiler, both targeting CUDA GPUs.
Type de document :
Communication dans un congrès
GPGPU 6 - Sixth Workshop on General Purpose Processing Using GPUs, Mar 2013, Houston, United States. 2013
Liste complète des métadonnées


https://hal.inria.fr/hal-00786812
Contributeur : Albert Cohen <>
Soumis le : samedi 20 avril 2013 - 23:40:07
Dernière modification le : jeudi 29 septembre 2016 - 01:22:08
Document(s) archivé(s) le : samedi 1 avril 2017 - 20:56:13

Fichier

paper.pdf
Fichiers éditeurs autorisés sur une archive ouverte

Identifiants

  • HAL Id : hal-00786812, version 1

Collections

Citation

Albert Cohen, Tobias Grosser, Paul H. J. Kelly, J. Ramanujam, P. Sadayappan, et al.. Split Tiling for GPUs: Automatic Parallelization Using Trapezoidal Tiles to Reconcile Parallelism and Locality, avoiding Divergence and Load Imbalance. GPGPU 6 - Sixth Workshop on General Purpose Processing Using GPUs, Mar 2013, Houston, United States. 2013. <hal-00786812>

Partager

Métriques

Consultations de
la notice

544

Téléchargements du document

1015