Split Tiling for GPUs: Automatic Parallelization Using Trapezoidal Tiles to Reconcile Parallelism and Locality, avoiding Divergence and Load Imbalance

Albert Cohen; Tobias Grosser; Paul H. J. Kelly; J. Ramanujam; Ponnuswamy Sadayappan; Sven Verdoolaege

Communication Dans Un Congrès Année : 2013

Split Tiling for GPUs: Automatic Parallelization Using Trapezoidal Tiles to Reconcile Parallelism and Locality, avoiding Divergence and Load Imbalance

(1) , (1) , (2) , (3) , (4) , (1)

1
2
3
4

Albert Cohen

Fonction : Auteur

Parallélisme de Kahn Synchrone

Tobias Grosser

Fonction : Auteur

Parallélisme de Kahn Synchrone

Paul H. J. Kelly

Fonction : Auteur

Department of Computing [London]

J. Ramanujam

Fonction : Auteur

Department of Electrical and Computer Engineering - Louisiana State University

Ponnuswamy Sadayappan

Fonction : Auteur

Department of Computer Science and Engineering [Columbus]

Sven Verdoolaege

Fonction : Auteur

Parallélisme de Kahn Synchrone

Résumé

Tiling is a key technology to increase data reuse in computation kernels. For computations structured as one sequential outer "time" loop enclosing a set of parallel inner loops, the option of tiling only the parallel inner loops is generally not profitable because it does not enable enough data reuse. To combine parallelism and locality, several tiling algorithms propose to tile the time loop together with one or more of the parallel inner loops. However, all these algorithms have some limitations: they are either limited to special computation patterns, require the redundant execution of certain iterations (overlapped tiling), or require the use of wavefront parallelism which makes the parallel workload unbalanced. One approach to tiling that addresses most of these issues is split tiling, where tiles are subdivided into a sequence of trapezoidal computation steps. In this paper, we develop an approach to generate split tiled code for GPUs in the PPCG polyhedral code generator. We propose a generic algorithm to calculate an affine schedule and index-set splitting that enable us to perform tiling for locality and synchronization avoidance, while simultaneously maintaining parallelism, without the need for skewing or redundant computations. Our algorithm performs split tiling for an arbitrary number of dimensions and without the need to construct any large integer linear programming problem. The method and its implementation are evaluated on standard stencil kernels and compared with a state-of-the-art polyhedral compiler and with a domain-specific stencil compiler, both targeting CUDA GPUs.

Domaines

Langage de programmation [cs.PL]

Fichier principal

paper.pdf (928 Ko)

Origine : Fichiers éditeurs autorisés sur une archive ouverte

Albert Cohen : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00786812

Soumis le : samedi 20 avril 2013-23:40:07

Dernière modification le : vendredi 19 avril 2024-16:18:57

Archivage à long terme le : samedi 1 avril 2017-20:56:13

Dates et versions

hal-00786812 , version 1 (20-04-2013)

Identifiants

HAL Id : hal-00786812 , version 1

Citer

Albert Cohen, Tobias Grosser, Paul H. J. Kelly, J. Ramanujam, Ponnuswamy Sadayappan, et al.. Split Tiling for GPUs: Automatic Parallelization Using Trapezoidal Tiles to Reconcile Parallelism and Locality, avoiding Divergence and Load Imbalance. GPGPU 6 - Sixth Workshop on General Purpose Processing Using GPUs, Mar 2013, Houston, United States. ⟨hal-00786812⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-PARIS CNRS INRIA INRIA2 PSL

636 Consultations

1388 Téléchargements

Split Tiling for GPUs: Automatic Parallelization Using Trapezoidal Tiles to Reconcile Parallelism and Locality, avoiding Divergence and Load Imbalance

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager