Split Tiling for GPUs: Automatic Parallelization Using Trapezoidal Tiles to Reconcile Parallelism and Locality, avoiding Divergence and Load Imbalance - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2013

Split Tiling for GPUs: Automatic Parallelization Using Trapezoidal Tiles to Reconcile Parallelism and Locality, avoiding Divergence and Load Imbalance

Résumé

Tiling is a key technology to increase data reuse in computation kernels. For computations structured as one sequential outer "time" loop enclosing a set of parallel inner loops, the option of tiling only the parallel inner loops is generally not profitable because it does not enable enough data reuse. To combine parallelism and locality, several tiling algorithms propose to tile the time loop together with one or more of the parallel inner loops. However, all these algorithms have some limitations: they are either limited to special computation patterns, require the redundant execution of certain iterations (overlapped tiling), or require the use of wavefront parallelism which makes the parallel workload unbalanced. One approach to tiling that addresses most of these issues is split tiling, where tiles are subdivided into a sequence of trapezoidal computation steps. In this paper, we develop an approach to generate split tiled code for GPUs in the PPCG polyhedral code generator. We propose a generic algorithm to calculate an affine schedule and index-set splitting that enable us to perform tiling for locality and synchronization avoidance, while simultaneously maintaining parallelism, without the need for skewing or redundant computations. Our algorithm performs split tiling for an arbitrary number of dimensions and without the need to construct any large integer linear programming problem. The method and its implementation are evaluated on standard stencil kernels and compared with a state-of-the-art polyhedral compiler and with a domain-specific stencil compiler, both targeting CUDA GPUs.
Fichier principal
Vignette du fichier
paper.pdf (928 Ko) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte
Loading...

Dates et versions

hal-00786812 , version 1 (20-04-2013)

Identifiants

  • HAL Id : hal-00786812 , version 1

Citer

Albert Cohen, Tobias Grosser, Paul H. J. Kelly, J. Ramanujam, Ponnuswamy Sadayappan, et al.. Split Tiling for GPUs: Automatic Parallelization Using Trapezoidal Tiles to Reconcile Parallelism and Locality, avoiding Divergence and Load Imbalance. GPGPU 6 - Sixth Workshop on General Purpose Processing Using GPUs, Mar 2013, Houston, United States. ⟨hal-00786812⟩
636 Consultations
1388 Téléchargements

Partager

Gmail Facebook X LinkedIn More