Register Optimizations for Stencils on GPUs

Abstract : The recent advent of compute-intensive GPU architecture has allowed application developers to explore high-order 3D stencils for better computational accuracy. A common optimization strategy for such stencils is to expose sufficient data reuse by means such as loop unrolling, with the expectation of register-level reuse. However, the resulting code is often highly constrained by register pressure. While current state-of-the-art register allocators are satisfactory for most applications, they are unable to effectively manage register pressure for such complex high-order stencils, resulting in sub-optimal code with a large number of register spills. In this paper, we develop a statement reordering framework that models stencil computations as a DAG of trees with shared leaves, and adapts an optimal scheduling algorithm for minimizing register usage for expression trees. The effectiveness of the approach is demonstrated through experimental results on a range of stencils extracted from application codes.
Document type :
Conference papers
Complete list of metadatas

Cited literature [16 references]  Display  Hide  Download

https://hal.inria.fr/hal-01955542
Contributor : Fabrice Rastello <>
Submitted on : Friday, December 14, 2018 - 2:15:07 PM
Last modification on : Thursday, February 7, 2019 - 3:38:40 PM
Long-term archiving on : Friday, March 15, 2019 - 3:52:19 PM

File

ppopp18-hal.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01955542, version 1

Collections

Citation

Prashant Singh, Aravind Sukumaran-Rajam, Atanas Rountev, Fabrice Rastello, Louis-Noël Pouchet, et al.. Register Optimizations for Stencils on GPUs. PPoPP 2018 - 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Feb 2018, Vienna, Austria. pp.1-15. ⟨hal-01955542⟩

Share

Metrics

Record views

185

Files downloads

143