Skip to Main content Skip to Navigation
Theses

Polyhedral Compilation for Domain Specific Languages

Chandan Reddy 1, 2, 3
1 Parkas - Parallélisme de Kahn Synchrone
Inria de Paris, DI-ENS - Département d'informatique de l'École normale supérieure, CNRS - Centre National de la Recherche Scientifique
Abstract : In the recent years, the complexity of optimizing compilers has increased significantly due to increasing diversity of programming languages and heterogeneity of target architectures. Even though there has been a lot of progress with the general purpose compilers, they are not been able to extract peak level performance provided by the specialized libraries. To bridge this performance gap domain specific compilers(DSLs) are proposed, by restricting input to a specialized domain it can perform more aggressive transformations needed to achieve peak performance while being more flexible than standard libraries. One of the major optimization needed to obtain high performance on modern heterogeneous architectures is loop transformations to exploiting locality and automatic parallelization. The polyhedral model has evolved as a highly efficient, reusable generic framework for loop optimizations especially for regular static control affine programs. In this thesis we explore the suitability of polyhedral loop transformation framework in context of compiling Image processing and Deep learning pipelines. We study the challenges of adapting a generic polyhedral scheduler for DSLs. We propose various extensions to the scheduler to find optimal schedule by modeling various hardware and application characteristics. We present method to handle reductions in polyhedral model. In the state-of-the-art polyhedral compilers there was no support for reductions. The reduction loop was treated as a serial loop and this may be a major bottleneck for several applications especially on GPUs. We propose languages extensions in PENCIL to express arbitrary user-defined reductions. We encode this reduction information in polyhedral model using reduction dependences. We show how to use this dependences in polyhedral scheduler to exploit parallelization of reduction loops. We also propose a template based code generation for emitting highly efficient reduction code for GPUs. We validate our approach through experiments by comparing automatically generated code with the highly tuned library. Exploiting locality is a key factor in achieving high performance on the complex processors with complex memory/computation hierarchies. The cost function used in the Pluto algorithm optimizes only temporal locality. Exploiting spatial locality is as important as temporal locality and it has implications on vectorization and coalesced memory accesses. we propose a new unified algorithm for optimizing parallelism and locality in loop nests, that is capable of modeling temporal and spatial effects of multiprocessors and accelerators with deep memory hierarchies and multiple levels of parallelism. It orchestrates a collection of parametrizable optimization problems for locality and parallelism objectives over a polyhedral space of semantics-preserving transformations. We discuss the rationale for this unified algorithm, and validate it on a collection of representative computational kernels/benchmarks. We study the challenges of using polyhedral compilation techniques for a complex, real-world, end-to-end image processing application called SLAMBench. The SLAMBench has several non-affine kernels that not readily amendable for polyhedral compilation.We show the usefulness of summary functions to compile such non-affine parts of the program thus extending the reach of polyhedral compilation. We also present prl runtime library needed to avoid redundant data transfers between device and host. We validate our high-level compilation approach through experiments comparing the performance of the generated code with the highly optimized manual version of the SLAMBench. We also study the applicability of polyhedral compilation for optimizing deep learning pipelines. Most of the operations in the deep learning pipelines are affine hence are suitability for polyhedral compilation. Our framework is build on TVM an end-to-end deep learning compilation framework with support for multiple front ends such as MXNet, Tensorflow etc. and supports multiple different architectures. We extract the polyhedral representation from TVM IR and use polyhedral scheduler along with performance model based autotuning to automatically find the schedules for TVM operators. In this context we extend the polyhedral scheduler to find optimal schedules for different sizes and shapes of the tensor. We model the amount of data reuse for the case when all the parameter values are known, and formulate the constraints to ILP to maximize data reuse. We also present a performance model based autotuning technique that can cut down the tuning time from hours to minutes. We conduct experiments on the common deep learning benchmarks validating the effectiveness and general applicability of our technique in providing portable performance. Finally, we summarize our work and present concluding remarks as well as future research direc- tions. We believe with the improvements proposed in this dissertation improves the effectiveness of polyhedral framework as a loop transformation framework for compiling DSLs.
Document type :
Theses
Complete list of metadata

Cited literature [92 references]  Display  Hide  Download

https://hal.inria.fr/tel-02385670
Contributor : Timothy Bourke <>
Submitted on : Thursday, November 28, 2019 - 11:59:15 PM
Last modification on : Thursday, July 1, 2021 - 5:58:09 PM
Long-term archiving on: : Saturday, February 29, 2020 - 8:26:12 PM

File

reddy.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : tel-02385670, version 1

Collections

Citation

Chandan Reddy. Polyhedral Compilation for Domain Specific Languages. Computer Science [cs]. Ecole normale supérieure, 2019. English. ⟨tel-02385670⟩

Share

Metrics

Record views

201

Files downloads

934