The Next 700 Accelerated Layers - Archive ouverte HAL Access content directly
Journal Articles ACM Transactions on Architecture and Code Optimization Year : 2019

The Next 700 Accelerated Layers

(1) , (2) , (3) , (1) , (1) , (4) , (1) , (1) , (2, 1)
1
2
3
4

Abstract

Deep learning frameworks automate the deployment, distribution, synchronization, memory allocation, andhardware acceleration of models represented as graphs of computational operators. These operators wraphigh-performance libraries such as cuDNN or NNPACK. When the computation does not match any prede-fined library call, custom operators must be implemented, often at high engineering cost and performancepenalty, limiting the pace of innovation. To address this productivity gap, we propose and evaluate: (1) adomain-specific language with a tensor notation close to the mathematics of deep learning; (2) a Just-In-Time optimizing compiler based on the polyhedral framework; (3) carefully coordinated linear optimizationand evolutionary algorithms to synthesize high-performance CUDA kernels; (4) the transparent integrationof our flow into PyTorch and Caffe2, providing the fully automatic synthesis of high-performance GPU ker-nels from simple tensor algebra. The performance is comparable to, and often exceeds the performance of,highly tuned libraries.

Dates and versions

hal-02458550 , version 1 (28-01-2020)

Identifiers

Cite

Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary Devito, et al.. The Next 700 Accelerated Layers. ACM Transactions on Architecture and Code Optimization, 2019, 16 (4), pp.1-26. ⟨10.1145/3355606⟩. ⟨hal-02458550⟩
58 View
0 Download

Altmetric

Share

Gmail Facebook Twitter LinkedIn More