Skip to Main content Skip to Navigation
Journal articles

The Next 700 Accelerated Layers

Abstract : Deep learning frameworks automate the deployment, distribution, synchronization, memory allocation, andhardware acceleration of models represented as graphs of computational operators. These operators wraphigh-performance libraries such as cuDNN or NNPACK. When the computation does not match any prede-fined library call, custom operators must be implemented, often at high engineering cost and performancepenalty, limiting the pace of innovation. To address this productivity gap, we propose and evaluate: (1) adomain-specific language with a tensor notation close to the mathematics of deep learning; (2) a Just-In-Time optimizing compiler based on the polyhedral framework; (3) carefully coordinated linear optimizationand evolutionary algorithms to synthesize high-performance CUDA kernels; (4) the transparent integrationof our flow into PyTorch and Caffe2, providing the fully automatic synthesis of high-performance GPU ker-nels from simple tensor algebra. The performance is comparable to, and often exceeds the performance of,highly tuned libraries.
Complete list of metadata
Contributor : Albert Cohen <>
Submitted on : Tuesday, January 28, 2020 - 6:00:14 PM
Last modification on : Thursday, May 27, 2021 - 1:54:06 PM

Links full text




Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary Devito, et al.. The Next 700 Accelerated Layers. ACM Transactions on Architecture and Code Optimization, Association for Computing Machinery, 2019, 16 (4), pp.1-26. ⟨10.1145/3355606⟩. ⟨hal-02458550⟩



Record views