The Next 700 Accelerated Layers

Nicolas Vasilache; Oleksandr Zinenko; Theodoros Theodoridis; Priya Goyal; Zachary Devito; William Moses; Sven Verdoolaege; Andrew Adams; Albert Cohen

doi:10.1145/3355606

Article Dans Une Revue ACM Transactions on Architecture and Code Optimization Année : 2019

The Next 700 Accelerated Layers

(1) , (2) , (3) , (1) , (1) , (4) , (1) , (1) , (2, 1)

1
2
3
4

Nicolas Vasilache

Fonction : Auteur

Facebook AI Research [New York]

Oleksandr Zinenko

Fonction : Auteur
PersonId : 1844
IdHAL : ozinenko
ORCID : 0000-0003-1978-0222
IdRef : 197070205

Parallélisme de Kahn Synchrone

Theodoros Theodoridis

Fonction : Auteur

Eidgenössische Technische Hochschule - Swiss Federal Institute of Technology [Zürich]

Priya Goyal

Fonction : Auteur

Facebook AI Research [New York]

Zachary Devito

Fonction : Auteur

Facebook AI Research [New York]

William Moses

Fonction : Auteur

Massachusetts Institute of Technology

Sven Verdoolaege

Fonction : Auteur

Facebook AI Research [New York]

Andrew Adams

Fonction : Auteur

Facebook AI Research [New York]

Albert Cohen

Fonction : Auteur
PersonId : 6894
IdHAL : acohen
ORCID : 0000-0002-8866-5343
IdRef : 067155898

Parallélisme de Kahn Synchrone

Facebook AI Research [New York]

Résumé

Deep learning frameworks automate the deployment, distribution, synchronization, memory allocation, andhardware acceleration of models represented as graphs of computational operators. These operators wraphigh-performance libraries such as cuDNN or NNPACK. When the computation does not match any prede-fined library call, custom operators must be implemented, often at high engineering cost and performancepenalty, limiting the pace of innovation. To address this productivity gap, we propose and evaluate: (1) adomain-specific language with a tensor notation close to the mathematics of deep learning; (2) a Just-In-Time optimizing compiler based on the polyhedral framework; (3) carefully coordinated linear optimizationand evolutionary algorithms to synthesize high-performance CUDA kernels; (4) the transparent integrationof our flow into PyTorch and Caffe2, providing the fully automatic synthesis of high-performance GPU ker-nels from simple tensor algebra. The performance is comparable to, and often exceeds the performance of,highly tuned libraries.

Mots clés

GPU acceleration Polyhedral compilation Deep learning layers

Domaines

Langage de programmation [cs.PL] Apprentissage [cs.LG]

Albert Cohen : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-02458550

Soumis le : mardi 28 janvier 2020-18:00:14

Dernière modification le : vendredi 19 avril 2024-16:18:58

Dates et versions

hal-02458550 , version 1 (28-01-2020)

Identifiants

HAL Id : hal-02458550 , version 1
DOI : 10.1145/3355606

Citer

Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary Devito, et al.. The Next 700 Accelerated Layers. ACM Transactions on Architecture and Code Optimization, 2019, 16 (4), pp.1-26. ⟨10.1145/3355606⟩. ⟨hal-02458550⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-PARIS UNIV-RENNES1 CNRS INRIA IRISA INRIA2 PSL UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

63 Consultations

0 Téléchargements

The Next 700 Accelerated Layers

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager