Efficient Combination of Rematerialization and Offloading for Training DNNs

Olivier Beaumont; Lionel Eyraud-Dubois; Alena Shilova

Communication Dans Un Congrès Année : 2021

Efficient Combination of Rematerialization and Offloading for Training DNNs

(1) , (1) , (1)

Olivier Beaumont

Fonction : Auteur
PersonId : 181224
IdHAL : olivier-beaumont
ORCID : 0000-0003-2741-6228
IdRef : 124577083

High-End Parallel Algorithms for Challenging Numerical Simulations

Lionel Eyraud-Dubois

Fonction : Auteur
PersonId : 174911
IdHAL : lioneleyraud-dubois
ORCID : 0000-0003-2475-3309
IdRef : 172645301

High-End Parallel Algorithms for Challenging Numerical Simulations

Alena Shilova

Fonction : Auteur

High-End Parallel Algorithms for Challenging Numerical Simulations

Résumé

Rematerialization and offloading are two well known strategies to save memory during the training phase of deep neural networks, allowing data scientists to consider larger models, batch sizes or higher resolution data. Rematerialization trades memory for computation time, whereas Offloading trades memory for data movements. As these two resources are independent, it is appealing to consider the simultaneous combination of both strategies to save even more memory. We precisely model the costs and constraints corresponding to Deep Learning frameworks such as PyTorch or Tensorflow, we propose optimal algorithms to find a valid sequence of memory-constrained operations and finally, we evaluate the performance of proposed algorithms on realistic networks and computation platforms. Our experiments show that the possibility to offload can remove one third of the overhead of rematerialization, and that together they can reduce the memory used for activations by a factor 4 to 6, with an overhead below 20%.

Domaines

Calcul parallèle, distribué et partagé [cs.DC]

Fichier principal

offchkpt.pdf (563.2 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Olivier Beaumont : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-03359793

Soumis le : jeudi 30 septembre 2021-14:53:42

Dernière modification le : mercredi 20 mars 2024-17:52:16

Archivage à long terme le : vendredi 31 décembre 2021-20:31:53

Dates et versions

hal-03359793 , version 1 (30-09-2021)

Identifiants

HAL Id : hal-03359793 , version 1

Citer

Olivier Beaumont, Lionel Eyraud-Dubois, Alena Shilova. Efficient Combination of Rematerialization and Offloading for Training DNNs. NeurIPS 2021 - Thirty-fifth Conference on Neural Information Processing Systems, Dec 2021, Virtual-only Conference, France. ⟨hal-03359793⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA INRIA2

156 Consultations

241 Téléchargements

Efficient Combination of Rematerialization and Offloading for Training DNNs

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager