Skip to Main content Skip to Navigation
New interface
Conference papers

Efficient Combination of Rematerialization and Offloading for Training DNNs

Olivier Beaumont 1 Lionel Eyraud-Dubois 1 Alena Shilova 1 
1 HiePACS - High-End Parallel Algorithms for Challenging Numerical Simulations
LaBRI - Laboratoire Bordelais de Recherche en Informatique, Inria Bordeaux - Sud-Ouest
Abstract : Rematerialization and offloading are two well known strategies to save memory during the training phase of deep neural networks, allowing data scientists to consider larger models, batch sizes or higher resolution data. Rematerialization trades memory for computation time, whereas Offloading trades memory for data movements. As these two resources are independent, it is appealing to consider the simultaneous combination of both strategies to save even more memory. We precisely model the costs and constraints corresponding to Deep Learning frameworks such as PyTorch or Tensorflow, we propose optimal algorithms to find a valid sequence of memory-constrained operations and finally, we evaluate the performance of proposed algorithms on realistic networks and computation platforms. Our experiments show that the possibility to offload can remove one third of the overhead of rematerialization, and that together they can reduce the memory used for activations by a factor 4 to 6, with an overhead below 20%.
Complete list of metadata

https://hal.inria.fr/hal-03359793
Contributor : Olivier Beaumont Connect in order to contact the contributor
Submitted on : Thursday, September 30, 2021 - 2:53:42 PM
Last modification on : Sunday, June 26, 2022 - 3:14:36 AM
Long-term archiving on: : Friday, December 31, 2021 - 8:31:53 PM

File

offchkpt.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-03359793, version 1

Collections

Citation

Olivier Beaumont, Lionel Eyraud-Dubois, Alena Shilova. Efficient Combination of Rematerialization and Offloading for Training DNNs. NeurIPS 2021 - Thirty-fifth Conference on Neural Information Processing Systems, Dec 2021, Virtual-only Conference, France. ⟨hal-03359793⟩

Share

Metrics

Record views

129

Files downloads

188