Optimal Memory-aware Backpropagation of Deep Join Networks

Olivier Beaumont 1 Julien Herrmann 2, 1 Guillaume Pallez 2 Alena Shilova 1
1 HiePACS - High-End Parallel Algorithms for Challenging Numerical Simulations
LaBRI - Laboratoire Bordelais de Recherche en Informatique, Inria Bordeaux - Sud-Ouest
2 TADAAM - Topology-Aware System-Scale Data Management for High-Performance Computing
LaBRI - Laboratoire Bordelais de Recherche en Informatique, Inria Bordeaux - Sud-Ouest
Abstract : In the context of Deep Learning training, memory needs to store activations can prevent the user to consider large models and large batch sizes. A possible solution is to rely on model parallelism to distribute the weights of the model and the activations over distributed memory nodes. In this paper, we consider another purely sequential approach to save memory using checkpointing techniques. Checkpointing techniques have been introduced in the context of Automatic Differentiation. They consist in storing some, but not all activations during the feed-forward network training phase, and then to recompute missing values during the backward phase. Using this approach, it is possible, at the price of re-computations, to use a minimal amount of memory. The case of a single homogeneous chain i.e.the case of a network whose all stages are identical and form a chain, is well understood and optimal solutions based on dynamic programming have been proved in the Automatic Differentiation literature. The networks encountered in practice in the context of Deep Learning are much more diverse, both in terms of shape and heterogeneity. The present paper can be seen as an attempt to extend the class of graphs that can be solved optimally. Indeed, we provide an optimal algorithm, based on dynamic programming, for the case of several chains that gathers when computing the loss function. This model typically corresponds to the case of Siamese or Cross Modal Networks.
Complete list of metadatas

Cited literature [50 references]  Display  Hide  Download

https://hal.inria.fr/hal-02131552
Contributor : Guillaume Pallez (aupy) <>
Submitted on : Thursday, May 16, 2019 - 1:58:50 PM
Last modification on : Monday, May 27, 2019 - 10:06:44 AM

File

research_report.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02131552, version 1

Citation

Olivier Beaumont, Julien Herrmann, Guillaume Pallez, Alena Shilova. Optimal Memory-aware Backpropagation of Deep Join Networks. [Research Report] RR-9273, Inria. 2019. ⟨hal-02131552⟩

Share

Metrics

Record views

75

Files downloads

443