Optimal Memory-aware Backpropagation of Deep Join Networks

In the context of Deep Learning training, memory needs to store activations can prevent the user to consider large models and large batch sizes. A possible solution is to rely on model parallelism to distribute the weights of the model and the activations over distributed memory nodes. In this paper, we consider another purely sequential approach to save memory using checkpointing techniques. Checkpointing techniques have been introduced in the context of Automatic Differentiation. They consist in storing some, but not all activations during the feed-forward network training phase, and then to recompute missing values during the backward phase. Using this approach, it is possible, at the price of re-computations, to use a minimal amount of memory. The case of a single homogeneous chain i.e.the case of a network whose all stages are identical and form a chain, is well understood and optimal solutions based on dynamic programming have been proved in the Automatic Differentiation literature. The networks encountered in practice in the context of Deep Learning are much more diverse, both in terms of shape and heterogeneity. The present paper can be seen as an attempt to extend the class of graphs that can be solved optimally. Indeed, we provide an optimal algorithm, based on dynamic programming, for the case of several chains that gathers when computing the loss function. This model typically corresponds to the case of Siamese or Cross Modal Networks.

L’espace mémoire nécessaire pour l’apprentissage de réseaux profonds peut empêcher l’utilisateur de considérer de grands modèles. Dans ce travail nous discutons l’utilisation des techniques d’ordonnancement sous contraintes mémoire utilisées en différentiation automatique (AD) pour exécuter des graphes de rétropropagation sous contraintes mémoires. Le cas d’une chaine simple et homogène est maitrisé et la littérature en différentiation automatique offre de nombreuses techniques et solutions optimales sous diverses contraintes. Dans le cadre de l’apprentissage profond, les réseaux rencontrés sont souvent plus structurés et diverses (forme, hétérogénéité). Dans ce travail nous définissons la classe des graphes de rétropropagation et nous étendons l’ensemble des grahes sur lesquels on peut calculer en temps polynomial une solution optimale (en terme de temps d’exécution) sous contraintes mémoire. En particulier, nous considérons le cas des join qui correspondent à des modèles tels que les r éseaux siamois ou cross-modaux.

Mots clés

Backpropagation Memory Pebble game Pebble gam

Rétropropagation Mémoire

Domaines

Intelligence artificielle [cs.AI] Calcul parallèle, distribué et partagé [cs.DC]

Fichier principal

research_report.pdf (903.53 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Guillaume Pallez (Aupy) : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-02131552

Soumis le : jeudi 16 mai 2019-13:58:50

Dernière modification le : mercredi 20 mars 2024-17:52:16

Dates et versions

hal-02131552 , version 1 (16-05-2019)

Identifiants

HAL Id : hal-02131552 , version 1

Citer

Olivier Beaumont, Julien Herrmann, Guillaume Pallez, Alena Shilova. Optimal Memory-aware Backpropagation of Deep Join Networks. [Research Report] RR-9273, Inria. 2019. ⟨hal-02131552⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA INRIA-RRRT INRIA2 LARA ANR

195 Consultations

460 Téléchargements