Rockmate: an Efficient, Fast, Automatic and Generic Tool for Re-materialization in PyTorch
Résumé
Abstract We propose Rockmate to control the memory requirements when training
PyTorch DNN models. Rockmate is an automatic tool that starts from
the model code and generates an equivalent model, using a predefined
amount of memory for activations, at the cost of a few re-computations.
Rockmate automatically detects the structure of computational
and data dependencies and rewrites the initial model as a sequence of
complex blocks. We show that such a structure is widespread and can be
found in many models in the literature (Transformer based models, ResNet,
RegNets,...). This structure allows us to solve the problem in a fast
and efficient way, using an adaptation of Checkmate (too slow on the
whole model but general) at the level of individual blocks and an
adaptation of Rotor (fast but limited to sequential models) at the level
of the sequence itself. We show through experiments on many models
that Rockmate is as fast as Rotor and as efficient as Checkmate,
and that it allows in many cases to obtain a significantly
lower memory consumption for activations (by a factor of 2 to 5)
for a rather negligible overhead (of the order of 10% to 20%).
Rockmate is open source and available at \url{https://github.com/topal-team/rockmate}
Domaines
Intelligence artificielle [cs.AI]
Origine : Fichiers produits par l'(les) auteur(s)