Optimizing the Training of Co-Located Deep Learning Models Using Cache-Aware Staggering - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2023

Optimizing the Training of Co-Located Deep Learning Models Using Cache-Aware Staggering

Résumé

Despite significant advances, training deep learning models remains a time-consuming and resource-intensive task. One of the key challenges in this context is the ingestion of the training data, which involves non-trivial overheads: read the training data from a remote repository, apply augmentations and transformations, shuffle the training samples, and assemble them into mini-batches. Despite the introduction of abstractions such as data pipelines that aim to hide such overheads asynchronously, it is often the case that the data ingestion is slower than the training, causing a delay at each training iteration. This problem is further augmented when training multiple deep learning models simultaneously on powerful compute nodes that feature multiple GPUs. In this case, the training data is often reused across different training instances (e.g., in the case of multi-model or ensemble training) or even within the same training instance (e.g., data-parallel training). However, transparent caching solutions (e.g., OS-level POSIX caching) are not suitable to directly mitigate the competition between training instances that reuse the same training data. In this paper, we study the problem of how to minimize the makespan of running two training instances that reuse the same training data. The makespan is subject to a trade-off: if the training instances start at the same time, competition for I/O bandwidth slows down the data pipelines and increases the makespan. If one training instance is staggered, competition is reduced but the makespan increases. We aim to optimize this trade-off by proposing a performance model capable of predicting the makespan based on the staggering between the training instances, which can be used to find the optimal staggering that triggers just enough competition to make optimal use of transparent caching in order to minimize the makespan. Experiments with different combinations of learning models using the same training data demonstrate that (1) staggering is important to minimize the makespan; (2) our performance model is accurate and can predict the optimal staggering in advance based on calibration overhead.
Fichier principal
Vignette du fichier
HiPC23_DL_cache_aware_scheduling_for_optimized_DL_data_IO.pdf (523.33 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Licence : CC BY - Paternité

Dates et versions

hal-04343672 , version 1 (14-12-2023)

Licence

Paternité

Identifiants

  • HAL Id : hal-04343672 , version 1

Citer

Kevin Assogba, M Mustafa Rafique, Bogdan Nicolae. Optimizing the Training of Co-Located Deep Learning Models Using Cache-Aware Staggering. HIPC’23: 30th IEEE International Conference on High Performance Computing, Data, and Analytics, Dec 2023, Goa, India. ⟨hal-04343672⟩
78 Consultations
60 Téléchargements

Partager

Gmail Facebook X LinkedIn More