Scheduling the I/O of HPC applications under congestion

Ana Gainaru 1 Anne Benoit 2, 3 Guillaume Aupy 2, 3 Franck Cappello 4, 5, 6, 7 Yves Robert 2, 3 Marc Snir 1, 4
3 ROMA - Optimisation des ressources : modèles, algorithmes et ordonnancement
Inria Grenoble - Rhône-Alpes, LIP - Laboratoire de l'Informatique du Parallélisme
5 GRAND-LARGE - Global parallel and distributed computing
LRI - Laboratoire de Recherche en Informatique, LIFL - Laboratoire d'Informatique Fondamentale de Lille, UP11 - Université Paris-Sud - Paris 11, Inria Saclay - Ile de France, CNRS - Centre National de la Recherche Scientifique : UMR8623
Abstract : A significant percentage of the computing capacity of large-scale platforms is wasted due to interferences incurred by multiple applications that access a shared parallel file system concurrently. One solution to handling I/O bursts in large-scale HPC systems is to absorb them at an intermediate storage layer consisting of burst buffers. However, our analysis of the Argonne's Mira system shows that burst buffers cannot prevent congestion at all times. As a consequence, I/O performance is dramatically degraded, showing in some cases a decrease in I/O throughput of 67%. In this paper, we analyze the effects of interference on application I/O bandwidth, and propose several scheduling techniques to mitigate congestion. We show through extensive experiments that our global I/O scheduler is able to reduce the effects of congestion, even on systems where burst buffers are used, and can increase the overall system throughput up to 56%. We also show that it outperforms current Mira I/O schedulers.
Complete list of metadatas
Contributor : Equipe Roma <>
Submitted on : Friday, April 25, 2014 - 5:25:54 PM
Last modification on : Thursday, August 1, 2019 - 2:12:06 PM
Long-term archiving on : Friday, July 25, 2014 - 12:25:29 PM


Files produced by the author(s)


  • HAL Id : hal-00983789, version 1



Ana Gainaru, Anne Benoit, Guillaume Aupy, Franck Cappello, Yves Robert, et al.. Scheduling the I/O of HPC applications under congestion. [Research Report] RR-8519, 2014, pp.25. ⟨hal-00983789v1⟩



Record views


Files downloads