Skip to Main content Skip to Navigation
Conference papers

Large-scale experiment for topology-aware resource management

Yiannis Georgiou 1 Guillaume Mercier 2 Adèle Villiermet 2 
2 TADAAM - Topology-Aware System-Scale Data Management for High-Performance Computing
LaBRI - Laboratoire Bordelais de Recherche en Informatique, Inria Bordeaux - Sud-Ouest
Abstract : A Resource and Job Management System (RJMS) is a crucial system software part of the HPC stack. It is responsible for efficiently delivering computing power to applications in supercomputing environments and its main intelligence relies on resource selection techniques to find the most adapted resources to schedule the users' jobs. In [8], we introduced a new topology-aware resource selection algorithm to determine the best choice among the available nodes of the platform based on their position in the network and on application behaviour (expressed as a communication matrix). We did integrate this algorithm as a plugin in Slurm and validated it with several optimization schemes by making comparisons with the default Slurm algorithm. This paper presents further experiments with regard to this selection process.
Complete list of metadata

Cited literature [14 references]  Display  Hide  Download
Contributor : Emmanuel Jeannot Connect in order to contact the contributor
Submitted on : Tuesday, December 19, 2017 - 11:57:20 AM
Last modification on : Saturday, June 25, 2022 - 10:37:57 AM


Files produced by the author(s)


  • HAL Id : hal-01667350, version 1



Yiannis Georgiou, Guillaume Mercier, Adèle Villiermet. Large-scale experiment for topology-aware resource management. Open workshop on data locality, Aug 2017, Santiago de Compostella, Spain. ⟨hal-01667350⟩



Record views


Files downloads