Large-scale experiment for topology-aware resource management

Yiannis Georgiou 1 Guillaume Mercier 2 Adèle Villiermet 2
2 TADAAM - Topology-Aware System-Scale Data Management for High-Performance Computing
LaBRI - Laboratoire Bordelais de Recherche en Informatique, Inria Bordeaux - Sud-Ouest
Abstract : A Resource and Job Management System (RJMS) is a crucial system software part of the HPC stack. It is responsible for efficiently delivering computing power to applications in supercomputing environments and its main intelligence relies on resource selection techniques to find the most adapted resources to schedule the users' jobs. In [8], we introduced a new topology-aware resource selection algorithm to determine the best choice among the available nodes of the platform based on their position in the network and on application behaviour (expressed as a communication matrix). We did integrate this algorithm as a plugin in Slurm and validated it with several optimization schemes by making comparisons with the default Slurm algorithm. This paper presents further experiments with regard to this selection process.
Complete list of metadatas

Cited literature [14 references]  Display  Hide  Download

https://hal.inria.fr/hal-01667350
Contributor : Emmanuel Jeannot <>
Submitted on : Tuesday, December 19, 2017 - 11:57:20 AM
Last modification on : Tuesday, August 13, 2019 - 3:20:09 PM

File

article_89.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01667350, version 1

Collections

Citation

Yiannis Georgiou, Guillaume Mercier, Adèle Villiermet. Large-scale experiment for topology-aware resource management. Open workshop on data locality, Aug 2017, Santiago de Compostella, Spain. ⟨hal-01667350⟩

Share

Metrics

Record views

329

Files downloads

173