Large-scale experiment for topology-aware resource management

Yiannis Georgiou 1 Guillaume Mercier 2 Adèle Villiermet 2
2 TADAAM - Topology-Aware System-Scale Data Management for High-Performance Computing
LaBRI - Laboratoire Bordelais de Recherche en Informatique, Inria Bordeaux - Sud-Ouest
Abstract : A Resource and Job Management System (RJMS) is a crucial system software part of the HPC stack. It is responsible for efficiently delivering computing power to applications in supercomputing environments and its main intelligence relies on resource selection techniques to find the most adapted resources to schedule the users' jobs. In [8], we introduced a new topology-aware resource selection algorithm to determine the best choice among the available nodes of the platform based on their position in the network and on application behaviour (expressed as a communication matrix). We did integrate this algorithm as a plugin in Slurm and validated it with several optimization schemes by making comparisons with the default Slurm algorithm. This paper presents further experiments with regard to this selection process.
Type de document :
Communication dans un congrès
Open workshop on data locality, Aug 2017, Santiago de Compostella, Spain. 2018
Liste complète des métadonnées

Littérature citée [14 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01667350
Contributeur : Emmanuel Jeannot <>
Soumis le : mardi 19 décembre 2017 - 11:57:20
Dernière modification le : mercredi 6 juin 2018 - 09:47:57

Fichier

article_89.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01667350, version 1

Citation

Yiannis Georgiou, Guillaume Mercier, Adèle Villiermet. Large-scale experiment for topology-aware resource management. Open workshop on data locality, Aug 2017, Santiago de Compostella, Spain. 2018. 〈hal-01667350〉

Partager

Métriques

Consultations de la notice

203

Téléchargements de fichiers

48