Topology-Aware Job Mapping

Abstract : A Resource and Job Management System (RJMS) is a crucial system software part of the HPC stack. It is responsible for eciently delivering computing power to applications in supercomputing environments. Its main intelligence relies on resource selection techniques to find the most adapted resources to schedule the users' jobs. This paper introduces a new method that takes into account the topology of the machine and the application characteristics to determine the best choice among the available nodes of the platform, based upon the network topology and taking into account the applications communication pattern. To validate our approach, we integrate this algorithm as a plugin for Slurm, a well-known and widespread RJMS. We assess our plugin with di↵erent optimization schemes by comparing with the default topology-aware Slurm algorithm, using both emulation and simulation of a large-scale platform and by carrying out experiments in a real cluster. We show that transparently taking into account a job communication pattern and the topology allows for relevant performance gains.
Type de document :
Article dans une revue
International Journal of High Performance Computing Applications, SAGE Publications, 2017, pp.63. 〈10.1109/SC.2006.63〉
Liste complète des métadonnées

Littérature citée [25 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01621325
Contributeur : Emmanuel Jeannot <>
Soumis le : lundi 23 octobre 2017 - 11:34:51
Dernière modification le : jeudi 11 janvier 2018 - 06:27:21
Document(s) archivé(s) le : mercredi 24 janvier 2018 - 13:28:54

Fichier

Jeannot_CCDSC_revised.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Yiannis Georgiou, Emmanuel Jeannot, Guillaume Mercier, Adèle Villiermet. Topology-Aware Job Mapping. International Journal of High Performance Computing Applications, SAGE Publications, 2017, pp.63. 〈10.1109/SC.2006.63〉. 〈hal-01621325〉

Partager

Métriques

Consultations de la notice

145

Téléchargements de fichiers

69