On the Overhead of Topology Discovery for Locality-aware Scheduling in HPC

Brice Goglin 1
1 TADAAM - Topology-Aware System-Scale Data Management for High-Performance Computing
LaBRI - Laboratoire Bordelais de Recherche en Informatique, Inria Bordeaux - Sud-Ouest
Abstract : The increasing complexity of parallel computing platforms requires a deep knowledge of the hardware and of the application needs. Locality a key criteria for performance optimization. It involves software tools to expose information about the hardware topology to high performance runtime libraries. We show that the overhead of gathering such information from the operating system is significant on large computing nodes that run Linux. This overhead also increases more than linearly with the number of processes that perform it simultaneously. We then study the actual needs of the HPC software ecosystem in terms of topology information. We propose some ways to avoid multiple expensive topology discovery and to share topology information between components such as the resource manager or the runtime libraries.
Type de document :
Communication dans un congrès
PDP2017 - 25th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, Mar 2017, St Petersburg, Russia. IEEE Computer Society, pp.9, 2017, Proceedings of the 25th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP2017). 〈http://pdp2017.org/〉. 〈10.1109/PDP.2017.35〉
Liste complète des métadonnées

Littérature citée [29 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01402755
Contributeur : Brice Goglin <>
Soumis le : jeudi 13 juillet 2017 - 19:49:43
Dernière modification le : lundi 18 septembre 2017 - 09:52:08

Fichier

article.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Brice Goglin. On the Overhead of Topology Discovery for Locality-aware Scheduling in HPC. PDP2017 - 25th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, Mar 2017, St Petersburg, Russia. IEEE Computer Society, pp.9, 2017, Proceedings of the 25th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP2017). 〈http://pdp2017.org/〉. 〈10.1109/PDP.2017.35〉. 〈hal-01402755v3〉

Partager

Métriques

Consultations de
la notice

91

Téléchargements du document

23