Study on progress threads placement and dedicated cores for overlapping MPI nonblocking collectives on manycore processor

Alexandre Denis; Julien Jaeger; Emmanuel Jeannot; Marc Pérache; Hugo Taboada

doi:10.1177/1094342019860184

Article Dans Une Revue International Journal of High Performance Computing Applications Année : 2019

Study on progress threads placement and dedicated cores for overlapping MPI nonblocking collectives on manycore processor

(1, 2) , (3, 4) , (1, 2) , (4) , (4)

1
2
3
4

Alexandre Denis

Fonction : Auteur
PersonId : 103
IdHAL : adenis
ORCID : 0000-0001-8606-4344
IdRef : 225733218

Topology-Aware System-Scale Data Management for High-Performance Computing

Laboratoire Bordelais de Recherche en Informatique

Julien Jaeger

Fonction : Auteur

Centre d'Études de Limeil-Valenton

DAM Île-de-France

Emmanuel Jeannot

Fonction : Auteur
PersonId : 15678
IdHAL : emmanuel-jeannot
ORCID : 0000-0002-3956-2997
IdRef : 084595108

Topology-Aware System-Scale Data Management for High-Performance Computing

Laboratoire Bordelais de Recherche en Informatique

Marc Pérache

Fonction : Auteur

DAM Île-de-France

Hugo Taboada

Fonction : Auteur
PersonId : 987793

DAM Île-de-France

Résumé

To amortize the cost of MPI collective operations, nonblocking collectives have been proposed so as to allow communications to be overlapped with computation. Unfortunately, collective communications are more CPU-hungry than point-to-point communications and running them in a communication thread on a dedicated CPU core makes them slow. On the other hand, running collective communications on the application cores leads to no overlap. In this paper, we propose placement algorithms for progress threads that do not degrade performance when running on cores dedicated to communications to get communication/computation overlap. We first show that even simple collective operations, such as those based on a chain topology, are not straightforward to make progress in background on a dedicated core. Then, we propose an algorithm for tree-based collective operations that splits the tree between communication cores and application cores. To get the best of both worlds, the algorithm runs the short but heavy part of the tree on application cores, and the long but narrow part of the tree on one or several communication cores, so as to get a trade-off between overlap and absolute performance. We provide a model to study and predict its behavior and to tune its parameters. We implemented both algorithms in the MPC framework, which is a thread-based MPI implementation. We have run benchmarks on manycore processors such as the KNL and Skylake and get good results for both performance and overlap.

Mots clés

MPI Placement Non-blocking Collectives Communication/Computation Overlap

Domaines

Réseaux et télécommunications [cs.NI]

Fichier principal

version-finale.pdf (1.12 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Alexandre Denis : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-02400422

Soumis le : lundi 9 décembre 2019-14:49:01

Dernière modification le : mercredi 3 avril 2024-11:24:09

Archivage à long terme le : mardi 10 mars 2020-22:09:21

Dates et versions

hal-02400422 , version 1 (09-12-2019)

Identifiants

HAL Id : hal-02400422 , version 1
DOI : 10.1177/1094342019860184

Citer

Alexandre Denis, Julien Jaeger, Emmanuel Jeannot, Marc Pérache, Hugo Taboada. Study on progress threads placement and dedicated cores for overlapping MPI nonblocking collectives on manycore processor. International Journal of High Performance Computing Applications, 2019, 33 (6), pp.1240-1254. ⟨10.1177/1094342019860184⟩. ⟨hal-02400422⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CEA UNIV-RENNES1 CNRS INRIA IRISA DAM INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC PLAFRIM CEA-DRF UNIV-RENNES UR1-MATH-NUM

111 Consultations

269 Téléchargements

Study on progress threads placement and dedicated cores for overlapping MPI nonblocking collectives on manycore processor

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager