Skip to Main content Skip to Navigation
Journal articles

Study on progress threads placement and dedicated cores for overlapping MPI nonblocking collectives on manycore processor

Abstract : To amortize the cost of MPI collective operations, nonblocking collectives have been proposed so as to allow communications to be overlapped with computation. Unfortunately, collective communications are more CPU-hungry than point-to-point communications and running them in a communication thread on a dedicated CPU core makes them slow. On the other hand, running collective communications on the application cores leads to no overlap. In this paper, we propose placement algorithms for progress threads that do not degrade performance when running on cores dedicated to communications to get communication/computation overlap. We first show that even simple collective operations, such as those based on a chain topology, are not straightforward to make progress in background on a dedicated core. Then, we propose an algorithm for tree-based collective operations that splits the tree between communication cores and application cores. To get the best of both worlds, the algorithm runs the short but heavy part of the tree on application cores, and the long but narrow part of the tree on one or several communication cores, so as to get a trade-off between overlap and absolute performance. We provide a model to study and predict its behavior and to tune its parameters. We implemented both algorithms in the MPC framework, which is a thread-based MPI implementation. We have run benchmarks on manycore processors such as the KNL and Skylake and get good results for both performance and overlap.
Document type :
Journal articles
Complete list of metadata

Cited literature [22 references]  Display  Hide  Download
Contributor : Alexandre Denis Connect in order to contact the contributor
Submitted on : Monday, December 9, 2019 - 2:49:01 PM
Last modification on : Tuesday, January 25, 2022 - 3:44:50 AM
Long-term archiving on: : Tuesday, March 10, 2020 - 10:09:21 PM


Files produced by the author(s)




Alexandre Denis, Julien Jaeger, Emmanuel Jeannot, Marc Pérache, Hugo Taboada. Study on progress threads placement and dedicated cores for overlapping MPI nonblocking collectives on manycore processor. International Journal of High Performance Computing Applications, SAGE Publications, 2019, 33 (6), pp.1240-1254. ⟨10.1177/1094342019860184⟩. ⟨hal-02400422⟩



Les métriques sont temporairement indisponibles