Skip to Main content Skip to Navigation
Conference papers

Modeling Memory Contention between Communications and Computations in Distributed HPC Systems

Alexandre Denis 1 Emmanuel Jeannot 1 Philippe Swartvagher 1 
1 TADAAM - Topology-Aware System-Scale Data Management for High-Performance Computing
LaBRI - Laboratoire Bordelais de Recherche en Informatique, Inria Bordeaux - Sud-Ouest
Abstract : To amortize the cost of MPI communications, distributed parallel HPC applications can overlap network communications with computations in the hope that it improves global application performance. When using this technique, both computations and communications are running at the same time. But computation usually also performs some data movements. Since data for computations and for communications use the same memory system, memory contention may occur when computations are memory-bound and large messages are transmitted through the network at the same time. In this paper we propose a model to predict memory bandwidth for computations and for communications when they are executed side by side, according to data locality and taking contention into account. Elaboration of the model allowed to better understand locations of bottleneck in the memory system and what are the strategies of the memory system in case of contention. The model was evaluated on many platforms with different characteristics, and showed a prediction error in average lower than 4 %.
Complete list of metadata

https://hal.inria.fr/hal-03682199
Contributor : Philippe SWARTVAGHER Connect in order to contact the contributor
Submitted on : Monday, May 30, 2022 - 7:02:07 PM
Last modification on : Wednesday, June 15, 2022 - 3:25:25 AM

File

apdcm.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Alexandre Denis, Emmanuel Jeannot, Philippe Swartvagher. Modeling Memory Contention between Communications and Computations in Distributed HPC Systems. IPDPS - 2022 - IEEE International Parallel and Distributed Processing Symposium Workshops, May 2022, Lyon / Virtual, France. pp.10, ⟨10.1109/IPDPSW55747.2022.00086⟩. ⟨hal-03682199⟩

Share

Metrics

Record views

0

Files downloads

0