HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Reports

Modeling Memory Contention between Communications and Computations in Distributed HPC Systems (Extended Version)

Alexandre Denis 1 Emmanuel Jeannot 1 Philippe Swartvagher 1
1 TADAAM - Topology-Aware System-Scale Data Management for High-Performance Computing
LaBRI - Laboratoire Bordelais de Recherche en Informatique, Inria Bordeaux - Sud-Ouest
Abstract : To amortize the cost of MPI communications, distributed parallel HPC applications can overlap network communications with computations in the hope that it improves global application performance. When using this technique, both computations and communications are running at the same time. But computation usually also performs some data movements. Since data for computations and for communications use the same memory system, memory contention may occur when computations are memory-bound and large messages are transmitted through the network at the same time. In this paper we propose a model to predict memory bandwidth for computations and for communications when they are executed side by side, according to data locality and taking contention into account. Elaboration of the model allowed to better understand locations of bottleneck in the memory system and what are the strategies of the memory system in case of contention. The model was evaluated on many platforms with different characteristics, and showed a prediction error in average lower than 4 %.
Complete list of metadata

https://hal.inria.fr/hal-03564751
Contributor : Philippe Swartvagher Connect in order to contact the contributor
Submitted on : Thursday, February 10, 2022 - 3:41:54 PM
Last modification on : Wednesday, May 4, 2022 - 4:42:48 PM
Long-term archiving on: : Wednesday, May 11, 2022 - 6:56:46 PM

File

RR-9451.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-03564751, version 1

Citation

Alexandre Denis, Emmanuel Jeannot, Philippe Swartvagher. Modeling Memory Contention between Communications and Computations in Distributed HPC Systems (Extended Version). [Research Report] RR-9451, INRIA Bordeaux, équipe TADAAM. 2022, pp.34. ⟨hal-03564751⟩

Share

Metrics

Record views

67

Files downloads

91