HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation

Distributed Termination Detection for HPC Task-Based Environments

Abstract : This paper revisits distributed termination detection algorithms in the context of high-performance computing applications in task systems. We first outline the need to efficiently detect termination in workflows for which the total number of tasks is data dependent and therefore not known statically but only revealed dynamically during execution. We introduce an efficient variant of the Credit Distribution Algorithm (CDA) and compare it to the original algorithm (HCDA) as well as to its two primary competitors: the Four Counters algorithm (4C) and the Efficient Delay-Optimal Distributed algorithm (EDOD). On the theoretical side, we analyze the behavior of each algorithm for some simplified task-based kernels and show the superiority of CDA in terms of the number of control messages. On the practical side, we provide a highly tuned implementation of each termination detection algorithm within PaRSEC and compare their performance for a variety of benchmarks, extracted from scientific applications that exhibit dynamic behaviors.
Document type :
Complete list of metadata

Cited literature [27 references]  Display  Hide  Download

Contributor : Equipe Roma Connect in order to contact the contributor
Submitted on : Sunday, June 10, 2018 - 7:08:03 PM
Last modification on : Monday, May 16, 2022 - 4:46:02 PM
Long-term archiving on: : Tuesday, September 11, 2018 - 12:22:26 PM


Files produced by the author(s)


  • HAL Id : hal-01811823, version 1



George Bosilca, Aurelien Bouteiller, Thomas Hérault, Valentin Le Fèvre, Yves Robert, et al.. Distributed Termination Detection for HPC Task-Based Environments. [Research Report] RR-9181, Inria - Research Centre Grenoble – Rhône-Alpes. 2018, pp.1-28. ⟨hal-01811823⟩



Record views


Files downloads