Skip to Main content Skip to Navigation
Reports

Distributed Termination Detection for HPC Task-Based Environments

Abstract : This paper revisits distributed termination detection algorithms in the context of high-performance computing applications in task systems. We first outline the need to efficiently detect termination in workflows for which the total number of tasks is data dependent and therefore not known statically but only revealed dynamically during execution. We introduce an efficient variant of the Credit Distribution Algorithm (CDA) and compare it to the original algorithm (HCDA) as well as to its two primary competitors: the Four Counters algorithm (4C) and the Efficient Delay-Optimal Distributed algorithm (EDOD). On the theoretical side, we analyze the behavior of each algorithm for some simplified task-based kernels and show the superiority of CDA in terms of the number of control messages. On the practical side, we provide a highly tuned implementation of each termination detection algorithm within PaRSEC and compare their performance for a variety of benchmarks, extracted from scientific applications that exhibit dynamic behaviors.
Document type :
Reports
Complete list of metadata

Cited literature [27 references]  Display  Hide  Download

https://hal.inria.fr/hal-01811823
Contributor : Equipe Roma <>
Submitted on : Sunday, June 10, 2018 - 7:08:03 PM
Last modification on : Monday, November 16, 2020 - 9:56:04 AM
Long-term archiving on: : Tuesday, September 11, 2018 - 12:22:26 PM

File

rr9181.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01811823, version 1

Collections

Citation

George Bosilca, Aurelien Bouteiller, Thomas Hérault, Valentin Le Fèvre, Yves Robert, et al.. Distributed Termination Detection for HPC Task-Based Environments. [Research Report] RR-9181, Inria - Research Centre Grenoble – Rhône-Alpes. 2018, pp.1-28. ⟨hal-01811823⟩

Share

Metrics

Record views

186

Files downloads

289