Skip to Main content Skip to Navigation

Checkpointing Workflows à la Young/Daly Is Not Good Enough

Abstract : This paper revisits checkpointing strategies when workflows composed of multiple tasks execute on a parallel platform. The objective is to minimize the expectation of the total execution time. For a single task, the Young/Daly formula provides the optimal checkpointing period. However, when many tasks execute simultaneously, the risk that one of them is severely delayed increases with the number of tasks. To mitigate this risk, a possibility is to checkpoint each task more often than with the Young/Daly strategy. But is it worth slowing each task down with extra checkpoints? Does the extra checkpointing make a difference globally? This paper answers these questions. On the theoretical side, we prove several negative results for keeping the Young/Daly period when many tasks execute concurrently, and we design novel checkpointing strategies that guarantee an efficient execution with high probability. On the practical side, we report comprehensive experiments that demonstrate the need to go beyond the Young/Daly period and to checkpoint more often, for a wide range of application/platform settings.
Document type :
Complete list of metadata
Contributor : Equipe Roma Connect in order to contact the contributor
Submitted on : Thursday, June 17, 2021 - 6:40:00 PM
Last modification on : Monday, May 16, 2022 - 4:46:02 PM
Long-term archiving on: : Saturday, September 18, 2021 - 6:59:36 PM


Files produced by the author(s)


  • HAL Id : hal-03264047, version 1



Anne Benoit, Lucas Perotin, yves Robert, Hongyang Sun. Checkpointing Workflows à la Young/Daly Is Not Good Enough. [Research Report] RR-9413, Inria - Research Centre Grenoble – Rhône-Alpes. 2021, pp.54. ⟨hal-03264047⟩



Record views


Files downloads