Skip to Main content Skip to Navigation

Combining Checkpointing and Replication for Reliable Execution of Linear Workflows

Abstract : This report combines checkpointing and replication for the reliable execution of linear work ows. While both methods have been studied separately, their combination has not yet been investigated despite its promising potential to minimize the execution time of linear work ows in failure-prone environments. The combination raises new problems: for each task, we have to decide whether to checkpoint and/or replicate it. We provide an optimal dynamic programming algorithm of quadratic complexity to solve both problems. This dynamic programming algorithm has been validated through extensive simulations that reveal the conditions in which checkpointing only, replication only, or the combination of both techniques lead to improved performance.
Document type :
Complete list of metadata

Cited literature [37 references]  Display  Hide  Download
Contributor : Equipe Roma Connect in order to contact the contributor
Submitted on : Thursday, February 22, 2018 - 10:30:35 AM
Last modification on : Friday, September 30, 2022 - 4:12:18 AM
Long-term archiving on: : Wednesday, May 23, 2018 - 12:38:26 PM


Files produced by the author(s)


  • HAL Id : hal-01714978, version 1



Anne Benoit, Aurélien Cavelan, Florina Ciorba, Valentin Le Fèvre, Yves Robert. Combining Checkpointing and Replication for Reliable Execution of Linear Workflows. [Research Report] RR-9152, Inria - Research Centre Grenoble – Rhône-Alpes. 2018, pp.1-36. ⟨hal-01714978⟩



Record views


Files downloads