Skip to Main content Skip to Navigation
Journal articles

Towards Optimal Multi-Level Checkpointing

Abstract : We provide a framework to analyze multi-level checkpointing protocols, by formally defining a k-level checkpointing pattern. We provide a first-order approximation to the optimal checkpointing period, and show that the corresponding overhead is in the order of k =1 √ 2λ C , where λ is the error rate at level , and C the checkpointing cost at level. This nicely extends the classical Young/Daly formula on single-level checkpointing. Furthermore, we are able to fully characterize the shape of the optimal pattern (number and positions of checkpoints), and we provide a dynamic programming algorithm to determine the optimal subset of levels to be used. Finally, we perform simulations to check the accuracy of the theoretical study and to confirm the optimality of the subset of levels returned by the dynamic programming algorithm. The results nicely corroborate the theoretical study, and demonstrate the usefulness of multi-level checkpointing with the optimal subset of levels.
Complete list of metadatas

Cited literature [18 references]  Display  Hide  Download

https://hal.inria.fr/hal-02082416
Contributor : Equipe Roma <>
Submitted on : Thursday, March 28, 2019 - 11:46:05 AM
Last modification on : Wednesday, February 26, 2020 - 11:14:08 AM
Long-term archiving on: : Saturday, June 29, 2019 - 1:42:10 PM

File

multilevel_failstop.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Anne Benoit, Aurélien Cavelan, Valentin Le Fèvre, Yves Robert, Hongyang Sun. Towards Optimal Multi-Level Checkpointing. IEEE Transactions on Computers, Institute of Electrical and Electronics Engineers, 2017, 66 (7), pp.1212-1226. ⟨10.1109/TC.2016.2643660⟩. ⟨hal-02082416⟩

Share

Metrics

Record views

84

Files downloads

330