Skip to Main content Skip to Navigation
Journal articles

Checkpointing algorithms and fault prediction

Abstract : This paper deals with the impact of fault prediction techniques on checkpointing strategies. We extend the classical first-order analysis of Young and Daly in the presence of a fault prediction system, characterized by its recall and its precision. In this framework, we provide optimal algorithms to decide whether and when to take predictions into account, and we derive the optimal value of the checkpointing period. These results allow us to analytically assess the key parameters that impact the performance of fault predictors at very large scale.
Complete list of metadatas

Cited literature [26 references]  Display  Hide  Download

https://hal.inria.fr/hal-00908446
Contributor : Equipe Roma <>
Submitted on : Saturday, November 23, 2013 - 2:19:20 AM
Last modification on : Tuesday, November 19, 2019 - 2:39:17 AM
Long-term archiving on: : Monday, February 24, 2014 - 2:30:18 AM

File

main.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Guillaume Aupy, Yves Robert, Frédéric Vivien, Dounia Zaidouni. Checkpointing algorithms and fault prediction. Journal of Parallel and Distributed Computing, Elsevier, 2013, 74 (2), pp.2048-2064. ⟨10.1016/j.jpdc.2013.10.010⟩. ⟨hal-00908446⟩

Share

Metrics

Record views

453

Files downloads

690