Skip to Main content Skip to Navigation
Reports

Checkpointing algorithms and fault prediction

Guillaume Aupy 1, 2, * Yves Robert 2, 1 Frédéric Vivien 2, 1 Dounia Zaidouni 2, 1
* Corresponding author
1 ROMA - Optimisation des ressources : modèles, algorithmes et ordonnancement
Inria Grenoble - Rhône-Alpes, LIP - Laboratoire de l'Informatique du Parallélisme
Abstract : This paper deals with the impact of fault prediction techniques on checkpointing strategies. We extend the classical first-order analysis of Young and Daly in the presence of a fault prediction system, characterized by its recall and its precision. In this framework, we provide an optimal algorithm to decide when to take predictions into account, and we derive the optimal value of the checkpointing period. These results allow to analytically assess the key parameters that impact the performance of fault predictors at very large scale.
Complete list of metadatas

https://hal.inria.fr/hal-00788313
Contributor : Guillaume Pallez (aupy) <>
Submitted on : Thursday, February 14, 2013 - 11:12:57 AM
Last modification on : Monday, October 21, 2019 - 4:11:00 PM
Long-term archiving on: : Saturday, April 1, 2017 - 11:56:26 PM

File

RR-8237.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00788313, version 1

Collections

Citation

Guillaume Aupy, Yves Robert, Frédéric Vivien, Dounia Zaidouni. Checkpointing algorithms and fault prediction. [Research Report] RR-8237, 2013, pp.8237. ⟨hal-00788313v1⟩

Share

Metrics

Record views

49

Files downloads

52