# Towards Optimal Multi-Level Checkpointing

1 ROMA - Optimisation des ressources : modèles, algorithmes et ordonnancement
Inria Grenoble - Rhône-Alpes, LIP - Laboratoire de l'Informatique du Parallélisme
Abstract : We provide a framework to analyze multi-level checkpointing protocols, by formally defining a $k$-level checkpointing pattern. We provide a first-order approximation to the optimal checkpointing pattern, and show that the corresponding overhead is of the order of $\sum_{\ell=1}^{k}\sqrt{2\lambda_\ell C_\ell}$, where $\lambda_\ell$ is the error rate at level~$\ell$, and $C_\ell$ the checkpointing cost at level~$\ell$. This nicely extends the classical Young/Daly formula. Furthermore, we are able to fully characterize the shape of the optimal pattern (number and positions of checkpoints), and we provide a dynamic programming algorithm to determine which levels should be used. Finally, we perform simulations to check the accuracy of the theoretical study and to confirm the optimality of the subset of levels returned by the dynamic programming algorithm. The results nicely corroborate the theoretical study, and demonstrate the usefulness of multi-level checkpointing with the optimal subset of levels.
Keywords :
Document type :
Reports
Domain :

Cited literature [17 references]

https://hal.inria.fr/hal-01339788
Contributor : Equipe Roma Connect in order to contact the contributor
Submitted on : Monday, December 19, 2016 - 12:40:36 PM
Last modification on : Friday, September 30, 2022 - 4:12:14 AM

### File

RR-8930.pdf
Files produced by the author(s)

### Identifiers

• HAL Id : hal-01339788, version 3

### Citation

Anne Benoit, Aurélien Cavelan, Valentin Le Fèvre, Yves Robert, Hongyang Sun. Towards Optimal Multi-Level Checkpointing. [Research Report] RR-8930, INRIA Grenoble - Rhone-Alpes. 2016. ⟨hal-01339788v3⟩

Record views