Skip to Main content Skip to Navigation
Conference papers

Robustness of the Young/Daly formula for stochastic iterative applications

Yishu Du 1, 2 Loris Marchal 1, 2 Yves Robert 1, 2 Guillaume Pallez 3
1 ROMA - Optimisation des ressources : modèles, algorithmes et ordonnancement
Inria Grenoble - Rhône-Alpes, LIP - Laboratoire de l'Informatique du Parallélisme
3 TADAAM - Topology-Aware System-Scale Data Management for High-Performance Computing
LaBRI - Laboratoire Bordelais de Recherche en Informatique, Inria Bordeaux - Sud-Ouest
Abstract : The Young/Daly formula for periodic checkpointing is known to hold for a divisible load application where one can checkpoint at any time-step. In an nutshell, the optimal period is $P YD = 2µ f C$ where µ f is the Mean Time Between Failures (MTBF) and C is the checkpoint time. This paper assesses the accuracy of the formula for applications decomposed into computational iterations where: (i) the duration of an iteration is stochastic, i.e., obeys a probability distribution law D of mean µ D ; and (ii) one can checkpoint only at the end of an iteration. We first consider static strategies where checkpoints are taken after a given number of iterations k and provide a closed-form, asymptotically optimal, formula for k, valid for any distribution D. We then show that using the Young/Daly formula to compute $k (as k • µ D = P YD)$ is a first order approximation of this formula. We also consider dynamic strategies where one decides to checkpoint at the end of an iteration only if the total amount of work since the last checkpoint exceeds a threshold W th , and otherwise proceed to the next iteration. Similarly, we provide a closed-form formula for this threshold and show that P YD is a first-order approximation of W th. Finally, we provide an extensive set of simulations where D is either Uniform, Gamma or truncated Normal, which shows the global accuracy of the Young/Daly formula, even when the distribution D had a large standard deviation (and when one cannot use a first-order approximation). Hence we establish that the relevance of the formula goes well beyond its original framework.
Document type :
Conference papers
Complete list of metadatas

https://hal.inria.fr/hal-03024618
Contributor : Equipe Roma <>
Submitted on : Monday, November 30, 2020 - 4:40:15 PM
Last modification on : Thursday, December 3, 2020 - 1:52:14 PM

File

icpp20-170.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Yishu Du, Loris Marchal, Yves Robert, Guillaume Pallez. Robustness of the Young/Daly formula for stochastic iterative applications. ICPP 2020 - 49th International Conference on Parallel Processing, Aug 2020, Edmonton / Virtual, Canada. pp.1-11, ⟨10.1145/3404397.3404419⟩. ⟨hal-03024618⟩

Share

Metrics

Record views

14

Files downloads

49