Skip to Main content Skip to Navigation
Journal articles

Complexity Analysis of Checkpoint Scheduling with Variable Costs

Mohamed Slim Bouguerra 1 Denis Trystram 2, 3 Frédéric Wagner 2, 3
1 MESCAL - Middleware efficiently scalable
Inria Grenoble - Rhône-Alpes, LIG - Laboratoire d'Informatique de Grenoble [2007-2015]
3 MOAIS - PrograMming and scheduling design fOr Applications in Interactive Simulation
Inria Grenoble - Rhône-Alpes, LIG - Laboratoire d'Informatique de Grenoble [2007-2015]
Abstract : The parallel computing platforms available today are increasingly larger and thus, more and more subject to failures. Consequently it is necessary to develop efficient strategies providing safe and reliable completion for HPC parallel applications. Checkpointing is one of the most popular and efficient technique for developing fault-tolerant applications on such context. However, checkpoint operations are costly in terms of time, computation and network communication. This will certainly affect the global performance of the application. In this work, we propose a performance model that expresses formally the checkpoint scheduling problem. This model exhibits the tradeoff between the impact of the checkpoints operations and the lost computation due to failures. Based on this model we study the computational complexity of the problem of scheduling checkpoints with variable costs for general failure distributions. More precisely, we provide a new computational complexity analysis which explicits in depth the relations between the probabilistic failure model, the checkpoint cost and the computational model. In particular, we prove that the checkpoint scheduling problem is NP-hard even in the simple case of uniform failure distribution. We also present a dynamic programming scheme for determining the optimal checkpointing times in all the variants of the problem.
Complete list of metadatas
Contributor : Arnaud Legrand <>
Submitted on : Wednesday, February 13, 2013 - 5:05:47 PM
Last modification on : Thursday, July 9, 2020 - 9:44:36 AM




Mohamed Slim Bouguerra, Denis Trystram, Frédéric Wagner. Complexity Analysis of Checkpoint Scheduling with Variable Costs. IEEE Transactions on Computers, Institute of Electrical and Electronics Engineers, 2012, 99, ⟨10.1109/TC.2012.57⟩. ⟨hal-00788101⟩



Record views