Consistent Checkpointing in Message Passing Distributed Systems

Roberto Baldoni 1 Jean-Michel Hélary 1 Achour Mostefaoui 1 Michel Raynal 1
1 ADP - Distributed Algorithms and Protocols
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, INRIA Rennes
Abstract : A global checkpoint of a distributed computation is a a set of local checkpoints (local states), one per process. Determining consistent global checkpoints is a very important problem for many distributed applications (e.g. fault-tolerance, distributed debugging, properties detection, etc). This paper concentrates on such determinations. A precedence relation on checkpoint intervals (such intervals are sets of events produced by processes between two successive local checkpoints) is introduced and analyzed. It is shown that a local chekpoint is useless (i.e. it cannot participate in any consistent global checkpoint) iff some pattern appears in this precedence relation. Then an adaptive checkpointing algorithm is introduced. This algorithm, assuming processes take local checkpoints independently, requires them to take (as few as possible) additional ckeckpoints in order that none of previously taken checkpoints be useless. It is based on the prevention of the previously mentioned pattern. In some sense, this algorithm combines advantages of both coordinated and uncoordinated checkpointing algorithms without inheriting their drawbacks.
Type de document :
[Research Report] RR-2564, INRIA. 1995
Liste complète des métadonnées
Contributeur : Rapport de Recherche Inria <>
Soumis le : mercredi 24 mai 2006 - 14:33:23
Dernière modification le : mercredi 16 mai 2018 - 11:23:02
Document(s) archivé(s) le : lundi 5 avril 2010 - 00:05:27



  • HAL Id : inria-00074117, version 1


Roberto Baldoni, Jean-Michel Hélary, Achour Mostefaoui, Michel Raynal. Consistent Checkpointing in Message Passing Distributed Systems. [Research Report] RR-2564, INRIA. 1995. 〈inria-00074117〉



Consultations de la notice


Téléchargements de fichiers