HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation

Consistent Checkpointing in Message Passing Distributed Systems

Roberto Baldoni 1 Jean-Michel Hélary 1 Achour Mostefaoui 1 Michel Raynal 1
1 ADP - Distributed Algorithms and Protocols
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, INRIA Rennes
Abstract : A global checkpoint of a distributed computation is a a set of local checkpoints (local states), one per process. Determining consistent global checkpoints is a very important problem for many distributed applications (e.g. fault-tolerance, distributed debugging, properties detection, etc). This paper concentrates on such determinations. A precedence relation on checkpoint intervals (such intervals are sets of events produced by processes between two successive local checkpoints) is introduced and analyzed. It is shown that a local chekpoint is useless (i.e. it cannot participate in any consistent global checkpoint) iff some pattern appears in this precedence relation. Then an adaptive checkpointing algorithm is introduced. This algorithm, assuming processes take local checkpoints independently, requires them to take (as few as possible) additional ckeckpoints in order that none of previously taken checkpoints be useless. It is based on the prevention of the previously mentioned pattern. In some sense, this algorithm combines advantages of both coordinated and uncoordinated checkpointing algorithms without inheriting their drawbacks.
Document type :
Complete list of metadata

Contributor : Rapport de Recherche Inria Connect in order to contact the contributor
Submitted on : Wednesday, May 24, 2006 - 2:33:23 PM
Last modification on : Friday, February 4, 2022 - 3:25:18 AM
Long-term archiving on: : Monday, April 5, 2010 - 12:05:27 AM


  • HAL Id : inria-00074117, version 1


Roberto Baldoni, Jean-Michel Hélary, Achour Mostefaoui, Michel Raynal. Consistent Checkpointing in Message Passing Distributed Systems. [Research Report] RR-2564, INRIA. 1995. ⟨inria-00074117⟩



Record views


Files downloads