Skip to Main content Skip to Navigation
Reports

On Modeling Consistent Checkpoints and the Domino Effect in Distributed Systems

Roberto Baldoni 1 Jean-Michel Hélary 1 Achour Mostefaoui 1 Michel Raynal 1
1 ADP - Distributed Algorithms and Protocols
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, INRIA Rennes
Abstract : Backward error recovery is one of the most used schemes to ensure fault-tolera- nce in distributed systems. It consists, upon the occurrence of a failure, in restoring a distributed computation to an error-free global state from which it can be resumed to produce a correct behavior. Checkpointing is one of the techniques to pursue the backward error recovery. In this paper, we present a general framework that takes a semantic including missing and orphan messages into account. Notions of missings and orphans are revisited by considering additional underlying mechanism available on channels and semantics of messages. This framework allows, first, to state and prove a theorem to determine if an arbitrary set of checkpoints is consistent and, second, to define formally the domino effect. Further, we show how previously published uncoordinated checkpointing algorithms can be described in our context and some example of uncoordinated checkpointin- g algorithms that ensure domino-free rollback recovery are also given.
Document type :
Reports
Complete list of metadata

https://hal.inria.fr/inria-00074112
Contributor : Rapport de Recherche Inria <>
Submitted on : Wednesday, May 24, 2006 - 2:32:55 PM
Last modification on : Thursday, February 11, 2021 - 2:48:03 PM
Long-term archiving on: : Monday, April 5, 2010 - 12:05:17 AM

Identifiers

  • HAL Id : inria-00074112, version 1

Citation

Roberto Baldoni, Jean-Michel Hélary, Achour Mostefaoui, Michel Raynal. On Modeling Consistent Checkpoints and the Domino Effect in Distributed Systems. [Research Report] RR-2569, INRIA. 1995. ⟨inria-00074112⟩

Share

Metrics

Record views

372

Files downloads

3941