On Modeling Consistent Checkpoints and the Domino Effect in Distributed Systems - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Rapport (Rapport De Recherche) Année : 1995

On Modeling Consistent Checkpoints and the Domino Effect in Distributed Systems

Résumé

Backward error recovery is one of the most used schemes to ensure fault-tolera- nce in distributed systems. It consists, upon the occurrence of a failure, in restoring a distributed computation to an error-free global state from which it can be resumed to produce a correct behavior. Checkpointing is one of the techniques to pursue the backward error recovery. In this paper, we present a general framework that takes a semantic including missing and orphan messages into account. Notions of missings and orphans are revisited by considering additional underlying mechanism available on channels and semantics of messages. This framework allows, first, to state and prove a theorem to determine if an arbitrary set of checkpoints is consistent and, second, to define formally the domino effect. Further, we show how previously published uncoordinated checkpointing algorithms can be described in our context and some example of uncoordinated checkpointin- g algorithms that ensure domino-free rollback recovery are also given.
Fichier principal
Vignette du fichier
RR-2569.pdf (299.68 Ko) Télécharger le fichier

Dates et versions

inria-00074112 , version 1 (24-05-2006)

Identifiants

  • HAL Id : inria-00074112 , version 1

Citer

Roberto Baldoni, Jean-Michel Hélary, Achour Mostefaoui, Michel Raynal. On Modeling Consistent Checkpoints and the Domino Effect in Distributed Systems. [Research Report] RR-2569, INRIA. 1995. ⟨inria-00074112⟩
226 Consultations
3047 Téléchargements

Partager

Gmail Facebook X LinkedIn More