On Modeling Consistent Checkpoints and the Domino Effect in Distributed Systems

Roberto Baldoni 1 Jean-Michel Hélary 1 Achour Mostefaoui 1 Michel Raynal 1
1 ADP - Distributed Algorithms and Protocols
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, INRIA Rennes
Abstract : Backward error recovery is one of the most used schemes to ensure fault-tolera- nce in distributed systems. It consists, upon the occurrence of a failure, in restoring a distributed computation to an error-free global state from which it can be resumed to produce a correct behavior. Checkpointing is one of the techniques to pursue the backward error recovery. In this paper, we present a general framework that takes a semantic including missing and orphan messages into account. Notions of missings and orphans are revisited by considering additional underlying mechanism available on channels and semantics of messages. This framework allows, first, to state and prove a theorem to determine if an arbitrary set of checkpoints is consistent and, second, to define formally the domino effect. Further, we show how previously published uncoordinated checkpointing algorithms can be described in our context and some example of uncoordinated checkpointin- g algorithms that ensure domino-free rollback recovery are also given.
Type de document :
[Research Report] RR-2569, INRIA. 1995
Liste complète des métadonnées

Contributeur : Rapport de Recherche Inria <>
Soumis le : mercredi 24 mai 2006 - 14:32:55
Dernière modification le : vendredi 16 novembre 2018 - 01:24:05
Document(s) archivé(s) le : lundi 5 avril 2010 - 00:05:17



  • HAL Id : inria-00074112, version 1


Roberto Baldoni, Jean-Michel Hélary, Achour Mostefaoui, Michel Raynal. On Modeling Consistent Checkpoints and the Domino Effect in Distributed Systems. [Research Report] RR-2569, INRIA. 1995. 〈inria-00074112〉



Consultations de la notice


Téléchargements de fichiers