Communicating processes and fault tolerance : a shared memory multiprocessor experience

Michel Banâtre 1 Maurice Jégado 1 Philippe Joubert 1 Christine Morin 1
1 LSP - Langages et Systèmes Parallèles
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires
Abstract : The concept of backward recovery is now well established as a means of restoring a consistent state of a fault tolerant system should some faults occur. In this paper, we consider a system of communicating processes mapped onto a multilevel execution support. A shared memory multiprocessor machine is assumed. Our interest is in tolerating the hardware faults that may occur during the execution of a concurrent computation. The machine provides a hardware backard recovery protocol based on a specialized memory device which tracks dependencies between the processors accessing shared data residing in memory. The transparency provided by the protocol is discussed considering successively the models of computation at the various levels of abstraction of the execution support.
Document type :
Reports
Complete list of metadatas

https://hal.inria.fr/inria-00074911
Contributor : Rapport de Recherche Inria <>
Submitted on : Wednesday, May 24, 2006 - 4:52:58 PM
Last modification on : Friday, November 16, 2018 - 1:28:31 AM
Long-term archiving on : Tuesday, April 12, 2011 - 8:04:11 PM

Identifiers

  • HAL Id : inria-00074911, version 1

Citation

Michel Banâtre, Maurice Jégado, Philippe Joubert, Christine Morin. Communicating processes and fault tolerance : a shared memory multiprocessor experience. [Research Report] RR-1649, INRIA. 1992. ⟨inria-00074911⟩

Share

Metrics

Record views

292

Files downloads

47