Skip to Main content Skip to Navigation

An architecture for tolerating processor failures in shared-memory multiprocessors

Abstract : In this paper, we focus on the problem of recovering processor failures in shared memory multiprocessors. We propose an architecture designed for transparently tolerating processor failures. The recoverable shared memory (RSM) in the main component of this architecture which provides a hardware supported backward error recovery mechanism. This technique copes with standard caches and cache coherence protocols and avoids rollback propagation. The performance of the architecture during normal execution is evaluated and compared with that of existing fault tolerant shared memory multiprocessors. The performance study has been conducted by simulation using address traces collected from real parallel applications.
Document type :
Complete list of metadata
Contributor : Rapport de Recherche Inria <>
Submitted on : Wednesday, May 24, 2006 - 4:06:56 PM
Last modification on : Tuesday, June 15, 2021 - 4:26:33 PM
Long-term archiving on: : Monday, April 5, 2010 - 12:13:47 AM


  • HAL Id : inria-00074708, version 1


Michel Banâtre, Alain Gefflaut, Philippe Joubert, Peter Lee, Christine Morin. An architecture for tolerating processor failures in shared-memory multiprocessors. [Research Report] RR-1965, INRIA. 1993. ⟨inria-00074708⟩



Record views


Files downloads