Skip to Main content Skip to Navigation
New interface
Reports (Research report)

An architecture for tolerating processor failures in shared-memory multiprocessors

Michel Banâtre 1 Alain Gefflaut 1 Philippe Joubert 1 Peter Lee 2 Christine Morin 1 
1 LSP - Langages et Systèmes Parallèles
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, INRIA Rennes
Abstract : In this paper, we focus on the problem of recovering processor failures in shared memory multiprocessors. We propose an architecture designed for transparently tolerating processor failures. The recoverable shared memory (RSM) in the main component of this architecture which provides a hardware supported backward error recovery mechanism. This technique copes with standard caches and cache coherence protocols and avoids rollback propagation. The performance of the architecture during normal execution is evaluated and compared with that of existing fault tolerant shared memory multiprocessors. The performance study has been conducted by simulation using address traces collected from real parallel applications.
Document type :
Reports (Research report)
Complete list of metadata

https://hal.inria.fr/inria-00074708
Contributor : Rapport De Recherche Inria Connect in order to contact the contributor
Submitted on : Wednesday, May 24, 2006 - 4:06:56 PM
Last modification on : Thursday, October 27, 2022 - 3:45:41 AM
Long-term archiving on: : Monday, April 5, 2010 - 12:13:47 AM

Identifiers

  • HAL Id : inria-00074708, version 1

Citation

Michel Banâtre, Alain Gefflaut, Philippe Joubert, Peter Lee, Christine Morin. An architecture for tolerating processor failures in shared-memory multiprocessors. [Research Report] RR-1965, INRIA. 1993. ⟨inria-00074708⟩

Share

Metrics

Record views

170

Files downloads

208