Skip to Main content Skip to Navigation

Tolerating Node Failures in Cache Only Memory Architectures

Michel Banâtre 1 Alain Gefflaut 1 Christine Morin 1
1 SOLIDOR - Design of Distributed Operating Systems
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, INRIA Rennes
Abstract : COMAs (Cache Only Memory Architectures) are an interesting class of large scale shared memory multiprocessors. They extend the concepts of cache memories and shared virtual memory by using the local memories of the nodes as large caches for a single shared address space. Due to their large number of components, these architectures are particularly susceptible to hardware failures and so fault tolerance mechanisms have to be introduced to ensure a high availability. In this paper, we propose an implementation of backward error recovery in a COMA which minimizes performance degradation and requires little hardware modifications. This implementation uses the features of a COMA to implement a stable storage abstraction using the standard memories of the architecture. Recovery data are replicated and mixed with current data in node memories both of which are managed in a transparent way using an extended coherence protocol.
Document type :
Complete list of metadata
Contributor : Rapport de Recherche Inria Connect in order to contact the contributor
Submitted on : Wednesday, May 24, 2006 - 3:06:04 PM
Last modification on : Friday, February 4, 2022 - 3:15:25 AM
Long-term archiving on: : Monday, April 5, 2010 - 12:08:31 AM


  • HAL Id : inria-00074341, version 1


Michel Banâtre, Alain Gefflaut, Christine Morin. Tolerating Node Failures in Cache Only Memory Architectures. [Research Report] RR-2335, INRIA. 1994. ⟨inria-00074341⟩



Record views


Files downloads