Tolerating Node Failures in Cache Only Memory Architectures

Michel Banâtre 1 Alain Gefflaut 1 Christine Morin 1
1 SOLIDOR - Design of Distributed Operating Systems
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, INRIA Rennes
Abstract : COMAs (Cache Only Memory Architectures) are an interesting class of large scale shared memory multiprocessors. They extend the concepts of cache memories and shared virtual memory by using the local memories of the nodes as large caches for a single shared address space. Due to their large number of components, these architectures are particularly susceptible to hardware failures and so fault tolerance mechanisms have to be introduced to ensure a high availability. In this paper, we propose an implementation of backward error recovery in a COMA which minimizes performance degradation and requires little hardware modifications. This implementation uses the features of a COMA to implement a stable storage abstraction using the standard memories of the architecture. Recovery data are replicated and mixed with current data in node memories both of which are managed in a transparent way using an extended coherence protocol.
Type de document :
[Research Report] RR-2335, INRIA. 1994
Liste complète des métadonnées
Contributeur : Rapport de Recherche Inria <>
Soumis le : mercredi 24 mai 2006 - 15:06:04
Dernière modification le : vendredi 16 novembre 2018 - 01:23:24
Document(s) archivé(s) le : lundi 5 avril 2010 - 00:08:31



  • HAL Id : inria-00074341, version 1


Michel Banâtre, Alain Gefflaut, Christine Morin. Tolerating Node Failures in Cache Only Memory Architectures. [Research Report] RR-2335, INRIA. 1994. 〈inria-00074341〉



Consultations de la notice


Téléchargements de fichiers