An Efficient and Scalable Approach for Implementing Fault Tolerant DSM Architectures

Christine Morin 1 Anne-Marie Kermarrec 1 Michel Banâtre 1
1 SOLIDOR - Design of Distributed Operating Systems
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, INRIA Rennes
Abstract : Distributed Shared Memory (DSM) architectures are attractive to execute high performance parallel applications. Made up of a large number of components, these architectures have however a high probability of failure. We propose a protocol to tolerate node failures in two classes of DSM architectures: Cache Only Memory Architectures (COMA) and Distributed Virtual Shared Memory (SVM) systems. The proposed solution is based on backward error recovery and consists of an extension to the existing coherence protocols to manage data used by processors for the computation and recovery data, used for fault tolerance. The implementation of the protocol in a COMA architecture has been evaluated by simulation. The protocol has also been implemented in a \textscsvm system on a network of workstations. Both simulation results and measurements show that our solution is efficient and scalable.
Type de document :
Rapport
[Research Report] RR-3103, INRIA. 1997
Liste complète des métadonnées

https://hal.inria.fr/inria-00073588
Contributeur : Rapport de Recherche Inria <>
Soumis le : mercredi 24 mai 2006 - 13:16:10
Dernière modification le : jeudi 11 janvier 2018 - 06:20:10
Document(s) archivé(s) le : dimanche 4 avril 2010 - 23:51:06

Fichiers

Identifiants

  • HAL Id : inria-00073588, version 1

Collections

Citation

Christine Morin, Anne-Marie Kermarrec, Michel Banâtre. An Efficient and Scalable Approach for Implementing Fault Tolerant DSM Architectures. [Research Report] RR-3103, INRIA. 1997. 〈inria-00073588〉

Partager

Métriques

Consultations de la notice

506

Téléchargements de fichiers

64