A Recoverable Distributed Shared Memory Integrating Coherence and Recoverability

Anne-Marie Kermarrec 1 Gilbert Cabillic 1 Alain Gefflaut 1 Christine Morin 1 Isabelle Puaut 1
1 SOLIDOR - Design of Distributed Operating Systems
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, INRIA Rennes
Abstract : Large-scale distributed systems are very attractive for the execution of parallel applications requiring a huge computing power. However, their high probability of site failure is unacceptable, especially for long time running applications. In this paper, we address this problem and propose a checkpointing mechanism relying on a recoverable distributed shared memory (DSM). Although most recoverable DSM require specific hardware to store recovery data, our scheme uses standard memories to store both current and recovery data. Moreover, the management of recovery data is merged with the management of current data by extending the DSM's coherence protocol. This approach limits the hardware development and takes advantage of the data replication provided by a DSM in order to limit the amount of transferred pages during the checkpointing.
Type de document :
Rapport
[Research Report] RR-2481, INRIA. 1995
Liste complète des métadonnées

https://hal.inria.fr/inria-00074193
Contributeur : Rapport de Recherche Inria <>
Soumis le : mercredi 24 mai 2006 - 14:43:16
Dernière modification le : mercredi 16 mai 2018 - 11:23:05
Document(s) archivé(s) le : dimanche 4 avril 2010 - 22:13:11

Fichiers

Identifiants

  • HAL Id : inria-00074193, version 1

Citation

Anne-Marie Kermarrec, Gilbert Cabillic, Alain Gefflaut, Christine Morin, Isabelle Puaut. A Recoverable Distributed Shared Memory Integrating Coherence and Recoverability. [Research Report] RR-2481, INRIA. 1995. 〈inria-00074193〉

Partager

Métriques

Consultations de la notice

334

Téléchargements de fichiers

193