Skip to Main content Skip to Navigation
New interface
Reports (Research report)

Checkpointing and Recovery of Shared Memory Parallel Applications in a Cluster

Ramamurthy Badrinath 1 Christine Morin 1 Geoffroy Vallée 1 
1 PARIS - Programming distributed parallel systems for large scale numerical simulation
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, ENS Cachan - École normale supérieure - Cachan, Inria Rennes – Bretagne Atlantique
Abstract : This paper describes issues in the design and implementation of checkpointing and recovery modules for the Kerrighed DSM cluster system. Our design is for a DSM supporting the sequential consistency model. The mechanisms are general enough to be used in a number of different checkpointing and recovery protocols. It is designed to support common optimizations for performance suggested in literature, while staying light-weight during fault-free execution. We also present preliminary performance results of the current implementation.
Document type :
Reports (Research report)
Complete list of metadata
Contributor : Rapport De Recherche Inria Connect in order to contact the contributor
Submitted on : Tuesday, May 23, 2006 - 6:46:03 PM
Last modification on : Thursday, October 27, 2022 - 3:45:16 AM
Long-term archiving on: : Sunday, April 4, 2010 - 10:37:39 PM


  • HAL Id : inria-00071780, version 1


Ramamurthy Badrinath, Christine Morin, Geoffroy Vallée. Checkpointing and Recovery of Shared Memory Parallel Applications in a Cluster. [Research Report] RR-4806, INRIA. 2003. ⟨inria-00071780⟩



Record views


Files downloads