Skip to Main content Skip to Navigation

Checkpointing and Recovery of Shared Memory Parallel Applications in a Cluster

Ramamurthy Badrinath 1 Christine Morin 1 Geoffroy Vallée 1
1 PARIS - Programming distributed parallel systems for large scale numerical simulation
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, ENS Cachan - École normale supérieure - Cachan, Inria Rennes – Bretagne Atlantique
Abstract : This paper describes issues in the design and implementation of checkpointing and recovery modules for the Kerrighed DSM cluster system. Our design is for a DSM supporting the sequential consistency model. The mechanisms are general enough to be used in a number of different checkpointing and recovery protocols. It is designed to support common optimizations for performance suggested in literature, while staying light-weight during fault-free execution. We also present preliminary performance results of the current implementation.
Document type :
Complete list of metadata
Contributor : Rapport de Recherche Inria Connect in order to contact the contributor
Submitted on : Tuesday, May 23, 2006 - 6:46:03 PM
Last modification on : Tuesday, June 15, 2021 - 4:06:14 PM
Long-term archiving on: : Sunday, April 4, 2010 - 10:37:39 PM


  • HAL Id : inria-00071780, version 1


Ramamurthy Badrinath, Christine Morin, Geoffroy Vallée. Checkpointing and Recovery of Shared Memory Parallel Applications in a Cluster. [Research Report] RR-4806, INRIA. 2003. ⟨inria-00071780⟩



Record views


Files downloads