Locks and Barriers in Checkpointing and Recovery

Ramamurthy Badrinath 1 Christine Morin 1
1 PARIS - Programming distributed parallel systems for large scale numerical simulation
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, ENS Cachan - École normale supérieure - Cachan, Inria Rennes – Bretagne Atlantique
Abstract : Dependency tracking between communicating tasks is an important concept in backward error recovery for parallel applications. One can extend the traditional dependence tracking model for message passing systems to track dependencies between shared memory and task private states for shared memory applications. The objective of this paper is to analyze the issues generated by locks and barriers in parallel applications so that we can checkpoint tasks at any time (even when holding or waiting for locks and barriers). In particular we attempt to extend earlier dependency tracking mechanisms to locks and barriers. We address both coordinated and uncoordinated checkpointing schemes.
Document type :
Reports
Complete list of metadatas

https://hal.inria.fr/inria-00071563
Contributor : Rapport de Recherche Inria <>
Submitted on : Tuesday, May 23, 2006 - 5:57:26 PM
Last modification on : Friday, November 16, 2018 - 1:24:04 AM
Long-term archiving on : Sunday, April 4, 2010 - 8:35:44 PM

Identifiers

  • HAL Id : inria-00071563, version 1

Citation

Ramamurthy Badrinath, Christine Morin. Locks and Barriers in Checkpointing and Recovery. [Research Report] RR-5021, INRIA. 2003. ⟨inria-00071563⟩

Share

Metrics

Record views

291

Files downloads

149