A Two-Level Checkpoint Algorithm in a Highly-Available Parallel Single Level Store System

Christine Morin 1 Renaud Lottiaux 1 Anne-Marie Kermarrec 2
1 PARIS - Programming distributed parallel systems for large scale numerical simulation
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, ENS Cachan - École normale supérieure - Cachan, Inria Rennes – Bretagne Atlantique
Abstract : A Parallel Single Level Store systems (PSLS) integrates a shared virtual memory and a parallel file system. Managing globally the data, they provide programmers of scientific applications with the attractive shared memory programming model combined with a large and efficient file system in a cluster. In this paper, we present a cheap and efficient two-level checkpointi- ng approach enabling a PSLS to tolerate failures. The first level checkpointing algorithm is very efficient and saves data in memory but requires a large amount of memory space. When memories are saturated, an alternative algorithm, saving a checkpoint on disks is implemented. Performance results present the impact of different variants of the checkpointing algorithms.
Document type :
Reports
Complete list of metadatas

Cited literature [17 references]  Display  Hide  Download

https://hal.inria.fr/inria-00072547
Contributor : Rapport de Recherche Inria <>
Submitted on : Wednesday, May 24, 2006 - 10:15:06 AM
Last modification on : Friday, November 16, 2018 - 1:24:26 AM
Long-term archiving on : Sunday, April 4, 2010 - 11:12:52 PM

Identifiers

  • HAL Id : inria-00072547, version 1

Citation

Christine Morin, Renaud Lottiaux, Anne-Marie Kermarrec. A Two-Level Checkpoint Algorithm in a Highly-Available Parallel Single Level Store System. [Research Report] RR-4086, INRIA. 2000. ⟨inria-00072547⟩

Share

Metrics

Record views

445

Files downloads

303