Skip to Main content Skip to Navigation

Smooth and Efficient Integration of High-Availability in a Parallel Single Level Store System

Anne-Marie Kermarrec 1 Christine Morin 2
2 PARIS - Programming distributed parallel systems for large scale numerical simulation
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, ENS Cachan - École normale supérieure - Cachan, Inria Rennes – Bretagne Atlantique
Abstract : A parallel single level store (PSLS) system integrates a shared virtual memory and a parallel file system thus providing programmers with a global address space including both memory and file data. Parallel single level store systems implemented in a cluster thus represent an attractive support for long running parallel applications combining both the natural shared memory programming model and a large and efficient file system. However the need to tolerate failures in such a system increases with the size of applications. In this paper we present the smooth integration of a backward error recovery high-availability support into a parallel single level store system. Our system is able to tolerate multiple transient failures, a single permanent one, and power cut failures affecting the whole cluster without requiring any specific hardware. For this purpose, our highly-available parallel single level store system relies on a high degree of integration (and reusability) of high-availability and standard supports. We focus on the parallel file system management at checkpointing and recovery time and especially on the mirror management. A prototype integrating our high-availability support has been implemented and we show some performance results in the paper.
Document type :
Complete list of metadatas

Cited literature [18 references]  Display  Hide  Download
Contributor : Rapport de Recherche Inria <>
Submitted on : Wednesday, May 24, 2006 - 10:11:59 AM
Last modification on : Friday, July 10, 2020 - 4:23:25 PM
Long-term archiving on: : Sunday, April 4, 2010 - 11:12:20 PM


  • HAL Id : inria-00072532, version 1


Anne-Marie Kermarrec, Christine Morin. Smooth and Efficient Integration of High-Availability in a Parallel Single Level Store System. [Research Report] RR-4099, INRIA. 2001. ⟨inria-00072532⟩



Record views


Files downloads