Smooth and Efficient Integration of High-Availability in a Parallel Single Level Store System

Anne-Marie Kermarrec 1 Christine Morin 2
2 PARIS - Programming distributed parallel systems for large scale numerical simulation
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, ENS Cachan - École normale supérieure - Cachan, Inria Rennes – Bretagne Atlantique
Abstract : A parallel single level store (PSLS) system integrates a shared virtual memory and a parallel file system thus providing programmers with a global address space including both memory and file data. Parallel single level store systems implemented in a cluster thus represent an attractive support for long running parallel applications combining both the natural shared memory programming model and a large and efficient file system. However the need to tolerate failures in such a system increases with the size of applications. In this paper we present the smooth integration of a backward error recovery high-availability support into a parallel single level store system. Our system is able to tolerate multiple transient failures, a single permanent one, and power cut failures affecting the whole cluster without requiring any specific hardware. For this purpose, our highly-available parallel single level store system relies on a high degree of integration (and reusability) of high-availability and standard supports. We focus on the parallel file system management at checkpointing and recovery time and especially on the mirror management. A prototype integrating our high-availability support has been implemented and we show some performance results in the paper.
Document type :
Reports
Complete list of metadatas

Cited literature [18 references]  Display  Hide  Download

https://hal.inria.fr/inria-00072532
Contributor : Rapport de Recherche Inria <>
Submitted on : Wednesday, May 24, 2006 - 10:11:59 AM
Last modification on : Friday, November 16, 2018 - 1:24:40 AM
Long-term archiving on : Sunday, April 4, 2010 - 11:12:20 PM

Identifiers

  • HAL Id : inria-00072532, version 1

Citation

Anne-Marie Kermarrec, Christine Morin. Smooth and Efficient Integration of High-Availability in a Parallel Single Level Store System. [Research Report] RR-4099, INRIA. 2001. ⟨inria-00072532⟩

Share

Metrics

Record views

316

Files downloads

225