How to bring together fault tolerance and data consistency to enable grid data sharing

Gabriel Antoniu 1 Jean-François Deverge 1 Sébastien Monnet 1
1 PARIS - Programming distributed parallel systems for large scale numerical simulation
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, ENS Cachan - École normale supérieure - Cachan, Inria Rennes – Bretagne Atlantique
Abstract : This paper addresses the challenge of transparent data sharing within computing grids built as cluster federations. On such platforms, the availability of storage resources may change in a dynamic way, often due to hardware failures. We focus on the problem of handling the consistency of replicated data in the presence of failures. We propose a software architecture which decouples consistency management from fault tolerance management. We illustrate this architecture with a case study showing how to design a consistency protocol using fault-tolerant building blocks. As a proof of concept, we describe a prototype implementation of this protocol within JuxMem, a software experimental platform for grid data sharing, and we report on a preliminary experimental evaluation of the proposed approach.
Complete list of metadatas

https://hal.inria.fr/inria-00000987
Contributor : Sébastien Monnet <>
Submitted on : Wednesday, January 11, 2006 - 9:38:55 AM
Last modification on : Friday, November 16, 2018 - 1:23:24 AM
Long-term archiving on : Monday, September 20, 2010 - 2:00:06 PM

Identifiers

  • HAL Id : inria-00000987, version 2

Citation

Gabriel Antoniu, Jean-François Deverge, Sébastien Monnet. How to bring together fault tolerance and data consistency to enable grid data sharing. Concurrency and Computation: Practice and Experience, Wiley, 2006, Concurrency and Computation: Practice and Experience, pp.1-19. ⟨inria-00000987v2⟩

Share

Metrics

Record views

517

Files downloads

187