Skip to Main content Skip to Navigation
Conference papers

Using failure injection mechanisms to experiment and evaluate a grid failure detector

Sébastien Monnet 1 Marin Bertier 1
1 PARIS - Programming distributed parallel systems for large scale numerical simulation
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, ENS Cachan - École normale supérieure - Cachan, Inria Rennes – Bretagne Atlantique
Abstract : Computing grids are large-scale, highly-distributed, often hierarchical, platforms. At such scales, failures are no longer exceptions, but part of the normal behavior. When designing software for grids, developers have to take failures into account. It is crucial to make experiments at a large scale, with various volatility conditions, in order to measure the impact of failures on the whole system. This paper presents an experimental tool allowing the user to inject failures during a practical evaluation of fault-tolerant systems. We illustrate the usefulness of our tool through an evaluation of a hierarchical grid failure detector.
Complete list of metadata

Cited literature [16 references]  Display  Hide  Download

https://hal.inria.fr/inria-00001193
Contributor : Sébastien Monnet <>
Submitted on : Tuesday, June 27, 2006 - 12:28:18 AM
Last modification on : Monday, February 15, 2021 - 10:42:54 AM
Long-term archiving on: : Saturday, April 3, 2010 - 8:56:32 PM

File

Identifiers

  • HAL Id : inria-00001193, version 1

Citation

Sébastien Monnet, Marin Bertier. Using failure injection mechanisms to experiment and evaluate a grid failure detector. Workshop on Computational Grids and Clusters (WCGC 2006), Jul 2006, Rio de Janeiro, Brazil. ⟨inria-00001193⟩

Share

Metrics

Record views

518

Files downloads

324