Skip to Main content Skip to Navigation
Reports

FAIL-MPI: How fault-tolerant is fault-tolerant MPI ?

Thomas Herault 1, 2 William Hoarau 1, 2 Pierre Lemarinier 1, 2 Eric Rodriguez 1 Sébastien Tixeuil 1, 2
1 GRAND-LARGE - Global parallel and distributed computing
CNRS - Centre National de la Recherche Scientifique : UMR8623, Inria Saclay - Ile de France, UP11 - Université Paris-Sud - Paris 11, LIFL - Laboratoire d'Informatique Fondamentale de Lille, LRI - Laboratoire de Recherche en Informatique
Abstract : One of the topics of paramount importance in the development of Cluster and Grid middleware is the impact of faults since their occurrence probability in a Grid infrastructure and in large-scale distributed system is actually very high. MPI (Message Passing Interface) is a popular abstraction for programming distributed computation applications. FAIL is an abstract language for fault occurrence description capable of expressing complex and realistic fault scenarios. In this paper, we investigate the possibility of using FAIL to inject faults in a fault-tolerant MPI implementation. Our middleware, FAIL-MPI, is used to carry quantitative and qualitative faults and stress testing.
Complete list of metadata

https://hal.inria.fr/inria-00078183
Contributor : Sébastien Tixeuil <>
Submitted on : Saturday, June 3, 2006 - 9:00:16 PM
Last modification on : Friday, February 5, 2021 - 6:32:01 PM
Long-term archiving on: : Tuesday, September 18, 2012 - 2:30:49 PM

Identifiers

  • HAL Id : inria-00078183, version 1

Collections

Citation

Thomas Herault, William Hoarau, Pierre Lemarinier, Eric Rodriguez, Sébastien Tixeuil. FAIL-MPI: How fault-tolerant is fault-tolerant MPI ?. [Research Report] 1450, 2006, pp.26. ⟨inria-00078183⟩

Share

Metrics

Record views

916

Files downloads

523