HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation

FAIL-MPI: How fault-tolerant is fault-tolerant MPI ?

Thomas Herault 1, 2 William Hoarau 1, 2 Pierre Lemarinier 1, 2 Eric Rodriguez 1 Sébastien Tixeuil 1, 2
1 GRAND-LARGE - Global parallel and distributed computing
CNRS - Centre National de la Recherche Scientifique : UMR8623, Inria Saclay - Ile de France, UP11 - Université Paris-Sud - Paris 11, LIFL - Laboratoire d'Informatique Fondamentale de Lille, LRI - Laboratoire de Recherche en Informatique
Abstract : One of the topics of paramount importance in the development of Cluster and Grid middleware is the impact of faults since their occurrence probability in a Grid infrastructure and in large-scale distributed system is actually very high. MPI (Message Passing Interface) is a popular abstraction for programming distributed computation applications. FAIL is an abstract language for fault occurrence description capable of expressing complex and realistic fault scenarios. In this paper, we investigate the possibility of using FAIL to inject faults in a fault-tolerant MPI implementation. Our middleware, FAIL-MPI, is used to carry quantitative and qualitative faults and stress testing.
Complete list of metadata

Contributor : Sébastien Tixeuil Connect in order to contact the contributor
Submitted on : Saturday, June 3, 2006 - 9:00:16 PM
Last modification on : Friday, February 4, 2022 - 3:31:02 AM
Long-term archiving on: : Tuesday, September 18, 2012 - 2:30:49 PM


  • HAL Id : inria-00078183, version 1


Thomas Herault, William Hoarau, Pierre Lemarinier, Eric Rodriguez, Sébastien Tixeuil. FAIL-MPI: How fault-tolerant is fault-tolerant MPI ?. [Research Report] 1450, 2006, pp.26. ⟨inria-00078183⟩



Record views


Files downloads