FAIL-MPI: How fault-tolerant is fault-tolerant MPI ?

Thomas Hérault 1, 2 William Hoarau 1, 2 Pierre Lemarinier 1, 2 Eric Rodriguez 1 Sébastien Tixeuil 1, 2
1 GRAND-LARGE - Global parallel and distributed computing
CNRS - Centre National de la Recherche Scientifique : UMR8623, Inria Saclay - Ile de France, UP11 - Université Paris-Sud - Paris 11, LIFL - Laboratoire d'Informatique Fondamentale de Lille, LRI - Laboratoire de Recherche en Informatique
Abstract : One of the topics of paramount importance in the development of Cluster and Grid middleware is the impact of faults since their occurrence probability in a Grid infrastructure and in large-scale distributed system is actually very high. MPI (Message Passing Interface) is a popular abstraction for programming distributed computation applications. FAIL is an abstract language for fault occurrence description capable of expressing complex and realistic fault scenarios. In this paper, we investigate the possibility of using FAIL to inject faults in a fault-tolerant MPI implementation. Our middleware, FAIL-MPI, is used to carry quantitative and qualitative faults and stress testing.
Liste complète des métadonnées

https://hal.inria.fr/inria-00078183
Contributeur : Sébastien Tixeuil <>
Soumis le : samedi 3 juin 2006 - 21:00:16
Dernière modification le : jeudi 5 avril 2018 - 12:30:12
Document(s) archivé(s) le : mardi 18 septembre 2012 - 14:30:49

Fichier

Identifiants

  • HAL Id : inria-00078183, version 1

Collections

Citation

Thomas Hérault, William Hoarau, Pierre Lemarinier, Eric Rodriguez, Sébastien Tixeuil. FAIL-MPI: How fault-tolerant is fault-tolerant MPI ?. [Research Report] 1450, 2006, pp.26. 〈inria-00078183〉

Partager

Métriques

Consultations de la notice

635

Téléchargements de fichiers

237