Fault management in P2P-MPI - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2007

Fault management in P2P-MPI

Résumé

We present in this paper the recent developments done in P2P-MPI, a grid middleware, concerning the fault management, which covers fault-tolerance for applications and fault detection. P2P-MPI provides a transparent fault tolerance facility based on replication of com- putations. Applications are monitored by a distributed set of external modules called failure detectors. The contribution of this paper is the analysis of the advantages and drawbacks of such detectors for a real implementation, and its integration in P2P-MPI. We pay especially at- tention to the reliability of the failure detection service and to the failure detection speed. We propose a variant of the binary round-robin protocol, which is more reliable than the application execution in any case. Exper- iments on applications of up to 256 processes, carried out on Grid'5000 show that the real detection times closely match the predictions.
Fichier principal
Vignette du fichier
icps-2007-185.pdf (181.98 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

inria-00529974 , version 1 (27-10-2010)

Identifiants

  • HAL Id : inria-00529974 , version 1

Citer

Stéphane Genaud, Choopan Rattanapoka. Fault management in P2P-MPI. In proceedings of International Conference on Grid and Pervasive Computing, GPC'07, May 2007, Paris, France. ⟨inria-00529974⟩
47 Consultations
153 Téléchargements

Partager

Gmail Facebook X LinkedIn More