Evaluation of Replication and Fault Detection in P2P-MPI - Inria - Institut national de recherche en sciences et technologies du numérique Access content directly
Conference Papers Year : 2009

Evaluation of Replication and Fault Detection in P2P-MPI

Abstract

We present in this paper an evaluation of fault management in the grid middleware P2P-MPI. One of P2P-MPI's objective is to support environments using commodity hardware. Hence, running programs is failure prone and a particular attention must be paid to fault management. The fault management covers two issues: fault-tolerance and fault detection. P2P-MPI provides a transparent fault tolerance facility based on replication of computations. Fault detection concerns the monitoring of the program execution by the system. The monitoring is done through a distributed set of modules called failure detectors. In this paper, we report results from several experiments which show the overhead of replication, and the cost of fault detection.
No file

Dates and versions

inria-00425519 , version 1 (21-10-2009)

Identifiers

Cite

Stéphane Genaud, Choopan Rattanapoka. Evaluation of Replication and Fault Detection in P2P-MPI. 6th High Performance Grid Computing International Workshop in conjunction with International Parallel and Distributed Processing Symposium - IPDPS 2009, May 2009, Rome, Italy. pp.1-8, ⟨10.1109/IPDPS.2009.5160969⟩. ⟨inria-00425519⟩
77 View
0 Download

Altmetric

Share

Gmail Facebook X LinkedIn More