Skip to Main content Skip to Navigation
Conference papers

Evaluation of Replication and Fault Detection in P2P-MPI

Abstract : We present in this paper an evaluation of fault management in the grid middleware P2P-MPI. One of P2P-MPI's objective is to support environments using commodity hardware. Hence, running programs is failure prone and a particular attention must be paid to fault management. The fault management covers two issues: fault-tolerance and fault detection. P2P-MPI provides a transparent fault tolerance facility based on replication of computations. Fault detection concerns the monitoring of the program execution by the system. The monitoring is done through a distributed set of modules called failure detectors. In this paper, we report results from several experiments which show the overhead of replication, and the cost of fault detection.
Complete list of metadata
Contributor : Stéphane Genaud Connect in order to contact the contributor
Submitted on : Wednesday, October 21, 2009 - 9:18:41 PM
Last modification on : Friday, February 26, 2021 - 3:28:02 PM




Stéphane Genaud, Choopan Rattanapoka. Evaluation of Replication and Fault Detection in P2P-MPI. 6th High Performance Grid Computing International Workshop in conjunction with International Parallel and Distributed Processing Symposium - IPDPS 2009, May 2009, Rome, Italy. pp.1-8, ⟨10.1109/IPDPS.2009.5160969⟩. ⟨inria-00425519⟩



Record views