Evaluation of Replication and Fault Detection in P2P-MPI

Abstract : We present in this paper an evaluation of fault management in the grid middleware P2P-MPI. One of P2P-MPI's objective is to support environments using commodity hardware. Hence, running programs is failure prone and a particular attention must be paid to fault management. The fault management covers two issues: fault-tolerance and fault detection. P2P-MPI provides a transparent fault tolerance facility based on replication of computations. Fault detection concerns the monitoring of the program execution by the system. The monitoring is done through a distributed set of modules called failure detectors. In this paper, we report results from several experiments which show the overhead of replication, and the cost of fault detection.
Type de document :
Communication dans un congrès
6th High Performance Grid Computing International Workshop in conjunction with International Parallel and Distributed Processing Symposium - IPDPS 2009, May 2009, Rome, Italy. IEEE CS, pp.1-8, 2009, 2009 IEEE International Symposium on Parallel&Distributed Processing. IPDPS 2009. 〈10.1109/IPDPS.2009.5160969〉
Liste complète des métadonnées

https://hal.inria.fr/inria-00425519
Contributeur : Stéphane Genaud <>
Soumis le : mercredi 21 octobre 2009 - 21:18:41
Dernière modification le : vendredi 12 janvier 2018 - 01:08:53

Identifiants

Collections

Citation

Stéphane Genaud, Choopan Rattanapoka. Evaluation of Replication and Fault Detection in P2P-MPI. 6th High Performance Grid Computing International Workshop in conjunction with International Parallel and Distributed Processing Symposium - IPDPS 2009, May 2009, Rome, Italy. IEEE CS, pp.1-8, 2009, 2009 IEEE International Symposium on Parallel&Distributed Processing. IPDPS 2009. 〈10.1109/IPDPS.2009.5160969〉. 〈inria-00425519〉

Partager

Métriques

Consultations de la notice

270