L. Alvisi and K. Marzullo, Message logging: pessimistic, optimistic, and causal, Proceedings of 15th International Conference on Distributed Computing Systems, pp.229-236, 1995.
DOI : 10.1109/ICDCS.1995.500024

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.11.9997

P. David and . Anderson, Boinc: A system for public-resource computing and storage, 5th International Workshop on Grid Computing, pp.4-10, 2004.

G. Antoniu, L. Cudennec, M. Duigou, and M. , Performance scalability of the JXTA P2P framework, 2007 IEEE International Parallel and Distributed Processing Symposium, 2007.
DOI : 10.1109/IPDPS.2007.370299

URL : https://hal.archives-ouvertes.fr/inria-00119916

D. Bailey, E. Barszcz, J. Barton, D. Browning, R. Carter et al., The NAS Paralell Benchmarks, 1994.

M. Baker, B. Carpenter, and A. Shafi, MPJ Express: Towards Thread Safe Java HPC, 2006 IEEE International Conference on Cluster Computing, pp.1-10, 2006.
DOI : 10.1109/CLUSTR.2006.311890

R. Batchu, Y. S. Dandass, A. Skjellum, and M. Beddhu, MPI/FT: A Model-Based Approach to Low-Overhead Fault Tolerant Message-Passing Middleware, Cluster Computing, vol.7, issue.4, pp.303-315, 2004.
DOI : 10.1023/B:CLUS.0000039491.64560.8a

A. Blansché and P. Gançarski, MACLAW: A modular approach for clustering with local attribute weighting, Pattern Recognition Letters, vol.27, issue.11, pp.1299-1306, 2006.
DOI : 10.1016/j.patrec.2005.07.027

M. Bornemann, R. V. Van-nieuwpoort, and T. Kielmann, MPJ/Ibis: A Flexible and Efficient Message Passing Platform for Java, Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp.217-224, 2005.
DOI : 10.1007/11557265_30

G. Bosilca, A. Bouteiller, F. Cappello, S. Djailali, G. Fedak et al., MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes, ACM/IEEE SC 2002 Conference (SC'02), pp.1-18, 2002.
DOI : 10.1109/SC.2002.10048

URL : https://hal.archives-ouvertes.fr/in2p3-00457138

A. Bouteiller, F. Cappello, T. Hérault, and G. Krawezik, Pierre Lemarinier, and Frédéric Magniette. (mpich-v2): a fault tolerant MPI for volatile nodes based on the pessimistic sender based message logging, SuperComputing 2003, pp.242-250, 2003.

J. M. Bull, L. A. Smith, M. D. Westhead, D. S. Henty, and R. A. Davey, A benchmark suite for high performance java. Concurrency -Practice and Experience, pp.375-388, 2000.

F. Cappello, Grid'5000: a large scale and highly reconfigurable grid experimental testbed, The 6th IEEE/ACM International Workshop on Grid Computing, 2005., pp.99-106, 2005.
DOI : 10.1109/GRID.2005.1542730

URL : https://hal.archives-ouvertes.fr/hal-00684943

D. Caromel, A. Di-costanzo, and C. Mathieu, Peer-to-peer for computational grids: mixing clusters and desktop machines, Parallel Computing, vol.33, issue.4-5, pp.275-288, 2007.
DOI : 10.1016/j.parco.2007.02.011

URL : https://hal.archives-ouvertes.fr/hal-00125041

B. Carpenter, V. Getov, G. Judd, T. Skjellum, and G. Fox, MPJ: MPI-like message passing for Java. Concurrency: Practice and Experience, pp.1019-1038, 2000.

T. Deepak, C. , and S. Toueg, Unreliable failure detectors for reliable distributed systems, J. ACM, vol.43, issue.2, pp.225-267, 1996.

D. Dewolfs, J. Broeckhove, V. S. Sunderam, and G. E. Fagg, FT-MPI, Fault-Tolerant Metacomputing and Generic Name Services: A Case Study, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 13th European PVM/MPI User's Group Meeting, pp.133-140, 2006.
DOI : 10.1007/11846802_24

N. Drost, R. V. Van-nieuwpoort, and H. Bal, Simple locality-aware co-allocation in peer-to-peer supercomputing, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06), pp.14-21, 2006.
DOI : 10.1109/CCGRID.2006.1630909

W. Dietmar, D. F. Erwin, and . Snelling, Unicore: A grid computing environment, Euro-Par, pp.825-834, 2001.

L. Eyraud-dubois, A. Legrand, M. Quinson, and F. Vivien, A First Step Towards Automatically Building Network Representations, Euro-Par, pp.160-169, 2007.
DOI : 10.1007/978-3-540-74466-5_18

URL : https://hal.archives-ouvertes.fr/inria-00130734

G. Fagg and J. Dongarra, FT-MPI: Fault Tolerant MPI, Supporting Dynamic Applications in a Dynamic World, EuroPVM/MPI User's GroupMeeting, pp.346-353, 2000.
DOI : 10.1007/3-540-45255-9_47

I. Foster and C. Kesselman, Globus: a Metacomputing Infrastructure Toolkit, International Journal of High Performance Computing Applications, vol.11, issue.2, pp.115-128, 1997.
DOI : 10.1177/109434209701100205

T. Ian, A. Foster, and . Iamnitchi, On death, taxes, and the convergence of peer-to-peer and grid computing, Peer-to-Peer Systems II, Second International Workshop, IPTPS, pp.118-128, 2003.

R. L. Daniel, T. S. Graham, and . Woodall, Open MPI: Goals, concept, and design of a next generation MPI implementation, Proceedings, 11th European PVM/MPI Users' Group Meeting, pp.97-104, 2004.

S. Genaud, E. Jeannot, and C. Rattanapoka, Fault management in P2P-MPI, International Journal of Parallel Programming, 2009.
URL : https://hal.archives-ouvertes.fr/inria-00529974

S. Genaud, P. Gançarski, G. Latu, A. Blansché, C. Rattanapoka et al., Exploitation of a parallel clustering algorithm on commodity hardware with P2P-MPI, The Journal of Supercomputing, vol.16, issue.3, pp.21-41, 2008.
DOI : 10.1007/s11227-007-0136-2

URL : https://hal.archives-ouvertes.fr/inria-00503998

S. Genaud and C. Rattanapoka, Fault management in p2p-mpi, In proceedings of International Conference on Grid and Pervasive Computing, GPC'07, pp.64-77, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00529974

S. Genaud and C. Rattanapoka, Large-scale experiment of co-allocation strategies for Peer-to-Peer supercomputing in P2P-MPI, 2008 IEEE International Symposium on Parallel and Distributed Processing, 2008.
DOI : 10.1109/IPDPS.2008.4536212

URL : https://hal.archives-ouvertes.fr/inria-00214137

W. Gropp, E. Lusk, and A. Skjellum, Using MPI, Portable Parallel Programming with the Message-Passing Interface. Scientific and Engineering Computation Series, 1999.

V. Hadzilacos and S. Toueg, A modular approach to fault-tolerant broadcasts and related problems, 1994.

E. Huedo, R. S. Montero, and I. M. Llorente, A framework for adaptive execution in grids. Software, Practice and Experience, pp.631-651, 2004.

E. Jeanvoine, C. Morin, and D. Leprince, Vigne: Executing Easily and Efficiently a Wide Range of Distributed Applications in Grids, Proceedings of Euro-Par 2007, pp.394-403, 2007.
DOI : 10.1007/978-3-540-74466-5_43

URL : https://hal.archives-ouvertes.fr/hal-00689008

S. Louca and N. Neophytou, Arianos Lachanas, and Paraskevas Evripidou. MPI-FT: Portable fault tolerenace scheme for MPI, Parallel Processing Letters, pp.371-382, 2000.

D. Nurmi, J. Brevik, and R. Wolski, Modeling Machine Availability in Enterprise and Wide-Area Distributed Computing Environments, Euro-Par, pp.432-441, 2005.
DOI : 10.1007/11549468_50

R. Rabenseifner and J. Träff, More efficient reduction algorithms for non-power-of-two number of processors in message-passing parallel systems book series lecture notes in computer science, Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp.36-46, 2004.

A. D. Sridharan-ranganathan, R. W. George, M. C. Todd, and . Chidester, Gossip-style failure detection and distributed consensus for scalable heterogeneous clusters, Cluster Computing, vol.4, issue.3, pp.197-209, 2001.
DOI : 10.1023/A:1011494323443

C. Rattanapoka, P2P-MPI: A Fault-tolerant Mesasges Passing Interface Implementation for Grids, 2008.

S. C. Rhea, D. Geels, T. Roscoe, and J. Kubiatowicz, Handling churn in a DHT, ATEC'04: Proceedings of the USENIX Annual Technical Conference 2004 on USENIX Annual Technical Conference, pp.127-140, 2004.

S. Sankaran, J. M. Squyres, B. Barrett, A. Lumsdaine, J. Duell et al., The Lam/Mpi Checkpoint/Restart Framework: System-Initiated Checkpointing, International Journal of High Performance Computing Applications, vol.19, issue.4, pp.479-493, 2005.
DOI : 10.1177/1094342005056139

F. B. Schneider, Replication Management Using the State Machine Approach, pp.169-195, 1993.

K. Shudo, Y. Tanaka, and S. Sekiguchi, P3: P2P-based middleware enabling transfer and aggregation of computational resources, CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005., pp.259-266, 2005.
DOI : 10.1109/CCGRID.2005.1558563

M. Snir, S. W. Otto, D. W. Walker, J. Dongarra, and S. Huss-lederman, MPI: The Complete Reference, 1995.

G. Stellner and . Cocheck, Checkpointing and Process Migration for MPI, Proceedings of the 10th International Parallel Processing Symposium (IPPS '96), pp.526-531, 1996.

D. Thain, T. Tannenbaum, and M. Livny, Distributed computing in practice: the Condor experience. Concurrency and Computation: Practice and Experience, pp.323-356, 2005.

R. Van-nieuwpoort, J. Maassen, R. F. Hofman, T. Kielmann, and H. E. Bal, Ibis, Proceedings of the 2002 joint ACM-ISCOPE conference on Java Grande , JGI '02, pp.18-27, 2002.
DOI : 10.1145/583810.583813

R. Van-renesse, Y. Minsky, and M. Hayden, A Gossip-Style Failure Detection Service, Middleware '98, p.55, 1998.
DOI : 10.1007/978-1-4471-1283-9_4