Starfish: fault-tolerant dynamic MPI programs on clusters of workstations, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469), pp.167-176, 1999. ,
DOI : 10.1109/HPDC.1999.805295
Common mechanisms for supporting fault tolerance in DSM and message passing systems, 2003. ,
URL : https://hal.archives-ouvertes.fr/hal-01272454
MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes, ACM/IEEE SC 2002 Conference (SC'02), pp.29-47, 2002. ,
DOI : 10.1109/SC.2002.10048
URL : https://hal.archives-ouvertes.fr/in2p3-00457138
Distributed snapshots: determining global states of distributed systems, ACM Transactions on Computer Systems, vol.3, issue.1, pp.63-75, 1985. ,
DOI : 10.1145/214451.214456
Lightweight Logging for Lazy Release Consistent Distributed Shared Memory, Operating Systems Design and Implementation, pp.59-73, 1996. ,
A survey of rollback-recovery protocols in message-passing systems, ACM Computing Surveys, vol.34, issue.3, pp.375-408, 2002. ,
DOI : 10.1145/568522.568525
Conception et évaluation d'un protocole de reprise d'applications parallèles dans une fédération de grappes de calculateurs, 2003. ,
An efficient and scalable approach for implementing fault-tolerant DSM architectures, IEEE Transactions on Computers, vol.49, issue.5, pp.414-430, 2000. ,
DOI : 10.1109/12.859537
URL : https://hal.archives-ouvertes.fr/inria-00073588
Integrating Fault-Tolerance Techniques in Grid Applications, 2000. ,
Hierarchical Coordinated Checkpointing Protocol, International Conference on Parallel and Distributed Computing Systems, pp.240-245, 2002. ,
Exploiting operating system services to efficiently checkpoint parallel applications in GENESIS, Fifth International Conference on Algorithms and Architectures for Parallel Processing, 2002. Proceedings., 2002. ,
DOI : 10.1109/ICAPP.2002.1173584
Campus scientifique, 615 rue du Jardin Botanique Irisa, Campus universitaire de Beaulieu, 35042 RENNES Cedex Unité de recherche INRIA Rhône-Alpes, p.78153, 2004. ,