A. Agbaria and R. Friedman, Starfish: fault-tolerant dynamic MPI programs on clusters of workstations, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469), pp.167-176, 1999.
DOI : 10.1109/HPDC.1999.805295

R. Badrinath and C. Morin, Common mechanisms for supporting fault tolerance in DSM and message passing systems, 2003.
URL : https://hal.archives-ouvertes.fr/hal-01272454

G. Bosilca, A. Bouteiller, F. Cappello, S. Djailali, G. Fedak et al., MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes, ACM/IEEE SC 2002 Conference (SC'02), pp.29-47, 2002.
DOI : 10.1109/SC.2002.10048

URL : https://hal.archives-ouvertes.fr/in2p3-00457138

M. Costa, P. Guedes, M. Sequeira, N. Neves, and M. Castro, Lightweight Logging for Lazy Release Consistent Distributed Shared Memory, Operating Systems Design and Implementation, pp.59-73, 1996.

M. Elnozahy, L. Alvisi, Y. Wang, and D. Johnson, A survey of rollback-recovery protocols in message-passing systems, ACM Computing Surveys, vol.34, issue.3, pp.375-408, 2002.
DOI : 10.1145/568522.568525

S. Monnet, C. Morin, and R. Badrinath, A hierarchical checkpointing protocol for parallel applications in cluster federations, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings., 2004.
DOI : 10.1109/IPDPS.2004.1303242

URL : https://hal.archives-ouvertes.fr/inria-00000990

C. Morin, A. Kermarrec, M. Banâtre, and A. Gefflaut, An efficient and scalable approach for implementing fault-tolerant DSM architectures, IEEE Transactions on Computers, vol.49, issue.5, pp.414-430, 2000.
DOI : 10.1109/12.859537

URL : https://hal.archives-ouvertes.fr/inria-00073588

A. Nguyen-tuong, Integrating Fault-Tolerance Techniques in Grid Applications, 2000.

H. Paul, A. Gupta, and R. Badrinath, Hierarchical Coordinated Checkpointing Protocol, International Conference on Parallel and Distributed Computing Systems, pp.240-245, 2002.

J. Rough and A. Goscinski, Exploiting operating system services to efficiently checkpoint parallel applications in GENESIS, Fifth International Conference on Algorithms and Architectures for Parallel Processing, 2002. Proceedings., 2002.
DOI : 10.1109/ICAPP.2002.1173584