S. Rao, L. Alvisi, and H. M. Vin, Egida: An extensible toolkit for low-overhead fault-tolerance. Fault-Tolerant Computing, International Symposium on, p.48, 1999.

A. Bouteiller, T. Herault, G. Krawezik, P. Lemarinier, and F. Cappello, Mpich-v project: A multiprotocol automatic fault-tolerant mpi, International Journal of High Performance Computing Applications, vol.20, issue.3, pp.319-333, 2006.
URL : https://hal.archives-ouvertes.fr/hal-00688637

Q. Gao, W. Yu, W. Huang, and D. K. Panda, Application-transparent checkpoint/restart for mpi programs over infiniband, ICPP '06: Proceedings of the 2006 International Conference on Parallel Processing, pp.471-478, 2006.

J. Hursey, J. M. Squyres, T. I. Mattox, and A. Lumsdaine, The design and implementation of checkpoint/restart process fault tolerance for Open MPI, Proceedings of the 21st IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2007.

E. N. Elnozahy, L. Alvisi, Y. Wang, and D. B. Johnson, A survey of rollback-recovery protocols in messagepassing systems, ACM Computing Surveys, vol.34, issue.3, pp.375-408, 2002.

K. , M. Chandy, and L. Lamport, Distributed snapshots: determining global states of distributed systems, ACM Transactions on Computer Systems, vol.3, issue.1, pp.63-75, 1985.

R. Koo and S. Toueg, Checkpointing and rollbackrecovery for distributed systems, FJCC, pp.1150-1158, 1986.

L. Alvisi and K. Marzullo, Message logging: Pessimistic, optimistic, causal, and optimal, IEEE Transactions on Software Engineering, vol.24, issue.2, pp.149-159, 1998.

D. B. Johnson and W. Zwaenepoel, Sender-based message logging, Proceedings of the Seventeenth International Symposium on Fault-Tolerant Computing, 1987.

A. Bouteiller, T. Ropars, G. Bosilca, C. Morin, and J. Dongarra, Reasons for a pessimistic or optimistic message logging protocol in mpi uncoordinated failure recovery, IEEE International Conference on Cluster Computing, 2009.
URL : https://hal.archives-ouvertes.fr/inria-00424017

F. Quaglia, R. Baldoni, and B. Ciciani, On the no-z-cycle property in distributed executions, Journal of Computer and System Sciences, vol.61, issue.3, pp.400-427, 2000.

R. E. Strom and S. Yemini, Optimistic recovery in distributed systems, ACM Transactions on Computer Systems, vol.3, issue.3, pp.204-226, 1985.

F. Cappello, A. Guermouche, T. Herault, and M. Snir, Revisiting fault tolerant protocols for hpc applications, of the INRIA-Illinois Joint Laboratory on Petascale Computing, 2009.

K. Antypas, J. Shalf, and H. Wasserman, Nersc-6 workload analysis and benchmark selection process, 2008.

L. Carrington, D. Komatitsch, M. Laurenzano, M. Tikir, D. Michéa et al., High-frequency simulations of global seismic wave propagation using SPECFEM3D_GLOBE on 62 thousand processor cores, Proceedings of the ACM/IEEE Supercomputing SC'2008 conference, pp.1-11, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00721217

M. Grunberg, S. Genaud, and C. Mongenet, Seismic ray-tracing and earth mesh modeling on various parallel architectures, J. Supercomput, vol.29, issue.1, pp.27-44, 2004.
URL : https://hal.archives-ouvertes.fr/inria-00504155

D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter et al., The nas parallel benchmarks-summary and preliminary results, Supercomputing '91: Proceedings of the 1991 ACM/IEEE conference on Supercomputing, pp.158-165, 1991.