Egida: An extensible toolkit for low-overhead fault-tolerance. Fault-Tolerant Computing, International Symposium on, p.48, 1999. ,
Mpich-v project: A multiprotocol automatic fault-tolerant mpi, International Journal of High Performance Computing Applications, vol.20, issue.3, pp.319-333, 2006. ,
URL : https://hal.archives-ouvertes.fr/hal-00688637
Application-transparent checkpoint/restart for mpi programs over infiniband, ICPP '06: Proceedings of the 2006 International Conference on Parallel Processing, pp.471-478, 2006. ,
The design and implementation of checkpoint/restart process fault tolerance for Open MPI, Proceedings of the 21st IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2007. ,
A survey of rollback-recovery protocols in messagepassing systems, ACM Computing Surveys, vol.34, issue.3, pp.375-408, 2002. ,
Distributed snapshots: determining global states of distributed systems, ACM Transactions on Computer Systems, vol.3, issue.1, pp.63-75, 1985. ,
Checkpointing and rollbackrecovery for distributed systems, FJCC, pp.1150-1158, 1986. ,
Message logging: Pessimistic, optimistic, causal, and optimal, IEEE Transactions on Software Engineering, vol.24, issue.2, pp.149-159, 1998. ,
Sender-based message logging, Proceedings of the Seventeenth International Symposium on Fault-Tolerant Computing, 1987. ,
Reasons for a pessimistic or optimistic message logging protocol in mpi uncoordinated failure recovery, IEEE International Conference on Cluster Computing, 2009. ,
URL : https://hal.archives-ouvertes.fr/inria-00424017
On the no-z-cycle property in distributed executions, Journal of Computer and System Sciences, vol.61, issue.3, pp.400-427, 2000. ,
Optimistic recovery in distributed systems, ACM Transactions on Computer Systems, vol.3, issue.3, pp.204-226, 1985. ,
Revisiting fault tolerant protocols for hpc applications, of the INRIA-Illinois Joint Laboratory on Petascale Computing, 2009. ,
Nersc-6 workload analysis and benchmark selection process, 2008. ,
High-frequency simulations of global seismic wave propagation using SPECFEM3D_GLOBE on 62 thousand processor cores, Proceedings of the ACM/IEEE Supercomputing SC'2008 conference, pp.1-11, 2008. ,
URL : https://hal.archives-ouvertes.fr/hal-00721217
Seismic ray-tracing and earth mesh modeling on various parallel architectures, J. Supercomput, vol.29, issue.1, pp.27-44, 2004. ,
URL : https://hal.archives-ouvertes.fr/inria-00504155
The nas parallel benchmarks-summary and preliminary results, Supercomputing '91: Proceedings of the 1991 ACM/IEEE conference on Supercomputing, pp.158-165, 1991. ,