Message logging: pessimistic, optimistic, causal, and optimal, IEEE Transactions on Software Engineering, vol.24, issue.2, pp.149-159, 1998. ,
DOI : 10.1109/32.666828
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.12.78
An analysis of communication induced checkpointing, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352), p.242, 1999. ,
DOI : 10.1109/FTCS.1999.781058
A communication-induced checkpointing protocol that ensures rollback-dependency trackability, Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing, p.68, 1997. ,
DOI : 10.1109/FTCS.1997.614079
Independent checkpointing and concurrent rollback for recovery in distributed systems-an optimistic approach, Proceedings [1988] Seventh Symposium on Reliable Distributed Systems, pp.3-12, 1988. ,
DOI : 10.1109/RELDIS.1988.25775
MPICH-V Project: A Multiprotocol Automatic Fault-Tolerant MPI, International Journal of High Performance Computing Applications, vol.20, issue.3, pp.319-333, 2006. ,
DOI : 10.1177/1094342006067469
URL : https://hal.archives-ouvertes.fr/hal-00688637
Reasons for a pessimistic or optimistic message logging protocol in MPI uncoordinated failure, recovery, 2009 IEEE International Conference on Cluster Computing and Workshops, 2009. ,
DOI : 10.1109/CLUSTR.2009.5289157
URL : https://hal.archives-ouvertes.fr/inria-00424017
Toward Exascale Resilience, International Journal of High Performance Computing Applications, vol.23, issue.4, pp.374-388, 2009. ,
DOI : 10.1177/1094342009347767
On Communication Determinism in Parallel HPC Applications, 2010 Proceedings of 19th International Conference on Computer Communications and Networks, 2010. ,
DOI : 10.1109/ICCCN.2010.5560143
Distributed snapshots: determining global states of distributed systems, ACM Transactions on Computer Systems, vol.3, issue.1, pp.63-75, 1985. ,
DOI : 10.1145/214451.214456
A survey of rollback-recovery protocols in message-passing systems, ACM Computing Surveys, vol.34, issue.3, pp.375-408, 2002. ,
DOI : 10.1145/568522.568525
Sender-Based Message Logging, Digest of Papers: The 17th Annual International Symposium on Fault-Tolerant Computing, pp.14-19, 1987. ,
Checkpointing and Rollback-Recovery for Distributed Systems, Proceedings of 1986 ACM Fall joint computer conference, ACM '86, pp.1150-1158, 1986. ,
DOI : 10.1109/TSE.1987.232562
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.91.1616
Time, clocks, and the ordering of events in a distributed system, Communications of the ACM, vol.21, issue.7, pp.558-565, 1978. ,
DOI : 10.1145/359545.359563
Using time to improve the performance of coordinated checkpointing, Proceedings of IEEE International Computer Performance and Dependability Symposium, pp.282-291, 1996. ,
DOI : 10.1109/IPDS.1996.540229
Modeling the Impact of Checkpoints on Next-Generation Systems, 24th IEEE Conference on Mass Storage Systems and Technologies (MSST 2007), pp.30-46, 2007. ,
DOI : 10.1109/MSST.2007.4367962
Communication Patterns, Workshop on Communication Architecture for Clusters CAC'06, 2006. ,
Active Optimistic Message Logging for Reliable Execution of MPI Applications, 15th International Euro-Par Conference, pp.615-626, 2009. ,
DOI : 10.1145/3959.3962
URL : https://hal.archives-ouvertes.fr/inria-00424002
Understanding failures in petascale computers, Journal of Physics: Conference Series, vol.78, issue.11pp, p.12022, 2007. ,
DOI : 10.1088/1742-6596/78/1/012022
NetPIPE: A Network Protocol Independent Performance Evaluator, IASTED International Conference on Intelligent Information Management and Systems, 1996. ,