The NAS Parallel Benchmarks 2.0, 1995. ,
DOI : 10.1177/109434209100500306
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.104.3829
A communication-induced checkpointing protocol that ensures rollback-dependency trackability, Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing, p.68, 1997. ,
DOI : 10.1109/FTCS.1997.614079
Independent checkpointing and concurrent rollback for recovery in distributed systems-an optimistic approach, Proceedings [1988] Seventh Symposium on Reliable Distributed Systems, pp.3-12, 1988. ,
DOI : 10.1109/RELDIS.1988.25775
On Communication Determinism in Parallel HPC Applications, 2010 Proceedings of 19th International Conference on Computer Communications and Networks, pp.1-8, 2010. ,
DOI : 10.1109/ICCCN.2010.5560143
Distributed snapshots: determining global states of distributed systems, ACM Transactions on Computer Systems, vol.3, issue.1, pp.63-75, 1985. ,
DOI : 10.1145/214451.214456
A survey of rollback-recovery protocols in message-passing systems, ACM Computing Surveys, vol.34, issue.3, pp.375-408, 2002. ,
DOI : 10.1145/568522.568525
Uncoordinated Checkpointing Without Domino Effect for Send-Deterministic Message Passing Applications, 25th IEEE International Parallel & Distributed Processing Symposium (IPDPS2011), 2011. ,
Scalable group-based checkpoint/restart for large-scale message-passing systems, 2008 IEEE International Symposium on Parallel and Distributed Processing, 2008. ,
DOI : 10.1109/IPDPS.2008.4536302
Sender-Based Message Logging, Digest of Papers : 17 Annual International Symposium on Fault-Tolerant Computing, pp.14-19, 1987. ,
Time, clocks, and the ordering of events in a distributed system, Communications of the ACM, vol.21, issue.7, pp.558-565, 1978. ,
DOI : 10.1145/359545.359563
Team-Based Message Logging: Preliminary Results, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, 2010. ,
DOI : 10.1109/CCGRID.2010.110
URL : http://charm.cs.illinois.edu/newPapers/10-02/paper.pdf
Hybrid checkpointing for parallel applications in cluster federations, IEEE International Symposium on Cluster Computing and the Grid, 2004. CCGrid 2004., pp.773-782, 2004. ,
DOI : 10.1109/CCGrid.2004.1336712
URL : https://hal.archives-ouvertes.fr/inria-00000991
On the Use of Cluster-Based Partial Message Logging to Improve Fault Tolerance for MPI HPC Applications, 2011. ,
DOI : 10.1002/cpe.1364
URL : https://hal.archives-ouvertes.fr/hal-00786558
Trading off logging overhead and coordinating overhead to achieve efficient rollback recovery, Concurrency and Computation : Practice and Experience, pp.819-853, 2009. ,
DOI : 10.1002/cpe.1364