File I/O for MPI Applications in Redundant Execution Scenarios, 2012 20th Euromicro International Conference on Parallel, Distributed and Network-based Processing, pp.112-119, 2012. ,
DOI : 10.1109/PDP.2012.22
Redesigning the message logging model for high performance. Concurrency and Computation: Practice and Experience, pp.2196-2211, 2010. ,
Transparent Redundant Computing with MPI, Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface, EuroMPI'10, pp.208-218, 2010. ,
DOI : 10.1007/978-3-642-15646-5_22
A Benchmark Simulation for Moist Nonhydrostatic Numerical Models, Monthly Weather Review, vol.130, issue.12, 2002. ,
DOI : 10.1175/1520-0493(2002)130<2917:ABSFMN>2.0.CO;2
On Communication Determinism in Parallel HPC Applications, 2010 Proceedings of 19th International Conference on Computer Communications and Networks, 2010. ,
DOI : 10.1109/ICCCN.2010.5560143
Combining Partial Redundancy and Checkpointing for HPC, 2012 IEEE 32nd International Conference on Distributed Computing Systems, pp.615-626, 2012. ,
DOI : 10.1109/ICDCS.2012.56
System Resilience at Extreme Scale, 2008. ,
Redundant Execution of HPC Applications with MR-MPI, Parallel and Distributed Computing and Networks / 720: Software Engineering, 2011. ,
DOI : 10.2316/P.2011.719-031
Evaluating the viability of process replication reliability for exascale systems, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, pp.441-4412, 2011. ,
DOI : 10.1145/2063384.2063443
Detection and correction of silent data corruption for large-scale high-performance computing, IEEE/ACM SuperComputing 2012, pp.1-7812, 2012. ,
FTI: high performance Fault Tolerance Interface for hybrid systems, In IEEE/ACM SuperComputing, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00721216
Uncoordinated Checkpointing Without Domino Effect for Send-Deterministic Message Passing Applications, 25th IEEE International Parallel & Distributed Processing Symposium (IPDPS2011), 2011. ,
HydEE: Failure Containment without Event Logging for Large Scale Send-Deterministic MPI Applications, 2012 IEEE 26th International Parallel and Distributed Processing Symposium, 2012. ,
DOI : 10.1109/IPDPS.2012.111
URL : https://hal.archives-ouvertes.fr/hal-01121941
Time, clocks, and the ordering of events in a distributed system, Communications of the ACM, vol.21, issue.7, pp.558-565, 1978. ,
DOI : 10.1145/359545.359563
Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, 2010. ,
DOI : 10.1109/SC.2010.18
Modeling the Impact of Checkpoints on Next-Generation Systems, 24th IEEE Conference on Mass Storage Systems and Technologies (MSST 2007), pp.30-46, 2007. ,
DOI : 10.1109/MSST.2007.4367962
Alleviating scalability issues of checkpointing protocols, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1-1811, 2012. ,
DOI : 10.1109/SC.2012.18
Understanding replication in databases and distributed systems, Proceedings 20th IEEE International Conference on Distributed Computing Systems, p.464, 2000. ,
DOI : 10.1109/ICDCS.2000.840959