S. Bohm and C. Engelmann, File I/O for MPI Applications in Redundant Execution Scenarios, 2012 20th Euromicro International Conference on Parallel, Distributed and Network-based Processing, pp.112-119, 2012.
DOI : 10.1109/PDP.2012.22

A. Bouteiller, G. Bosilca, and J. Dongarra, Redesigning the message logging model for high performance. Concurrency and Computation: Practice and Experience, pp.2196-2211, 2010.

R. Brightwell, K. Ferreira, and R. Riesen, Transparent Redundant Computing with MPI, Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface, EuroMPI'10, pp.208-218, 2010.
DOI : 10.1007/978-3-642-15646-5_22

G. H. Bryan and J. M. Fritsch, A Benchmark Simulation for Moist Nonhydrostatic Numerical Models, Monthly Weather Review, vol.130, issue.12, 2002.
DOI : 10.1175/1520-0493(2002)130<2917:ABSFMN>2.0.CO;2

F. Cappello, A. Guermouche, and M. Snir, On Communication Determinism in Parallel HPC Applications, 2010 Proceedings of 19th International Conference on Computer Communications and Networks, 2010.
DOI : 10.1109/ICCCN.2010.5560143

J. Elliott, K. Kharbas, D. Fiala, F. Mueller, K. Ferreira et al., Combining Partial Redundancy and Checkpointing for HPC, 2012 IEEE 32nd International Conference on Distributed Computing Systems, pp.615-626, 2012.
DOI : 10.1109/ICDCS.2012.56

E. N. Elnozahy, System Resilience at Extreme Scale, 2008.

C. Engelmann and S. Böhm, Redundant Execution of HPC Applications with MR-MPI, Parallel and Distributed Computing and Networks / 720: Software Engineering, 2011.
DOI : 10.2316/P.2011.719-031

K. Ferreira, J. Stearley, J. H. Laros, I. , R. Oldfield et al., Evaluating the viability of process replication reliability for exascale systems, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, pp.441-4412, 2011.
DOI : 10.1145/2063384.2063443

D. Fiala, F. Mueller, C. Engelmann, R. Riesen, K. Ferreira et al., Detection and correction of silent data corruption for large-scale high-performance computing, IEEE/ACM SuperComputing 2012, pp.1-7812, 2012.

L. B. Gomez, N. Maruyama, D. Komatitsch, S. Tsuboi, F. Cappello et al., FTI: high performance Fault Tolerance Interface for hybrid systems, In IEEE/ACM SuperComputing, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00721216

A. Guermouche, T. Ropars, E. Brunet, M. Snir, and F. Cappello, Uncoordinated Checkpointing Without Domino Effect for Send-Deterministic Message Passing Applications, 25th IEEE International Parallel & Distributed Processing Symposium (IPDPS2011), 2011.

A. Guermouche, T. Ropars, M. Snir, and F. Cappello, HydEE: Failure Containment without Event Logging for Large Scale Send-Deterministic MPI Applications, 2012 IEEE 26th International Parallel and Distributed Processing Symposium, 2012.
DOI : 10.1109/IPDPS.2012.111

URL : https://hal.archives-ouvertes.fr/hal-01121941

L. Lamport, Time, clocks, and the ordering of events in a distributed system, Communications of the ACM, vol.21, issue.7, pp.558-565, 1978.
DOI : 10.1145/359545.359563

A. Moody, G. Bronevetsky, K. Mohror, and B. R. Supinski, Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, 2010.
DOI : 10.1109/SC.2010.18

R. A. Oldfield, S. Arunagiri, P. J. Teller, S. Seelam, M. R. Varela et al., Modeling the Impact of Checkpoints on Next-Generation Systems, 24th IEEE Conference on Mass Storage Systems and Technologies (MSST 2007), pp.30-46, 2007.
DOI : 10.1109/MSST.2007.4367962

R. Riesen, K. Ferreira, D. Da-silva, P. Lemarinier, D. Arnold et al., Alleviating scalability issues of checkpointing protocols, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1-1811, 2012.
DOI : 10.1109/SC.2012.18

M. Wiesmann, F. Pedone, A. Schiper, B. Kemme, and G. Alonso, Understanding replication in databases and distributed systems, Proceedings 20th IEEE International Conference on Distributed Computing Systems, p.464, 2000.
DOI : 10.1109/ICDCS.2000.840959