S. Agarwal, R. Garg, M. S. Gupta, and J. E. Moreira, Adaptive incremental checkpointing for massively parallel systems, Proceedings of the 18th annual international conference on Supercomputing , ICS '04, pp.277-286, 2004.
DOI : 10.1145/1006209.1006248

L. Bautista-gomez, S. Tsuboi, D. Komatitsch, F. Cappello, N. Maruyama et al., FTI, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, pp.1-3232, 2011.
DOI : 10.1145/2063384.2063427

URL : https://hal.archives-ouvertes.fr/hal-00721216

R. Brightwell, K. Ferreira, and R. Riesen, Transparent redundant computing with mpi. In EuroMPI'10: Proceedings of the 17th European MPI user's group meeting conference on recent advances in the message passing interface, pp.208-218, 2010.

G. H. Bryan and R. Rotunno, The Maximum Intensity of Tropical Cyclones in Axisymmetric Numerical Model Simulations, Monthly Weather Review, vol.137, issue.6, pp.1770-1789, 2009.
DOI : 10.1175/2008MWR2709.1

P. H. Carns, W. B. Ligon, R. B. Ross, and R. Thakur, PVFS: A parallel file system for Linux clusters, Proceedings of the 4th Annual Linux Showcase and Conference, pp.317-327, 2000.

C. Clark, K. Fraser, S. Hand, J. G. Hansen, E. Jul et al., Live migration of virtual machines, NSDI'05: Proceedings of the 2nd Symposium on Networked Systems Design & Implementation, pp.273-286, 2005.

P. J. Denning, Working Sets Past and Present, IEEE Transactions on Software Engineering, vol.6, issue.1, pp.64-84, 1980.
DOI : 10.1109/TSE.1980.230464

X. Dong, Y. Xie, N. Muralimanohar, and N. P. Jouppi, Hybrid checkpointing using emerging nonvolatile memories for future exascale systems, ACM Transactions on Architecture and Code Optimization, vol.8, issue.2, pp.1-629, 2011.
DOI : 10.1145/1970386.1970387

M. Dorier, G. Antoniu, F. Cappello, M. Snir, and L. Orf, Damaris: How to Efficiently Leverage Multicore Parallelism to Achieve Scalable, Jitter-free I/O, 2012 IEEE International Conference on Cluster Computing, 2012.
DOI : 10.1109/CLUSTER.2012.26

URL : https://hal.archives-ouvertes.fr/hal-00715252

E. N. Elnozahy, L. Alvisi, Y. Wang, and D. B. Johnson, A survey of rollback-recovery protocols in message-passing systems, ACM Computing Surveys, vol.34, issue.3, pp.375-408, 2002.
DOI : 10.1145/568522.568525

J. Evans, A scalable concurrent malloc(3) implementation for FreeBSD, Proceedings of BSDCan 2006, 2006.

K. B. Ferreira, R. Riesen, R. Brighwell, P. Bridges, and D. Arnold, libhashckpt: Hash-Based Incremental Checkpointing Using GPU???s, EuroMPI'11: Proceedings of the 18th European MPI Users' Group Conference on Recent Advances in the Message Passing Interface, pp.272-281, 2011.
DOI : 10.1007/978-3-642-24449-0_31

R. Gioiosa, J. C. Sancho, S. Jiang, F. Petrini, and K. Davis, Transparent, Incremental Checkpointing at Kernel Level: a Foundation for Fault Tolerance for Parallel Computers, ACM/IEEE SC 2005 Conference (SC'05), pp.1-9, 2005.
DOI : 10.1109/SC.2005.76

L. B. Gomez, B. Nicolae, N. Maruyama, F. Cappello, and S. Matsuoka, Scalable Reed-Solomon-Based Reliable Local Storage for HPC Applications on IaaS Clouds, Euro-Par '12: 18th International Euro-Par Conference on Parallel Processing, pp.313-324, 2012.
DOI : 10.1007/978-3-642-32820-6_32

URL : https://hal.archives-ouvertes.fr/hal-00703119

T. Hoefler, T. Schneider, and A. Lumsdaine, Characterizing the Influence of System Noise on Large-Scale Applications by Simulation, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1-11, 2010.
DOI : 10.1109/SC.2010.12

K. Z. Ibrahim, S. Hofmeyr, C. Iancu, and E. Roman, Optimized pre-copy live migration for memory intensive applications, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, pp.1-40, 2011.
DOI : 10.1145/2063384.2063437

W. M. Jones, J. T. Daly, and N. Debardeleben, Application monitoring and checkpointing in HPC, Proceedings of the 50th Annual Southeast Regional Conference on, ACM-SE '12, pp.262-267, 2012.
DOI : 10.1145/2184512.2184574

D. Manivannan, Q. Jiang, J. Yang, and M. Singhal, A quasi-synchronous checkpointing algorithm that prevents contention for stable storage, Information Sciences, vol.178, issue.15, pp.3109-3116, 2008.
DOI : 10.1016/j.ins.2008.04.001

P. Mcgrath and B. Tangney, Scrabble ??? a distributed application with an emphasis on continuity, Software Engineering Journal, vol.5, issue.3, pp.160-164, 1990.
DOI : 10.1049/sej.1990.0018

A. Moody, G. Bronevetsky, K. Mohror, and B. R. Supinski, Design, modeling, and evaluation of a scalable multi-level checkpointing system, SC '10: Proceedings of the 23rd International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1-11, 2010.

B. Nicolae, On the Benefits of Transparent Compression for Cost-Effective Cloud Data Storage. Transactions on Large-Scale Data-and Knowledge-Centered Systems, pp.167-184, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00613583

B. Nicolae, Towards Scalable Checkpoint Restart: A Collective Inline Memory Contents Deduplication Proposal, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, pp.1-10, 2013.
DOI : 10.1109/IPDPS.2013.14

URL : https://hal.archives-ouvertes.fr/hal-00781532

B. Nicolae and F. Cappello, BlobCR, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, pp.1-3412, 2011.
DOI : 10.1145/2063384.2063429

URL : https://hal.archives-ouvertes.fr/inria-00601865

B. Nicolae and F. Cappello, A hybrid local storage transfer scheme for live migration of I/O intensive workloads, Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing, HPDC '12, pp.85-96, 2012.
DOI : 10.1145/2287076.2287088

URL : https://hal.archives-ouvertes.fr/hal-00686654

S. Rajagopalan, B. Cully, R. O. Connor, and A. Warfield, SecondSite: Disaster Tolerance as a Service, VEE '12: Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments, pp.97-108

M. Vasavada, F. Mueller, P. H. Hargrove, and E. Roman, Comparing different approaches for incremental checkpointing: The showdown, Linux'11: The 13th Annual Linux Symposium, pp.69-79, 2011.

C. Wang, F. Mueller, C. Engelmann, and S. L. Scott, Hybrid Checkpointing for MPI Jobs in HPC Environments, 2010 IEEE 16th International Conference on Parallel and Distributed Systems, pp.524-533, 2010.
DOI : 10.1109/ICPADS.2010.48