L. Alvisi and K. Marzullo, Message logging: pessimistic, optimistic, causal, and optimal, IEEE Transactions on Software Engineering, vol.24, issue.2, pp.149-159, 1998.
DOI : 10.1109/32.666828

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.12.78

L. Bautista-gomez, S. Tsuboi, D. Komatitsch, F. Cappello, N. Maruyama et al., FTI, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, pp.1-3232, 2011.
DOI : 10.1145/2063384.2063427

URL : https://hal.archives-ouvertes.fr/hal-00721216

J. Bent, G. Gibson, G. Grider, B. Mcclelland, P. Nowoczynski et al., PLFS, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, pp.1-12, 2009.
DOI : 10.1145/1654059.1654081

R. Bolze, F. Cappello, E. Caron, M. Daydé, F. Desprez et al., Grid'5000: A Large Scale And Highly Reconfigurable Experimental Grid Testbed, International Journal of High Performance Computing Applications, vol.20, issue.4, pp.481-494, 2006.
DOI : 10.1177/1094342006070078

URL : https://hal.archives-ouvertes.fr/hal-00684943

H. George, R. Bryan, and . Rotunno, The maximum intensity of tropical cyclones in axisymmetric numerical model simulations, Journal of the American Meteorological Society, vol.137, pp.1770-1789, 2009.

B. Calder, J. Wang, A. Ogus, N. Nilakantan, A. Skjolsvold et al., Windows Azure Storage, Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, SOSP '11, pp.143-157, 2011.
DOI : 10.1145/2043556.2043571

C. Coti, T. Herault, P. Lemarinier, L. Pilard, A. Rezmerita et al., Blocking vs. Non-Blocking Coordinated Checkpointing for Large-Scale Fault Tolerant MPI, ACM/IEEE SC 2006 Conference (SC'06), 2006.
DOI : 10.1109/SC.2006.15

URL : https://hal.archives-ouvertes.fr/hal-00688644

B. Cully, G. Lefebvre, D. Meyer, M. Feeley, N. Hutchinson et al., Remus: high availability via asynchronous virtual machine replication, NSDI'08: Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation, pp.161-174, 2008.

X. Dong, Y. Xie, N. Muralimanohar, and N. P. Jouppi, Hybrid checkpointing using emerging nonvolatile memories for future exascale systems, ACM Transactions on Architecture and Code Optimization, vol.8, issue.2, pp.1-629, 2011.
DOI : 10.1145/1970386.1970387

J. Duell, P. Hargrove, and E. Roman, The Design and Implementation of Berkeley Lab's Linux Checkpoint/Restart, Future Technologies Group, 2002.

E. N. Mootaz, L. Elnozahy, Y. Alvisi, D. B. Wang, and . Johnson, A survey of rollback-recovery protocols in message-passing systems, ACM Comput . Surv, vol.34, pp.375-408, 2002.

K. Ferreira, J. Stearley, J. H. Laros, I. , R. Oldfield et al., Evaluating the viability of process replication reliability for exascale systems, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, pp.1-4412, 2011.
DOI : 10.1145/2063384.2063443

M. Gagné, Cooking with Linux?still searching for the ultimate Linux distro?, Linux J, issue.161, p.9, 2007.

L. Bautista-gomez, B. Nicolae, N. Maruyama, F. Cappello, and S. Matsuoka, Scalable Reed-Solomon-based Reliable Local Storage for HPC Applications on IaaS Clouds, Euro-Par '12: 18th International Euro-Par Conference on Parallel Processing, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00703119

W. Gropp, E. Lusk, and A. Skjellum, ): portable parallel programming with the message-passing interface, Using MPI, 1999.

A. Guermouche, T. Ropars, E. Brunet, M. Snir, and F. Cappello, Uncoordinated Checkpointing Without Domino Effect for Send-Deterministic MPI Applications, 2011 IEEE International Parallel & Distributed Processing Symposium, pp.989-1000, 2011.
DOI : 10.1109/IPDPS.2011.95

URL : https://hal.archives-ouvertes.fr/hal-01121937

A. Gupta, L. V. Kal, D. S. Milojicic, P. Faraboschi, R. Kaufmann et al., Exploring the performance and mapping of HPC applications to platforms in the cloud, Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing, HPDC '12, pp.121-122, 2012.
DOI : 10.1145/2287076.2287093

G. Jacob, E. Hansen, and . Jul, Scalable virtual machine storage using local disks, SIGOPS Oper. Syst. Rev, vol.44, pp.71-79, 2010.

Q. He, S. Zhou, B. Kobler, D. Duffy, and T. Mcglynn, Case study for running HPC applications in public clouds, Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC '10, pp.395-401, 2010.
DOI : 10.1145/1851476.1851535

A. Lakshman and P. Malik, Cassandra, ACM SIGOPS Operating Systems Review, vol.44, issue.2, pp.35-40, 2010.
DOI : 10.1145/1773912.1773922

D. Manivannan, Q. Jiang, J. Yang, and M. Singhal, A quasi-synchronous checkpointing algorithm that prevents contention for stable storage, Information Sciences, vol.178, issue.15, pp.3109-3116, 2008.
DOI : 10.1016/j.ins.2008.04.001

P. Mehrotra, J. Djomehri, S. Heistand, R. Hood, H. Jin et al., Performance evaluation of Amazon EC2 for NASA HPC applications, Proceedings of the 3rd workshop on Scientific Cloud Computing Date, ScienceCloud '12, p.4150, 2012.
DOI : 10.1145/2287036.2287045

T. Dutch, G. Meyer, B. Aggarwal, G. Cully, M. J. Lefebvre et al., Parallax: Virtual disks for virtual machines, SIGOPS Oper. Syst. Rev, vol.42, issue.4, pp.41-54, 2008.

A. Moody, G. Bronevetsky, K. Mohror, and B. R. De-supinski, Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
DOI : 10.1109/SC.2010.18

B. Nicolae, BlobSeer, Proceedings of the 2009 EDBT/ICDT Workshops on, EDBT/ICDT '09, 2010.
DOI : 10.1145/1698790.1698796

URL : https://hal.archives-ouvertes.fr/hal-00803430

B. Nicolae, On the Benefits of Transparent Compression for Cost-Effective Cloud Data Storage. Transactions on Large-Scale Data-and Knowledge- Centered Systems, pp.167-184, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00613583

B. Nicolae, G. Antoniu, L. Bougé, D. Moise, and A. Carpen-amarie, BlobSeer: Next-generation data management for large scale infrastructures, Journal of Parallel and Distributed Computing, vol.71, issue.2, pp.169-184, 2011.
DOI : 10.1016/j.jpdc.2010.08.004

URL : https://hal.archives-ouvertes.fr/inria-00511414

B. Nicolae, J. Bresnahan, K. Keahey, and G. Antoniu, Going back and forth, Proceedings of the 20th international symposium on High performance distributed computing, HPDC '11, pp.147-158, 2011.
DOI : 10.1145/1996130.1996152

URL : https://hal.archives-ouvertes.fr/inria-00570682

B. Nicolae and F. Cappello, BlobCR, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, pp.1-3412, 2011.
DOI : 10.1145/2063384.2063429

URL : https://hal.archives-ouvertes.fr/inria-00601865

B. Nicolae and F. Cappello, A hybrid local storage transfer scheme for live migration of I/O intensive workloads, Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing, HPDC '12, pp.85-96, 2012.
DOI : 10.1145/2287076.2287088

URL : https://hal.archives-ouvertes.fr/hal-00686654

B. Nicolae, F. Cappello, and G. Antoniu, Optimizing multideployment on clouds by means of self-adaptive prefetching, Euro-Par '11: 17th International Euro-Par Conference on Parallel Processing, pp.503-513, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00594406

X. Ouyang, K. Gopalakrishnan, T. Gangadharappa, and D. K. Panda, Fast checkpointing by Write Aggregation with Dynamic Buffer and Interleaving on multicore architecture, 2009 International Conference on High Performance Computing (HiPC), pp.99-108, 2009.
DOI : 10.1109/HIPC.2009.5433218

X. Ouyang, R. Rajachandrasekar, X. Besseron, H. Wang, J. Huang et al., CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart, 2011 International Conference on Parallel Processing, pp.375-384, 2011.
DOI : 10.1109/ICPP.2011.85

T. Vu-pham, H. Jamjoom, K. Jordan, and Z. Shae, A service composition framework for market-oriented high performance computing cloud, HPDC '10: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, pp.284-287, 2010.

L. Ramakrishnan, P. T. Zbiegel, S. Campbell, R. Bradshaw, R. Shane-canon et al., Magellan, Proceedings of the 2nd international workshop on Scientific cloud computing, ScienceCloud '11, pp.49-58, 2011.
DOI : 10.1145/1996109.1996119

D. Reimer, A. Thomas, G. Ammons, T. Mummert, V. Bowen-alpern et al., Opening black boxes, Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments , VEE '08, pp.111-120, 2008.
DOI : 10.1145/1346256.1346272

M. Rosenblum and J. K. Ousterhout, The design and implementation of a log-structured file system, ACM Transactions on Computer Systems, vol.10, issue.1, pp.26-52, 1992.
DOI : 10.1145/146941.146943

G. Vallée, T. Naughton, H. Ong, and S. L. Scott, Checkpoint/restart of virtual machines based on Xen, HAPCW '06: Proceedings of the High Availability and Performance Workshop, 2006.

O. Villa, S. Krishnamoorthy, J. Nieplocha, D. M. Brown, and J. , Scalable transparent checkpoint-restart of global address space applications on virtual machines over infiniband, Proceedings of the 6th ACM conference on Computing frontiers, CF '09, pp.197-206, 2009.
DOI : 10.1145/1531743.1531776

K. Venkatesh, V. , and N. Nagappan, Characterizing cloud computing hardware reliability, SoCC '10: Proceedings of the 1st ACM symposium on Cloud computing, pp.193-204, 2010.

C. Wang, F. Mueller, C. Engelmann, and S. L. Scott, Hybrid Checkpointing for MPI Jobs in HPC Environments, 2010 IEEE 16th International Conference on Parallel and Distributed Systems, pp.524-533, 2010.
DOI : 10.1109/ICPADS.2010.48

M. Zhang, H. Jin, X. Shi, and S. Wu, VirtCFT: A Transparent VM-Level Fault-Tolerant System for Virtual Clusters, 2010 IEEE 16th International Conference on Parallel and Distributed Systems, pp.147-154, 2010.
DOI : 10.1109/ICPADS.2010.125