Message logging: pessimistic, optimistic, causal, and optimal, IEEE Transactions on Software Engineering, vol.24, issue.2, pp.149-159, 1998. ,
DOI : 10.1109/32.666828
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.12.78
FTI, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, pp.1-3232, 2011. ,
DOI : 10.1145/2063384.2063427
URL : https://hal.archives-ouvertes.fr/hal-00721216
PLFS, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, pp.1-12, 2009. ,
DOI : 10.1145/1654059.1654081
Grid'5000: A Large Scale And Highly Reconfigurable Experimental Grid Testbed, International Journal of High Performance Computing Applications, vol.20, issue.4, pp.481-494, 2006. ,
DOI : 10.1177/1094342006070078
URL : https://hal.archives-ouvertes.fr/hal-00684943
The maximum intensity of tropical cyclones in axisymmetric numerical model simulations, Journal of the American Meteorological Society, vol.137, pp.1770-1789, 2009. ,
Windows Azure Storage, Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, SOSP '11, pp.143-157, 2011. ,
DOI : 10.1145/2043556.2043571
Blocking vs. Non-Blocking Coordinated Checkpointing for Large-Scale Fault Tolerant MPI, ACM/IEEE SC 2006 Conference (SC'06), 2006. ,
DOI : 10.1109/SC.2006.15
URL : https://hal.archives-ouvertes.fr/hal-00688644
Remus: high availability via asynchronous virtual machine replication, NSDI'08: Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation, pp.161-174, 2008. ,
Hybrid checkpointing using emerging nonvolatile memories for future exascale systems, ACM Transactions on Architecture and Code Optimization, vol.8, issue.2, pp.1-629, 2011. ,
DOI : 10.1145/1970386.1970387
The Design and Implementation of Berkeley Lab's Linux Checkpoint/Restart, Future Technologies Group, 2002. ,
A survey of rollback-recovery protocols in message-passing systems, ACM Comput . Surv, vol.34, pp.375-408, 2002. ,
Evaluating the viability of process replication reliability for exascale systems, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, pp.1-4412, 2011. ,
DOI : 10.1145/2063384.2063443
Cooking with Linux?still searching for the ultimate Linux distro?, Linux J, issue.161, p.9, 2007. ,
Scalable Reed-Solomon-based Reliable Local Storage for HPC Applications on IaaS Clouds, Euro-Par '12: 18th International Euro-Par Conference on Parallel Processing, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00703119
): portable parallel programming with the message-passing interface, Using MPI, 1999. ,
Uncoordinated Checkpointing Without Domino Effect for Send-Deterministic MPI Applications, 2011 IEEE International Parallel & Distributed Processing Symposium, pp.989-1000, 2011. ,
DOI : 10.1109/IPDPS.2011.95
URL : https://hal.archives-ouvertes.fr/hal-01121937
Exploring the performance and mapping of HPC applications to platforms in the cloud, Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing, HPDC '12, pp.121-122, 2012. ,
DOI : 10.1145/2287076.2287093
Scalable virtual machine storage using local disks, SIGOPS Oper. Syst. Rev, vol.44, pp.71-79, 2010. ,
Case study for running HPC applications in public clouds, Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC '10, pp.395-401, 2010. ,
DOI : 10.1145/1851476.1851535
Cassandra, ACM SIGOPS Operating Systems Review, vol.44, issue.2, pp.35-40, 2010. ,
DOI : 10.1145/1773912.1773922
A quasi-synchronous checkpointing algorithm that prevents contention for stable storage, Information Sciences, vol.178, issue.15, pp.3109-3116, 2008. ,
DOI : 10.1016/j.ins.2008.04.001
Performance evaluation of Amazon EC2 for NASA HPC applications, Proceedings of the 3rd workshop on Scientific Cloud Computing Date, ScienceCloud '12, p.4150, 2012. ,
DOI : 10.1145/2287036.2287045
Parallax: Virtual disks for virtual machines, SIGOPS Oper. Syst. Rev, vol.42, issue.4, pp.41-54, 2008. ,
Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis ,
DOI : 10.1109/SC.2010.18
BlobSeer, Proceedings of the 2009 EDBT/ICDT Workshops on, EDBT/ICDT '09, 2010. ,
DOI : 10.1145/1698790.1698796
URL : https://hal.archives-ouvertes.fr/hal-00803430
On the Benefits of Transparent Compression for Cost-Effective Cloud Data Storage. Transactions on Large-Scale Data-and Knowledge- Centered Systems, pp.167-184, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00613583
BlobSeer: Next-generation data management for large scale infrastructures, Journal of Parallel and Distributed Computing, vol.71, issue.2, pp.169-184, 2011. ,
DOI : 10.1016/j.jpdc.2010.08.004
URL : https://hal.archives-ouvertes.fr/inria-00511414
Going back and forth, Proceedings of the 20th international symposium on High performance distributed computing, HPDC '11, pp.147-158, 2011. ,
DOI : 10.1145/1996130.1996152
URL : https://hal.archives-ouvertes.fr/inria-00570682
BlobCR, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, pp.1-3412, 2011. ,
DOI : 10.1145/2063384.2063429
URL : https://hal.archives-ouvertes.fr/inria-00601865
A hybrid local storage transfer scheme for live migration of I/O intensive workloads, Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing, HPDC '12, pp.85-96, 2012. ,
DOI : 10.1145/2287076.2287088
URL : https://hal.archives-ouvertes.fr/hal-00686654
Optimizing multideployment on clouds by means of self-adaptive prefetching, Euro-Par '11: 17th International Euro-Par Conference on Parallel Processing, pp.503-513, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00594406
Fast checkpointing by Write Aggregation with Dynamic Buffer and Interleaving on multicore architecture, 2009 International Conference on High Performance Computing (HiPC), pp.99-108, 2009. ,
DOI : 10.1109/HIPC.2009.5433218
CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart, 2011 International Conference on Parallel Processing, pp.375-384, 2011. ,
DOI : 10.1109/ICPP.2011.85
A service composition framework for market-oriented high performance computing cloud, HPDC '10: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, pp.284-287, 2010. ,
Magellan, Proceedings of the 2nd international workshop on Scientific cloud computing, ScienceCloud '11, pp.49-58, 2011. ,
DOI : 10.1145/1996109.1996119
Opening black boxes, Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments , VEE '08, pp.111-120, 2008. ,
DOI : 10.1145/1346256.1346272
The design and implementation of a log-structured file system, ACM Transactions on Computer Systems, vol.10, issue.1, pp.26-52, 1992. ,
DOI : 10.1145/146941.146943
Checkpoint/restart of virtual machines based on Xen, HAPCW '06: Proceedings of the High Availability and Performance Workshop, 2006. ,
Scalable transparent checkpoint-restart of global address space applications on virtual machines over infiniband, Proceedings of the 6th ACM conference on Computing frontiers, CF '09, pp.197-206, 2009. ,
DOI : 10.1145/1531743.1531776
Characterizing cloud computing hardware reliability, SoCC '10: Proceedings of the 1st ACM symposium on Cloud computing, pp.193-204, 2010. ,
Hybrid Checkpointing for MPI Jobs in HPC Environments, 2010 IEEE 16th International Conference on Parallel and Distributed Systems, pp.524-533, 2010. ,
DOI : 10.1109/ICPADS.2010.48
VirtCFT: A Transparent VM-Level Fault-Tolerant System for Virtual Clusters, 2010 IEEE 16th International Conference on Parallel and Distributed Systems, pp.147-154, 2010. ,
DOI : 10.1109/ICPADS.2010.125