DMTCP: Transparent checkpointing for cluster computations and the desktop, 2009 IEEE International Symposium on Parallel & Distributed Processing ,
DOI : 10.1109/IPDPS.2009.5161063
Science clouds: Early experiences in cloud computing for scientific applications, Cloud computing and applications, pp.825-830, 2008. ,
The Eucalyptus Open-Source Cloud-Computing System, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, pp.124-131, 2009. ,
DOI : 10.1109/CCGRID.2009.93
OpenNebula: A Cloud Management Tool, IEEE Internet Computing, vol.15, issue.2, pp.11-14, 2011. ,
DOI : 10.1109/MIC.2011.44
Snooze: A Scalable and Autonomic Virtual Machine Management Framework for Private Clouds, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), 2012. ,
DOI : 10.1109/CCGrid.2012.71
URL : https://hal.archives-ouvertes.fr/hal-00651542
Improving Utilization of Infrastructure Clouds, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp.205-214, 2011. ,
DOI : 10.1109/CCGrid.2011.56
CloudNet, ACM SIGPLAN Notices, vol.46, issue.7, pp.121-132, 2011. ,
DOI : 10.1145/2007477.1952699
FRIEDA: Flexible Robust Intelligent Elastic Data Management in Cloud Environments, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, pp.1096-1105, 2012. ,
DOI : 10.1109/SC.Companion.2012.132
Ceph: A scalable, high-performance distributed file system, Proceedings of the 7th symposium on Operating systems design and implementation. USENIX Association, pp.307-320, 2006. ,
Adaptive checkpointing for master-worker style parallelism (extended abstract), Proc. of 2005 IEEE Computer Society International Conference on Cluster Computing, 2005. ,
Zookeeper: Wait-free coordination for internet-scale systems, USENIX Annual Technical Conference, p.9, 2010. ,
RESTful web framework for java. http://www.restlet.org. [17] (2013) The Grid'5000 experimentation testbed ,
The Nas Parallel Benchmarks, International Journal of High Performance Computing Applications, vol.5, issue.3, pp.63-73, 1991. ,
DOI : 10.1177/109434209100500306
A survey of fault tolerance mechanisms and checkpoint/restart implementations for high performance computing systems, The Journal of Supercomputing, vol.6, issue.5, pp.1302-1326, 2013. ,
DOI : 10.1007/s11227-013-0884-0
The Design and Implementation of Checkpoint/Restart Process Fault Tolerance for Open MPI, 2007 IEEE International Parallel and Distributed Processing Symposium, 2007. ,
DOI : 10.1109/IPDPS.2007.370605
The Lam/Mpi Checkpoint/Restart Framework: System-Initiated Checkpointing, International Journal of High Performance Computing Applications, vol.19, issue.4, pp.479-493, 2005. ,
DOI : 10.1177/1094342005056139
Application-transparent checkpoint/restart for MPI programs over InfiniBand, ICPP '06: Proceedings of the 2006 International Conference on Parallel Processing, pp.471-478, 2006. ,
MPICH-V Project: A Multiprotocol Automatic Fault-Tolerant MPI, International Journal of High Performance Computing Applications, vol.20, issue.3, pp.319-333, 2006. ,
DOI : 10.1177/1094342006067469
URL : https://hal.archives-ouvertes.fr/hal-00688637
Berkeley lab checkpoint/restart (BLCR) for Linux clusters, Journal of Physics: Conference Series, vol.46, pp.494-499, 2006. ,
DOI : 10.1088/1742-6596/46/1/067
Approaches to cloud computing fault tolerance, 2012 International Conference on Computer, Information and Telecommunication Systems (CITS), pp.1-6, 2012. ,
DOI : 10.1109/CITS.2012.6220386
Fault Tolerance Middleware for Cloud Computing, 2010 IEEE 3rd International Conference on Cloud Computing, pp.67-74, 2010. ,
DOI : 10.1109/CLOUD.2010.26
A Fault Tolerance Framework for High Performance Computing in Cloud, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), pp.709-710, 2012. ,
DOI : 10.1109/CCGrid.2012.80
Optimization of cloud task processing with checkpoint-restart mechanism Storage and Analysis, ser. SC '13, Proceedings of the International Conference on High Performance Computing, Networking, pp.1-6412, 2013. ,
BlobCR, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, pp.341-3412, 2011. ,
DOI : 10.1145/2063384.2063429
URL : https://hal.archives-ouvertes.fr/inria-00601865
VNsnap: Taking Snapshots of Virtual Networked Infrastructures in the Cloud, Services Computing, pp.484-496, 2012. ,
DOI : 10.1109/TSC.2011.29
Checkpoint-restart for a network of virtual machines, 2013 IEEE International Conference on Cluster Computing (CLUSTER), 2013. ,
DOI : 10.1109/CLUSTER.2013.6702626
A large-scale study of failures in highperformance computing systems Dependable and Secure Computing, IEEE Transactions on, vol.7, issue.4, pp.337-350, 2010. ,
A Self-tuning Failure Detection Scheme for Cloud Computing Service, 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pp.668-679, 2012. ,
DOI : 10.1109/IPDPS.2012.126
GAMoSe: An Accurate Monitoring Service For Grid Applications, Sixth International Symposium on Parallel and Distributed Computing (ISPDC'07), pp.40-40, 2007. ,
DOI : 10.1109/ISPDC.2007.23
URL : https://hal.archives-ouvertes.fr/inria-00424023