M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. H. Katz et al., Above the clouds: A berkeley view of cloud computing, EECS Department, 2009.

J. E. Smith and R. Nair, Virtual Machines: Versatile Platforms For Systems And Processes, 2005.

D. Gupta, L. Cherkasova, R. Gardner, and A. Vahdat, Enforcing Performance Isolation Across Virtual Machines in Xen, Proceedings of 7th ACM/IFIP/USENIX Int'l Conf. on Middleware (Middleware'06), pp.342-362, 2006.
DOI : 10.1145/956993.956995

C. Evangelinos and C. N. Hill, Cloud Computing for parallel Scientific HPC Applications: Feasibility of Running Coupled Atmosphere-Ocean Climate Models on Amazon's EC2, Computability and Complexity in Analysis (CAA'08), 2008.

D. Nurmi, R. Wolski, C. Grzegorczyk, G. Obertelli, S. Soman et al., Eucalyptus: an open-source cloud computing infrastructure, Journal of Physics: Conference Series, pp.1-14, 2009.
DOI : 10.1088/1742-6596/180/1/012051

J. N. Glosli, K. J. Caspersen, J. A. Gunnels, D. F. Richards, R. E. Rudd et al., Extending stability beyond CPU millennium, Proceedings of the 2007 ACM/IEEE conference on Supercomputing , SC '07, pp.1-58, 2007.
DOI : 10.1145/1362622.1362700

J. Wilkes, More Google cluster data Google research blog, 2011.

C. Reiss, J. Wilkes, and J. L. Hellerstein, Google cluster-usage traces: format + schema, Google Inc, 2011.

C. Reiss, A. Tumanov, G. R. Ganger, R. H. Katz, and M. A. Kozuch, Towards understanding heterogeneous clouds at scale: Google trace analysis. Intel science and technology center for cloud computing, 2012.

S. Di, D. Kondo, and W. Cirne, Characterization and Comparison of Cloud versus Grid Workloads, 2012 IEEE International Conference on Cluster Computing, pp.230-238, 2012.
DOI : 10.1109/CLUSTER.2012.35

S. Yi, A. Andrzejak, and D. Kondo, Monetary Cost-Aware Checkpointing and Migration on Amazon Cloud Spot Instances, IEEE Trans. on Services Computing, pp.512-524, 2012.
DOI : 10.1109/TSC.2011.44

URL : https://hal.archives-ouvertes.fr/hal-00788761

W. Cirne, G. Chaudhry, and S. Johnson, Managing Descheduling Risk in the Google Cloud

J. T. Daly, A higher order estimate of the optimum checkpoint interval for restart dumps, Future Generation Computer Systems, vol.22, issue.3, pp.303-312, 2006.
DOI : 10.1016/j.future.2004.11.016

R. Subramaniyan, E. Grobelny, S. Studham, and A. George, Optimization of checkpointing-related I/O for??high-performance parallel and distributed computing, The Journal of Supercomputing, vol.35, issue.1, pp.150-180, 2008.
DOI : 10.1007/s11227-007-0162-0

M. S. Bouguerra, T. Gautier, D. Trystram, and J. M. Vincent, A Flexible Checkpoint/Restart Model in Distributed Systems, Proceedings of the 8th international conference on Parallel processing and applied mathematics (PPAM'10), pp.206-215, 2010.
DOI : 10.1007/978-3-642-14390-8_22

URL : https://hal.archives-ouvertes.fr/hal-00788926

J. W. Young, A first order approximation to the optimum checkpoint interval, Communications ACM, pp.530-531, 1974.
DOI : 10.1145/361147.361115

P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris et al., Xen and the art of virtualization, Proceedings of the 19th ACM symposium on Operating systems principles (SOSP '03, pp.164-177, 2003.

J. Dean and S. Ghemawat, MapReduce, Communications of the ACM, vol.51, issue.1, pp.107-113, 2008.
DOI : 10.1145/1327452.1327492

P. H. Hargrove and J. C. , Berkeley lab checkpoint/restart (BLCR) for Linux clusters, Journal of Physics: Conference Series, p.494, 2006.
DOI : 10.1088/1742-6596/46/1/067

L. A. Barroso, J. Dean, and U. Holzle, Web search for a planet: the google cluster architecture, IEEE Micro, vol.23, issue.2, pp.22-28, 2003.
DOI : 10.1109/MM.2003.1196112

L. Huang, J. Jia, B. Yu, B. G. Chun, P. Maniatis et al., Predicting Execution Time of Computer Programs Using Sparse Polynomial Regression, Proceedings of 24th International Conference on Neural Information Processing Systems (NIPS'10), pp.1-9, 2010.

B. Nicolae and F. Cappello, BlobCR, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, pp.1-3412, 2011.
DOI : 10.1145/2063384.2063429

URL : https://hal.archives-ouvertes.fr/inria-00601865

S. Di and C. L. Wang, Error-Tolerant Resource Allocation and Payment Minimization for Cloud System, IEEE Transactions on Parallel and Distributed Systems, vol.24, issue.6, pp.1097-1106, 2013.
DOI : 10.1109/TPDS.2012.309

C. H. Leung and Q. H. Choo, On the Execution of Large Batch Programs in Unreliable Computing Systems, IEEE Transactions on Software Engineering, vol.10, issue.4, pp.444-450, 1984.
DOI : 10.1109/TSE.1984.5010258

K. Wolter, Stochastic models for checkpointing. in Stochastic Models for Fault Tolerance, pp.177-236, 2010.

T. Nakagawa, Optimum retrial number of reliability models. in Advanced Reliability Models and Maintenance Policies, Series in Reliability Engineering, pp.101-122, 2008.

A. Tchana, D. Broto, and . Hagimont, Fault Tolerant Approaches in Cloud Computing Infrastructures, Proceedings of the 8th International Conference on Autonomic and Autonomous Systems (ICAS'12), pp.42-48, 2012.

J. Barr, A. Narin, and J. Varia, Building Fault-Tolerant Applications on AWS, Tech. Rep, 2011.