J. Dongarra, P. Beckman, and T. Moore, The International Exascale Software Project roadmap, International Journal of High Performance Computing Applications, vol.25, issue.1, 2011.
DOI : 10.1177/1094342010391989

L. Kale and S. Krishnan, CHARM++: A Portable Concurrent Object Oriented System Based on C++, OOPSLA'93, 1993.

C. Coti and N. Grenèche, Os-level failure injection with systemtap, 1502.

L. Sarzyniec, T. Buchert, E. Jeanvoine, and L. Nussbaum, Design and Evaluation of a Virtual Experimental Environment for Distributed Systems, 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, 2013.
DOI : 10.1109/PDP.2013.32

URL : https://hal.archives-ouvertes.fr/hal-00724308

O. S. Gómez, N. Juristo, and S. Vegas, Replications types in experimental disciplines, Proceedings of the 2010 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM '10, 2010.
DOI : 10.1145/1852786.1852790

G. Zheng, L. Shi, and L. V. Kalé, FTC-Charm++: An In-Memory Checkpoint-Based Fault Tolerant Runtime for Charm++ and MPI, 2004.

Y. Robert, Models for fault-tolerance at very large scale, 2013.

I. Koren and C. M. Krishna, Fault-tolerant systems, 2010.

A. Gupta, O. Sarood, L. Kale, and D. Milojicic, Improving HPC Application Performance in Cloud through Dynamic Load Balancing, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, 2013.
DOI : 10.1109/CCGrid.2013.65

T. Buchert, L. Nussbaum, and J. Gustedt, Methods for Emulation of Multi-core CPU Performance, 2011 IEEE International Conference on High Performance Computing and Communications, 2011.
DOI : 10.1109/HPCC.2011.45

URL : https://hal.archives-ouvertes.fr/inria-00535534

H. Menon and L. V. Kale, A distributed dynamic load balancer for iterative applications, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '13, 2013.
DOI : 10.1145/2503210.2503284

H. Menon, B. Acun, S. De-gonzalo, O. Sarood, and L. Kale, Thermal aware automated load balancing for HPC applications, 2013 IEEE International Conference on Cluster Computing (CLUSTER), 2013.
DOI : 10.1109/CLUSTER.2013.6702627

X. Ni, E. Meneses, N. Jain, and L. V. Kale, ACR, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '13, 2013.
DOI : 10.1145/2503210.2503266

X. Ni, E. Meneses, and L. V. Kale, Hiding Checkpoint Overhead in HPC Applications with a Semi-Blocking Algorithm, 2012 IEEE International Conference on Cluster Computing, 2012.
DOI : 10.1109/CLUSTER.2012.82

E. Totoni, N. Jain, and L. V. Kale, Toward Runtime Power Management of Exascale Networks by on/off Control of Links, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum, 2013.
DOI : 10.1109/IPDPSW.2013.191