J. Dongarra, The International Exascale Software Project: a Call To Cooperative Action By the Global High-Performance Community, International Journal of High Performance Computing Applications, vol.23, issue.4, pp.309-322, 2009.
DOI : 10.1177/1094342009347714

J. W. Young, A first order approximation to the optimum checkpoint interval, Communications of the ACM, vol.17, issue.9, pp.530-531, 1974.
DOI : 10.1145/361147.361115

J. T. Daly, A higher order estimate of the optimum checkpoint interval for restart dumps, Future Generation Computer Systems, vol.22, issue.3, pp.303-312, 2006.
DOI : 10.1016/j.future.2004.11.016

E. Gelenbe and D. Derochette, Performance of rollback recovery systems under intermittent failures, Communications of the ACM, vol.21, issue.6, pp.493-499, 1978.
DOI : 10.1145/359511.359531

A. Bouteiller, P. Lemarinier, K. Krawezik, and F. Capello, Coordinated checkpoint versus message log for fault tolerant MPI, Proceedings IEEE International Conference on Cluster Computing CLUSTR-03, pp.242-250, 2003.
DOI : 10.1109/CLUSTR.2003.1253321

K. M. Chandy and L. Lamport, Distributed snapshots: determining global states of distributed systems, Transactions on Computer Systems, pp.63-75, 1985.
DOI : 10.1145/214451.214456

E. N. Elnozahy, L. Alvisi, Y. Wang, and D. B. Johnson, A survey of rollback-recovery protocols in message-passing systems, ACM Computing Surveys, vol.34, issue.3, pp.375-408, 2002.
DOI : 10.1145/568522.568525

S. Agarwal, R. Garg, M. S. Gupta, and J. E. Moreira, Adaptive incremental checkpointing for massively parallel systems, Proceedings of the 18th annual international conference on Supercomputing , ICS '04, 2004.
DOI : 10.1145/1006209.1006248

S. Bharathi, A. Chervenak, E. Deelman, G. Mehta, M. Su et al., Characterization of scientific workflows, 2008 Third Workshop on Workflows in Support of Large-Scale Science, pp.1-10, 2008.
DOI : 10.1109/WORKS.2008.4723958

S. Chakrabarti, J. Demmel, and K. Yelick, Modeling the benefits of mixed data and task parallelism, Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures , SPAA '95, 1995.
DOI : 10.1145/215399.215423

P. Dutot, L. Eyraud, G. Mounié, and D. Trystram, SCHEDULING ON LARGE SCALE DISTRIBUTED PLATFORMS: FROM MODELS TO IMPLEMENTATIONS, International Journal of Foundations of Computer Science, vol.16, issue.02, pp.217-237, 2005.
DOI : 10.1142/S0129054105002954

URL : https://hal.archives-ouvertes.fr/hal-00005318

F. Suter, Scheduling delta-critical tasks in mixed-parallel applications on a national grid, Int. Conf. Grid Computing, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00165868

S. Toueg and . Babaoglu, On the Optimum Checkpoint Selection Problem, SIAM Journal on Computing, vol.13, issue.3, pp.630-649, 1984.
DOI : 10.1137/0213039

M. Bouguerra, T. Gautier, D. Trystram, and J. Vincent, A Flexible Checkpoint/Restart Model in Distributed Systems, PPAM, 2010.
DOI : 10.1007/978-3-642-14390-8_22

URL : https://hal.archives-ouvertes.fr/hal-00788926

Y. Ling, J. Mi, and X. Lin, A variational calculus approach to optimal checkpoint placement, IEEE Trans. Computers, pp.699-708, 2001.

T. Ozaki, T. Dohi, H. Okamura, and N. Kaio, Distribution-Free Checkpoint Placement Algorithms Based on Min-Max Principle, IEEE Transactions on Dependable and Secure Computing, vol.3, issue.2, pp.130-140, 2006.
DOI : 10.1109/TDSC.2006.22

M. Bougeret, H. Casanova, M. Rabie, Y. Robert, and F. Vivien, Checkpointing strategies for parallel jobs, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, 2011.
DOI : 10.1145/2063384.2063428

URL : https://hal.archives-ouvertes.fr/hal-00738504

E. Gelenbe and M. Hernández, Optimum checkpoints with age dependent failures, Acta Informatica, vol.27, issue.6, pp.519-531, 1990.
DOI : 10.1007/BF00277388

M. Bouguerra, D. Trystram, and F. Wagner, Complexity Analysis of Checkpoint Scheduling with Variable Costs, IEEE Transactions on Computers, vol.62, issue.6, pp.1269-1275, 2013.
DOI : 10.1109/TC.2012.57

URL : https://hal.archives-ouvertes.fr/hal-00788101

Y. Robert, F. Vivien, and D. Zaidouni, On the complexity of scheduling checkpoints for computational workflows, IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN 2012), pp.1-6, 2012.
DOI : 10.1109/DSNW.2012.6264675

URL : https://hal.archives-ouvertes.fr/hal-00763382

G. Aupy, A. Benoit, H. Casanova, and Y. Robert, Scheduling Computational Workflows on Failure-Prone Platforms, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2014.
DOI : 10.1109/IPDPSW.2015.33

URL : https://hal.archives-ouvertes.fr/hal-01075100

M. R. Garey and D. S. Johnson, Computers and Intractability, a Guide to the Theory of NP-Completeness, 1979.

G. Aupy, Source code and data, 2014.

. Pegasus, Pegasus workflow generator, 2014.

G. Juve, A. Chervenak, E. Deelman, S. Bharathi, G. Mehta et al., Characterizing and profiling scientific workflows, Future Generation Computer Systems, vol.29, issue.3, pp.682-692, 2013.
DOI : 10.1016/j.future.2012.08.015