O. Weidner, M. Atkinson, A. Barker, and R. Vicente, Rethinking High Performance Computing Platforms, Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing, DIDC '16, 2016.
DOI : 10.1109/PDP.2013.41

A. Bhatele, K. Mohror, S. H. Langer, and K. E. Isaacs, There goes the neighborhood, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '13, 2013.
DOI : 10.1145/2503210.2503247

J. W. Young, A first order approximation to the optimum checkpoint interval, Communications of the ACM, vol.17, issue.9, pp.530-531, 1974.
DOI : 10.1145/361147.361115

J. T. Daly, A higher order estimate of the optimum checkpoint interval for restart dumps, Future Generation Computer Systems, vol.22, issue.3, pp.303-312, 2006.
DOI : 10.1016/j.future.2004.11.016

H. Luu, M. Winslett, W. Gropp, R. Ross, P. Carns et al., A Multiplatform Study of I/O Behavior on Petascale Supercomputers, Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, HPDC '15, 2015.
DOI : 10.1109/SC.2008.5222721

M. Dorier, G. Antoniu, R. Ross, D. Kimpe, and S. Ibrahim, CALCioM: Mitigating I/O Interference in HPC Systems through Cross-Application Coordination, 2014 IEEE 28th International Parallel and Distributed Processing Symposium, pp.156-163, 2015.
DOI : 10.1109/IPDPS.2014.27

URL : https://hal.archives-ouvertes.fr/hal-00916091

A. Moody, G. Bronevetsky, K. Mohror, and B. R. De-supinski, Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System, Proc. Supercomputing (SC '10, 2010.

L. Bautista-gomez, S. Tsuboi, D. Komatitsch, F. Cappello, N. Maruyama et al., FTI, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, 2011.
DOI : 10.1145/2063384.2063427

URL : https://hal.archives-ouvertes.fr/hal-00721216

S. Boyd and L. Vandenberghe, Convex Optimization, 2004.

S. Arunagiri, J. T. Daly, and P. J. Teller, Modeling and Analysis of Checkpoint I/O Operations, Analytical and Stochastic Modeling Techniques and Applications: 17th International Conference, pp.387-399, 2010.
DOI : 10.1145/361147.361115

J. Plank, K. Li, and M. Puening, Diskless checkpointing, IEEE Transactions on Parallel and Distributed Systems, vol.9, issue.10, pp.972-986, 1998.
DOI : 10.1109/71.730527

J. Cornwell and A. Kongmunvattana, Efficient System-Level Remote Checkpointing Technique for BLCR, 2011 Eighth International Conference on Information Technology: New Generations, pp.1002-1007, 2011.
DOI : 10.1109/ITNG.2011.172

N. H. Vaidya, A case for two-level distributed recovery schemes, ACM SIGMETRICS Joint Int. Conf. on Measurement and Modeling of Computer Systems, 1995.

A. Kougkas, M. Dorier, R. Latham, R. Ross, and X. H. Sun, Leveraging burst buffer coordination to prevent I/O interference, 2016 IEEE 12th International Conference on e-Science (e-Science)
DOI : 10.1109/eScience.2016.7870922

J. Bent, G. Gibson, G. Grider, B. Mcclelland, P. Nowoczynski et al., PLFS, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, pp.1-2112, 2009.
DOI : 10.1145/1654059.1654081

J. S. Plank, Y. Chen, K. Li, M. Beck, and G. Kingsley, Memory exclusion: optimizing the performance of checkpointing systems, Software: Practice and Experience, vol.43, issue.2, pp.125-142, 1999.
DOI : 10.1006/jpdc.1997.1338

G. Bronevetsky, D. Marques, K. Pingali, S. Mckee, and R. Rugina, Compiler-enhanced incremental checkpointing for OpenMP applications, IEEE International Symposium on Parallel&Distributed Processing, pp.1-12, 2009.

Y. Chen, K. Li, and J. S. Plank, CLIP, Proceedings of the 1997 ACM/IEEE conference on Supercomputing (CDROM) , Supercomputing '97, 1997.
DOI : 10.1145/509593.509626

E. N. Elnozahy, D. B. Johnson, and W. Zwaenpoel, The performance of consistent checkpointing, [1992] Proceedings 11th Symposium on Reliable Distributed Systems, 1992.
DOI : 10.1109/RELDIS.1992.235144

K. Li, J. F. Naughton, and J. S. Plank, Low-Latency, Concurrent Checkpointing for Parallel Programs, IEEE Transactions on Parallel and Distributed Systems, vol.5, issue.8, pp.874-879, 1994.

J. S. Plank, M. Beck, G. Kingsley, and K. Li, Libckpt: Transparent Checkpointing under Unix, USENIX Winter 1995 Technical Conference, pp.213-224, 1995.

M. Paun, N. Naksinehaboon, R. Nassar, C. Leangsuksun, S. L. Scott et al., INCREMENTAL CHECKPOINT SCHEMES FOR WEIBULL FAILURE DISTRIBUTION, International Journal of Foundations of Computer Science, vol.23, issue.03, pp.329-344, 2010.
DOI : 10.1109/12.936236

S. Al-kiswany, M. Ripeanu, S. Vazhkudai, and A. Gharaibeh, stdchk: A Checkpoint Storage System for Desktop Grid Computing, 2008 The 28th International Conference on Distributed Computing Systems, pp.613-624, 2008.
DOI : 10.1109/ICDCS.2008.19

K. B. Ferreira, R. Riesen, R. Brightwell, P. G. Bridges, and D. Arnold, Libhashckpt: Hashbased Incremental Checkpointing Using GPUs, Proceedings of the 18th EuroMPI Conference, 2011.

D. Ibtesham, D. Arnold, P. G. Bridges, K. B. Ferreira, and R. Brightwell, On the Viability of Compression for Reducing the Overheads of Checkpoint/Restart-Based Fault Tolerance, 2012 41st International Conference on Parallel Processing, pp.148-157, 2012.
DOI : 10.1109/ICPP.2012.45

C. Li and W. Fuchs, CATCH-compiler-assisted techniques for checkpointing, [1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium, pp.74-81, 1990.
DOI : 10.1109/FTCS.1990.89337

J. S. Plank and K. Li, ickp: A Consistent Checkpointer for Multicomputers, " Parallel & Distributed Technology: Systems & Applications, pp.62-67, 1994.

J. S. Plank, J. Xu, and R. H. Netzer, Compressed Differences: An Algorithm for Fast Incremental Checkpointing, 1995.

T. Z. Islam, K. Mohror, S. Bagchi, A. Moody, B. D. Supinski et al., MCRENGINE: A Scalable Checkpointing System Using Data-Aware Aggregation and Compression, Proc. Supercomputing (SC'12, 2012.

N. Sasaki, K. Sato, T. Endo, and S. Matsuoka, Exploration of Lossy Compression for Application-Level Checkpoint/Restart, 2015 IEEE International Parallel and Distributed Processing Symposium, 2015.
DOI : 10.1109/IPDPS.2015.67

X. Ni, E. Meneses, N. Jain, and L. V. Kalé, ACR, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '13, 2014.
DOI : 10.1145/2503210.2503266

G. Aupy, A. Gainaru, and V. L. Fevre, Periodic I/O Scheduling for Super-computers, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01474553

S. Groot, K. Goda, D. Yokoyama, M. Nakano, and M. Kitsuregawa, Modeling I/O interference for data intensive distributed applications, Proceedings of the 28th Annual ACM Symposium on Applied Computing, SAC '13, pp.343-350, 2013.
DOI : 10.1145/2480362.2480434

H. Sim, Y. Kim, S. S. Vazhkudai, D. Tiwari, A. Anwar et al., AnalyzeThis, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '15, 2015.
DOI : 10.1109/MSST.2012.6232372

A. Gainaru, G. Aupy, A. Benoit, F. Cappello, Y. Robert et al., Scheduling the I/O of HPC Applications Under Congestion, 2015 IEEE International Parallel and Distributed Processing Symposium, 2016.
DOI : 10.1109/IPDPS.2015.116

URL : https://hal.archives-ouvertes.fr/hal-01251938

Z. Zhou, X. Yang, D. Zhao, P. Rich, W. Tang et al., I/O-Aware Batch Scheduling for Petascale Computing Systems, 2015 IEEE International Conference on Cluster Computing, pp.254-263, 2015.
DOI : 10.1109/CLUSTER.2015.45

S. Herbein, D. H. Ahn, D. Lipari, T. R. Scogland, M. Stearman et al., Scalable I/O-Aware Job Scheduling for Burst Buffer Enabled HPC Clusters, Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, HPDC '16, pp.70-79, 2017.
DOI : 10.1109/CLUSTER.2015.45

A. J. Oliner, L. Rudolph, and R. K. Sahoo, Cooperative checkpointing, Proceedings of the 20th annual international conference on Supercomputing , ICS '06, pp.14-23, 2006.
DOI : 10.1145/1183401.1183406

D. Tiwari, S. Gupta, and S. S. Vazhkudai, Lazy Checkpointing: Exploiting Temporal Locality in Failures to Mitigate Checkpointing Overheads on Extreme-Scale Systems, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pp.25-36, 2014.
DOI : 10.1109/DSN.2014.101

R. Inria, Inovallée 655 avenue de l'Europe Montbonnot 38334 Saint Ismier Cedex Publisher Inria Domaine de Voluceau -Rocquencourt BP 105 -78153 Le Chesnay Cedex inria, pp.249-6399