Rethinking High Performance Computing Platforms, Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing, DIDC '16, 2016. ,
DOI : 10.1109/PDP.2013.41
There goes the neighborhood, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '13, 2013. ,
DOI : 10.1145/2503210.2503247
A first order approximation to the optimum checkpoint interval, Communications of the ACM, vol.17, issue.9, pp.530-531, 1974. ,
DOI : 10.1145/361147.361115
A higher order estimate of the optimum checkpoint interval for restart dumps, Future Generation Computer Systems, vol.22, issue.3, pp.303-312, 2006. ,
DOI : 10.1016/j.future.2004.11.016
A Multiplatform Study of I/O Behavior on Petascale Supercomputers, Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, HPDC '15, 2015. ,
DOI : 10.1109/SC.2008.5222721
CALCioM: Mitigating I/O Interference in HPC Systems through Cross-Application Coordination, 2014 IEEE 28th International Parallel and Distributed Processing Symposium, pp.156-163, 2015. ,
DOI : 10.1109/IPDPS.2014.27
URL : https://hal.archives-ouvertes.fr/hal-00916091
Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System, Proc. Supercomputing (SC '10, 2010. ,
FTI, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, 2011. ,
DOI : 10.1145/2063384.2063427
URL : https://hal.archives-ouvertes.fr/hal-00721216
Convex Optimization, 2004. ,
Modeling and Analysis of Checkpoint I/O Operations, Analytical and Stochastic Modeling Techniques and Applications: 17th International Conference, pp.387-399, 2010. ,
DOI : 10.1145/361147.361115
Diskless checkpointing, IEEE Transactions on Parallel and Distributed Systems, vol.9, issue.10, pp.972-986, 1998. ,
DOI : 10.1109/71.730527
Efficient System-Level Remote Checkpointing Technique for BLCR, 2011 Eighth International Conference on Information Technology: New Generations, pp.1002-1007, 2011. ,
DOI : 10.1109/ITNG.2011.172
A case for two-level distributed recovery schemes, ACM SIGMETRICS Joint Int. Conf. on Measurement and Modeling of Computer Systems, 1995. ,
Leveraging burst buffer coordination to prevent I/O interference, 2016 IEEE 12th International Conference on e-Science (e-Science) ,
DOI : 10.1109/eScience.2016.7870922
PLFS, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, pp.1-2112, 2009. ,
DOI : 10.1145/1654059.1654081
Memory exclusion: optimizing the performance of checkpointing systems, Software: Practice and Experience, vol.43, issue.2, pp.125-142, 1999. ,
DOI : 10.1006/jpdc.1997.1338
Compiler-enhanced incremental checkpointing for OpenMP applications, IEEE International Symposium on Parallel&Distributed Processing, pp.1-12, 2009. ,
CLIP, Proceedings of the 1997 ACM/IEEE conference on Supercomputing (CDROM) , Supercomputing '97, 1997. ,
DOI : 10.1145/509593.509626
The performance of consistent checkpointing, [1992] Proceedings 11th Symposium on Reliable Distributed Systems, 1992. ,
DOI : 10.1109/RELDIS.1992.235144
Low-Latency, Concurrent Checkpointing for Parallel Programs, IEEE Transactions on Parallel and Distributed Systems, vol.5, issue.8, pp.874-879, 1994. ,
Libckpt: Transparent Checkpointing under Unix, USENIX Winter 1995 Technical Conference, pp.213-224, 1995. ,
INCREMENTAL CHECKPOINT SCHEMES FOR WEIBULL FAILURE DISTRIBUTION, International Journal of Foundations of Computer Science, vol.23, issue.03, pp.329-344, 2010. ,
DOI : 10.1109/12.936236
stdchk: A Checkpoint Storage System for Desktop Grid Computing, 2008 The 28th International Conference on Distributed Computing Systems, pp.613-624, 2008. ,
DOI : 10.1109/ICDCS.2008.19
Libhashckpt: Hashbased Incremental Checkpointing Using GPUs, Proceedings of the 18th EuroMPI Conference, 2011. ,
On the Viability of Compression for Reducing the Overheads of Checkpoint/Restart-Based Fault Tolerance, 2012 41st International Conference on Parallel Processing, pp.148-157, 2012. ,
DOI : 10.1109/ICPP.2012.45
CATCH-compiler-assisted techniques for checkpointing, [1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium, pp.74-81, 1990. ,
DOI : 10.1109/FTCS.1990.89337
ickp: A Consistent Checkpointer for Multicomputers, " Parallel & Distributed Technology: Systems & Applications, pp.62-67, 1994. ,
Compressed Differences: An Algorithm for Fast Incremental Checkpointing, 1995. ,
MCRENGINE: A Scalable Checkpointing System Using Data-Aware Aggregation and Compression, Proc. Supercomputing (SC'12, 2012. ,
Exploration of Lossy Compression for Application-Level Checkpoint/Restart, 2015 IEEE International Parallel and Distributed Processing Symposium, 2015. ,
DOI : 10.1109/IPDPS.2015.67
ACR, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '13, 2014. ,
DOI : 10.1145/2503210.2503266
Periodic I/O Scheduling for Super-computers, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01474553
Modeling I/O interference for data intensive distributed applications, Proceedings of the 28th Annual ACM Symposium on Applied Computing, SAC '13, pp.343-350, 2013. ,
DOI : 10.1145/2480362.2480434
AnalyzeThis, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '15, 2015. ,
DOI : 10.1109/MSST.2012.6232372
Scheduling the I/O of HPC Applications Under Congestion, 2015 IEEE International Parallel and Distributed Processing Symposium, 2016. ,
DOI : 10.1109/IPDPS.2015.116
URL : https://hal.archives-ouvertes.fr/hal-01251938
I/O-Aware Batch Scheduling for Petascale Computing Systems, 2015 IEEE International Conference on Cluster Computing, pp.254-263, 2015. ,
DOI : 10.1109/CLUSTER.2015.45
Scalable I/O-Aware Job Scheduling for Burst Buffer Enabled HPC Clusters, Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, HPDC '16, pp.70-79, 2017. ,
DOI : 10.1109/CLUSTER.2015.45
Cooperative checkpointing, Proceedings of the 20th annual international conference on Supercomputing , ICS '06, pp.14-23, 2006. ,
DOI : 10.1145/1183401.1183406
Lazy Checkpointing: Exploiting Temporal Locality in Failures to Mitigate Checkpointing Overheads on Extreme-Scale Systems, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pp.25-36, 2014. ,
DOI : 10.1109/DSN.2014.101
Inovallée 655 avenue de l'Europe Montbonnot 38334 Saint Ismier Cedex Publisher Inria Domaine de Voluceau -Rocquencourt BP 105 -78153 Le Chesnay Cedex inria, pp.249-6399 ,