Improving Performance via Mini-applications, Sandia National Laboratories Research Report, vol.5574, 2009. ,
SPEEDUP-AWARE CO-SCHEDULES FOR EFFICIENT WORKLOAD MANAGEMENT, Parallel Processing Letters, vol.23, issue.02, p.1340001, 2013. ,
DOI : 10.1142/S012962641340001X
Co-scheduling algorithms for high-throughput workload execution, Journal of Scheduling, vol.23, issue.2, 2015. ,
DOI : 10.1109/DATE.2012.6176641
URL : https://hal.archives-ouvertes.fr/hal-01252366
A survey of rollback-recovery protocols in message-passing systems, ACM Computing Surveys, vol.34, issue.3, pp.375-408, 2002. ,
DOI : 10.1145/568522.568525
A first order approximation to the optimum checkpoint interval, Communications of the ACM, vol.17, issue.9, pp.530-531, 1974. ,
DOI : 10.1145/361147.361115
A higher order estimate of the optimum checkpoint interval for restart dumps, Future Generation Computer Systems, vol.22, issue.3, pp.303-312, 2004. ,
DOI : 10.1016/j.future.2004.11.016
Approximation Algorithms for Scheduling Independent Malleable Tasks, Euro-Par 2001 Parallel Processing, ser ,
DOI : 10.1007/3-540-44681-8_29
The Implementation of the Cilk-5 Multithreaded Language, Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation, ser. PLDI '98, pp.212-223, 1998. ,
Resilient Application Co-scheduling with Processor Redistribution, 2016 45th International Conference on Parallel Processing (ICPP), 2015. ,
DOI : 10.1109/ICPP.2016.21
URL : https://hal.archives-ouvertes.fr/hal-01219258
Batch Resizing Policies and Techniques for Fine-Grain Grid Tasks: The Nuts and Bolts, Journal of Information Processing Systems, vol.7, issue.2, 2011. ,
DOI : 10.3745/JIPS.2011.7.2.299
Fault-Tolerance Techniques for High- Performance Computing, 2015. ,
DOI : 10.1007/978-3-319-20943-2
URL : https://hal.archives-ouvertes.fr/hal-01200488
Hiding Checkpoint Overhead in HPC Applications with a Semi-Blocking Algorithm, 2012 IEEE International Conference on Cluster Computing, pp.364-372, 2012. ,
DOI : 10.1109/CLUSTER.2012.82
Performance and reliability trade-offs for the double checkpointing algorithm, International Journal of Networking and Computing, vol.4, issue.1, pp.23-41, 2014. ,
DOI : 10.15803/ijnc.4.1_23
URL : https://hal.archives-ouvertes.fr/hal-01091928
A first order approximation to the optimum checkpoint interval, Communications of the ACM, vol.17, issue.9, pp.530-531, 1974. ,
DOI : 10.1145/361147.361115
Efficient collective communication in distributed heterogeneous systems, JPDC, vol.63, issue.3, pp.251-263, 2003. ,
Graph theory with applications, 1976. ,
DOI : 10.1007/978-1-349-03521-2
Computers and Intractability, A Guide to the Theory of NP-Completeness, 1979. ,
Multigraph realizations of degree sequences: Maximization is easy, minimization is hard, Operations Research Letters, vol.36, issue.5, pp.594-596, 2008. ,
DOI : 10.1016/j.orl.2008.05.004
Checkpointing strategies for parallel jobs, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, pp.1-11, 2011. ,
DOI : 10.1145/2063384.2063428
URL : https://hal.archives-ouvertes.fr/hal-00738504
Unified model for assessing checkpointing protocols at extreme-scale, Concurrency and Computation: Practice and Experience, pp.2772-2791, 2014. ,
DOI : 10.1002/cpe.3173
URL : https://hal.archives-ouvertes.fr/hal-00696154