Energy-aware checkpointing of divisible tasks with soft or hard deadlines, 2013 International Green Computing Conference Proceedings, 2013. ,
DOI : 10.1109/IGCC.2013.6604467
URL : https://hal.archives-ouvertes.fr/hal-00857244
Speed scaling to manage energy and temperature, Journal of the ACM, vol.54, issue.1, pp.1-3, 2007. ,
DOI : 10.1145/1206035.1206038
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.550.7426
A Different Re-execution Speed Can Help, 2016 45th International Conference on Parallel Processing Workshops (ICPPW) ,
DOI : 10.1109/ICPPW.2016.45
URL : https://hal.archives-ouvertes.fr/hal-01297125
Optimal Resilience Patterns to Cope with Fail-Stop and Silent Errors, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2016. ,
DOI : 10.1109/IPDPS.2016.39
URL : https://hal.archives-ouvertes.fr/hal-01354886
Efficient checkpoint/verification patterns, High Performance Computing Applications, 2015. ,
DOI : 10.1177/1094342015594531
URL : https://hal.archives-ouvertes.fr/ensl-01252342
Silent error detection in numerical time-stepping schemes, High Performance Computing Applications, 2014. ,
DOI : 10.1177/1094342014532297
Algorithm-based fault tolerance applied to high performance computing, Journal of Parallel and Distributed Computing, vol.69, issue.4, pp.410-416, 2009. ,
DOI : 10.1016/j.jpdc.2008.12.002
Toward Exascale Resilience, International Journal of High Performance Computing Applications, vol.23, issue.4, p.2014 ,
DOI : 10.1177/1094342009347767
Online-ABFT: An online algorithm based fault tolerance scheme for soft error detection in iterative methods, PPoPP, 2013. ,
A higher order estimate of the optimum checkpoint interval for restart dumps, Future Generation Computer Systems, vol.22, issue.3, pp.303-312, 2006. ,
DOI : 10.1016/j.future.2004.11.016
Detection and correction of silent data corruption for large-scale high-performance computing, Proc. SC'12, p.78, 2012. ,
The Use of Triple-Modular Redundancy to Improve Computer Reliability, IBM Journal of Research and Development, vol.6, issue.2, pp.200-209, 1962. ,
DOI : 10.1147/rd.62.0200
Design, modeling, and evaluation of a scalable multi-level checkpointing system, Proc. SC'10, pp.1-11, 2010. ,
DOI : 10.2172/984082
A cost model for selecting checkpoint positions in time warp parallel simulation, IEEE Transactions on Parallel and Distributed Systems, vol.12, issue.4, pp.346-362, 2001. ,
DOI : 10.1109/71.920586
Multiple Frequency Selection in DVFS-Enabled Processors to Minimize Energy Consumption, Energy- Efficient Distributed Computing Systems, 2012. ,
DOI : 10.1002/9781118342015.ch17
Self-stabilizing iterative solvers, Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, ScalA '13, 2013. ,
DOI : 10.1145/2530268.2530272
A scheduling model for reduced CPU energy, Proceedings of IEEE 36th Annual Foundations of Computer Science, 1995. ,
DOI : 10.1109/SFCS.1995.492493
A first order approximation to the optimum checkpoint interval, Communications of the ACM, vol.17, issue.9, pp.530-531, 1974. ,
DOI : 10.1145/361147.361115