G. Aupy, A. Benoit, R. Melhem, P. Renaud-goud, and Y. Robert, Energy-aware checkpointing of divisible tasks with soft or hard deadlines, 2013 International Green Computing Conference Proceedings, 2013.
DOI : 10.1109/IGCC.2013.6604467

URL : https://hal.archives-ouvertes.fr/hal-00857244

N. Bansal, T. Kimbrel, and K. Pruhs, Speed scaling to manage energy and temperature, Journal of the ACM, vol.54, issue.1, pp.1-3, 2007.
DOI : 10.1145/1206035.1206038

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.550.7426

A. Benoit, A. Cavelan, V. Le-f-`-evre, Y. Robert, and H. Sun, A Different Re-execution Speed Can Help, 2016 45th International Conference on Parallel Processing Workshops (ICPPW)
DOI : 10.1109/ICPPW.2016.45

URL : https://hal.archives-ouvertes.fr/hal-01297125

A. Benoit, A. Cavelan, Y. Robert, and H. Sun, Optimal Resilience Patterns to Cope with Fail-Stop and Silent Errors, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2016.
DOI : 10.1109/IPDPS.2016.39

URL : https://hal.archives-ouvertes.fr/hal-01354886

A. Benoit, Y. Robert, and S. K. Raina, Efficient checkpoint/verification patterns, High Performance Computing Applications, 2015.
DOI : 10.1177/1094342015594531

URL : https://hal.archives-ouvertes.fr/ensl-01252342

A. R. Benson, S. Schmit, and R. Schreiber, Silent error detection in numerical time-stepping schemes, High Performance Computing Applications, 2014.
DOI : 10.1177/1094342014532297

G. Bosilca, R. Delmas, J. Dongarra, and J. Langou, Algorithm-based fault tolerance applied to high performance computing, Journal of Parallel and Distributed Computing, vol.69, issue.4, pp.410-416, 2009.
DOI : 10.1016/j.jpdc.2008.12.002

F. Cappello, A. Geist, W. Gropp, S. Kale, B. Kramer et al., Toward Exascale Resilience, International Journal of High Performance Computing Applications, vol.23, issue.4, p.2014
DOI : 10.1177/1094342009347767

Z. Chen, Online-ABFT: An online algorithm based fault tolerance scheme for soft error detection in iterative methods, PPoPP, 2013.

J. T. Daly, A higher order estimate of the optimum checkpoint interval for restart dumps, Future Generation Computer Systems, vol.22, issue.3, pp.303-312, 2006.
DOI : 10.1016/j.future.2004.11.016

D. Fiala, F. Mueller, C. Engelmann, R. Riesen, K. Ferreira et al., Detection and correction of silent data corruption for large-scale high-performance computing, Proc. SC'12, p.78, 2012.

R. E. Lyons and W. Vanderkulk, The Use of Triple-Modular Redundancy to Improve Computer Reliability, IBM Journal of Research and Development, vol.6, issue.2, pp.200-209, 1962.
DOI : 10.1147/rd.62.0200

A. Moody, G. Bronevetsky, K. Mohror, and B. R. Supinski, Design, modeling, and evaluation of a scalable multi-level checkpointing system, Proc. SC'10, pp.1-11, 2010.
DOI : 10.2172/984082

F. Quaglia, A cost model for selecting checkpoint positions in time warp parallel simulation, IEEE Transactions on Parallel and Distributed Systems, vol.12, issue.4, pp.346-362, 2001.
DOI : 10.1109/71.920586

N. B. Rizvandi, A. Y. Zomaya, Y. C. Lee, A. J. Boloori, and J. Taheri, Multiple Frequency Selection in DVFS-Enabled Processors to Minimize Energy Consumption, Energy- Efficient Distributed Computing Systems, 2012.
DOI : 10.1002/9781118342015.ch17

P. Sao and R. Vuduc, Self-stabilizing iterative solvers, Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, ScalA '13, 2013.
DOI : 10.1145/2530268.2530272

F. Yao, A. Demers, and S. Shenker, A scheduling model for reduced CPU energy, Proceedings of IEEE 36th Annual Foundations of Computer Science, 1995.
DOI : 10.1109/SFCS.1995.492493

J. W. Young, A first order approximation to the optimum checkpoint interval, Communications of the ACM, vol.17, issue.9, pp.530-531, 1974.
DOI : 10.1145/361147.361115