Tradeoff exploration between reliability, power consumption, and execution time for embedded systems, International Journal on Software Tools for Technology Transfer, vol.7, issue.3, pp.229-245, 2013. ,
DOI : 10.1007/s10009-012-0263-9
URL : https://hal.archives-ouvertes.fr/hal-00923926
Frédéric Vivien, and Dounia Zaidouni. 2013. On the combination of silent error detection and checkpointing, Proceedings of the 19th IEEE Pacific Rim International Symposium on Dependable Computing (PRDC, pp.11-20 ,
Energy-aware scheduling under reliability and makespan constraints, 2012 19th International Conference on High Performance Computing, pp.1-10, 2012. ,
DOI : 10.1109/HiPC.2012.6507482
URL : https://hal.archives-ouvertes.fr/hal-00763384
Speed scaling to manage energy and temperature, Journal of the ACM, vol.54, issue.1, pp.1-3, 2007. ,
DOI : 10.1145/1206035.1206038
Silent error detection in numerical timestepping schemes. The International Journal of High Performance Computing Applications DOI: 10, p.1094342014532297, 1177. ,
Algorithm-based fault tolerance applied to high performance computing, Journal of Parallel and Distributed Computing, vol.69, issue.4, pp.410-416, 2009. ,
DOI : 10.1016/j.jpdc.2008.12.002
Checkpointing strategies for parallel jobs, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, pp.1-11, 2011. ,
DOI : 10.1145/2063384.2063428
URL : https://hal.archives-ouvertes.fr/hal-00738504
Soft error vulnerability of iterative linear algebra methods, Proceedings of the 22nd annual international conference on Supercomputing , ICS '08, pp.155-164, 2008. ,
DOI : 10.1145/1375527.1375552
Distributed snapshots: determining global states of distributed systems, ACM Transactions on Computer Systems, vol.3, issue.1, pp.63-75, 1985. ,
DOI : 10.1145/214451.214456
Online-ABFT: An online algorithm based fault tolerance scheme for soft error detection in iterative methods, Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pp.167-176, 2013. ,
A higher order estimate of the optimum checkpoint interval for restart dumps, Future Generation Computer Systems, vol.22, issue.3, pp.303-312, 2006. ,
DOI : 10.1016/j.future.2004.11.016
Combined DVFS and Mapping Exploration for Lifetime and Soft-error Susceptibility Improvement in MPSoCs, Proceedings of the Conference on Design, pp.1-61, 2014. ,
The impact of new technology on soft error rates, 2011 International Reliability Physics Symposium, pp.1-5, 2011. ,
DOI : 10.1109/IRPS.2011.5784522
Temperature Management in Data Centers: Why Some (Might) Like It Hot. SIGMETRICS Perform, Eval. Rev, vol.40, pp.1-163, 2012. ,
Combining Partial Redundancy and Checkpointing for HPC, 2012 IEEE 32nd International Conference on Distributed Computing Systems, pp.615-626, 2012. ,
DOI : 10.1109/ICDCS.2012.56
A survey of rollbackrecovery protocols in message-passing systems, ACM Computing Survey, vol.34, pp.375-408, 2002. ,
Making a Case for Efficient Supercomputing, pp.54-64, 2003. ,
Detection and correction of silent data corruption for large-scale high-performance computing, Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC). 78, 2012. ,
Fault-tolerant iterative methods via selective reliability, 2011. ,
A Power-Aware Run-Time System for High-Performance Computing, Proceedings of the ACM, pp.1-9, 2005. ,
Algorithm-Based Fault Tolerance for Matrix Operations, IEEE Trans. Comput, vol.33, issue.6, pp.518-528, 1984. ,
Cosmic rays don't strike twice, ACM SIGARCH Computer Architecture News, vol.40, issue.1, pp.111-122, 2012. ,
DOI : 10.1145/2189750.2150989
When is multi-version checkpointing needed?, Proceedings of the 3rd Workshop on Fault-tolerance for HPC at extreme scale, FTXS '13, pp.49-56, 2013. ,
DOI : 10.1145/2465813.2465821
The Use of Triple-Modular Redundancy to Improve Computer Reliability, IBM Journal of Research and Development, vol.6, issue.2, pp.200-209, 1962. ,
DOI : 10.1147/rd.62.0200
Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System, Proc. of the ACM, pp.1-11, 2010. ,
ACR, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '13, 2013. ,
DOI : 10.1145/2503210.2503266
The effect of cosmic rays on the soft error rate of a DRAM at ground level, IEEE Transactions on Electron Devices, vol.41, issue.4, pp.553-557, 1994. ,
DOI : 10.1109/16.278509
Distribution-Free Checkpoint Placement Algorithms Based on Min-Max Principle, IEEE Transactions on Dependable and Secure Computing, vol.3, issue.2, pp.130-140, 2006. ,
DOI : 10.1109/TDSC.2006.22
The effect of data center temperature on energy efficiency, 2008 11th Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems, pp.1167-1174, 2008. ,
DOI : 10.1109/ITHERM.2008.4544393
Multiple Frequency Selection in DVFS-Enabled Processors to Minimize Energy Consumption, Energy-Efficient Distributed Computing Systems, 2012. ,
Self-stabilizing iterative solvers, Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, ScalA '13, 2013. ,
DOI : 10.1145/2530268.2530272
A 'cool' way of improving the reliability of HPC machines, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '13, pp.1-5812, 2013. ,
DOI : 10.1145/2503210.2503228
Fault tolerant preconditioned conjugate gradient for sparse linear system solution, Proceedings of the 26th ACM international conference on Supercomputing, ICS '12, pp.69-78, 2012. ,
DOI : 10.1145/2304576.2304588
On the Optimum Checkpoint Selection Problem, SIAM Journal on Computing, vol.13, issue.3, pp.630-649, 1984. ,
DOI : 10.1137/0213039
A scheduling model for reduced CPU energy, Proceedings of IEEE 36th Annual Foundations of Computer Science, p.374, 1995. ,
DOI : 10.1109/SFCS.1995.492493
A first order approximation to the optimum checkpoint interval, Communications of the ACM, vol.17, issue.9, pp.530-531, 1974. ,
DOI : 10.1145/361147.361115
Reliability-aware Dynamic Voltage Scaling for energy-constrained real-time embedded systems, 2008 IEEE International Conference on Computer Design, pp.633-639, 2008. ,
DOI : 10.1109/ICCD.2008.4751927
The Effects of Energy Management on Reliability in Realtime Embedded Systems, Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD, pp.35-40, 2004. ,
Accelerated testing for cosmic soft-error rate, IBM Journal of Research and Development, vol.40, issue.1, pp.1-51, 1996. ,
DOI : 10.1147/rd.401.0051
Cosmic ray soft error rates of 16-Mb DRAM memory chips, IEEE Journal of Solid-State Circuits, vol.33, issue.2, pp.246-252, 1998. ,
DOI : 10.1109/4.658626