Scheduling Computational Workflows on Failure-Prone Platforms, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015. ,
DOI : 10.1109/IPDPSW.2015.33
URL : https://hal.archives-ouvertes.fr/hal-01075100
Detecting silent data corruption through data dynamic monitoring for scientific applications, SIGPLAN Notices, vol.49, issue.8, pp.381-382, 2014. ,
Detecting and correcting data corruption in stencil applications through multivariate interpolation, Proc.1st Int. Workshop on Fault Tolerant Systems (FTS), 2015. ,
FTI, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, 2011. ,
DOI : 10.1145/2063384.2063427
URL : https://hal.archives-ouvertes.fr/hal-00721216
Assessing general-purpose algorithms to cope with fail-stop and silent errors, Proc. 5th Int. Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01358146
Optimal Resilience Patterns to Cope with Fail-Stop and Silent Errors, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS) ,
DOI : 10.1109/IPDPS.2016.39
URL : https://hal.archives-ouvertes.fr/hal-01354886
Lightweight Silent Data Corruption Detection Based on Runtime Data Analysis for HPC Applications, Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, HPDC '15, 2015. ,
DOI : 10.1145/2749246.2749253
Distributed snapshots: determining global states of distributed systems, ACM Transactions on Computer Systems, vol.3, issue.1, pp.63-75, 1985. ,
DOI : 10.1145/214451.214456
Online-ABFT: An online algorithm based fault tolerance scheme for soft error detection in iterative methods, Proc. PPoPP, pp.167-176, 2013. ,
Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System, Proc. SC'10, 2010. ,
DOI : 10.2172/984082
The effect of cosmic rays on the soft error rate of a DRAM at ground level, IEEE Transactions on Electron Devices, vol.41, issue.4, pp.553-557, 1994. ,
DOI : 10.1109/16.278509
On the Optimum Checkpoint Selection Problem, SIAM Journal on Computing, vol.13, issue.3, 1984. ,
DOI : 10.1137/0213039
Cosmic ray soft error rates of 16-Mb DRAM memory chips, IEEE Journal of Solid-State Circuits, vol.33, issue.2, pp.246-252, 1998. ,
DOI : 10.1109/4.658626
IBM experiments in soft fails in computer electronics, IBM J. Res. Dev, vol.40, issue.1, pp.3-18, 1996. ,