A linear algebraic model of algorithm-based fault tolerance, IEEE Trans. Comput, vol.37, p.15991604, 1988. ,
A simple strategy for varying the restart parameter in GMRES(m), J. Comput. Appl. Math, vol.230, issue.2, p.751761, 2009. ,
Algorithm-based fault tolerance on a hypercube multiprocessor, IEEE Transactions on Computers, vol.39, issue.9, p.11321145, 1990. ,
DOI : 10.1109/12.57055
Algorithmic fault tolerance using the Lanczos method, SIAM J. Matrix Anal. Appl, vol.13, p.312332, 1992. ,
Checkpointing strategies for parallel jobs, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, 2011. ,
DOI : 10.1145/2063384.2063428
URL : https://hal.archives-ouvertes.fr/hal-00738504
Faulttolerant linear solvers via selective reliability, 1206. ,
Soft error vulnerability of iterative linear algebra methods, Proceedings of the 22nd annual international conference on Supercomputing , ICS '08, p.155164, 2008. ,
DOI : 10.1145/1375527.1375552
Automated application-level checkpointing of MPI programs, Proceedings of the ninth ACM SIG- PLAN symposium on Principles and practice of parallel programming, PPoPP '03 ,
C 3: A??System for Automating Application-Level Checkpointing of MPI Programs, LCPC'03, p.357373, 2003. ,
DOI : 10.1007/978-3-540-24644-2_23
Experimental evaluation of application-level checkpointing for OpenMP programs, Proceedings of the 20th annual international conference on Supercomputing , ICS '06, p.213, 2006. ,
DOI : 10.1145/1183401.1183405
Preventive migration vs. preventive checkpointing for extreme scale supercomputers. Parallel Processing Letters, p.111132, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00945068
Compiler-directed selective data protection against soft errors, Proceedings of the 2005 conference on Asia South Pacific design automation , ASP-DAC '05, p.713716, 2005. ,
DOI : 10.1145/1120725.1121000
Algorithm-based recovery for iterative methods without checkpointing, Proceedings of the 20th international symposium on High performance distributed computing, HPDC '11, p.7384, 2011. ,
DOI : 10.1145/1996130.1996142
Algorithm-based checkpoint-free fault tolerance for parallel matrix computations on volatile resources, Proceedings of the 20th international conference on Parallel and distributed processing, IPDPS'06, p.9797, 2006. ,
Fault tolerant linear algebra: Recovering from fail-stop failures without checkpointing, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), p.14, 2010. ,
DOI : 10.1109/IPDPSW.2010.5470775
Algorithm 832, ACM Transactions on Mathematical Software, vol.30, issue.2 ,
DOI : 10.1145/992200.992206
Algorithm 915, SuiteSparseQR, ACM Transactions on Mathematical Software, vol.38, issue.1, p.122, 2011. ,
DOI : 10.1145/2049662.2049670
Self-adapting numerical software (SANS) eort, IBM J. Res. Dev, vol.50, p.223238, 2006. ,
FT-MPI: Fault tolerant MPI, supporting dynamic applications in a dynamic world, Proceedings of the 7th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2000. ,
Algorithm-based fault tolerance for matrix operations, IEEE Trans. Comput, vol.33, p.518528, 1984. ,
Recovery Patterns for Iterative Methods in a Parallel Unstable Environment, SIAM Journal on Scientific Computing, vol.30, issue.1, p.102116, 2007. ,
DOI : 10.1137/040620394
An optimal checkpoint/restart model for a large scale high performance computing system, IEEE Trans. Comput, vol.33, p.19, 2008. ,
Analyzing the soft error resilience of linear solvers on multicore multiprocessors, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), p.112, 2010. ,
DOI : 10.1109/IPDPS.2010.5470411
Solution of sparse indenite systems of linear equations ,
Fault-Tolerant Matrix Operations for Networks of Workstations Using Diskless Checkpointing, Journal of Parallel and Distributed Computing, vol.43, issue.2, p.125138, 1997. ,
DOI : 10.1006/jpdc.1997.1336
Iterative Methods for Sparse Linear Systems, Society for Industrial and Applied Mathematics, 2003. ,
DOI : 10.1137/1.9780898718003