C. Anfinson and F. Luk, A linear algebraic model of algorithm-based fault tolerance, IEEE Transactions on Computers, vol.37, issue.12, pp.1599-1604, 1988.
DOI : 10.1109/12.9736

A. Benoit, A. Cavelan, Y. Robert, and H. Sun, Assessing general-purpose algorithms to cope with fail-stop and silent errors, Workshop on Performance Modeling, Benchmarking and Simulation ( PMBS), 2014.
URL : https://hal.archives-ouvertes.fr/hal-01358146

A. R. Benson, S. Schmit, and R. Schreiber, Silent error detection in numerical time-stepping schemes, International Journal of High Performance Computing Applications, vol.29, issue.4, 1312.
DOI : 10.1177/1094342014532297

G. Bosilca, R. Delmas, J. Dongarra, and J. Langou, Algorithm-based fault tolerance applied to high performance computing, Journal of Parallel and Distributed Computing, vol.69, issue.4, pp.410-416, 2009.
DOI : 10.1016/j.jpdc.2008.12.002

M. Bougeret, H. Casanova, M. Rabie, Y. Robert, and F. Vivien, Checkpointing strategies for parallel jobs, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, pp.1-11, 2011.
DOI : 10.1145/2063384.2063428

URL : https://hal.archives-ouvertes.fr/hal-00738504

P. G. Bridges, K. B. Ferreira, M. A. Heroux, and M. Hoemmen, Fault-tolerant linear solvers via selective reliability. preprint, 2012.

G. Bronevetsky and B. De-supinski, Soft error vulnerability of iterative linear algebra methods, Proceedings of the 22nd annual international conference on Supercomputing , ICS '08, pp.155-164, 2008.
DOI : 10.1145/1375527.1375552

Z. Chen, Online-ABFT: An Online Algorithm Based Fault Tolerance Scheme for Soft Error Detection in Iterative Methods, Proc. 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '13, pp.167-176, 2013.

J. T. Daly, A higher order estimate of the optimum checkpoint interval for restart dumps, Future Generation Computer Systems, vol.22, issue.3, pp.303-312, 2004.
DOI : 10.1016/j.future.2004.11.016

T. A. Davis and Y. Hu, The university of Florida sparse matrix collection, ACM Transactions on Mathematical Software, vol.38, issue.1, pp.1-125, 2011.
DOI : 10.1145/2049662.2049663

P. Du, A. Bouteiller, G. Bosilca, T. Herault, and J. Dongarra, Algorithm-based fault tolerance for dense matrix factorizations, PPoPP, pp.225-234, 2012.

I. S. Duff, R. G. Grimes, and J. G. Lewis, Sparse matrix test problems, ACM Transactions on Mathematical Software, vol.15, issue.1, pp.1-14, 1989.
DOI : 10.1145/62038.62043

J. Elliott, K. Kharbas, D. Fiala, F. Mueller, K. Ferreira et al., Combining Partial Redundancy and Checkpointing for HPC, 2012 IEEE 32nd International Conference on Distributed Computing Systems, 2012.
DOI : 10.1109/ICDCS.2012.56

J. Elliott, F. Mueller, M. Stoyanov, and C. Webster, Quantifying the impact of single bit flips on floating point arithmetic. preprint, 2013.

D. Fiala, F. Mueller, C. Engelmann, R. Riesen, K. Ferreira et al., Detection and correction of silent data corruption for large-scale high-performance computing, Proc. of the ACM/IEEE SC Int. Conf., SC '12, 2012.

M. Heroux and M. Hoemmen, Fault-tolerant iterative methods via selective reliability, Research report SAND2011-3915 C, Sandia National Laboratories, 2011.

N. J. Higham, Accuracy and Stability of Numerical Algorithms, 2002.
DOI : 10.1137/1.9780898718027

N. J. Higham, Functions of Matrices: Theory and Computation, 2008.
DOI : 10.1137/1.9780898717778

M. Hoemmen and M. A. Heroux, Fault-tolerant iterative methods via selective reliability, 2011.

M. Hoemmen and M. A. Heroux, Fault-Tolerant Iterative Methods via Selective Reliability, Proceedings of the 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC), p.9, 2011.

K. Huang and J. A. Abraham, Algorithm-Based Fault Tolerance for, Matrix Operations. Computers, IEEE Transactions, vol.33, issue.6, pp.518-528, 1984.

A. A. Hwang, I. A. Stefanovici, and B. Schroeder, Cosmic rays don't strike twice: understanding the nature of dram errors and the implications for system design

G. Lu, Z. Zheng, and A. A. Chien, When is multi-version checkpointing needed?, Proceedings of the 3rd Workshop on Fault-tolerance for HPC at extreme scale, FTXS '13, 2013.
DOI : 10.1145/2465813.2465821

R. E. Lyons and W. Vanderkulk, The Use of Triple-Modular Redundancy to Improve Computer Reliability, IBM Journal of Research and Development, vol.6, issue.2, pp.200-209, 1962.
DOI : 10.1147/rd.62.0200

M. Mitzenmacher and E. Upfal, Probability and Computing: Randomized Algorithms and Probabilistic Analysis, 2005.
DOI : 10.1017/CBO9780511813603

A. Moody, G. Bronevetsky, K. Mohror, and B. R. Supinski, Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1-11, 2010.
DOI : 10.1109/SC.2010.18

Y. Saad, Iterative Methods for Sparse Linear Systems, 2003.
DOI : 10.1137/1.9780898718003

P. Sao and R. Vuduc, Self-stabilizing iterative solvers, Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, ScalA '13, 2013.
DOI : 10.1145/2530268.2530272

M. Shantharam, S. Srinivasmurthy, and P. Raghavan, Fault tolerant preconditioned conjugate gradient for sparse linear system solution, Proceedings of the 26th ACM international conference on Supercomputing, ICS '12, 2012.
DOI : 10.1145/2304576.2304588

J. Sloan, R. Kumar, and G. Bronevetsky, Algorithmic approaches to low overhead fault detection for sparse linear algebra, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012), pp.1-12, 2012.
DOI : 10.1109/DSN.2012.6263938

M. Stoyanov and C. Webster, Quantifying the impact of single bit flips on floating point arithmetic, 2013.

E. W. Weisstein, Laplacian matrix. From MathWorld?A Wolfram Web Resource, 2014.

J. W. Young, A first order approximation to the optimum checkpoint interval, Communications of the ACM, vol.17, issue.9, pp.530-531, 1974.
DOI : 10.1145/361147.361115