C. Anfinson and F. Luk, A linear algebraic model of algorithm-based fault tolerance, IEEE Transactions on Computers, vol.37, issue.12, pp.1599-1604, 1988.
DOI : 10.1109/12.9736

G. Aupy, Y. Robert, F. Vivien, and D. Zaidouni, Checkpointing algorithms and fault prediction, Journal of Parallel and Distributed Computing, vol.74, issue.2, pp.2048-2064, 2014.
DOI : 10.1016/j.jpdc.2013.10.010

URL : https://hal.archives-ouvertes.fr/hal-00788313

A. Benoit, A. Cavelan, Y. Robert, and H. Sun, Assessing generalpurpose algorithms to cope with fail-stop and silent errors, Workshop on Performance Modeling, Benchmarking and Simulation (PMBS), 2014.
URL : https://hal.archives-ouvertes.fr/hal-01358146

A. R. Benson, S. Schmit, and R. Schreiber, Silent error detection in numerical time-stepping schemes, International Journal of High Performance Computing Applications, vol.29, issue.4, 1312.
DOI : 10.1177/1094342014532297

G. Bosilca, R. Delmas, J. Dongarra, and J. Langou, Algorithm-based fault tolerance applied to high performance computing, Journal of Parallel and Distributed Computing, vol.69, issue.4, pp.410-416, 2009.
DOI : 10.1016/j.jpdc.2008.12.002

M. Bougeret, H. Casanova, M. Rabie, Y. Robert, and F. Vivien, Checkpointing strategies for parallel jobs, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, pp.1-11, 2011.
DOI : 10.1145/2063384.2063428

URL : https://hal.archives-ouvertes.fr/hal-00738504

P. G. Bridges, K. B. Ferreira, M. A. Heroux, and M. Hoemmen, Faulttolerant linear solvers via selective reliability, p.2012

G. Bronevetsky and B. De-supinski, Soft error vulnerability of iterative linear algebra methods, Proceedings of the 22nd annual international conference on Supercomputing , ICS '08, pp.155-164, 2008.
DOI : 10.1145/1375527.1375552

Z. Chen, Online-ABFT: An Online Algorithm Based Fault Tolerance Scheme for Soft Error Detection in Iterative Methods, Proc. 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ser. PPoPP '13, pp.167-176, 2013.

J. T. Daly-]-t, Y. Davis, and . Hu, A higher order estimate of the optimum checkpoint interval for restart dumps The University of Florida Sparse Matrix Collection, FGCS ACM Trans. Math. Softw, vol.22, issue.38 1 1, pp.303-3121, 2004.

P. Du, A. Bouteiller, G. Bosilca, T. Herault, and J. D. Ppopp, Algorithmbased fault tolerance for dense matrix factorizations, pp.225-234, 2012.

J. Elliott, F. Mueller, M. Stoyanov, and C. Webster, Quantifying the impact of single bit flips on floating point arithmetic, 2013.
DOI : 10.2172/1089338

J. Elliott, K. Kharbas, D. Fiala, F. Mueller, K. Ferreira et al., Combining Partial Redundancy and Checkpointing for HPC, 2012 IEEE 32nd International Conference on Distributed Computing Systems, 2012.
DOI : 10.1109/ICDCS.2012.56

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.228.2542

M. Fasi, Y. Robert, and B. Uçar, Combining Algorithm-based Fault Tolerance and Checkpointing for Iterative Solvers Available: https, 2015.

D. Fiala, F. Mueller, C. Engelmann, R. Riesen, K. Ferreira et al., Detection and correction of silent data corruption for large-scale high-performance computing, Proc. of the ACM/IEEE SC Int. Conf., ser. SC '12, 2012.

M. Heroux and M. Hoemmen, Fault-tolerant iterative methods via selective reliability, Sandia National Laboratories, 2011.

N. J. Higham, Accuracy and Stability of Numerical Algorithms, 2002.
DOI : 10.1137/1.9780898718027

M. Hoemmen and M. A. Heroux, Fault-tolerant iterative methods via selective reliability, Sandia Corporation, Tech. Rep, 2011.

K. Huang and J. A. Abraham, Algorithm-Based Fault Tolerance for Matrix Operations, Computers, IEEE Transactions, vol.33, issue.6, pp.518-528, 1984.

A. A. Hwang, I. A. Stefanovici, and B. Schroeder, Cosmic rays don't strike twice, ACM SIGARCH Computer Architecture News, vol.40, issue.1, pp.111-122, 2012.
DOI : 10.1145/2189750.2150989

K. Kaya, B. Uçar, and U. V. , Analysis of Partitioning Models and Metrics in Parallel Sparse Matrix-Vector Multiplication, Parallel Processing and Applied Mathematics, pp.174-184, 2014.
DOI : 10.1007/978-3-642-55195-6_16

URL : https://hal.archives-ouvertes.fr/hal-00821523

G. Lu, Z. Zheng, and A. A. Chien, When is multi-version checkpointing needed?, Proceedings of the 3rd Workshop on Fault-tolerance for HPC at extreme scale, FTXS '13
DOI : 10.1145/2465813.2465821

R. E. Lyons and W. Vanderkulk, The Use of Triple-Modular Redundancy to Improve Computer Reliability, IBM Journal of Research and Development, vol.6, issue.2, pp.200-209, 1962.
DOI : 10.1147/rd.62.0200

M. Mitzenmacher and E. Upfal, Probability and Computing: Randomized Algorithms and Probabilistic Analysis, 2005.
DOI : 10.1017/CBO9780511813603

A. Moody, G. Bronevetsky, K. Mohror, and B. R. De-supinski, Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System, Proc. of the ACM, pp.1-11, 2010.

Y. Saad, Iterative Methods for Sparse Linear Systems, 2003.
DOI : 10.1137/1.9780898718003

P. Sao and R. Vuduc, Self-stabilizing iterative solvers, Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, ScalA '13, 2013.
DOI : 10.1145/2530268.2530272

M. Shantharam, S. Srinivasmurthy, and P. Raghavan, Fault tolerant preconditioned conjugate gradient for sparse linear system solution, Proceedings of the 26th ACM international conference on Supercomputing, ICS '12, 2012.
DOI : 10.1145/2304576.2304588

J. Sloan, R. Kumar, and G. Bronevetsky, Algorithmic approaches to low overhead fault detection for sparse linear algebra, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012), pp.1-12, 2012.
DOI : 10.1109/DSN.2012.6263938

M. Stoyanov and C. Webster, Quantifying the impact of single bit flips on floating point arithmetic, Oak Ridge National Laboratory, Tech. Rep, 2013.

E. W. Mathworld and ?. Resource, Laplacian matrix, 2014.

J. W. Young, A first order approximation to the optimum checkpoint interval, Communications of the ACM, vol.17, issue.9, pp.530-531, 1974.
DOI : 10.1145/361147.361115