On soft errors in the conjugate gradient method: sensitivity and robust numerical detection - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Article Dans Une Revue SIAM Journal on Scientific Computing Année : 2020

On soft errors in the conjugate gradient method: sensitivity and robust numerical detection

Résumé

The conjugate gradient (CG) method is the most widely used iterative scheme for the solution of large sparse systems of linear equations when the matrix is symmetric positive definite. Although more than 60 years old, it is still a serious candidate for extreme-scale computations on large computing platforms. On the technological side, the continuous shrinking of transistor geometry and the increasing complexity of these devices affect dramatically their sensitivity to natural radiation and thus diminish their reliability. One of the most common effects produced by natural radiation is the single event upset which consists in a bit-flip in a memory cell producing unexpected results at the application level. Consequently, future extreme-scale computing facilities will be more prone to errors of any kind, including bit-flips, during their calculations. These numerical and technological observations are the main motivations for this work, where we first investigate through extensive numerical experiments the sensitivity of CG to bit-flips in its main computationally intensive kernels, namely the matrix-vector product and the preconditioner application. We further propose numerical criteria to detect the occurrence of such soft errors and assess their robustness through extensive numerical experiments.
Fichier principal
Vignette du fichier
siam_paper_final.pdf (854.08 Ko) Télécharger le fichier

Dates et versions

hal-03022845 , version 1 (25-11-2020)

Identifiants

Citer

Emmanuel Agullo, Siegfried Cools, Emrullah Fatih-Yetkin, Luc Giraud, Nick Schenkels, et al.. On soft errors in the conjugate gradient method: sensitivity and robust numerical detection. SIAM Journal on Scientific Computing, 2020, 42 (6), ⟨10.1137/18M122858X⟩. ⟨hal-03022845⟩
105 Consultations
210 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More