Reliability of Checksum based Detection for Soft Errors in Conjugate Gradient Variants - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2015

Reliability of Checksum based Detection for Soft Errors in Conjugate Gradient Variants

Résumé

Soft errors that are not detected by hardware mechanisms may be extremely complex to detect at the software layer. One option is to perform a full duplication of the computation (and data) and check on a regular basis that intermediate results are consistent. However, this mechanism may be prohibitive. In the context of CG solver, the most prohibitive operation to duplicate is SpMV. To avoid the duplication of this operation, checksum mechanisms may be employed. In this presentation, we investigate the reliability of such an approach in finite precision arithmetic. We illustrate our discussion with the CGPOP code, a miniapp for performing the CG within the Parallel Ocean Program (POP), which is a candidate for exascale climate simulations.
Fichier non déposé

Dates et versions

hal-01200706 , version 1 (17-09-2015)

Identifiants

  • HAL Id : hal-01200706 , version 1

Citer

Emmanuel Agullo, Luc Giraud, Emrullah Fatih Yetkin. Reliability of Checksum based Detection for Soft Errors in Conjugate Gradient Variants. SIAM Conference on Computational Science and Engineering (SIAM CSE 2015), Mar 2015, Salt Lake city, Utah, United States. ⟨hal-01200706⟩
199 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More