Skip to Main content Skip to Navigation
Reports

Which Verification for Soft Error Detection?

Abstract : Many methods are available to detect silent errors in high-performance computing (HPC) applications. Each comes with a given cost and recall (fraction of all errors that are actually detected). The main contribution of this paper is to show which detector(s) to use, and to characterize the optimal computational pattern for the application: how many detectors of each type to use, together with the length of the work segment that precedes each of them. We conduct a comprehensive complexity analysis of this optimization problem, showing NP-completeness and designing an FPTAS (Fully Polynomial-Time Approximation Scheme). On the practical side, we provide a greedy algorithm whose performance is shown to be close to the optimal for a realistic set of evaluation scenarios.
Complete list of metadatas

Cited literature [33 references]  Display  Hide  Download

https://hal.inria.fr/hal-01164445
Contributor : Equipe Roma <>
Submitted on : Monday, October 5, 2015 - 6:55:26 PM
Last modification on : Wednesday, February 26, 2020 - 11:14:25 AM
Long-term archiving on: : Wednesday, April 26, 2017 - 10:21:54 PM

File

RR-8741.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01164445, version 2

Collections

Citation

Leonardo Bautista-Gomez, Anne Benoit, Aurélien Cavelan, Saurabh K. Raina, Yves Robert, et al.. Which Verification for Soft Error Detection?. [Research Report] RR-8741, INRIA Grenoble; ENS Lyon; Jaypee Institute of Information Technology, India; Argonne National Laboratory; University of Tennessee Knoxville, USA; INRIA. 2015, pp.20. ⟨hal-01164445v2⟩

Share

Metrics

Record views

443

Files downloads

396