Skip to Main content Skip to Navigation
Conference papers

Which Verification for Soft Error Detection?

Abstract : Many methods are available to detect silent errors in high-performance computing (HPC) applications. Each comes with a given cost and recall (fraction of all errors that are actually detected). The main contribution of this paper is to characterize the optimal computational pattern for an application: which detector(s) to use, how many detectors of each type to use, together with the length of the work segment that precedes each of them. We conduct a comprehensive complexity analysis of this optimization problem, showing NP-completeness and designing an FPTAS (Fully Polynomial-Time Approximation Scheme). On the practical side, we provide a greedy algorithm whose performance is shown to be close to the optimal for a realistic set of evaluation scenarios.
Complete list of metadatas

Cited literature [33 references]  Display  Hide  Download

https://hal.inria.fr/hal-01252382
Contributor : Equipe Roma <>
Submitted on : Thursday, January 7, 2016 - 3:04:16 PM
Last modification on : Wednesday, February 26, 2020 - 11:14:31 AM
Long-term archiving on: : Friday, April 8, 2016 - 1:27:34 PM

File

hipc.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01252382, version 1

Collections

Citation

Leonardo Bautista-Gomez, Anne Benoit, Aurélien Cavelan, Saurabh K. Raina, Yves Robert, et al.. Which Verification for Soft Error Detection?. High Performance Computing 2015, Dec 2015, Bangalore, India. ⟨hal-01252382⟩

Share

Metrics

Record views

443

Files downloads

214