HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Reports

Assessing the Impact of Partial Verifications Against Silent Data Corruptions

Abstract : Silent errors, or silent data corruptions, constitute a major threat on very large scale platforms. When a silent error strikes, it is not detected immediately but only after some delay, which prevents the use of pure periodic checkpointing approaches devised for fail-stop errors. Instead, checkpointing must be coupled with some verification mechanism to guarantee that corrupted data will never be written into the checkpoint file. Such a guaranteed verification mechanism typically incurs a high cost. In this paper, we investigate the use of partial verification mechanisms in addition to a guaranteed verification. The main objective is to investigate to which extent it is worthwhile to use some light-cost but less precise verification type in the middle of a periodic computing pattern, which ends with a guaranteed verification right before each checkpoint. Introducing partial verifications dramatically complicates the analysis, but we are able to analytically determine the optimal computing pattern (up to first-order approximation), including the optimal length of the pattern, the optimal number of partial verifications, as well as their optimal positions inside the pattern. Simulations based on a wide range of parameters confirm the benefits of partial verifications in certain scenarios, when compared to the baseline algorithm that uses only guaranteed verifications.
Complete list of metadata

Cited literature [28 references]  Display  Hide  Download

https://hal.inria.fr/hal-01143832
Contributor : Equipe Roma Connect in order to contact the contributor
Submitted on : Wednesday, June 17, 2015 - 2:00:58 PM
Last modification on : Monday, May 16, 2022 - 4:46:02 PM
Long-term archiving on: : Tuesday, April 25, 2017 - 11:15:13 AM

File

RR-8711.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01143832, version 2

Collections

Citation

Aurélien Cavelan, Saurabh K. Raina, Yves Robert, Hongyang Sun. Assessing the Impact of Partial Verifications Against Silent Data Corruptions. [Research Report] RR-8711, INRIA Grenoble - Rhône-Alpes; ENS Lyon; Université Lyon 1; Jaypee Institute of Information Technology, India; CNRS - Lyon (69); University of Tennessee Knoxville, USA; INRIA. 2015. ⟨hal-01143832v2⟩

Share

Metrics

Record views

122

Files downloads

145