Skip to Main content Skip to Navigation
Reports

Repairing Multiple Failures with Coordinated and Adaptive Regenerating Codes

Abstract : Erasure correcting codes are widely used to ensure data persistence in distributed storage systems. This paper addresses the repair of such codes in the presence of simultaneous failures. It is crucial to maintain the required redundancy over time to prevent permanent data losses. We go beyond existing work (i.e., regenerating codes by Dimakis et al.) and propose coordinated regenerating codes allowing devices to coordinate during simultaneous repairs thus reducing the costs further. We provide closed form expressions of the communication costs of our new codes depending on the number of live devices and the number of devices being repaired. We prove that deliberately delaying repairs does not bring additional gains in itself. This means that regenerating codes are optimal as long as each failure can be repaired before a second one occurs. Yet, when multiple failures are detected simultaneously, we prove that our coordinated regenerating codes are optimal and outperform uncoordinated repairs (with respect to communication and storage costs). Finally, we define adaptive regenerating codes that self-adapt to the system state and prove they are optimal.
Complete list of metadata

Cited literature [29 references]  Display  Hide  Download

https://hal.inria.fr/inria-00516647
Contributor : Nicolas Le Scouarnec <>
Submitted on : Thursday, July 7, 2011 - 12:19:27 PM
Last modification on : Tuesday, June 15, 2021 - 4:13:54 PM
Long-term archiving on: : Sunday, December 4, 2016 - 1:48:24 PM

File

RR-7375.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : inria-00516647, version 4

Citation

Anne-Marie Kermarrec, Nicolas Le Scouarnec, Gilles Straub. Repairing Multiple Failures with Coordinated and Adaptive Regenerating Codes. [Research Report] RR-7375, INRIA. 2010. ⟨inria-00516647v4⟩

Share

Metrics

Record views

899

Files downloads

466