Repair Time in Distributed Storage Systems

Frédéric Giroire; Sandeep Kumar Gupta; Remigiusz Modrzejewski; Julian Monteiro; Stéphane Perennes

doi:10.1007/978-3-642-40053-7_9

Communication Dans Un Congrès Année : 2013

Repair Time in Distributed Storage Systems

(1) , (2) , (1) , (3) , (1)

1
2
3

Frédéric Giroire

Fonction : Auteur
PersonId : 5597
IdHAL : frederic-giroire
ORCID : 0000-0002-3727-051X
IdRef : 11611598X

Combinatorics, Optimization and Algorithms for Telecommunications

Sandeep Kumar Gupta

Fonction : Auteur
PersonId : 891944

Indian Institute of Technology Delhi

Remigiusz Modrzejewski

Fonction : Auteur

Combinatorics, Optimization and Algorithms for Telecommunications

Julian Monteiro

Fonction : Auteur
PersonId : 884907

Instituto de Astronomia, Geofísica e Ciências Atmosféricas [São Paulo]

Stéphane Perennes

Fonction : Auteur

Combinatorics, Optimization and Algorithms for Telecommunications

Résumé

In this paper, we analyze a highly distributed backup stor- age system realized by means of nano datacenters (NaDa). NaDa have been recently proposed as a way to mitigate the growing energy, band- width and device costs of traditional data centers, following the popu- larity of cloud computing. These service provider-controlled peer-to-peer systems take advantage of resources already committed to always-on set top boxes, the fact they do not generate heat dissipation costs and their proximity to users. In this kind of systems redundancy is introduced to preserve the data in case of peer failures or departures. To ensure long-term fault tolerance, the storage system must have a self-repairing service that continuously reconstructs the fragments of redundancy that are lost. The speed of this reconstruction process is crucial for the data survival. This speed is mainly determined by how much bandwidth, which is a critical re- source of such systems, is available. In the literature, the reconstruc- tion times are modeled as independent (e.g., poissonian, deterministic, or more generally following any distribution). In practice, however, nu- merous reconstructions start at the same time (when the system detects that a peer has failed). Consequently, they are correlated to each other because concurrent reconstructions do compete for the same bandwidth. This correlation negatively impacts the efficiency of the bandwidth uti- lization and henceforth the repair time. We propose a new analytical framework that takes into account this correlation when estimating the repair time and the probability of data loss. Mainly, we introduce a queuing model in which reconstructions are served by peers at a rate that depends on the available bandwidth. We show that the load is unbalanced among peers (young peers inherently store less data than the old ones). This leads us to introduce a correcting factor on the repair rate of the system. The models and schemes proposed are validated by mathematical analysis, extensive set of simulations, and experimentation using the GRID5000 test-bed platform. This new model allows system designers to operate a more accurate choice of system parameters in function of their targeted data durability.

Domaines

Algorithme et structure de données [cs.DS]

Fichier principal

globe-preprint.pdf (591.78 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Frédéric Giroire : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00866058

Soumis le : mercredi 25 septembre 2013-17:23:42

Dernière modification le : lundi 26 février 2024-11:22:11

Archivage à long terme le : vendredi 7 avril 2017-02:46:28

Dates et versions

hal-00866058 , version 1 (25-09-2013)

Identifiants

HAL Id : hal-00866058 , version 1
DOI : 10.1007/978-3-642-40053-7_9

Citer

Frédéric Giroire, Sandeep Kumar Gupta, Remigiusz Modrzejewski, Julian Monteiro, Stéphane Perennes. Repair Time in Distributed Storage Systems. 6th International Conference on Data Management in Cloud, Grid and P2P Systems (Globe 2013), Aug 2013, Prague, Czech Republic. pp.99-110, ⟨10.1007/978-3-642-40053-7_9⟩. ⟨hal-00866058⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA I3S GRID5000 INRIA-CHILE INRIA2 UNIV-COTEDAZUR SILECS ALDYNET

348 Consultations

278 Téléchargements

Repair Time in Distributed Storage Systems

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager