Repair Time in Distributed Storage Systems

Abstract : In this paper, we analyze a highly distributed backup stor- age system realized by means of nano datacenters (NaDa). NaDa have been recently proposed as a way to mitigate the growing energy, band- width and device costs of traditional data centers, following the popu- larity of cloud computing. These service provider-controlled peer-to-peer systems take advantage of resources already committed to always-on set top boxes, the fact they do not generate heat dissipation costs and their proximity to users. In this kind of systems redundancy is introduced to preserve the data in case of peer failures or departures. To ensure long-term fault tolerance, the storage system must have a self-repairing service that continuously reconstructs the fragments of redundancy that are lost. The speed of this reconstruction process is crucial for the data survival. This speed is mainly determined by how much bandwidth, which is a critical re- source of such systems, is available. In the literature, the reconstruc- tion times are modeled as independent (e.g., poissonian, deterministic, or more generally following any distribution). In practice, however, nu- merous reconstructions start at the same time (when the system detects that a peer has failed). Consequently, they are correlated to each other because concurrent reconstructions do compete for the same bandwidth. This correlation negatively impacts the efficiency of the bandwidth uti- lization and henceforth the repair time. We propose a new analytical framework that takes into account this correlation when estimating the repair time and the probability of data loss. Mainly, we introduce a queuing model in which reconstructions are served by peers at a rate that depends on the available bandwidth. We show that the load is unbalanced among peers (young peers inherently store less data than the old ones). This leads us to introduce a correcting factor on the repair rate of the system. The models and schemes proposed are validated by mathematical analysis, extensive set of simulations, and experimentation using the GRID5000 test-bed platform. This new model allows system designers to operate a more accurate choice of system parameters in function of their targeted data durability.
Type de document :
Communication dans un congrès
6th International Conference on Data Management in Cloud, Grid and P2P Systems (Globe 2013), Aug 2013, Prague, Czech Republic. Springer, 8059, pp.99-110, 2013, Data Management in Cloud, Grid and P2P Systems. 〈10.1007/978-3-642-40053-7_9〉
Liste complète des métadonnées

Littérature citée [18 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-00866058
Contributeur : Frédéric Giroire <>
Soumis le : mercredi 25 septembre 2013 - 17:23:42
Dernière modification le : lundi 4 décembre 2017 - 15:14:19
Document(s) archivé(s) le : vendredi 7 avril 2017 - 02:46:28

Fichier

globe-preprint.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Frédéric Giroire, Sandeep Kumar Gupta, Remigiusz Modrzejewski, Julian Monteiro, Stéphane Perennes. Repair Time in Distributed Storage Systems. 6th International Conference on Data Management in Cloud, Grid and P2P Systems (Globe 2013), Aug 2013, Prague, Czech Republic. Springer, 8059, pp.99-110, 2013, Data Management in Cloud, Grid and P2P Systems. 〈10.1007/978-3-642-40053-7_9〉. 〈hal-00866058〉

Partager

Métriques

Consultations de la notice

610

Téléchargements de fichiers

242