A Scalable Inline Cluster Deduplication Framework for Big Data Protection

Yinjin Fu; Hong Jiang; Nong Xiao

doi:10.1007/978-3-642-35170-9_18

Communication Dans Un Congrès Année : 2012

A Scalable Inline Cluster Deduplication Framework for Big Data Protection

(1, 2) , (2) , (1)

1
2

Yinjin Fu

Fonction : Auteur
PersonId : 1011566

National University of Defense Technology [China]

University of Nebraska–Lincoln

Hong Jiang

Fonction : Auteur
PersonId : 1011567

University of Nebraska–Lincoln

Nong Xiao

Fonction : Auteur
PersonId : 1011321

National University of Defense Technology [China]

Résumé

Cluster deduplication has become a widely deployed technology in data protection services for Big Data to satisfy the requirements of service level agreement (SLA). However, it remains a great challenge for cluster deduplication to strike a sensible tradeoff between the conflicting goals of scalable deduplication throughput and high duplicate elimination ratio in cluster systems with low-end individual secondary storage nodes. We propose ∑-Dedupe, a scalable inline cluster deduplication framework, as a middleware deployable in cloud data centers, to meet this challenge by exploiting data similarity and locality to optimize cluster deduplication in inter-node and intra-node scenarios, respectively. Governed by a similarity-based stateful data routing scheme, ∑-Dedupe assigns similar data to the same backup server at the super-chunk granularity using a handprinting technique to maintain high cluster-deduplication efficiency without cross-node deduplication, and balances the workload of servers from backup clients. Meanwhile, ∑-Dedupe builds a similarity index over the traditional locality-preserved caching design to alleviate the chunk index-lookup bottleneck in each node. Extensive evaluation of our ∑-Dedupe prototype against state-of-the-art schemes, driven by real-world datasets, demonstrates that ∑-Dedupe achieves a cluster-wide duplicate elimination ratio almost as high as the high-overhead and poorly scalable traditional stateful routing scheme but at an overhead only slightly higher than that of the scalable but low duplicate-elimination-ratio stateless routing approaches.

Mots clés

Big Data protection cluster deduplication data routing superchunk handprinting similarity index load balance

Domaines

Informatique [cs] Réseaux et télécommunications [cs.NI]

Fichier principal

978-3-642-35170-9_18_Chapter.pdf (1.2 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Hal Ifip : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01555548

Soumis le : mardi 4 juillet 2017-11:32:59

Dernière modification le : vendredi 5 août 2022-15:03:07

Archivage à long terme le : vendredi 15 décembre 2017-00:29:58

Dates et versions

hal-01555548 , version 1 (04-07-2017)

Licence

Paternité

Identifiants

HAL Id : hal-01555548 , version 1
DOI : 10.1007/978-3-642-35170-9_18

Citer

Yinjin Fu, Hong Jiang, Nong Xiao. A Scalable Inline Cluster Deduplication Framework for Big Data Protection. 13th International Middleware Conference (MIDDLEWARE), Dec 2012, Montreal, QC, Canada. pp.354-373, ⟨10.1007/978-3-642-35170-9_18⟩. ⟨hal-01555548⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

IFIP-LNCS IFIP IFIP-TC IFIP-WG IFIP-TC6 IFIP-WG6-1 IFIP-MIDDLEWARE IFIP-LNCS-7662

68 Consultations

202 Téléchargements

A Scalable Inline Cluster Deduplication Framework for Big Data Protection

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager