A Scalable Inline Cluster Deduplication Framework for Big Data Protection

Abstract : Cluster deduplication has become a widely deployed technology in data protection services for Big Data to satisfy the requirements of service level agreement (SLA). However, it remains a great challenge for cluster deduplication to strike a sensible tradeoff between the conflicting goals of scalable deduplication throughput and high duplicate elimination ratio in cluster systems with low-end individual secondary storage nodes. We propose ∑-Dedupe, a scalable inline cluster deduplication framework, as a middleware deployable in cloud data centers, to meet this challenge by exploiting data similarity and locality to optimize cluster deduplication in inter-node and intra-node scenarios, respectively. Governed by a similarity-based stateful data routing scheme, ∑-Dedupe assigns similar data to the same backup server at the super-chunk granularity using a handprinting technique to maintain high cluster-deduplication efficiency without cross-node deduplication, and balances the workload of servers from backup clients. Meanwhile, ∑-Dedupe builds a similarity index over the traditional locality-preserved caching design to alleviate the chunk index-lookup bottleneck in each node. Extensive evaluation of our ∑-Dedupe prototype against state-of-the-art schemes, driven by real-world datasets, demonstrates that ∑-Dedupe achieves a cluster-wide duplicate elimination ratio almost as high as the high-overhead and poorly scalable traditional stateful routing scheme but at an overhead only slightly higher than that of the scalable but low duplicate-elimination-ratio stateless routing approaches.
Type de document :
Communication dans un congrès
Priya Narasimhan; Peter Triantafillou. 13th International Middleware Conference (MIDDLEWARE), Dec 2012, Montreal, QC, Canada. Springer, Lecture Notes in Computer Science, LNCS-7662, pp.354-373, 2012, Middleware 2012. 〈10.1007/978-3-642-35170-9_18〉
Liste complète des métadonnées

Littérature citée [17 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01555548
Contributeur : Hal Ifip <>
Soumis le : mardi 4 juillet 2017 - 11:32:59
Dernière modification le : mardi 4 juillet 2017 - 11:34:12
Document(s) archivé(s) le : vendredi 15 décembre 2017 - 00:29:58

Fichier

978-3-642-35170-9_18_Chapter.p...
Fichiers produits par l'(les) auteur(s)

Licence


Distributed under a Creative Commons Paternité 4.0 International License

Identifiants

Citation

Yinjin Fu, Hong Jiang, Nong Xiao. A Scalable Inline Cluster Deduplication Framework for Big Data Protection. Priya Narasimhan; Peter Triantafillou. 13th International Middleware Conference (MIDDLEWARE), Dec 2012, Montreal, QC, Canada. Springer, Lecture Notes in Computer Science, LNCS-7662, pp.354-373, 2012, Middleware 2012. 〈10.1007/978-3-642-35170-9_18〉. 〈hal-01555548〉

Partager

Métriques

Consultations de la notice

52

Téléchargements de fichiers

132