DDFlasks: Deduplicated Very Large Scale Data Store

Abstract : With the increasing number of connected devices, it becomes essential to find novel data management solutions that can leverage their computational and storage capabilities. However, developing very large scale data management systems requires tackling a number of interesting distributed systems challenges, namely continuous failures and high levels of node churn. In this context, epidemic-based protocols proved suitable and effective and have been successfully used to build DataFlasks, an epidemic data store for massive scale systems. Ensuring resiliency in this data store comes with a significant cost in storage resources and network bandwidth consumption. Deduplication has proven to be an efficient technique to reduce both costs but, applying it to a large-scale distributed storage system is not a trivial task. In fact, achieving significant space-savings without compromising the resiliency and decentralized design of these storage systems is a relevant research challenge.In this paper, we extend DataFlasks with deduplication to design DDFlasks. This system is evaluated in a real world scenario using Wikipedia snapshots, and the results are twofold. We show that deduplication is able to decrease storage consumption up to 63% and decrease network bandwidth consumption by up to 20%, while maintaining a fully-decentralized and resilient design.
Document type :
Conference papers
Lydia Y. Chen; Hans Reiser. 17th IFIP International Conference on Distributed Applications and Interoperable Systems (DAIS), Jun 2017, Neuchâtel, Switzerland. Springer International Publishing, Lecture Notes in Computer Science, LNCS-10320, pp.51-66, 2017, Distributed Applications and Interoperable Systems. 〈10.1007/978-3-319-59665-5_4〉
Liste complète des métadonnées

Cited literature [24 references]  Display  Hide  Download

https://hal.inria.fr/hal-01800122
Contributor : Hal Ifip <>
Submitted on : Friday, May 25, 2018 - 3:17:26 PM
Last modification on : Friday, May 25, 2018 - 3:50:03 PM
Document(s) archivé(s) le : Sunday, August 26, 2018 - 1:57:28 PM

File

 Restricted access
To satisfy the distribution rights of the publisher, the document is embargoed until : 2020-01-01

Please log in to resquest access to the document

Licence


Distributed under a Creative Commons Attribution 4.0 International License

Identifiers

Citation

Francisco Maia, João Paulo, Fábio Coelho, Francisco Neves, José Pereira, et al.. DDFlasks: Deduplicated Very Large Scale Data Store . Lydia Y. Chen; Hans Reiser. 17th IFIP International Conference on Distributed Applications and Interoperable Systems (DAIS), Jun 2017, Neuchâtel, Switzerland. Springer International Publishing, Lecture Notes in Computer Science, LNCS-10320, pp.51-66, 2017, Distributed Applications and Interoperable Systems. 〈10.1007/978-3-319-59665-5_4〉. 〈hal-01800122〉

Share

Metrics

Record views

92