DDFlasks: Deduplicated Very Large Scale Data Store - Archive ouverte HAL Access content directly
Conference Papers Year : 2017

DDFlasks: Deduplicated Very Large Scale Data Store

(1) , (1) , (1) , (1) , (1) , (1)
1
Francisco Maia
  • Function : Author
  • PersonId : 978511
João Paulo
  • Function : Author
  • PersonId : 978502
Fábio Coelho
  • Function : Author
  • PersonId : 1032381
Francisco Neves
  • Function : Author
  • PersonId : 1032382
José Pereira
  • Function : Author
  • PersonId : 968822
Rui Oliveira
  • Function : Author
  • PersonId : 998185

Abstract

With the increasing number of connected devices, it becomes essential to find novel data management solutions that can leverage their computational and storage capabilities. However, developing very large scale data management systems requires tackling a number of interesting distributed systems challenges, namely continuous failures and high levels of node churn. In this context, epidemic-based protocols proved suitable and effective and have been successfully used to build DataFlasks, an epidemic data store for massive scale systems. Ensuring resiliency in this data store comes with a significant cost in storage resources and network bandwidth consumption. Deduplication has proven to be an efficient technique to reduce both costs but, applying it to a large-scale distributed storage system is not a trivial task. In fact, achieving significant space-savings without compromising the resiliency and decentralized design of these storage systems is a relevant research challenge.In this paper, we extend DataFlasks with deduplication to design DDFlasks. This system is evaluated in a real world scenario using Wikipedia snapshots, and the results are twofold. We show that deduplication is able to decrease storage consumption up to 63% and decrease network bandwidth consumption by up to 20%, while maintaining a fully-decentralized and resilient design.
Fichier principal
Vignette du fichier
450046_1_En_4_Chapter.pdf (178.6 Ko) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

hal-01800122 , version 1 (25-05-2018)

Licence

Attribution - CC BY 4.0

Identifiers

Cite

Francisco Maia, João Paulo, Fábio Coelho, Francisco Neves, José Pereira, et al.. DDFlasks: Deduplicated Very Large Scale Data Store . 17th IFIP International Conference on Distributed Applications and Interoperable Systems (DAIS), Jun 2017, Neuchâtel, Switzerland. pp.51-66, ⟨10.1007/978-3-319-59665-5_4⟩. ⟨hal-01800122⟩
261 View
43 Download

Altmetric

Share

Gmail Facebook Twitter LinkedIn More