BlobSeer: Next Generation Data Management for Large Scale Infrastructures

Bogdan Nicolae 1, * Gabriel Antoniu 1 Luc Bougé 1 Diana Moise 1 Alexandra Carpen-Amarie 1
* Auteur correspondant
1 KerData - Scalable Storage for Clouds and Beyond
Inria Rennes – Bretagne Atlantique , IRISA-D1 - SYSTÈMES LARGE ÉCHELLE
Abstract : As data volumes increase at a high speed in more and more application fields of science, engineering, information services, etc., the challenges posed by data-intensive computing gain an increasing importance. The emergence of highly scalable infrastructures, e.g. for cloud computing and for petascale computing and beyond introduces additional issues for which scalable data management becomes an immediate need. This paper brings several contributions. First, it proposes a set of principles for designing highly scalable distributed storage systems that are optimized for heavy data access concurrency. In particular, we highlight the potentially large benefits of using versioning in this context. Second, based on these principles, we propose a set of versioning algorithms, both for data and metadata, that enable a high throughput under concurrency. Finally, we implement and evaluate these algorithms in the BlobSeer prototype, that we integrate as a storage backend in the Hadoop MapReduce framework. We perform extensive microbenchmarks as well as experiments with real MapReduce applications: they demonstrate that applying the principles defended in our approach brings substantial benefits to data intensive applications.
Type de document :
Article dans une revue
Journal of Parallel and Distributed Computing, Elsevier, 2011, 71 (2), pp.168-184. 〈http://dx.doi.org/10.1016/j.jpdc.2010.08.004〉. 〈10.1016/j.jpdc.2010.08.004〉
Liste complète des métadonnées

Littérature citée [38 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00511414
Contributeur : Bogdan Nicolae <>
Soumis le : mardi 24 août 2010 - 21:37:19
Dernière modification le : mardi 16 janvier 2018 - 15:54:18
Document(s) archivé(s) le : jeudi 25 novembre 2010 - 02:51:47

Fichier

paper.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Bogdan Nicolae, Gabriel Antoniu, Luc Bougé, Diana Moise, Alexandra Carpen-Amarie. BlobSeer: Next Generation Data Management for Large Scale Infrastructures. Journal of Parallel and Distributed Computing, Elsevier, 2011, 71 (2), pp.168-184. 〈http://dx.doi.org/10.1016/j.jpdc.2010.08.004〉. 〈10.1016/j.jpdc.2010.08.004〉. 〈inria-00511414〉

Partager

Métriques

Consultations de la notice

1133

Téléchargements de fichiers

797