28595 articles – 22087 Notices  [english version]

inria-00457809, version 1

BlobSeer: Efficient Data Management for Data-Intensive Applications Distributed at Large-Scale

Bogdan Nicolae (Auteur à contacter de préférence) a1, Gabriel Antoniu () b1, Luc Bougé () c1

IPDPS '10: Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing: Workshops and Phd Forum (2010) 1-4

Résumé : Large-scale data-intensive applications are a class of applications that acquire and maintain massive datasets, while performing distributed computations on these datasets. In this context, a a key factor is the storage service responsible for the data management, as it has to efficiently deal with massively parallel data access in order to ensure scalability and performance for the whole system itself. This PhD thesis proposes BlobSeer, a data management service specifically designed to address the needs of large-scale data-intensive applications. Three key design factors: data striping, distributed metadata management and versioning-based concurrency control enable BlobSeer not only to provide efficient support for features commonly used to exploit data-level parallelism, but also enable exploring a set of new features that can be leveraged to further improve parallel data access. Extensive experimentations, both in scale and scope, on the Grid5000 testbed demonstrate clear benefits of using BlobSeer as the underlying storage for a variety of scenarios: data-intensive grid applications, grid file systems, MapReduce datacenters, desktop grids. Further work targets providing efficient storage solutions for cloud computing as well.

  • a –  Université de Rennes I
  • b –  INRIA
  • c –  Ecole Normale Supérieure de Cachan
  • 1 :  KerData (INRIA - IRISA)
  • INRIA – CNRS : UMR6074 – École normale supérieure de Cachan - ENS Cachan – Institut National des Sciences Appliquées (INSA) - Rennes – Université de Rennes 1
  • Collaboration : Grid'5000
  • Domaine : Informatique/Calcul parallèle, distribué et partagé
  • Mots-clés : data intensive applications – large scale – distributed data storage – high throughput – heavy access concurrency – versioning – efficient concurrency control – data striping – distributed metadata management
 
  • inria-00457809, version 1
  • oai:hal.inria.fr:inria-00457809
  • Contributeur : 
  • Soumis le : Jeudi 18 Février 2010, 17:14:33
  • Dernière modification le : Lundi 4 Février 2013, 14:59:01