BlobSeer: Bringing High Throughput under Heavy Concurrency to Hadoop Map/Reduce Applications

Bogdan Nicolae; Diana Moise; Gabriel Antoniu; Luc Bougé; Matthieu Dorier

Reports (Research Report) Year : 2009

BlobSeer: Bringing High Throughput under Heavy Concurrency to Hadoop Map/Reduce Applications

(1) , (1) , (1) , (1) , (1)

Bogdan Nicolae

Function : Author
PersonId : 862774

Scalable Storage for Clouds and Beyond

Diana Moise

Function : Author
PersonId : 856286

Scalable Storage for Clouds and Beyond

Gabriel Antoniu

Function : Correspondent author
PersonId : 746326
IdHAL : gabriel-antoniu
ORCID : 0000-0001-6525-3736
IdRef : 095615296

Connectez-vous pour contacter l'auteur

Scalable Storage for Clouds and Beyond

Luc Bougé

Function : Author
PersonId : 1264
IdHAL : bouge
ORCID : 0000-0002-5510-4443
IdRef : 032062591

Scalable Storage for Clouds and Beyond

Matthieu Dorier

Function : Author
PersonId : 865414

Scalable Storage for Clouds and Beyond

Abstract

Hadoop is a software framework supporting the Map/Reduce programming model. It relies on the Hadoop Distributed File System (HDFS) as its primary storage system. The efficiency of HDFS is crucial for the performance of Map/Reduce applications. We substitute the original HDFS layer of Hadoop with a new, concurrency-optimized data storage layer based on the BlobSeer data management service. Thereby, the efficiency of Hadoop is significantly improved for data-intensive Map/Reduce applications, which naturally exhibit a high degree of data access concurrency. Moreover, BlobSeer's features (built-in versioning, its support for concurrent append operations) open the possibility for Hadoop to further extend its functionalities. We report on extensive experiments conducted on the Grid'5000 testbed. The results illustrate the benefits of our approach over the original HDFS-based implementation of Hadoop.

Keywords

Distributed file systems High-performance systems High throughput Large-scale Heavy access concurrency Map/Reduce applications Hadoop BlobSeer. BlobSeer

Domains

Programming Languages [cs.PL]

Fichier principal

rr-7140.pdf (357.74 Ko)

Origin : Files produced by the author(s)

Luc Bougé : Connect in order to contact the contributor

https://inria.hal.science/inria-00440312

Submitted on : Thursday, December 10, 2009-11:41:16 AM

Last modification on : Friday, March 24, 2023-2:52:52 PM

Long-term archiving on: Thursday, June 30, 2011-11:19:23 AM

Dates and versions

inria-00440312 , version 1 (10-12-2009)

Identifiers

HAL Id : inria-00440312 , version 1

Cite

Bogdan Nicolae, Diana Moise, Gabriel Antoniu, Luc Bougé, Matthieu Dorier. BlobSeer: Bringing High Throughput under Heavy Concurrency to Hadoop Map/Reduce Applications. [Research Report] RR-7140, INRIA. 2009, pp.20. ⟨inria-00440312⟩

Export

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM EC-PARIS UNIV-RENNES1 CNRS INRIA INSA-RENNES IRISA INRIA-RRRT GRID5000 IRISA-D1 INRIA2 LARA UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES SILECS UR1-MATH-NUM

239 View

379 Download

BlobSeer: Bringing High Throughput under Heavy Concurrency to Hadoop Map/Reduce Applications

Abstract

Keywords

Domains

Dates and versions

Identifiers

Cite

Export

Collections

Share