Improving the Hadoop Map/Reduce Framework to Support Concurrent Appends through the BlobSeer BLOB management system

Diana Moise 1 Gabriel Antoniu 1, * Luc Bougé 1
* Auteur correspondant
1 KerData - Scalable Storage for Clouds and Beyond
Inria Rennes – Bretagne Atlantique , IRISA-D1 - SYSTÈMES LARGE ÉCHELLE
Abstract : Hadoop is a reference software framework supporting the Map/Reduce programming model. It relies on the Hadoop Distributed File System (HDFS) as its primary storage system. Although HDFS does not offer support for concurrently appending data to existing files, we argue that Map/Reduce applications as well as other classes of applications can benefit from such a functionality. We provide support for concurrent appends by building a concurrency-optimized data storage layer based on the BlobSeer data management service. Moreover, we modify the Hadoop Map/Reduce framework to use the append operation in the ''reduce'' phase of the application. To validate this work, we perform experiments on a large number of nodes of the Grid'5000 testbed. We demonstrate that massively concurrent append and read operations have a low impact on each other. Besides, measurements with an application available with Hadoop show that the support for concurrent appends to shared file is introduced with no extra cost, whereas the number of files managed by the Map/Reduced framework is substantially reduced.
Type de document :
Communication dans un congrès
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing (HPDC'10), Workshop on MapReduce and its Applications, Jun 2010, Chicago, United States. ACM, pp.834--840, 2010, 〈http://doi.acm.org/10.1145/1851476.1851596〉. 〈10.1145/1851476.1851596〉
Liste complète des métadonnées

https://hal.inria.fr/inria-00476861
Contributeur : Diana Moise <>
Soumis le : mardi 27 avril 2010 - 14:20:15
Dernière modification le : mercredi 16 mai 2018 - 11:23:28
Document(s) archivé(s) le : mardi 28 septembre 2010 - 12:45:25

Fichier

mapreduce_MAB.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Diana Moise, Gabriel Antoniu, Luc Bougé. Improving the Hadoop Map/Reduce Framework to Support Concurrent Appends through the BlobSeer BLOB management system. Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing (HPDC'10), Workshop on MapReduce and its Applications, Jun 2010, Chicago, United States. ACM, pp.834--840, 2010, 〈http://doi.acm.org/10.1145/1851476.1851596〉. 〈10.1145/1851476.1851596〉. 〈inria-00476861〉

Partager

Métriques

Consultations de la notice

981

Téléchargements de fichiers

283