Optimizing data storage for MapReduce applications in the Azure Clouds

Radu Tudoran 1
1 KerData - Scalable Storage for Clouds and Beyond
IRISA-D1 - SYSTÈMES LARGE ÉCHELLE, Inria Rennes – Bretagne Atlantique
Abstract : In this report we address the problem of data management in clouds for the MapReduce programing model. In order to improve the performance of data-intensive applications, we designed a distributed file system deployed on the computation nodes of public clouds. This approach exploits the data locality principle by moving the data close to the computation. The read performance increases up to 2 times and the write performance increases up to 5 times, compared to the traditional remote storage techniques used in public clouds. Encouraged by these results, we developed a customized MapReduce platform, relying on our istributed file system, and optimized it for dataintensive applications. We illustrate the benefits of our approach using a joint genetics and neuroimaging application for studying the variability between individuals, based on univariate data analysis. By adjusting the design of our MapReduce platform to meet the requirements of this application, we were able to reduce its computation time by up to 4 times.
Type de document :
Mémoires d'étudiants -- Hal-inria+
Distributed, Parallel, and Cluster Computing [cs.DC]. 2011
Liste complète des métadonnées

Contributeur : Radu Tudoran <>
Soumis le : lundi 21 novembre 2011 - 16:15:53
Dernière modification le : vendredi 16 novembre 2018 - 01:38:16


  • HAL Id : hal-00643336, version 1


Radu Tudoran. Optimizing data storage for MapReduce applications in the Azure Clouds. Distributed, Parallel, and Cluster Computing [cs.DC]. 2011. 〈hal-00643336〉



Consultations de la notice