Skip to Main content Skip to Navigation
Master thesis

Optimizing data storage for MapReduce applications in the Azure Clouds

Radu Tudoran 1
1 KerData - Scalable Storage for Clouds and Beyond
Inria Rennes – Bretagne Atlantique , IRISA-D1 - SYSTÈMES LARGE ÉCHELLE
Abstract : In this report we address the problem of data management in clouds for the MapReduce programing model. In order to improve the performance of data-intensive applications, we designed a distributed file system deployed on the computation nodes of public clouds. This approach exploits the data locality principle by moving the data close to the computation. The read performance increases up to 2 times and the write performance increases up to 5 times, compared to the traditional remote storage techniques used in public clouds. Encouraged by these results, we developed a customized MapReduce platform, relying on our istributed file system, and optimized it for dataintensive applications. We illustrate the benefits of our approach using a joint genetics and neuroimaging application for studying the variability between individuals, based on univariate data analysis. By adjusting the design of our MapReduce platform to meet the requirements of this application, we were able to reduce its computation time by up to 4 times.
Complete list of metadatas

https://hal.inria.fr/hal-00643336
Contributor : Radu Tudoran <>
Submitted on : Monday, November 21, 2011 - 4:15:53 PM
Last modification on : Saturday, July 11, 2020 - 3:23:03 AM

Identifiers

  • HAL Id : hal-00643336, version 1

Citation

Radu Tudoran. Optimizing data storage for MapReduce applications in the Azure Clouds. Distributed, Parallel, and Cluster Computing [cs.DC]. 2011. ⟨hal-00643336⟩

Share

Metrics

Record views

432