Optimizing data storage for MapReduce applications in the Azure Clouds - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Mémoires D'étudiants -- Hal-Inria+ Année : 2011

Optimizing data storage for MapReduce applications in the Azure Clouds

Radu Tudoran
  • Fonction : Auteur
  • PersonId : 914308

Résumé

In this report we address the problem of data management in clouds for the MapReduce programing model. In order to improve the performance of data-intensive applications, we designed a distributed file system deployed on the computation nodes of public clouds. This approach exploits the data locality principle by moving the data close to the computation. The read performance increases up to 2 times and the write performance increases up to 5 times, compared to the traditional remote storage techniques used in public clouds. Encouraged by these results, we developed a customized MapReduce platform, relying on our istributed file system, and optimized it for dataintensive applications. We illustrate the benefits of our approach using a joint genetics and neuroimaging application for studying the variability between individuals, based on univariate data analysis. By adjusting the design of our MapReduce platform to meet the requirements of this application, we were able to reduce its computation time by up to 4 times.
Fichier non déposé

Dates et versions

hal-00643336 , version 1 (21-11-2011)

Identifiants

  • HAL Id : hal-00643336 , version 1

Citer

Radu Tudoran. Optimizing data storage for MapReduce applications in the Azure Clouds. Distributed, Parallel, and Cluster Computing [cs.DC]. 2011. ⟨hal-00643336⟩
223 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More