To Overlap or Not to Overlap: Optimizing Incremental MapReduce Computations for On-Demand Data Upload

Abstract : Research on cloud-based Big Data analytics has focused so far on optimizing the performance and cost-effectiveness of the computations, while largely neglecting an important as-pect: users need to upload massive datasets on clouds for their computations. This paper studies the problem of run-ning MapReduce applications when considering the simulta-neous optimization of performance and cost of both the data upload and its corresponding computation taken together. We analyze the feasibility of incremental MapReduce ap-proaches to advance the computation as much as possible during the data upload by using already transferred data to calculate intermediate results. Our key finding shows that overlapping the transfer time with as many incremental com-putations as possible is not always efficient: a better solution is to wait for enough to fill the computational capacity of the MapReduce cluster. Results show significant performance and cost reduction compared with state-of-the-art solutions that leverage incremental computations in a naive fashion.
Type de document :
Communication dans un congrès
DataCloud'14: The 5th International Workshop on Data-Intensive Computing in the Clouds (held in conjunction with SC14), Nov 2014, New Orleans, United States. pp.9-16, 2014, 〈10.1109/DataCloud.2014.7〉
Liste complète des métadonnées

Littérature citée [20 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01094609
Contributeur : Bogdan Nicolae <>
Soumis le : jeudi 8 janvier 2015 - 16:07:59
Dernière modification le : mardi 16 janvier 2018 - 15:54:18
Document(s) archivé(s) le : jeudi 9 avril 2015 - 10:06:46

Fichier

main.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Stefan Ene, Bogdan Nicolae, Alexandru Costan, Gabriel Antoniu. To Overlap or Not to Overlap: Optimizing Incremental MapReduce Computations for On-Demand Data Upload. DataCloud'14: The 5th International Workshop on Data-Intensive Computing in the Clouds (held in conjunction with SC14), Nov 2014, New Orleans, United States. pp.9-16, 2014, 〈10.1109/DataCloud.2014.7〉. 〈hal-01094609〉

Partager

Métriques

Consultations de la notice

542

Téléchargements de fichiers

163