Handling Partitioning Skew in MapReduce using LEEN

Abstract : MapReduce is emerging as a prominent tool for big data processing. Data locality is a key feature in MapRe- duce that is extensively leveraged in data-intensive cloud systems: it avoids network saturation when processing large amounts of data by co-allocating computation and data stor- age, particularly for the map phase. However, our studies with Hadoop, a widely used MapReduce implementation, demonstrate that the presence of partitioning skew (Parti- tioning skew refers to the case when a variation in either the intermediate keys’ frequencies or their distributions or both among different data nodes) huge amount of data transfer during the shuffle phase and leads to significant unfairness on the reduce input among different data nodes. As a result, the applications severe performance degrada- tion due to the long data transfer during the shuffle phase along with the computation skew, particularly in reduce phase. In this paper, we develop a novel algorithm named LEEN for locality-aware and fairness-aware key partition- ing in MapReduce. LEEN embraces an asynchronous map and reduce scheme. All buffered intermediate keys are parti- tioned according to their frequencies and the fairness of the expected data distribution after the shuffle phase. We have integrated LEEN into Hadoop. Our experiments demon- strate that LEEN can efficiently achieve higher locality and reduce the amount of shuffled data. More importantly, LEEN guarantees fair distribution of the reduce inputs. As a result, LEEN achieves a performance improvement of up to 45 % on different workloads.
Type de document :
Article dans une revue
Peer-to-Peer Networking and Applications, Springer, 2013
Liste complète des métadonnées

Littérature citée [39 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-00822973
Contributeur : Shadi Ibrahim <>
Soumis le : mardi 28 juin 2016 - 16:16:11
Dernière modification le : mercredi 11 avril 2018 - 01:50:59
Document(s) archivé(s) le : jeudi 29 septembre 2016 - 12:22:31

Fichier

PPNA.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00822973, version 1

Citation

Shadi Ibrahim, Hai Jin, Lu Lu, Bingsheng He, Gabriel Antoniu, et al.. Handling Partitioning Skew in MapReduce using LEEN. Peer-to-Peer Networking and Applications, Springer, 2013. 〈hal-00822973〉

Partager

Métriques

Consultations de la notice

400

Téléchargements de fichiers

190