Handling Partitioning Skew in MapReduce using LEEN

Abstract : MapReduce is emerging as a prominent tool for big data processing. Data locality is a key feature in MapRe- duce that is extensively leveraged in data-intensive cloud systems: it avoids network saturation when processing large amounts of data by co-allocating computation and data stor- age, particularly for the map phase. However, our studies with Hadoop, a widely used MapReduce implementation, demonstrate that the presence of partitioning skew (Parti- tioning skew refers to the case when a variation in either the intermediate keys’ frequencies or their distributions or both among different data nodes) huge amount of data transfer during the shuffle phase and leads to significant unfairness on the reduce input among different data nodes. As a result, the applications severe performance degrada- tion due to the long data transfer during the shuffle phase along with the computation skew, particularly in reduce phase. In this paper, we develop a novel algorithm named LEEN for locality-aware and fairness-aware key partition- ing in MapReduce. LEEN embraces an asynchronous map and reduce scheme. All buffered intermediate keys are parti- tioned according to their frequencies and the fairness of the expected data distribution after the shuffle phase. We have integrated LEEN into Hadoop. Our experiments demon- strate that LEEN can efficiently achieve higher locality and reduce the amount of shuffled data. More importantly, LEEN guarantees fair distribution of the reduce inputs. As a result, LEEN achieves a performance improvement of up to 45 % on different workloads.
Complete list of metadatas

Cited literature [39 references]  Display  Hide  Download

https://hal.inria.fr/hal-00822973
Contributor : Shadi Ibrahim <>
Submitted on : Tuesday, June 28, 2016 - 4:16:11 PM
Last modification on : Monday, July 15, 2019 - 11:50:13 AM
Long-term archiving on: Thursday, September 29, 2016 - 12:22:31 PM

File

PPNA.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00822973, version 1

Citation

Shadi Ibrahim, Hai Jin, Lu Lu, Bingsheng He, Gabriel Antoniu, et al.. Handling Partitioning Skew in MapReduce using LEEN. Peer-to-Peer Networking and Applications, Springer, 2013. ⟨hal-00822973⟩

Share

Metrics

Record views

607

Files downloads

532