Skip to Main content Skip to Navigation
Conference papers

mpCache: Accelerating MapReduce with Hybrid Storage System on Many-Core Clusters

Abstract : As a widely used programming model and implementation for processing large data sets, MapReduce does not scale well on many-core clusters, which, unfortunately, are common in current data centers. To deal with the problem, this paper: 1) analyzes the causes of poor scalability of MapReduce on many-core clusters and identifies the key one as the underlying low-speed storage (hard disk) can not meet the requirements of frequent IO operations, and 2) proposes mpCache, a SSD based hybrid storage system that caches both Input Data and Localized Data, and dynamically tunes the cache space allocation between them to make full use of the space. mpCache has been incorporated into Hadoop and evaluated on a 7-node cluster by 13 benchmarks. The experimental results show that mpCache gains an average speedup of 2.09 when compared with the original Hadoop, and achieves an average speedup of 1.79 when compared with PACMan, the latest in-memory optimization of MapReduce.
Document type :
Conference papers
Complete list of metadata

Cited literature [20 references]  Display  Hide  Download
Contributor : Hal Ifip Connect in order to contact the contributor
Submitted on : Friday, November 25, 2016 - 2:31:06 PM
Last modification on : Tuesday, June 1, 2021 - 2:34:10 PM
Long-term archiving on: : Monday, March 20, 2017 - 5:31:37 PM


Files produced by the author(s)


Distributed under a Creative Commons Attribution 4.0 International License



Bo Wang, Jinlei Jiang, Guangwen yang. mpCache: Accelerating MapReduce with Hybrid Storage System on Many-Core Clusters. 11th IFIP International Conference on Network and Parallel Computing (NPC), Sep 2014, Ilan, Taiwan. pp.220-233, ⟨10.1007/978-3-662-44917-2_19⟩. ⟨hal-01403087⟩



Record views


Files downloads