Optimization of Real-World MapReduce Applications With Flame-MR: Practical Use Cases

Jorge Veiga 1 Roberto Expósito 1 Bruno Raffin 2 Juan Tourino 1
2 DATAMOVE - Data Aware Large Scale Computing
Inria Grenoble - Rhône-Alpes, LIG - Laboratoire d'Informatique de Grenoble
Abstract : Apache Hadoop is a widely used MapReduce framework for storing and processing large amounts of data. However, it presents some performance issues that hinder its utilization in many practical use cases. Although existing alternatives like Spark or Hama can outperform Hadoop, they require to rewrite the source code of the applications due to API incompatibilities. This paper studies the use of Flame-MR, an in-memory processing architecture for MapReduce applications, to improve the performance of real-world use cases in a transparent way while keeping application compatibility. Flame-MR adapts to the characteristics of the workloads, managing efficiently the use of custom data formats and iterative computations, while also reducing workload imbalance. The experimental evaluation, conducted in high performance clusters and the Microsoft Azure cloud, shows a clear outperformance of Flame-MR over Hadoop. In most cases, Flame-MR reduces the execution times by more than a half.
Complete list of metadatas

Cited literature [29 references]  Display  Hide  Download

https://hal.inria.fr/hal-01955503
Contributor : Bruno Raffin <>
Submitted on : Friday, December 14, 2018 - 1:49:47 PM
Last modification on : Wednesday, July 17, 2019 - 10:24:03 AM
Long-term archiving on : Friday, March 15, 2019 - 4:08:08 PM

File

2_FINAL_Article-2.pdf
Files produced by the author(s)

Identifiers

Citation

Jorge Veiga, Roberto Expósito, Bruno Raffin, Juan Tourino. Optimization of Real-World MapReduce Applications With Flame-MR: Practical Use Cases. IEEE Access, IEEE, 2018, 6, pp.69750-69762. ⟨10.1109/ACCESS.2018.2880842⟩. ⟨hal-01955503⟩

Share

Metrics

Record views

79

Files downloads

113