Skip to Main content Skip to Navigation
Conference papers

Spark-based Cloud Data Analytics using Multi-Objective Optimization

Fei Song 1 Khaled Zaouk 1 Chenghao Lyu 2 Arnab Sinha 1 Qi Fan 1 Yanlei Diao 1 Prashant Shenoy 2 
1 CEDAR - Rich Data Analytics at Cloud Scale
LIX - Laboratoire d'informatique de l'École polytechnique [Palaiseau], Inria Saclay - Ile de France
Abstract : Data analytics in the cloud has become an integral part of enterprise businesses. Big data analytics systems, however, still lack the ability to take user performance goals and budgetary constraints for a task, collectively referred to as task objectives, and automatically configure an analytic job to achieve these objectives. This paper presents a data analytics optimizer that can automatically determine a cluster configuration with a suitable number of cores as well as other system parameters that best meet the task objectives. At a core of our work is a principled multi-objective optimization (MOO) approach that computes a Pareto optimal set of job configurations to reveal tradeoffs between different user objectives, recommends a new job configuration that best explores such tradeoffs, and employs novel optimizations to enable such recommendations within a few seconds. We present efficient incremental algorithms based on the notion of a Progressive Frontier for realizing our MOO approach and implement them into a Spark-based prototype. Detailed experiments using benchmark workloads show that our MOO techniques provide a 2-50x speedup over existing MOO methods, while offering good coverage of the Pareto frontier. When compared to Ottertune, a state-of-the-art performance tuning system, our approach recommends configurations that yield 26%-49% reduction of running time of the TPCx-BB benchmark while adapting to different application preferences on multiple objectives.
Document type :
Conference papers
Complete list of metadata
Contributor : Fei Song Connect in order to contact the contributor
Submitted on : Sunday, February 14, 2021 - 10:53:36 AM
Last modification on : Friday, February 4, 2022 - 3:16:10 AM
Long-term archiving on: : Saturday, May 15, 2021 - 6:05:52 PM


Files produced by the author(s)


  • HAL Id : hal-02549758, version 1


Fei Song, Khaled Zaouk, Chenghao Lyu, Arnab Sinha, Qi Fan, et al.. Spark-based Cloud Data Analytics using Multi-Objective Optimization. ICDE - 37th IEEE International Conference on Data Engineering, Apr 2021, Chania, Greece. ⟨hal-02549758⟩



Record views


Files downloads