Skip to Main content Skip to Navigation
Conference papers

CSAS: Cost-Based Storage Auto-Selection, a Fine Grained Storage Selection Mechanism for Spark

Abstract : To improve system performance, Spark places the RDDs into memory for further access through the caching mechanism. And it provides a variety of storage levels to put cache RDDs. However, the RDD-grained manual storage level selection mechanism can not adjust depending on computing resources of the node. In this paper, we firstly present a fine-grained automatic storage level selection mechanism. And then we provide a storage level for a partition based on a cost model which fully considering the system resources status, compression and serialization costs. Experiments show that our approach can offer a up to 77% performance improvement compared to the default storage level scheme provided by Spark.
Document type :
Conference papers
Complete list of metadata

Cited literature [3 references]  Display  Hide  Download

https://hal.inria.fr/hal-01705452
Contributor : Hal Ifip <>
Submitted on : Friday, February 9, 2018 - 2:26:59 PM
Last modification on : Thursday, January 9, 2020 - 4:04:02 PM
Long-term archiving on: : Thursday, May 10, 2018 - 12:42:05 PM

File

457609_1_En_18_Chapter.pdf
Files produced by the author(s)

Licence


Distributed under a Creative Commons Attribution 4.0 International License

Identifiers

Citation

Bo Wang, Jie Tang, Rui Zhang, Zhimin Gu. CSAS: Cost-Based Storage Auto-Selection, a Fine Grained Storage Selection Mechanism for Spark. 14th IFIP International Conference on Network and Parallel Computing (NPC), Oct 2017, Hefei, China. pp.150-154, ⟨10.1007/978-3-319-68210-5_18⟩. ⟨hal-01705452⟩

Share

Metrics

Record views

207

Files downloads

121