Understanding Spark Performance in Hybrid and Multi-Site Clouds

Abstract : Recently, hybrid multi-site big data analytics (that combines on-premise with off-premise resources) has gained increasing popularity as a tool to process large amounts of data on-demand, without additional capital investment to increase the size of a single datacenter. However, making the most out of hybrid setups for big data analytics is challenging because on-premise resources can communicate with off-premise resources at significantly lower throughput and higher latency. Understanding the impact of this aspect is not trivial, especially in the context of modern big data an-alytics frameworks that introduce complex communication patterns and are optimized to overlap communication with computation in order to hide data transfer latencies. This paper contributes with a work-in-progress study that aims to identify and explain this impact in relationship to the known behavior on a single cloud. To this end, it analyses a representative big data workload on a hybrid Spark setup. Unlike previous experience that emphasized low end-impact of network communications in Spark, we found significant overhead in the shuffle phase when the bandwidth between the on-premise and off-premise resources is sufficiently small.
Complete list of metadatas

Cited literature [20 references]  Display  Hide  Download

https://hal.inria.fr/hal-01239140
Contributor : Alexandru Costan <>
Submitted on : Monday, December 14, 2015 - 2:14:39 PM
Last modification on : Thursday, February 7, 2019 - 2:59:59 PM
Long-term archiving on : Saturday, April 29, 2017 - 10:15:35 AM

File

main (1).pdf
Files produced by the author(s)

Licence


Public Domain

Identifiers

  • HAL Id : hal-01239140, version 1

Citation

Roxana-Ioana Roman, Bogdan Nicolae, Alexandru Costan, Gabriel Antoniu. Understanding Spark Performance in Hybrid and Multi-Site Clouds. BDAC-15 - 6th International Workshop on Big Data Analytics: Challenges and Opportunities (in conjunction with SC15) , Nov 2015, Austin, TX, United States. ⟨hal-01239140⟩

Share

Metrics

Record views

1175

Files downloads

817