Skip to Main content Skip to Navigation
New interface
Conference papers

Understanding Spark Performance in Hybrid and Multi-Site Clouds

Abstract : Recently, hybrid multi-site big data analytics (that combines on-premise with off-premise resources) has gained increasing popularity as a tool to process large amounts of data on-demand, without additional capital investment to increase the size of a single datacenter. However, making the most out of hybrid setups for big data analytics is challenging because on-premise resources can communicate with off-premise resources at significantly lower throughput and higher latency. Understanding the impact of this aspect is not trivial, especially in the context of modern big data an-alytics frameworks that introduce complex communication patterns and are optimized to overlap communication with computation in order to hide data transfer latencies. This paper contributes with a work-in-progress study that aims to identify and explain this impact in relationship to the known behavior on a single cloud. To this end, it analyses a representative big data workload on a hybrid Spark setup. Unlike previous experience that emphasized low end-impact of network communications in Spark, we found significant overhead in the shuffle phase when the bandwidth between the on-premise and off-premise resources is sufficiently small.
Complete list of metadata

Cited literature [20 references]  Display  Hide  Download
Contributor : Alexandru Costan Connect in order to contact the contributor
Submitted on : Monday, December 14, 2015 - 2:14:39 PM
Last modification on : Friday, November 18, 2022 - 9:27:28 AM
Long-term archiving on: : Saturday, April 29, 2017 - 10:15:35 AM


main (1).pdf
Files produced by the author(s)


Public Domain


  • HAL Id : hal-01239140, version 1


Roxana-Ioana Roman, Bogdan Nicolae, Alexandru Costan, Gabriel Antoniu. Understanding Spark Performance in Hybrid and Multi-Site Clouds. BDAC-15 - 6th International Workshop on Big Data Analytics: Challenges and Opportunities (in conjunction with SC15) , Nov 2015, Austin, TX, United States. ⟨hal-01239140⟩



Record views


Files downloads