Experimental Study on the Performance and Resource Utilization of Data Streaming Frameworks

Subarna Chatterjee 1 Christine Morin 1
1 MYRIADS - Design and Implementation of Autonomous Distributed Systems
Inria Rennes – Bretagne Atlantique , IRISA-D1 - SYSTÈMES LARGE ÉCHELLE
Abstract : With the advent of the Internet of Things (IoT), data stream processing have gained increased attention due to the ever-increasing need to process heterogeneous and voluminous data streams. This work addresses the problem of selecting a correct stream processing framework for a given application to be executed within a specific physical infrastructure. For this purpose, we focus on a thorough comparative analysis of three data stream processing platforms – Apache Flink, Apache Storm, and Twitter Heron (the enhanced version of Apache Storm), that are chosen based on their potential to process both streams and batches in real-time. The goal of the work is to enlighten the cloud-clients and the cloud-providers with the knowledge of the choice of the resource-efficient and requirement-adaptive streaming platform for a given application so that they can plan during allocation or assignment of Virtual Machines for application execution. For the comparative performance analysis of the chosen platforms, we have experimented using 8-node clusters on Grid5000 experimentation testbed and have selected a wide variety of applications ranging from a conventional benchmark to sensor-based IoT application and statistical batch processing application. In addition to the various performance metrics related to the elasticity and resource usage of the platforms, this work presents a comparative study of the " green-ness " of the streaming platforms by analyzing their power consumption – one of the first attempts of its kind. The obtained results are thoroughly analyzed to illustrate the functional behavior of these platforms under different computing scenarios.
Complete list of metadatas

Cited literature [17 references]  Display  Hide  Download

https://hal.inria.fr/hal-01823697
Contributor : Guillaume Pierre <>
Submitted on : Tuesday, June 26, 2018 - 12:56:25 PM
Last modification on : Friday, September 13, 2019 - 9:51:33 AM
Long-term archiving on : Wednesday, September 26, 2018 - 10:10:56 PM

File

1st.pdf
Files produced by the author(s)

Identifiers

Citation

Subarna Chatterjee, Christine Morin. Experimental Study on the Performance and Resource Utilization of Data Streaming Frameworks. CCGrid 2018 - 18th IEEE/ACM Symposium on Cluster, Cloud and Grid Computing, May 2018, Washington, DC, United States. pp.143-152, ⟨10.1109/CCGRID.2018.00029⟩. ⟨hal-01823697⟩

Share

Metrics

Record views

497

Files downloads

313