Exploring Shared State in Key-Value Store for Window-Based Multi-Pattern Streaming Analytics

Abstract : We are now witnessing an unprecedented growth of data that needs to be processed at always increasing rates in order to extract valuable insights. Big Data streaming analytics tools have been developed to cope with the online dimension of data processing: they enable real-time handling of live data sources by means of stateful aggregations (operators). Current state-of-art frameworks (e.g. Apache Flink [1]) enable each operator to work in isolation by creating data copies, at the expense of increased memory utilization. In this paper, we explore the feasibility of deduplication techniques to address the challenge of reducing memory footprint for window-based stream processing without significant impact on performance. We design a deduplication method specifically for window-based operators that rely on key-value stores to hold a shared state. We experiment with a synthetically generated workload while considering several deduplication scenarios and based on the results, we identify several potential areas of improvement. Our key finding is that more fine-grained interactions between streaming engines and (key-value) stores need to be designed in order to better respond to scenarios that have to overcome memory scarcity.
Complete list of metadatas

Cited literature [16 references]  Display  Hide  Download

https://hal.inria.fr/hal-01530744
Contributor : Ovidiu-Cristian Marcu <>
Submitted on : Wednesday, May 31, 2017 - 6:29:39 PM
Last modification on : Friday, September 13, 2019 - 9:51:33 AM
Long-term archiving on : Wednesday, September 6, 2017 - 5:28:14 PM

File

PID4664669.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01530744, version 1

Citation

Ovidiu-Cristian Marcu, Radu Tudoran, Bogdan Nicolae, Alexandru Costan, Gabriel Antoniu, et al.. Exploring Shared State in Key-Value Store for Window-Based Multi-Pattern Streaming Analytics. Workshop on the Integration of Extreme Scale Computing and Big Data Management and Analytics in conjunction with IEEE/ACM CCGrid 2017, May 2017, Madrid, Spain. ⟨hal-01530744⟩

Share

Metrics

Record views

614

Files downloads

1210