Skip to Main content Skip to Navigation
New interface
Conference papers

Exploring Shared State in Key-Value Store for Window-Based Multi-Pattern Streaming Analytics

Abstract : We are now witnessing an unprecedented growth of data that needs to be processed at always increasing rates in order to extract valuable insights. Big Data streaming analytics tools have been developed to cope with the online dimension of data processing: they enable real-time handling of live data sources by means of stateful aggregations (operators). Current state-of-art frameworks (e.g. Apache Flink [1]) enable each operator to work in isolation by creating data copies, at the expense of increased memory utilization. In this paper, we explore the feasibility of deduplication techniques to address the challenge of reducing memory footprint for window-based stream processing without significant impact on performance. We design a deduplication method specifically for window-based operators that rely on key-value stores to hold a shared state. We experiment with a synthetically generated workload while considering several deduplication scenarios and based on the results, we identify several potential areas of improvement. Our key finding is that more fine-grained interactions between streaming engines and (key-value) stores need to be designed in order to better respond to scenarios that have to overcome memory scarcity.
Complete list of metadata

Cited literature [16 references]  Display  Hide  Download
Contributor : Ovidiu-Cristian Marcu Connect in order to contact the contributor
Submitted on : Wednesday, May 31, 2017 - 6:29:39 PM
Last modification on : Friday, August 5, 2022 - 2:54:52 PM
Long-term archiving on: : Wednesday, September 6, 2017 - 5:28:14 PM


Files produced by the author(s)



Ovidiu-Cristian Marcu, Radu Tudoran, Bogdan Nicolae, Alexandru Costan, Gabriel Antoniu, et al.. Exploring Shared State in Key-Value Store for Window-Based Multi-Pattern Streaming Analytics. Workshop on the Integration of Extreme Scale Computing and Big Data Management and Analytics in conjunction with IEEE/ACM CCGrid 2017, May 2017, Madrid, Spain. ⟨10.1109/ccgrid.2017.126⟩. ⟨hal-01530744⟩



Record views


Files downloads