Skip to Main content Skip to Navigation
Conference papers

DIsCO: DynamIc Data COmpression in Distributed Stream Processing Systems

Abstract : Supporting high throughput in Distributed Stream Processing Systems (DSPSs) has been an important goal in recent years. Current works either focus on automatically increasing the system resources whenever the current setup is inadequate or apply load shedding techniques discarding some of the incoming data. However, both approaches have significant shortcomings as they require on the fly application reconfiguration where the application needs to be stopped and re-uploaded in the cluster with the new configurations, and can lead to significant information loss. One approach that has not yet been considered for improving the throughput of DSPSs is exploiting compression algorithms to minimize the communication overhead between components especially in cases where we have large-sized data like live CCTV camera reports. This work is the first that provides a novel framework, built on top of Apache Storm, which enables dynamic compression of incoming streaming data. Our approach uses a profiling algorithm to automatically determine the compression algorithm that should be applied and supports both lossless and lossy compression techniques. Furthermore, we propose a novel algorithm for determining when profiling should be applied. Finally, our detailed experimental evaluation with commonly used stream processing applications, indicates a clear improvement on the applications’ throughput when our proposed techniques are applied.
Complete list of metadata

Cited literature [17 references]  Display  Hide  Download

https://hal.inria.fr/hal-01800129
Contributor : Hal Ifip <>
Submitted on : Friday, May 25, 2018 - 3:17:53 PM
Last modification on : Friday, May 25, 2018 - 3:50:02 PM
Long-term archiving on: : Sunday, August 26, 2018 - 1:49:57 PM

File

450046_1_En_2_Chapter.pdf
Files produced by the author(s)

Licence


Distributed under a Creative Commons Attribution 4.0 International License

Identifiers

Citation

Nikos Zacheilas, Vana Kalogeraki. DIsCO: DynamIc Data COmpression in Distributed Stream Processing Systems. 17th IFIP International Conference on Distributed Applications and Interoperable Systems (DAIS), Jun 2017, Neuchâtel, Switzerland. pp.19-33, ⟨10.1007/978-3-319-59665-5_2⟩. ⟨hal-01800129⟩

Share

Metrics