Storage and Ingestion Systems in Support of Stream Processing: A Survey

Abstract : Under the pressure of massive, exponentially increasing amounts of heterogeneous data that are generated faster and faster, Big Data analytics applications have seen a shift from batch processing to stream processing, which can reduce the time needed to obtain meaningful insight dramatically. Stream processing is particularly well suited to address the challenges of fog/edge computing: much of this massive data comes from Internet of Things (IoT) devices and needs to be continuously funneled through an edge infrastructure towards centralized clouds. Thus, it is only natural to process data on their way as much as possible rather than wait for streams to accumulate on the cloud. Unfortunately, state-of-the-art stream processing systems are not well suited for this role: the data are accumulated (ingested), processed and persisted (stored) separately, often using different services hosted on different physical machines/clusters. Furthermore, there is only limited support for advanced data manipulations, which often forces application developers to introduce custom solutions and workarounds. In this survey article, we characterize the main state-of-the-art stream storage and ingestion systems. We identify the key aspects and discuss limitations and missing features in the context of stream processing for fog/edge and cloud computing. The goal is to help practitioners understand and prepare for potential bottlenecks when using such state-of-the-art systems. In particular, we discuss both functional (partitioning, metadata, search support, message routing, backpressure support) and non-functional aspects (high availability, durability, scalability, latency vs. throughput). As a conclusion of our study, we advocate for a unified stream storage and ingestion system to speed-up data management and reduce I/O redundancy (both in terms of storage space and network utilization).
Complete list of metadatas

Cited literature [105 references]  Display  Hide  Download

https://hal.inria.fr/hal-01939280
Contributor : Ovidiu-Cristian Marcu <>
Submitted on : Friday, December 14, 2018 - 3:22:16 PM
Last modification on : Friday, September 13, 2019 - 9:51:33 AM

File

RT-0501v2.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01939280, version 2

Citation

Ovidiu-Cristian Marcu, Alexandru Costan, Gabriel Antoniu, María Pérez-Hernández, Radu Tudoran, et al.. Storage and Ingestion Systems in Support of Stream Processing: A Survey. [Technical Report] RT-0501, INRIA Rennes - Bretagne Atlantique and University of Rennes 1, France. 2018, pp.1-33. ⟨hal-01939280v2⟩

Share

Metrics

Record views

146

Files downloads

379