Kera: A Unified Storage and Ingestion Architecture for Efficient Stream Processing

Abstract : Big Data applications are rapidly moving from a batch-oriented execution to a real-time model in order to extract value from the streams of data just as fast as they arrive. Such stream-based applications need to immediately ingest and analyze data and in many use cases combine live (i.e., real-time streams) and archived data in order to extract better insights. Current streaming architectures are designed with distinct components for ingestion (e.g., Kafka) and storage (e.g., HDFS) of stream data. Unfortunately, this separation is becoming an overhead especially when data needs to be archived for later analysis (i.e., near real-time): in such use cases, stream data has to be written twice to disk and may pass twice over high latency networks. Moreover, current ingestion mechanisms offer no support for searching the acquired streams in real time, an important requirement to promptly react to fast data. In this paper we describe the design of Kera: a unified storage and ingestion architecture that could better serve the specific needs of stream processing. We identify a set of design principles for stream-based Big Data processing that guide us in designing a novel architecture for streaming. We design Kera in order to reduce the storage and network utilization significantly, which can lead to reduced times for stream processing and archival. To this end, we propose a set of optimization techniques for handling streams with a log-structured (in memory and on disk) approach. On top of our envisioned architecture we devise the implementation of an efficient interface for data ingestion, processing, and storage (DIPS), an interplay between processing engines and smart storage systems, with the goal to reduce the end-to-end stream processing latency.
Type de document :
Rapport
[Research Report] RR-9074, INRIA Rennes - Bretagne Atlantique. 2017
Liste complète des métadonnées

Littérature citée [46 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01532070
Contributeur : Ovidiu-Cristian Marcu <>
Soumis le : vendredi 2 juin 2017 - 16:23:36
Dernière modification le : jeudi 11 janvier 2018 - 06:28:14
Document(s) archivé(s) le : mercredi 13 décembre 2017 - 09:11:02

Fichier

RR-9074.pdf
Fichiers produits par l'(les) auteur(s)

Licence


Copyright (Tous droits réservés)

Identifiants

  • HAL Id : hal-01532070, version 1

Citation

Ovidiu-Cristian Marcu, Alexandru Costan, Gabriel Antoniu, María S. Pérez-Hernández. Kera: A Unified Storage and Ingestion Architecture for Efficient Stream Processing. [Research Report] RR-9074, INRIA Rennes - Bretagne Atlantique. 2017. 〈hal-01532070〉

Partager

Métriques

Consultations de la notice

359

Téléchargements de fichiers

193