KerA: Scalable Data Ingestion for Stream Processing

Abstract : Big Data applications are increasingly moving from batch-oriented execution models to stream-based models that enable them to extract valuable insights close to real-time. To support this model, an essential part of the streaming processing pipeline is data ingestion, i.e., the collection of data from various sources (sensors, NoSQL stores, filesystems, etc.) and their delivery for processing. Data ingestion needs to support high throughput, low latency and must scale to a large number of both data producers and consumers. Since the overall performance of the whole stream processing pipeline is limited by that of the ingestion phase, it is critical to satisfy these performance goals. However, state-of-art data ingestion systems such as Apache Kafka build on static stream partitioning and offset-based record access, trading performance for design simplicity. In this paper we propose KerA, a data ingestion framework that alleviate the limitations of state-of-art thanks to a dynamic partitioning scheme and to lightweight indexing, thereby improving throughput, latency and scalability. Experimental evaluations show that KerA outperforms Kafka up to 4x for ingestion throughput and up to 5x for the overall stream processing throughput. Furthermore, they show that KerA is capable of delivering data fast enough to saturate the big data engine acting as the consumer.
Type de document :
Communication dans un congrès
ICDCS 2018 - 38th IEEE International Conference on Distributed Computing Systems, Jul 2018, Vienna, Austria. IEEE, pp.1480-1485, 2018, 〈http://icdcs2018.ocg.at/〉. 〈10.1109/ICDCS.2018.00152〉
Liste complète des métadonnées

Littérature citée [10 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01773799
Contributeur : Ovidiu-Cristian Marcu <>
Soumis le : lundi 23 avril 2018 - 10:47:11
Dernière modification le : jeudi 15 novembre 2018 - 11:58:57
Document(s) archivé(s) le : mercredi 19 septembre 2018 - 00:29:43

Fichier

ICDCS_2018_paper_732.pdf
Fichiers produits par l'(les) auteur(s)

Licence


Copyright (Tous droits réservés)

Identifiants

Citation

Ovidiu-Cristian Marcu, Alexandru Costan, Gabriel Antoniu, María Pérez-Hernández, Bogdan Nicolae, et al.. KerA: Scalable Data Ingestion for Stream Processing. ICDCS 2018 - 38th IEEE International Conference on Distributed Computing Systems, Jul 2018, Vienna, Austria. IEEE, pp.1480-1485, 2018, 〈http://icdcs2018.ocg.at/〉. 〈10.1109/ICDCS.2018.00152〉. 〈hal-01773799〉

Partager

Métriques

Consultations de la notice

533

Téléchargements de fichiers

457