Decentralized Scaling for Stream Processing Engines

Mehdi Belkhiria 1 Cédric Tedeschi 1
1 MYRIADS - Design and Implementation of Autonomous Distributed Systems
Inria Rennes – Bretagne Atlantique , IRISA-D1 - SYSTÈMES LARGE ÉCHELLE
Abstract : Stream Processing was recently introduced as a paradigm to easily develop and deploy applications targeting the near real-time processing of data getting continuously produced. Stream Processing engines such as Storm, Flink or Spark Streaming are today regarded as mature software platforms offering high-level programming abstraction easing the development of stream processing applications. Moreover, they ease the deployment of such applications over computing facilities such as clusters or clouds. Autoscaling, i.e., the ability to scale elastically and autonomously according to the incoming load (in our case the input data stream) gained recently momentum in the stream processing context. Autoscaling in this context saw the design and implementation of different mechanisms to be able to finely adapt the amount of computing resources dedicated to each step of the processing. Yet, with the advent of new platforms gathering smaller resources but at a larger geographic scale, as in the Fog computing paradigm, the need for decentralizing such autoscaling mechanisms appears, a topic which, to our knowledge, has been mostly ignored. In this paper, we tackle the problem of decentralized autoscaling for stream processing applications. Based on the common assumption that a stream processing application is a combination of operators (in a pipeline) potentially distributed over distant compute nodes, we provide a protocol letting each operator takes its own scaling decisions based on local information. While each operator maintains only a view on its neighbors in the pipeline, our protocol is able to ensure that all operators keep a consistent view while these neighbors are scaled out or scaled in in response to load variations, ensuring global correctness, e.g. that no data record is lost due to an outdated connection to a deleted neighbor. Our protocol is presented in detail, and its correctness validated. Its performances are captured through both analysis and simulation experiments.
Document type :
Preprints, Working Papers, ...
Complete list of metadatas

Cited literature [17 references]  Display  Hide  Download

https://hal.inria.fr/hal-02127609
Contributor : Cédric Tedeschi <>
Submitted on : Monday, May 13, 2019 - 3:47:44 PM
Last modification on : Friday, September 13, 2019 - 9:51:33 AM

File

rr-distributed-scaling.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02127609, version 1

Citation

Mehdi Belkhiria, Cédric Tedeschi. Decentralized Scaling for Stream Processing Engines. 2019. ⟨hal-02127609⟩

Share

Metrics

Record views

133

Files downloads

376