Skip to Main content Skip to Navigation
Reports

Merge, Split, and Cluster: Dynamic Deployment of Stream Processing Applications

Aymen Jlassi 1 Cédric Tedeschi 1
1 MYRIADS - Design and Implementation of Autonomous Distributed Systems
Inria Rennes – Bretagne Atlantique , IRISA-D1 - SYSTÈMES LARGE ÉCHELLE
Abstract : Stream Processing has become a major programming model to timely handle large volumes of data generated at the edge of the Internet. In this context, stream processing engines (SPE) are software tools easing the specification, deployment and monitoring of stream processing applications. Such applications are typically programmed as a directed acyclic graph (DAG) of operators to be applied on each data item. Yet, SPEs are mostly equipped to deploy one application at a time without seeking synergies between those applications. Yet, in many domains, the set of operators composing applications overlap for a non-negligible amount. We envision a platform on which applications are submitted dynamically, each new graph of operators potentially sharing some of them with the currently running operators. We assume the platform comprises compute nodes of homogeneous capacity. Provided a graph has to be deployed over multiple nodes, we need to minimize the inter-node traffic while guaranteeing that the capacity of a node is not exceeded. This paper presents the Merge, Split and Cluster approach: each time a new DAG of operators is submitted, i) its operators are first merged with the already running operators, ii) if an oper-ator's load thus created exceeds the nodes' capacity, the operators gets split into several instances, and iii) a clustering algorithms groups the operators of the resulting graph into clusters, each cluster being hosted by a single node so as to maximize intra-node traffic. The last phase is handled through a heuristic adapted from an optimal tree partitioning algorithm. The approach is validated through simulation. The results show that i) merging allows to drastically reduce the needs in computing resources. Secondly, and ii) that the heuristic provides an efficient clustering minimizing the intra-node traffic.
Complete list of metadatas

Cited literature [24 references]  Display  Hide  Download

https://hal.inria.fr/hal-02418771
Contributor : Cédric Tedeschi <>
Submitted on : Thursday, December 19, 2019 - 10:13:55 AM
Last modification on : Saturday, July 11, 2020 - 3:15:17 AM
Long-term archiving on: : Friday, March 20, 2020 - 2:13:11 PM

File

Clustering_CCGrid_2020.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02418771, version 1

Citation

Aymen Jlassi, Cédric Tedeschi. Merge, Split, and Cluster: Dynamic Deployment of Stream Processing Applications. [Research Report] Inria. 2019. ⟨hal-02418771⟩

Share

Metrics

Record views

68

Files downloads

232