Merge, Split, and Cluster: Dynamic Deployment of Stream Processing Applications - Archive ouverte HAL Access content directly
Reports (Research Report) Year : 2019

Merge, Split, and Cluster: Dynamic Deployment of Stream Processing Applications

(1) , (1)
1

Abstract

Stream Processing has become a major programming model to timely handle large volumes of data generated at the edge of the Internet. In this context, stream processing engines (SPE) are software tools easing the specification, deployment and monitoring of stream processing applications. Such applications are typically programmed as a directed acyclic graph (DAG) of operators to be applied on each data item. Yet, SPEs are mostly equipped to deploy one application at a time without seeking synergies between those applications. Yet, in many domains, the set of operators composing applications overlap for a non-negligible amount. We envision a platform on which applications are submitted dynamically, each new graph of operators potentially sharing some of them with the currently running operators. We assume the platform comprises compute nodes of homogeneous capacity. Provided a graph has to be deployed over multiple nodes, we need to minimize the inter-node traffic while guaranteeing that the capacity of a node is not exceeded. This paper presents the Merge, Split and Cluster approach: each time a new DAG of operators is submitted, i) its operators are first merged with the already running operators, ii) if an oper-ator's load thus created exceeds the nodes' capacity, the operators gets split into several instances, and iii) a clustering algorithms groups the operators of the resulting graph into clusters, each cluster being hosted by a single node so as to maximize intra-node traffic. The last phase is handled through a heuristic adapted from an optimal tree partitioning algorithm. The approach is validated through simulation. The results show that i) merging allows to drastically reduce the needs in computing resources. Secondly, and ii) that the heuristic provides an efficient clustering minimizing the intra-node traffic.
Fichier principal
Vignette du fichier
Clustering_CCGrid_2020.pdf (458.25 Ko) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

hal-02418771 , version 1 (19-12-2019)

Identifiers

  • HAL Id : hal-02418771 , version 1

Cite

Aymen Jlassi, Cédric Tedeschi. Merge, Split, and Cluster: Dynamic Deployment of Stream Processing Applications. [Research Report] Inria. 2019. ⟨hal-02418771⟩
48 View
127 Download

Share

Gmail Facebook Twitter LinkedIn More