Merge, Split, and Cluster: Dynamic Deployment of Stream Processing Applications - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Rapport (Rapport De Recherche) Année : 2019

Merge, Split, and Cluster: Dynamic Deployment of Stream Processing Applications

Résumé

Stream Processing has become a major programming model to timely handle large volumes of data generated at the edge of the Internet. In this context, stream processing engines (SPE) are software tools easing the specification, deployment and monitoring of stream processing applications. Such applications are typically programmed as a directed acyclic graph (DAG) of operators to be applied on each data item. Yet, SPEs are mostly equipped to deploy one application at a time without seeking synergies between those applications. Yet, in many domains, the set of operators composing applications overlap for a non-negligible amount. We envision a platform on which applications are submitted dynamically, each new graph of operators potentially sharing some of them with the currently running operators. We assume the platform comprises compute nodes of homogeneous capacity. Provided a graph has to be deployed over multiple nodes, we need to minimize the inter-node traffic while guaranteeing that the capacity of a node is not exceeded. This paper presents the Merge, Split and Cluster approach: each time a new DAG of operators is submitted, i) its operators are first merged with the already running operators, ii) if an oper-ator's load thus created exceeds the nodes' capacity, the operators gets split into several instances, and iii) a clustering algorithms groups the operators of the resulting graph into clusters, each cluster being hosted by a single node so as to maximize intra-node traffic. The last phase is handled through a heuristic adapted from an optimal tree partitioning algorithm. The approach is validated through simulation. The results show that i) merging allows to drastically reduce the needs in computing resources. Secondly, and ii) that the heuristic provides an efficient clustering minimizing the intra-node traffic.
Fichier principal
Vignette du fichier
Clustering_CCGrid_2020.pdf (458.25 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-02418771 , version 1 (19-12-2019)

Identifiants

  • HAL Id : hal-02418771 , version 1

Citer

Aymen Jlassi, Cédric Tedeschi. Merge, Split, and Cluster: Dynamic Deployment of Stream Processing Applications. [Research Report] Inria. 2019. ⟨hal-02418771⟩
50 Consultations
162 Téléchargements

Partager

Gmail Facebook X LinkedIn More