Proactive Online Scheduling for Shuffle Grouping in Distributed Stream Processing Systems

Nicoló Rivetti 1, 2 Emmanuelle Anceaume 3 Yann Busnel 4, 5 Leonardo Querzoni 2 Bruno Sericola 5
1 GDD - Gestion de Données Distribuées [Nantes]
LINA - Laboratoire d'Informatique de Nantes Atlantique
3 CIDRE - Confidentialité, Intégrité, Disponibilité et Répartition
CentraleSupélec, Inria Rennes – Bretagne Atlantique , IRISA-D1 - SYSTÈMES LARGE ÉCHELLE
5 DIONYSOS - Dependability Interoperability and perfOrmance aNalYsiS Of networkS
Inria Rennes – Bretagne Atlantique , IRISA-D2 - RÉSEAUX, TÉLÉCOMMUNICATION ET SERVICES
Abstract : Shuffle grouping is a technique used by stream processing frameworks to share input load among parallel instances of stateless operators. With shuffle grouping each tuple of a stream can be assigned to any available operator instance, independently from any previous assignment. A common approach to implement shuffle grouping is to adopt a round robin policy, a simple solution that fares well as long as the tuple execution time is constant. However, such assumption rarely holds in real cases where execution time strongly depends on tuple content. As a consequence, parallel stateless operators within stream processing applications may experience unpredictable unbalance that, in the end, causes undesirable increase in tuple completion times. In this paper we propose Proactive Online Shuffle Grouping (POSG), a novel approach to shuffle grouping aimed at reducing the overall tuple completion time. POSG estimates the execution time of each tuple, enabling a proactive and online scheduling of input load to the target operator instances. Sketches are used to efficiently store the otherwise large amount of information required to schedule incoming load. We provide a probabilistic analysis and illustrate, through both simulations and a running prototype, its impact on stream processing applications.
Type de document :
Rapport
[Research Report] LINA-University of Nantes; Sapienza Università di Roma (Italie). 2015
Liste complète des métadonnées

Littérature citée [11 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01246701
Contributeur : Yann Busnel <>
Soumis le : lundi 4 janvier 2016 - 16:45:18
Dernière modification le : mercredi 11 avril 2018 - 02:01:18
Document(s) archivé(s) le : vendredi 15 avril 2016 - 16:22:55

Fichiers

main-tr.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01246701, version 2

Citation

Nicoló Rivetti, Emmanuelle Anceaume, Yann Busnel, Leonardo Querzoni, Bruno Sericola. Proactive Online Scheduling for Shuffle Grouping in Distributed Stream Processing Systems. [Research Report] LINA-University of Nantes; Sapienza Università di Roma (Italie). 2015. 〈hal-01246701v2〉

Partager

Métriques

Consultations de la notice

1348

Téléchargements de fichiers

206