Reuse-based Optimization for Pig Latin

Jesús Camacho-Rodríguez 1, 2 Dario Colazzo 3 Melanie Herschel 2, 1 Ioana Manolescu 1, 2 Soudip Roy Chowdhury 1, 2
1 OAK - Database optimizations and architectures for complex large data
LRI - Laboratoire de Recherche en Informatique, UP11 - Université Paris-Sud - Paris 11, Inria Saclay - Ile de France, CNRS - Centre National de la Recherche Scientifique : UMR8623
Abstract : Pig Latin has become a popular language within the data management community interested in the efficient parallel processing of large data volumes. The dataflow-style primi-tives of Pig Latin provide an intuitive way for users to write complex analytical queries, which are in turn compiled into MapReduce jobs. Currently, subexpressions occurring repeatedly in Pig Latin scripts are executed as many times as they occur, leading to avoidable MapReduce jobs. The current Pig Latin optimizer is not capable of recognizing, and thus optimizing, such repeated subexpressions. We present a novel approach for identifying and reusing common subexpressions occurring in Pig Latin scripts. In particular, we lay the foundation of our reuse-based algo-rithms by formalizing the semantics of the Pig Latin query language with extended nested relational algebra for bags. Our algorithm, named PigReuse, operates on the algebraic representations of Pig Latin scripts, identifies subexpression merging opportunities, selects the best ones to execute based on a cost function, and merges other equivalent expressions to share its result. Our experimental results demonstrate the efficiency and effectiveness of our reuse-based algorithms and optimization strategies.
Type de document :
Communication dans un congrès
BDA'2014: 30e journées Bases de Données Avancées, Oct 2014, Grenoble-Autrans, France. 2014
Liste complète des métadonnées

Littérature citée [26 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01086497
Contributeur : Soudip Roy Chowdhury <>
Soumis le : lundi 24 novembre 2014 - 14:39:31
Dernière modification le : lundi 28 mai 2018 - 14:38:02
Document(s) archivé(s) le : vendredi 14 avril 2017 - 20:34:55

Fichier

PigReuse-CR-BDA.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01086497, version 1

Collections

Citation

Jesús Camacho-Rodríguez, Dario Colazzo, Melanie Herschel, Ioana Manolescu, Soudip Roy Chowdhury. Reuse-based Optimization for Pig Latin. BDA'2014: 30e journées Bases de Données Avancées, Oct 2014, Grenoble-Autrans, France. 2014. 〈hal-01086497〉

Partager

Métriques

Consultations de la notice

526

Téléchargements de fichiers

297