PigReuse: A Reuse-based Optimizer for Pig Latin

Abstract : Pig Latin is a popular language which is widely used for parallel processing of massive data sets. Currently, subexpressions occurring repeatedly in Pig Latin scripts are executed as many times as they appear, and the current Pig Latin optimizer does not identify reuse opportunities. We present a novel optimization approach aiming at identifying and reusing repeated subexpressions in Pig Latin scripts. Our optimization algorithm, named PigReuse, operates on a particular algebraic representation of Pig Latin scripts. PigReuse identifies subexpression merging opportunities, selects the best ones to execute based on a cost function, and reuses their results as needed in order to compute exactly the same output as the original scripts. Our experiments demonstrate the effectiveness of our approach.
Document type :
Reports
Complete list of metadatas

Cited literature [28 references]  Display  Hide  Download

https://hal.inria.fr/hal-01353891
Contributor : Jesús Camacho-Rodríguez <>
Submitted on : Thursday, August 18, 2016 - 1:24:47 PM
Last modification on : Thursday, June 13, 2019 - 11:34:02 AM
Long-term archiving on : Saturday, November 19, 2016 - 8:31:18 PM

File

pigreuse-technical-report.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01353891, version 1

Collections

Citation

Jesús Camacho-Rodríguez, Dario Colazzo, Melanie Herschel, Ioana Manolescu, Soudip Roy Chowdhury. PigReuse: A Reuse-based Optimizer for Pig Latin. [Technical Report] Inria Saclay. 2016. ⟨hal-01353891⟩

Share

Metrics

Record views

626

Files downloads

386