Reuse-based Optimization for Pig Latin

Abstract : Pig Latin is a popular language which is widely used for parallel processing of massive data sets. Currently, subexpres-sions occurring repeatedly in Pig Latin scripts are executed as many times as they appear, and the current Pig Latin optimizer does not identify reuse opportunities. We present a novel optimization approach aiming at identifying and reusing repeated subexpressions in Pig Latin scripts. Our optimization algorithm, named PigReuse, identifies subexpression merging opportunities, selects the best ones to execute based on a cost function, and reuses their results as needed in order to compute exactly the same output as the original scripts. Our experiments demonstrate the effectiveness of our approach.
Document type :
Conference papers
Complete list of metadatas

https://hal.inria.fr/hal-01425321
Contributor : Ioana Manolescu <>
Submitted on : Tuesday, January 3, 2017 - 3:03:30 PM
Last modification on : Thursday, June 13, 2019 - 11:34:02 AM
Long-term archiving on : Tuesday, April 4, 2017 - 1:52:47 PM

Files

paper-forHal.pdf
Files produced by the author(s)

Identifiers

Citation

Jesús Camacho-Rodríguez, Dario Colazzo, Melanie Herschel, Ioana Manolescu, Soudip Roy Chowdhury. Reuse-based Optimization for Pig Latin. 25th ACM International on Conference on Information and Knowledge Management, Oct 2016, Indianapolis, France. pp.2215 - 2220, ⟨10.1145/2983323.2983669⟩. ⟨hal-01425321⟩

Share

Metrics

Record views

692

Files downloads

328