Skip to Main content Skip to Navigation
Conference papers

Reuse-based Optimization for Pig Latin

Abstract : Pig Latin is a popular language which is widely used for parallel processing of massive data sets. Currently, subexpres-sions occurring repeatedly in Pig Latin scripts are executed as many times as they appear, and the current Pig Latin optimizer does not identify reuse opportunities. We present a novel optimization approach aiming at identifying and reusing repeated subexpressions in Pig Latin scripts. Our optimization algorithm, named PigReuse, identifies subexpression merging opportunities, selects the best ones to execute based on a cost function, and reuses their results as needed in order to compute exactly the same output as the original scripts. Our experiments demonstrate the effectiveness of our approach.
Document type :
Conference papers
Complete list of metadatas
Contributor : Ioana Manolescu <>
Submitted on : Tuesday, January 3, 2017 - 3:03:30 PM
Last modification on : Wednesday, January 6, 2021 - 11:30:12 AM
Long-term archiving on: : Tuesday, April 4, 2017 - 1:52:47 PM


Files produced by the author(s)



Jesús Camacho-Rodríguez, Dario Colazzo, Melanie Herschel, Ioana Manolescu, Soudip Roy Chowdhury. Reuse-based Optimization for Pig Latin. 25th ACM International on Conference on Information and Knowledge Management, Oct 2016, Indianapolis, France. pp.2215 - 2220, ⟨10.1145/2983323.2983669⟩. ⟨hal-01425321⟩



Record views


Files downloads