Skip to Main content Skip to Navigation
Conference papers

Reuse-based Optimization for Pig Latin

Abstract : Pig Latin is a popular language which is widely used for parallel processing of massive data sets. Currently, subexpres-sions occurring repeatedly in Pig Latin scripts are executed as many times as they appear, and the current Pig Latin optimizer does not identify reuse opportunities. We present a novel optimization approach aiming at identifying and reusing repeated subexpressions in Pig Latin scripts. Our optimization algorithm, named PigReuse, identifies subexpression merging opportunities, selects the best ones to execute based on a cost function, and reuses their results as needed in order to compute exactly the same output as the original scripts. Our experiments demonstrate the effectiveness of our approach.
Document type :
Conference papers
Complete list of metadata
Contributor : Ioana Manolescu Connect in order to contact the contributor
Submitted on : Tuesday, January 3, 2017 - 3:03:30 PM
Last modification on : Tuesday, January 25, 2022 - 8:30:03 AM
Long-term archiving on: : Tuesday, April 4, 2017 - 1:52:47 PM


Files produced by the author(s)



Jesús Camacho-Rodríguez, Dario Colazzo, Melanie Herschel, Ioana Manolescu, Soudip Roy Chowdhury. Reuse-based Optimization for Pig Latin. 25th ACM International on Conference on Information and Knowledge Management, Oct 2016, Indianapolis, United States. pp.2215 - 2220, ⟨10.1145/2983323.2983669⟩. ⟨hal-01425321⟩



Les métriques sont temporairement indisponibles