HAP: Building Pipelines with Heterogeneous Data and Hive
Abstract
The increasing number of available datasets gives opportunities to build large and complex applications which aggregate results coming from several sources. These emerging usecases require new systems where combinations of heterogeneous sources are both allowed and efficient.
To tackle these challenges, we provide a simple high-level set of primitives – called HAP – to easily describe processing chains. These descriptions are then compiled into optimized SQL queries executed by Hive.
Origin : Files produced by the author(s)
Loading...