Skip to Main content Skip to Navigation
Conference papers

Learning Linear Regression Models over Factorized Joins

Abstract : We investigate the problem of building least squares regression models over training datasets defined by arbitrary join queries on database tables. Our key observation is that joins entail a high degree of redundancy in both computation and data representation, which is not required for the end-to-end solution to learning over joins. We propose a new paradigm for computing batch gradient descent that exploits the factorized computation and representation of the training datasets, a rewriting of the regression objective function that decouples the computation of cofactors of model parameters from their convergence, and the commutativity of cofactor computation with relational union and projection. We introduce three flavors of this approach: F/FDB computes the cofactors in one pass over the materialized factorized join; F avoids this materialization and intermixes cofactor and join computation; F/SQL expresses this mixture as one SQL query. Our approach has the complexity of join factorization, which can be exponentially lower than of standard joins. Experiments with commercial, public, and synthetic datasets show that it outperforms MADlib, Python StatsModels, and R, by up to three orders of magnitude.
Document type :
Conference papers
Complete list of metadata

https://hal.inria.fr/hal-01330113
Contributor : Radu Ciucanu <>
Submitted on : Friday, June 10, 2016 - 12:29:47 AM
Last modification on : Friday, June 10, 2016 - 12:29:47 AM

Identifiers

  • HAL Id : hal-01330113, version 1

Citation

Maximilian Schleich, Dan Olteanu, Radu Ciucanu. Learning Linear Regression Models over Factorized Joins. ACM SIGMOD, Jun 2016, San Francisco, United States. ⟨hal-01330113⟩

Share

Metrics

Record views

259