Learning Linear Regression Models over Factorized Joins

Maximilian Schleich; Dan Olteanu; Radu Ciucanu

Communication Dans Un Congrès Année : 2016

Learning Linear Regression Models over Factorized Joins

(1) , (1) , (1)

Maximilian Schleich

Fonction : Auteur
PersonId : 983426

Department of Computer Science [Oxford]

Dan Olteanu

Fonction : Auteur
PersonId : 983427

Department of Computer Science [Oxford]

Radu Ciucanu

Fonction : Auteur
PersonId : 176966
IdHAL : radu-ciucanu
IdRef : 189245735

Department of Computer Science [Oxford]

Résumé

We investigate the problem of building least squares regression models over training datasets defined by arbitrary join queries on database tables. Our key observation is that joins entail a high degree of redundancy in both computation and data representation, which is not required for the end-to-end solution to learning over joins. We propose a new paradigm for computing batch gradient descent that exploits the factorized computation and representation of the training datasets, a rewriting of the regression objective function that decouples the computation of cofactors of model parameters from their convergence, and the commutativity of cofactor computation with relational union and projection. We introduce three flavors of this approach: F/FDB computes the cofactors in one pass over the materialized factorized join; F avoids this materialization and intermixes cofactor and join computation; F/SQL expresses this mixture as one SQL query. Our approach has the complexity of join factorization, which can be exponentially lower than of standard joins. Experiments with commercial, public, and synthetic datasets show that it outperforms MADlib, Python StatsModels, and R, by up to three orders of magnitude.

Domaines

Base de données [cs.DB]

Radu Ciucanu : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01330113

Soumis le : vendredi 10 juin 2016-00:29:47

Dernière modification le : mardi 14 février 2023-14:32:07

Dates et versions

hal-01330113 , version 1 (10-06-2016)

Identifiants

HAL Id : hal-01330113 , version 1

Citer

Maximilian Schleich, Dan Olteanu, Radu Ciucanu. Learning Linear Regression Models over Factorized Joins. ACM SIGMOD, Jun 2016, San Francisco, United States. ⟨hal-01330113⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

148 Consultations

0 Téléchargements

Learning Linear Regression Models over Factorized Joins

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Partager