Learning Join Queries from User Examples

Angela Bonifati 1, 2, 3 Radu Ciucanu 4, 3, * Slawomir Staworko 5, 3
* Auteur correspondant
1 BD - Base de Données
LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information
3 LINKS - Linking Dynamic Data
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
Abstract : We investigate the problem of learning join queries from user examples. The user is presented with a set of candidate tuples and is asked to label them as positive or negative examples, depending on whether or not she would like the tuples as part of the join result. The goal is to quickly infer an arbitrary n-ary join predicate across an arbitrary number m of relations while keeping the number of user interactions as minimal as possible. We assume no prior knowledge of the integrity constraints across the involved relations. Inferring the join predicate across multiple relations when the referential constraints are unknown may occur in several applications, such as data integration, reverse engineering of database queries, and schema inference. In such scenarios, the number of tuples involved in the join is typically large. We introduce a set of strategies that let us inspect the search space and aggressively prune what we call uninformative tuples, and we directly present to the user the informative ones that is, those that allow the user to quickly find the goal query she has in mind. In this article, we focus on the inference of joins with equality predicates and also allow disjunctive join predicates and projection in the queries. We precisely characterize the frontier between tractability and intractability for the following problems of interest in these settings: consistency checking, learnability, and deciding the informativeness of a tuple. Next, we propose several strategies for presenting tuples to the user in a given order that allows minimization of the number of interactions. We show the efficiency of our approach through an experimental study on both benchmark and synthetic datasets.
Type de document :
Article dans une revue
ACM Transactions on Database Systems, Association for Computing Machinery, 2016, 40 (4), pp.24:1--24:38. 〈http://dl.acm.org/citation.cfm?id=2818637〉
Liste complète des métadonnées

https://hal.inria.fr/hal-01187986
Contributeur : Radu Ciucanu <>
Soumis le : vendredi 28 août 2015 - 12:05:17
Dernière modification le : mercredi 19 septembre 2018 - 09:59:57

Identifiants

  • HAL Id : hal-01187986, version 1

Citation

Angela Bonifati, Radu Ciucanu, Slawomir Staworko. Learning Join Queries from User Examples. ACM Transactions on Database Systems, Association for Computing Machinery, 2016, 40 (4), pp.24:1--24:38. 〈http://dl.acm.org/citation.cfm?id=2818637〉. 〈hal-01187986〉

Partager

Métriques

Consultations de la notice

488