Query enumeration and nowhere dense graphs

Alexandre Vigny 1, 2
1 VALDA - Value from Data
DI-ENS - Département d'informatique de l'École normale supérieure, Inria de Paris
Abstract : Query evaluation is one of the most central tasks of a database system, and a vast amount of literature is devoted to the complexity of this problem. Given a database D and a query q, the goal is to compute the set q(D) of all solutions for q over D. Unfortunately, the set q(D) might be much bigger than the database itself, as the number of solutions may be exponential in the arity of the query. It can therefore be insufficient to measure the complexity of answering q on D only in terms of the total time needed to compute the complete result set q(D). One can imagine many scenarios to overcome this situation. We could for instance only want to compute the number of solutions or just compute the k most relevant solutions relative to some ranking function. In this thesis we mainly consider the complexity of enumerating the set q(D), i.e., generating one by one all the solutions for q on D. In this context two parameters play an important role. The first one is the preprocessing time, i.e. the time it takes to produce the first solution. The second one is the delay, i.e. the maximum time between the output of any two consecutive solutions. An enumeration algorithm is then said to be efficient if these two parameters are small. For the delay, the best we can hope for is constant time: depending only on the query and independent from the size of the database. For the preprocessing time an ideal goal would be linear time: linear in the size of the database with a constant factor depending on the query. When both are achieved we say that the query can be enumerated with constant delay after linear preprocessing. A special case of enumeration is when the query is boolean. In this case the prepro- cessing computes the answer to the query (yes or no). In order to be able to enumerate queries of a given language efficiently, it is therefore necessary to be able to solve the boolean case efficiently. It has been shown recently that boolean first-order (FO) queries can be evaluated in pseudo-linear time over nowhere dense classes of databases [43]. The notion of nowhere dense classes was introduced in [60] as a formalization of classes of “sparse” graphs and generalizes many well known classes of databases. Among classes of databases that are closed under subdatabases, the nowhere dense classes are the largest possible classes enjoying efficient evaluation of FO queries [55] (modulo an assumption in parameterized complexity theory). It has also been shown that over nowhere dense classes of databases, counting the number of solutions to a given FO query can be achieved in pseudo-linear time [45]. In this thesis, we show that enumerating FO queries on nowhere dense classes of databases can be done with constant delay after pseudo-linear preprocessing. This com- pletes the picture of the complexity of FO query evaluation on nowhere dense classes and, due to the above mentioned result of [55], on all classes that are closed under sub- databases. We also show that for any nowhere dense class of databases, given a FO query q and a database D in the class, after a pseudo-linear time preprocessing we can test in constant time whether an arbitrary input tuple belongs to the result set q(D).
Document type :
Theses
Complete list of metadatas

Cited literature [96 references]  Display  Hide  Download

https://hal.inria.fr/tel-01963540
Contributor : Pierre Senellart <>
Submitted on : Friday, December 21, 2018 - 1:19:31 PM
Last modification on : Tuesday, May 14, 2019 - 11:02:58 AM
Long-term archiving on : Friday, March 22, 2019 - 4:07:11 PM

File

these-vigny.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : tel-01963540, version 1

Citation

Alexandre Vigny. Query enumeration and nowhere dense graphs. Databases [cs.DB]. Université Paris-Diderot, 2018. English. ⟨tel-01963540⟩

Share

Metrics

Record views

129

Files downloads

113