Database Optimization Techniques for Semantic Queries

Ioana Manolescu

Résumé

Techniques for efficiently managing Semantic Web data have attracted significant interest from the data management and knowledge representation communities. In particular, as RDF is the most widely used model for Semantic Web data, a great deal of effort has been invested, especially in the database community, into algorithms and tools for efficient RDF query evaluation. Semantic Web data can be seen as a colection of facts enriched with ontological schemas, or semantic constraints, based on which reasoning can be applied to infer new information. Taking into account this implicit information is required in order to produce complete answers to queries. The difficulty in doing so depends on the expressive power of the constraints being used to describe the semantics of the data. One of the simplest constraint languages currently used in conjunction with RDF databases is RDF Schema (RDFS, in short), whose core consists of the rdfs:subClassOf, rdfs:subPropertyOf, rdfs:domain and rdfs:range predefined predicates, which allow characterizing the relationships between classes (unary relations) and properties (binary relations). More expressive formal constraints languages can be found in the DL-Lite family [6], the Datalog ± dialect [5] etc. The literature provides two classes of techniques for implementing reasoning , namely query reformulation and database saturation. The former consists of compiling the constraints into the query, making it syntactically more complex , while the latter compiles the constraints into the data, i.e., it adds all the consequences of the facts and the constraints to the database. The performance of these techniques depends on the expressive power of the ontological schema language, as well as on the characteristics of the data and queries. While saturation appears simple and robust, it is not always feasible and it may also perform poorly, especially in a distributed setting. Efficient query answering through FOL reformulation This talk describes some of our recent work (joint with François Goasdoué from Université de Rennes 1, France, and our PhD students Damian Bursztyn and Alexandra Roati¸n the topic of efficiently evaluating reformulated queries by adequately translating them to first-order logic (FOL). This translation enables their evaluation by relational database management systems (RDBMSs), leveraging thus their efficient storage and processing engines. Our first goal was to provide a FOL query reformulation algorithm for an expressive fragment of RDF in the presence of RDFS constraints. A difficulty in this context is due to the peculiar features of RDF, in particular the presence of the so-called blank nodes which can be seen as encoding a form of incomplete information [1]; further, going beyond the expressive power of typical DL dialects, RDF queries may span over both the data (facts) and the schema (constraints). In [7], we have identified the database fragment of RDF, extending the expressive power of previously formalized RDF fragments, in particular by the usage of blank nodes in the schema, and have provided a reformulation-based query answering algorithm for this fragment. Intuitively, the algorithm is based on intelligently exploiting the query and the schema constraints in order to obtain an SQL query whose evaluation through a standard RDBMS yields the query answer. Next, we have considered the issue of efficiently evaluating queries reformu-lated under RDFS constraints [4]. For relatively simple queries and constraints, such reformulated queries can be large – unions of hundreds, if not thousands of conjunctive queries (we call such unions UCQs, in short). It turns out that modern efficient RDBMSs are extremely poor at efficiently evaluating such huge UCQs, when they do not simply fail to evaluate them. To make it possible (or more efficient) to evaluate such reformulations, we have devised a cost-based optimization approach whereas we explore a set of syntactically equivalent, alternative Joins of Unions of Conjunctive Queries (JUCQ, in short) FOL refor-mulations of the original query. We estimate the performance of evaluating such alternative JUCQs through the RDBMS engine, and pick the one promising to lead to the best evaluation performance. We have shown that this technique improves reformulated query performance by up to four orders of magnitude.

Database Optimization Techniques for Semantic Queries

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager