Optimizing SPARQL query evaluation with a worst-case cardinality estimation based on statistics on the data

Louis Jachiet 1 Pierre Genevès 1 Nabil Layaïda 1
1 TYREX - Types and Reasoning for the Web
Inria Grenoble - Rhône-Alpes, LIG - Laboratoire d'Informatique de Grenoble
Abstract : SPARQL is the w3c standard query language for querying data expressed in the Resource Description Framework (rdf). There exists a variety of sparql evaluation schemes and, in many of them, estimating the cardinality of intermediate results is key for performance, especially when the computation is distributed and the datasets very large. For example it helps in choosing join orders that minimize the size of intermediate subquery results. In this context, we propose a new cardinality estimation based on statistics about the data. Our cardinality estimation is a worst-case analysis tailored for sparql and capable of taking advantage of the implicit schema often present in rdf datasets (e.g. functional dependencies). This implicit schema is captured by statistics therefore our method does not need for the schema to be explicit or perfect (our system performs well even if there are a few " violations " of these implicit dependencies). We implemented our cardinality estimation and used it to optimize the evaluation of sparql queries: equipped with our cardinality estimation, the query evaluator performs better against most queries (sometimes by an order of magnitude) and is only ever slightly slower.
Document type :
Preprints, Working Papers, ...
Liste complète des métadonnées

Cited literature [16 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01524387
Contributor : Tyrex Equipe <>
Submitted on : Thursday, May 18, 2017 - 9:57:46 AM
Last modification on : Thursday, October 11, 2018 - 8:48:04 AM

File

stats.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01524387, version 1

Collections

Citation

Louis Jachiet, Pierre Genevès, Nabil Layaïda. Optimizing SPARQL query evaluation with a worst-case cardinality estimation based on statistics on the data. 2017. ⟨hal-01524387⟩

Share

Metrics

Record views

504

Files downloads

347