Optimizing SPARQL query evaluation with a worst-case cardinality estimation based on statistics on the data

Louis Jachiet; Pierre Genevès; Nabil Layaïda

Pré-Publication, Document De Travail Année : 2017

Optimizing SPARQL query evaluation with a worst-case cardinality estimation based on statistics on the data

(1) , (1) , (1)

Louis Jachiet

Fonction : Auteur
PersonId : 179861
IdHAL : louis-jachiet

Types and Reasoning for the Web

Pierre Genevès

Fonction : Auteur
PersonId : 9676
IdHAL : pierre-geneves
ORCID : 0000-0001-7676-2755
IdRef : 117936324

Types and Reasoning for the Web

Nabil Layaïda

Fonction : Auteur
PersonId : 21665
IdHAL : nabil-layaida
ORCID : 0000-0001-8472-9365
IdRef : 15031504X

Types and Reasoning for the Web

Résumé

SPARQL is the w3c standard query language for querying data expressed in the Resource Description Framework (rdf). There exists a variety of sparql evaluation schemes and, in many of them, estimating the cardinality of intermediate results is key for performance, especially when the computation is distributed and the datasets very large. For example it helps in choosing join orders that minimize the size of intermediate subquery results. In this context, we propose a new cardinality estimation based on statistics about the data. Our cardinality estimation is a worst-case analysis tailored for sparql and capable of taking advantage of the implicit schema often present in rdf datasets (e.g. functional dependencies). This implicit schema is captured by statistics therefore our method does not need for the schema to be explicit or perfect (our system performs well even if there are a few " violations " of these implicit dependencies). We implemented our cardinality estimation and used it to optimize the evaluation of sparql queries: equipped with our cardinality estimation, the query evaluator performs better against most queries (sometimes by an order of magnitude) and is only ever slightly slower.

Mots clés

RDF System Distributed SPARQL Evaluation

Domaines

Web

Fichier principal

stats.pdf (360.16 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Tyrex Equipe : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01524387

Soumis le : jeudi 18 mai 2017-09:57:46

Dernière modification le : jeudi 4 avril 2024-21:10:45

Dates et versions

hal-01524387 , version 1 (18-05-2017)

Identifiants

HAL Id : hal-01524387 , version 1

Citer

Louis Jachiet, Pierre Genevès, Nabil Layaïda. Optimizing SPARQL query evaluation with a worst-case cardinality estimation based on statistics on the data. 2017. ⟨hal-01524387⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS INRIA LIG INRIA2 ANR LIG_SIDCH

369 Consultations

730 Téléchargements

Optimizing SPARQL query evaluation with a worst-case cardinality estimation based on statistics on the data

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager