Keys and Pseudo-keys Detection for Web Datasets Cleansing and Interlinking - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Rapport (Rapport Contrat/Projet) Année : 2012

Keys and Pseudo-keys Detection for Web Datasets Cleansing and Interlinking

Résumé

This report introduces a novel method for analysing web datasets based on key dependencies. This particular kind of functional dependencies, widely studied in the field of database theory, allows to evaluate if a set of properties constitutes a key for the set of data considered. When this is the case, there won't be any two instances having identical values for these properties. After giving necessary definitions, we propose an algorithm for detecting minimal keys and pseudo-keys in a RDF dataset. We then use this algorithm to detect keys in datasets published as web data and we apply this approach in two applications: (i) reducing the number of properties to compare in order to discover equivalent instances between two datasets, (ii) detecting errors inside a dataset.
Fichier principal
Vignette du fichier
datalift-412.pdf (364.89 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00785745 , version 1 (06-02-2013)

Identifiants

  • HAL Id : hal-00785745 , version 1

Citer

François Scharffe, Jérôme David, Manuel Atencia. Keys and Pseudo-keys Detection for Web Datasets Cleansing and Interlinking. [Contract] scharffe2012b, 2012, pp.18. ⟨hal-00785745⟩
346 Consultations
153 Téléchargements

Partager

Gmail Facebook X LinkedIn More