Skip to Main content Skip to Navigation
Reports

Keys and Pseudo-keys Detection for Web Datasets Cleansing and Interlinking

François Scharffe 1 Jérôme David 2 Manuel Atencia 2, 3
1 TATOO - Fouille de données environnementales
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
2 EXMO - Computer mediated exchange of structured knowledge
Inria Grenoble - Rhône-Alpes, LIG - Laboratoire d'Informatique de Grenoble
Abstract : This report introduces a novel method for analysing web datasets based on key dependencies. This particular kind of functional dependencies, widely studied in the field of database theory, allows to evaluate if a set of properties constitutes a key for the set of data considered. When this is the case, there won't be any two instances having identical values for these properties. After giving necessary definitions, we propose an algorithm for detecting minimal keys and pseudo-keys in a RDF dataset. We then use this algorithm to detect keys in datasets published as web data and we apply this approach in two applications: (i) reducing the number of properties to compare in order to discover equivalent instances between two datasets, (ii) detecting errors inside a dataset.
Complete list of metadata

Cited literature [12 references]  Display  Hide  Download

https://hal.inria.fr/hal-00785745
Contributor : Jérôme Euzenat <>
Submitted on : Wednesday, February 6, 2013 - 7:06:31 PM
Last modification on : Tuesday, February 9, 2021 - 3:02:04 PM
Long-term archiving on: : Saturday, April 1, 2017 - 5:43:19 PM

File

datalift-412.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00785745, version 1

Citation

François Scharffe, Jérôme David, Manuel Atencia. Keys and Pseudo-keys Detection for Web Datasets Cleansing and Interlinking. [Contract] scharffe2012b, 2012, pp.18. ⟨hal-00785745⟩

Share

Metrics

Record views

832

Files downloads

371