Service interruption on Monday 11 July from 12:30 to 13:00: all the sites of the CCSD (HAL, Epiciences, SciencesConf, AureHAL) will be inaccessible (network hardware connection).
Skip to Main content Skip to Navigation

Keys and Pseudo-keys Detection for Web Datasets Cleansing and Interlinking

François Scharffe 1 Jérôme David 2 Manuel Atencia 2, 3 
1 TATOO - Fouille de données environnementales
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
2 EXMO - Computer mediated exchange of structured knowledge
Inria Grenoble - Rhône-Alpes, LIG - Laboratoire d'Informatique de Grenoble
Abstract : This report introduces a novel method for analysing web datasets based on key dependencies. This particular kind of functional dependencies, widely studied in the field of database theory, allows to evaluate if a set of properties constitutes a key for the set of data considered. When this is the case, there won't be any two instances having identical values for these properties. After giving necessary definitions, we propose an algorithm for detecting minimal keys and pseudo-keys in a RDF dataset. We then use this algorithm to detect keys in datasets published as web data and we apply this approach in two applications: (i) reducing the number of properties to compare in order to discover equivalent instances between two datasets, (ii) detecting errors inside a dataset.
Complete list of metadata

Cited literature [12 references]  Display  Hide  Download
Contributor : Jérôme Euzenat Connect in order to contact the contributor
Submitted on : Wednesday, February 6, 2013 - 7:06:31 PM
Last modification on : Sunday, June 26, 2022 - 9:34:00 AM
Long-term archiving on: : Saturday, April 1, 2017 - 5:43:19 PM


Files produced by the author(s)


  • HAL Id : hal-00785745, version 1


François Scharffe, Jérôme David, Manuel Atencia. Keys and Pseudo-keys Detection for Web Datasets Cleansing and Interlinking. [Contract] scharffe2012b, 2012, pp.18. ⟨hal-00785745⟩



Record views


Files downloads