Skip to Main content Skip to Navigation
Conference papers

Keys and Pseudo-Keys Detection for Web Datasets Cleansing and Interlinking

Manuel Atencia 1, 2 Jérôme David 1 François Scharffe 3 
1 EXMO - Computer mediated exchange of structured knowledge
Inria Grenoble - Rhône-Alpes, LIG - Laboratoire d'Informatique de Grenoble
3 TATOO - Fouille de données environnementales
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
Abstract : This paper introduces a method for analyzing web datasets based on key dependencies. The classical notion of a key in relational databases is adapted to RDF datasets. In order to better deal with web data of variable quality, the definition of a pseudo-key is presented. An RDF vocabulary for representing keys is also provided. An algorithm to discover keys and pseudo-keys is described. Experimental results show that even for a big dataset such as DBpedia, the runtime of the algorithm is still reasonable. Two applications are further discussed: (i) detection of errors in RDF datasets, and (ii) datasets interlinking.
Document type :
Conference papers
Complete list of metadata

Cited literature [9 references]  Display  Hide  Download

https://hal.inria.fr/hal-00768412
Contributor : Jérôme Euzenat Connect in order to contact the contributor
Submitted on : Friday, December 21, 2012 - 2:33:37 PM
Last modification on : Sunday, June 26, 2022 - 9:33:58 AM
Long-term archiving on: : Sunday, December 18, 2016 - 8:34:13 AM

File

atencia2012b.pdf
Files produced by the author(s)

Identifiers

Citation

Manuel Atencia, Jérôme David, François Scharffe. Keys and Pseudo-Keys Detection for Web Datasets Cleansing and Interlinking. EKAW: Knowledge Engineering and Knowledge Management, Oct 2012, Galway, Ireland. pp.144-153, ⟨10.1007/978-3-642-33876-2_14⟩. ⟨hal-00768412⟩

Share

Metrics

Record views

324

Files downloads

499