Keys and Pseudo-Keys Detection for Web Datasets Cleansing and Interlinking

Manuel Atencia; Jérôme David; François Scharffe

doi:10.1007/978-3-642-33876-2_14

Conference Papers Year : 2012

Keys and Pseudo-Keys Detection for Web Datasets Cleansing and Interlinking

(1, 2) , (1) , (3)

1
2
3

Manuel Atencia

Function : Author
PersonId : 2547
IdHAL : manuelatencia

Computer mediated exchange of structured knowledge

Heterogeneous and Adaptive distributed DAta management Systems

Jérôme David

Function : Author
PersonId : 946669

Computer mediated exchange of structured knowledge

François Scharffe

Function : Author
PersonId : 871811
ORCID : 0000-0002-0010-0058

Fouille de données environnementales

Abstract

This paper introduces a method for analyzing web datasets based on key dependencies. The classical notion of a key in relational databases is adapted to RDF datasets. In order to better deal with web data of variable quality, the definition of a pseudo-key is presented. An RDF vocabulary for representing keys is also provided. An algorithm to discover keys and pseudo-keys is described. Experimental results show that even for a big dataset such as DBpedia, the runtime of the algorithm is still reasonable. Two applications are further discussed: (i) detection of errors in RDF datasets, and (ii) datasets interlinking.

Domains

Other [cs.OH]

Fichier principal

atencia2012b.pdf (210.86 Ko)

Origin : Files produced by the author(s)

Jérôme Euzenat : Connect in order to contact the contributor

https://inria.hal.science/hal-00768412

Submitted on : Friday, December 21, 2012-2:33:37 PM

Last modification on : Thursday, April 4, 2024-9:04:03 PM

Long-term archiving on: Sunday, December 18, 2016-8:34:13 AM

Dates and versions

hal-00768412 , version 1 (21-12-2012)

Identifiers

HAL Id : hal-00768412 , version 1
DOI : 10.1007/978-3-642-33876-2_14

Cite

Manuel Atencia, Jérôme David, François Scharffe. Keys and Pseudo-Keys Detection for Web Datasets Cleansing and Interlinking. EKAW: Knowledge Engineering and Knowledge Management, Oct 2012, Galway, Ireland. pp.144-153, ⟨10.1007/978-3-642-33876-2_14⟩. ⟨hal-00768412⟩

Export

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS INRIA LIG LIG_TDCGE LIG_TDCGE_HADAS LIRMM INRIA2 MIPS UNIV-MONTPELLIER ANR LIG_SIDCH

349 View

548 Download

Keys and Pseudo-Keys Detection for Web Datasets Cleansing and Interlinking

Abstract

Domains

Dates and versions

Identifiers

Cite

Export

Collections

Altmetric

Share