SAKey: Scalable Almost Key Discovery in RDF Data

Exploiting identity links among RDF resources allows applications to efficiently integrate data. Keys can be very useful to discover these identity links. A set of properties is considered as a key when its values uniquely identify resources. However, these keys are usually not available. The approaches that attempt to automatically discover keys can easily be overwhelmed by the size of the data and require clean data. We present SAKey, an approach that discovers keys in RDF data in an efficient way. To prune the search space, SAKey exploits characteristics of the data that are dynamically detected during the process. Furthermore , our approach can discover keys in datasets where erroneous data or duplicates exist (i.e., almost keys). The approach has been evaluated on different synthetic and real datasets. The results show both the relevance of almost keys and the efficiency of discovering them.

Mots clés

Keys Identity Links Data Linking RDF OWL2

Domaines

Informatique [cs] Intelligence artificielle [cs.AI] Base de données [cs.DB] Web

Fichier principal

ISWC2014.pdf (972.07 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Fatiha Saïs : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01275954

Soumis le : jeudi 18 février 2016-15:22:09

Dernière modification le : lundi 12 février 2024-09:36:03

Archivage à long terme le : jeudi 19 mai 2016-10:50:29

Dates et versions

hal-01275954 , version 1 (18-02-2016)

Identifiants

HAL Id : hal-01275954 , version 1
DOI : 10.1007/978-3-319-11964-9_3
PRODINRA : 361652
WOS : 000375525800003

Citer

Danai Symeonidou, Vincent Armant, Nathalie Pernelle, Fatiha Saïs. SAKey: Scalable Almost Key Discovery in RDF Data. In proceedings of the 13th International Semantic Web Conference, ISWC 2014, Oct 2014, Riva del Garda, Italy. pp.33--49, ⟨10.1007/978-3-319-11964-9_3⟩. ⟨hal-01275954⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

EC-PARIS CNRS UMR8623 LRI-LAHDAK UNIV-PARIS-SACLAY ANR LISN LISN-LAHDAK

462 Consultations

274 Téléchargements