Big Data Entity Resolution:

Vasilis Efthymiou; Kostas Stefanidis; Vassilis Christophides

doi:10.1109/BigData.2015.7363781

Communication Dans Un Congrès Année : 2015

Big Data Entity Resolution:

(1) , (1) , (2, 3)

1
2
3

Vasilis Efthymiou

Fonction : Auteur

Institute of Computer Science [FORTH, Heraklion]

Kostas Stefanidis

Fonction : Auteur

Institute of Computer Science [FORTH, Heraklion]

Vassilis Christophides

Fonction : Auteur
PersonId : 4825
IdHAL : vassilis-christophides
ORCID : 0000-0002-2076-1881
IdRef : 198210221

Measuring networks for enhancing USer Experience

Computer Science Department [Crete]

Résumé

—In the Web of data, entities are described by inter-linked data rather than documents on the Web. In this work, we focus on entity resolution in the Web of data, i.e., identifying descriptions that refer to the same real-world entity. To reduce the required number of pairwise comparisons, methods for entity resolution perform blocking as a pre-processing step. A blocking technique places similar entity descriptions into blocks and executes comparisons only between descriptions within the same block. We experimentally evaluate blocking techniques proposed for the Web of data and present dataset characteristics that determine the effectiveness and efficiency of such methods. Furthermore, we analyze the characteristics of the missed matching entity descriptions and examine different types of links that blocking techniques can potentially identify. I. INTRODUCTION Nowadays, knowledge bases (KBs) offer comprehensive, machine-readable descriptions of a large variety of real-world entities (e.g., persons, places) published on the Web as Linked Data (LD). Although KBs (e.g., DBpedia, Freebase) may be derived from the same data source (e.g., Wikipedia), they may provide multiple descriptions of the same entities. This is mainly due to the different information extraction tools and curation policies [3] employed by KBs, resulting to complementary and sometimes conflicting descriptions. Entity resolution (ER) aims to identify descriptions that refer to the same entity within or across KBs [2], [4]. Compared to data warehouses, the new ER challenges stem from the openness of the Web of data in describing entities by an unbounded number of KBs, the semantic and structural diversity of the descriptions provided across domains even for the same entities, and the autonomy of KBs in terms of adopted processes for creating and curating descriptions. In general, the way two descriptions can be effectively compared to efficiently decide if they refer to the same entity is challenged by the scale, diversity and graph structuring of the descriptions in the Web. This requires an understanding of the relationships among somehow similar descriptions that goes beyond duplicate detection. Also, the huge volume of entity collections that we need to resolve in the Web is prohibitive when examining pairwise all descriptions. In this context of big Web data, blocking is typically used as a pre-processing step for ER to reduce the number of required comparisons. After blocking, each description can be compared only to others placed within the same block. The desiderata of blocking are to place (i) similar

Mots clés

Web of Data Near Similarity Somehow Similarity Blocking Algorithms Entity Resolution

Domaines

Web Base de données [cs.DB] Apprentissage [cs.LG] Bibliothèque électronique [cs.DL]

Fichier principal

Big Data Entity Resolution.pdf (416.02 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

VASSILIS CHRISTOPHIDES : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01199399

Soumis le : mardi 15 septembre 2015-12:59:45

Dernière modification le : mercredi 26 octobre 2022-04:00:47

Archivage à long terme le : mardi 29 décembre 2015-07:12:12

Dates et versions

hal-01199399 , version 1 (15-09-2015)

Identifiants

HAL Id : hal-01199399 , version 1
DOI : 10.1109/BigData.2015.7363781

Citer

Vasilis Efthymiou, Kostas Stefanidis, Vassilis Christophides. Big Data Entity Resolution:: From Highly to Somehow Similar Entity Descriptions in the Web. 2015 IEEE International Conference on Big Data (IEEE BigData 2015), Oct 2015, Santa Clara, CA,, United States. ⟨10.1109/BigData.2015.7363781⟩. ⟨hal-01199399⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INRIA INRIA2

270 Consultations

718 Téléchargements

Big Data Entity Resolution:

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager