Entity Discovery and Annotation in Tables

Abstract : The Web is rich of tables (e.g., HTML tables, speadsheets, Google Fusion tables) that host a considerable wealth of high-quality relational data. Unlike unstructured texts, tables usually favour the automatic extraction of data because of their regular structure and properties. The data extraction is usually complemented by the annotation of the table, which determines its semantics by identifying a type for each column, the relations between columns, if any, and the entities that occur in each cell. In this paper, we focus on the problem of discovering and annotating entities intables. More specifically, we describe an algorithm that identifies the rows of a table that contain information on entities of specific types (e.g., restaurant, museum, theatre) derived from an ontology and determines the cells in which the names of those entities occur. We implemented this algorithm while developing a faceted browser over a repository of RDF data on points of interest of cities that we extracted from Google Fusion Tables. We claim that our algorithm complements the existing approaches, which annotate entities in a table based on a pre-compiled reference catalogue that lists the types of a finite set of entities; as a result, they are unable to discover and annotate entities that do not belong to the reference catalogue. Instead, we train our algorithm to look for information on previously unseen entities on the Web so as to annotate them with the correct type.
Type de document :
Communication dans un congrès
EDBT: Inernational Conference on Extending Database Technology, Mar 2013, Genoa, Italy. 2013
Liste complète des métadonnées

Littérature citée [23 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-00832639
Contributeur : Chantal Reynaud <>
Soumis le : mardi 11 juin 2013 - 10:07:30
Dernière modification le : jeudi 11 janvier 2018 - 06:27:11
Document(s) archivé(s) le : mardi 4 avril 2017 - 19:16:58

Fichier

edbt2013.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00832639, version 1

Citation

Gianluca Quercini, Chantal Reynaud-Delaître. Entity Discovery and Annotation in Tables. EDBT: Inernational Conference on Extending Database Technology, Mar 2013, Genoa, Italy. 2013. 〈hal-00832639〉

Partager

Métriques

Consultations de la notice

163

Téléchargements de fichiers

193