Skip to Main content Skip to Navigation
Conference papers

Entity Discovery and Annotation in Tables

Abstract : The Web is rich of tables (e.g., HTML tables, speadsheets, Google Fusion tables) that host a considerable wealth of high-quality relational data. Unlike unstructured texts, tables usually favour the automatic extraction of data because of their regular structure and properties. The data extraction is usually complemented by the annotation of the table, which determines its semantics by identifying a type for each column, the relations between columns, if any, and the entities that occur in each cell. In this paper, we focus on the problem of discovering and annotating entities intables. More specifically, we describe an algorithm that identifies the rows of a table that contain information on entities of specific types (e.g., restaurant, museum, theatre) derived from an ontology and determines the cells in which the names of those entities occur. We implemented this algorithm while developing a faceted browser over a repository of RDF data on points of interest of cities that we extracted from Google Fusion Tables. We claim that our algorithm complements the existing approaches, which annotate entities in a table based on a pre-compiled reference catalogue that lists the types of a finite set of entities; as a result, they are unable to discover and annotate entities that do not belong to the reference catalogue. Instead, we train our algorithm to look for information on previously unseen entities on the Web so as to annotate them with the correct type.
Document type :
Conference papers
Complete list of metadata

Cited literature [23 references]  Display  Hide  Download

https://hal.inria.fr/hal-00832639
Contributor : Chantal Reynaud <>
Submitted on : Tuesday, June 11, 2013 - 10:07:30 AM
Last modification on : Thursday, June 17, 2021 - 3:48:47 AM
Long-term archiving on: : Tuesday, April 4, 2017 - 7:16:58 PM

File

edbt2013.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00832639, version 1

Citation

Gianluca Quercini, Chantal Reynaud-Delaître. Entity Discovery and Annotation in Tables. EDBT: Inernational Conference on Extending Database Technology, Mar 2013, Genoa, Italy. ⟨hal-00832639⟩

Share

Metrics

Record views

353

Files downloads

503