Service interruption on Monday 11 July from 12:30 to 13:00: all the sites of the CCSD (HAL, Epiciences, SciencesConf, AureHAL) will be inaccessible (network hardware connection).
Skip to Main content Skip to Navigation
Conference papers

Entity Discovery and Annotation in Tables

Abstract : The Web is rich of tables (e.g., HTML tables, speadsheets, Google Fusion tables) that host a considerable wealth of high-quality relational data. Unlike unstructured texts, tables usually favour the automatic extraction of data because of their regular structure and properties. The data extraction is usually complemented by the annotation of the table, which determines its semantics by identifying a type for each column, the relations between columns, if any, and the entities that occur in each cell. In this paper, we focus on the problem of discovering and annotating entities intables. More specifically, we describe an algorithm that identifies the rows of a table that contain information on entities of specific types (e.g., restaurant, museum, theatre) derived from an ontology and determines the cells in which the names of those entities occur. We implemented this algorithm while developing a faceted browser over a repository of RDF data on points of interest of cities that we extracted from Google Fusion Tables. We claim that our algorithm complements the existing approaches, which annotate entities in a table based on a pre-compiled reference catalogue that lists the types of a finite set of entities; as a result, they are unable to discover and annotate entities that do not belong to the reference catalogue. Instead, we train our algorithm to look for information on previously unseen entities on the Web so as to annotate them with the correct type.
Document type :
Conference papers
Complete list of metadata

Cited literature [23 references]  Display  Hide  Download
Contributor : Chantal Reynaud Connect in order to contact the contributor
Submitted on : Tuesday, June 11, 2013 - 10:07:30 AM
Last modification on : Sunday, June 26, 2022 - 11:59:03 AM
Long-term archiving on: : Tuesday, April 4, 2017 - 7:16:58 PM


Files produced by the author(s)


  • HAL Id : hal-00832639, version 1


Gianluca Quercini, Chantal Reynaud-Delaître. Entity Discovery and Annotation in Tables. EDBT: Inernational Conference on Extending Database Technology, Mar 2013, Genoa, Italy. ⟨hal-00832639⟩



Record views


Files downloads