Knowledge extraction from webpages

Sylvain Tenier 1, 2 Amedeo Napoli 1 Xavier Polanco 2 Yannick Toussaint 1
1 ORPAILLEUR - Knowledge representation, reasonning
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : This article presents a system to extract Knowledge from webpages by producing semantic annotations. taking into account semantic information from the domain to annotate an element in a webpage implies solving two problems : (1) identifying the syntactic structure of this element in the webpage and (2) identifying the most specific concept (in terms of subsumption) of the ontology that will be used to annotate this element. Our approach relies on a wrapper-based machine learning algorithm combined with reasoning making use of the formal structure of the ontology.
Complete list of metadatas

Cited literature [9 references]  Display  Hide  Download

https://hal.inria.fr/inria-00000822
Contributor : Sylvain Tenier <>
Submitted on : Tuesday, November 22, 2005 - 11:36:12 AM
Last modification on : Thursday, January 11, 2018 - 6:19:53 AM
Long-term archiving on : Friday, April 2, 2010 - 6:13:36 PM

Identifiers

  • HAL Id : inria-00000822, version 1

Collections

Citation

Sylvain Tenier, Amedeo Napoli, Xavier Polanco, Yannick Toussaint. Knowledge extraction from webpages. 5th International Workshop on Knowledge Markup and Semantic Annotation - SemAnnot 2005, Siegfried Handschuh, Nov 2005, Galway/Ireland. ⟨inria-00000822⟩

Share

Metrics

Record views

267

Files downloads

220