Learning Node Selecting Tree Transducer from Completely Annotated Examples - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2004

Learning Node Selecting Tree Transducer from Completely Annotated Examples

Résumé

Web documents in HTML or XML form trees with nodes containing text. A base problem in Web information extraction is to find appropriate queries for informative nodes in trees. We propose to learn queries for nodes in trees automatically from examples. We introduce node selecting tree transducer (NSTT) for representing node queries in trees and show how to induce determinist ic NSTTs in polynomial time from completely annotated examples by methods of grammatical inference. We have implemented learning algorithms for NSTTs, started applying them to Web information extraction, and present first experimental results.
Fichier principal
Vignette du fichier
icgi04.pdf (242.99 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

inria-00536528 , version 1 (16-11-2010)

Identifiants

  • HAL Id : inria-00536528 , version 1

Citer

Julien Carme, Aurélien Lemay, Joachim Niehren. Learning Node Selecting Tree Transducer from Completely Annotated Examples. 7th International Colloquium on Grammatical Inference, 2004, Athens, Greece. pp.91--102. ⟨inria-00536528⟩
150 Consultations
1330 Téléchargements

Partager

Gmail Facebook X LinkedIn More