Learning Node Selecting Tree Transducer from Completely Annotated Examples

Abstract : Web documents in HTML or XML form trees with nodes containing text. A base problem in Web information extraction is to find appropriate queries for informative nodes in trees. We propose to learn queries for nodes in trees automatically from examples. We introduce node selecting tree transducer (NSTT) for representing node queries in trees and show how to induce determinist ic NSTTs in polynomial time from completely annotated examples by methods of grammatical inference. We have implemented learning algorithms for NSTTs, started applying them to Web information extraction, and present first experimental results.
Complete list of metadatas

Cited literature [23 references]  Display  Hide  Download

https://hal.inria.fr/inria-00536528
Contributor : Joachim Niehren <>
Submitted on : Tuesday, November 16, 2010 - 1:41:26 PM
Last modification on : Thursday, February 21, 2019 - 10:52:49 AM
Long-term archiving on : Thursday, February 17, 2011 - 2:26:16 AM

File

icgi04.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : inria-00536528, version 1

Collections

Citation

Julien Carme, Aurélien Lemay, Joachim Niehren. Learning Node Selecting Tree Transducer from Completely Annotated Examples. 7th International Colloquium on Grammatical Inference, 2004, Athens, Greece. pp.91--102. ⟨inria-00536528⟩

Share

Metrics

Record views

367

Files downloads

395