Skip to Main content Skip to Navigation
New interface
Conference papers

Learning Node Selecting Tree Transducer from Completely Annotated Examples

Abstract : Web documents in HTML or XML form trees with nodes containing text. A base problem in Web information extraction is to find appropriate queries for informative nodes in trees. We propose to learn queries for nodes in trees automatically from examples. We introduce node selecting tree transducer (NSTT) for representing node queries in trees and show how to induce determinist ic NSTTs in polynomial time from completely annotated examples by methods of grammatical inference. We have implemented learning algorithms for NSTTs, started applying them to Web information extraction, and present first experimental results.
Complete list of metadata

Cited literature [23 references]  Display  Hide  Download
Contributor : Joachim Niehren Connect in order to contact the contributor
Submitted on : Tuesday, November 16, 2010 - 1:41:26 PM
Last modification on : Friday, February 4, 2022 - 3:16:00 AM
Long-term archiving on: : Thursday, February 17, 2011 - 2:26:16 AM


Files produced by the author(s)


  • HAL Id : inria-00536528, version 1



Julien Carme, Aurélien Lemay, Joachim Niehren. Learning Node Selecting Tree Transducer from Completely Annotated Examples. 7th International Colloquium on Grammatical Inference, 2004, Athens, Greece. pp.91--102. ⟨inria-00536528⟩



Record views


Files downloads