Query Induction with Schema-Guided Pruning Strategies

Joachim Niehren 1, 2 Jérôme Champavère 1 Rémi Gilleron 1, 3 Aurélien Lemay 1, 2
2 LINKS - Linking Dynamic Data
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe
3 MAGNET - Machine Learning in Information Networks
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe
Abstract : Inference algorithms for tree automata that define node selecting queries in unranked trees rely on tree pruning strategies. These impose additional assumptions on node selection that are needed to compensate for small numbers of annotated examples. Pruning-based heuristics in query learning algorithms for Web information extraction often boost the learning quality and speed up the learning process. We will distinguish the class of regular queries that are stable under a given schema-guided pruning strategy, and show that this class is learnable with polynomial time and data. Our learning algorithm is obtained by adding pruning heuristics to the traditional learning algorithm for tree automata from positive and negative examples. While justified by a formal learning model, our learning algorithm for stable queries also performs very well in practice of XML information extraction.
Document type :
Journal articles
Complete list of metadatas

Cited literature [35 references]  Display  Hide  Download

https://hal.inria.fr/inria-00607121
Contributor : Joachim Niehren <>
Submitted on : Friday, March 29, 2013 - 8:46:25 PM
Last modification on : Thursday, June 27, 2019 - 1:36:06 PM
Long-term archiving on : Sunday, April 2, 2017 - 10:50:53 PM

File

0.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : inria-00607121, version 2

Citation

Joachim Niehren, Jérôme Champavère, Rémi Gilleron, Aurélien Lemay. Query Induction with Schema-Guided Pruning Strategies. Journal of Machine Learning Research, Microtome Publishing, 2013, 14, pp.927−964. ⟨inria-00607121v2⟩

Share

Metrics

Record views

563

Files downloads

324