Conditional Random Fields for XML Applications

Abstract : XML tree labeling is the problem of classifying elements in XML documents. It is a fundamental task for applications like XML transformation, schema matching, and information extraction. In this paper we propose XCRFs, conditional random fields for XML tree labeling. Dealing with trees often raises complexity problems. We describe optimization methods by means of constraints and combination techniques that allow XCRFs to be used in real tasks and in interactive machine learning programs. We show that domain knowledge in XML applications easily transfers in XCRFs thanks to constraints and combination of XCRFs. We describe an approach based on XCRF to learn tree transformations. The approach allows to solve xml data integration tasks and restructuration tasks. We have developed an open source toolbox for XCRFs. We use it to propose a Web service for the generation of personalized RSS feeds from HTML pages.
Document type :
Reports
Complete list of metadatas

Cited literature [45 references]  Display  Hide  Download

https://hal.inria.fr/inria-00342279
Contributor : Marc Tommasi <>
Submitted on : Thursday, November 27, 2008 - 9:03:35 AM
Last modification on : Tuesday, May 21, 2019 - 3:38:13 PM
Long-term archiving on : Thursday, October 11, 2012 - 12:07:18 PM

File

RR-6738.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : inria-00342279, version 1

Citation

Rémi Gilleron, Florent Jousse, Marc Tommasi, Isabelle Tellier. Conditional Random Fields for XML Applications. [Research Report] RR-6738, INRIA. 2008. ⟨inria-00342279⟩

Share

Metrics

Record views

472

Files downloads

335