Conditional Random Fields for XML Applications - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Rapport (Rapport De Recherche) Année : 2008

Conditional Random Fields for XML Applications

Résumé

XML tree labeling is the problem of classifying elements in XML documents. It is a fundamental task for applications like XML transformation, schema matching, and information extraction. In this paper we propose XCRFs, conditional random fields for XML tree labeling. Dealing with trees often raises complexity problems. We describe optimization methods by means of constraints and combination techniques that allow XCRFs to be used in real tasks and in interactive machine learning programs. We show that domain knowledge in XML applications easily transfers in XCRFs thanks to constraints and combination of XCRFs. We describe an approach based on XCRF to learn tree transformations. The approach allows to solve xml data integration tasks and restructuration tasks. We have developed an open source toolbox for XCRFs. We use it to propose a Web service for the generation of personalized RSS feeds from HTML pages.
Fichier principal
Vignette du fichier
RR-6738.pdf (381.46 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

inria-00342279 , version 1 (27-11-2008)

Identifiants

  • HAL Id : inria-00342279 , version 1

Citer

Rémi Gilleron, Florent Jousse, Marc Tommasi, Isabelle Tellier. Conditional Random Fields for XML Applications. [Research Report] RR-6738, INRIA. 2008. ⟨inria-00342279⟩
194 Consultations
241 Téléchargements

Partager

Gmail Facebook X LinkedIn More