Conditional Random Fields for XML Applications

Abstract : XML tree labeling is the problem of classifying elements in XML documents. It is a fundamental task for applications like XML transformation, schema matching, and information extraction. In this paper we propose XCRFs, conditional random fields for XML tree labeling. Dealing with trees often raises complexity problems. We describe optimization methods by means of constraints and combination techniques that allow XCRFs to be used in real tasks and in interactive machine learning programs. We show that domain knowledge in XML applications easily transfers in XCRFs thanks to constraints and combination of XCRFs. We describe an approach based on XCRF to learn tree transformations. The approach allows to solve xml data integration tasks and restructuration tasks. We have developed an open source toolbox for XCRFs. We use it to propose a Web service for the generation of personalized RSS feeds from HTML pages.
Type de document :
Rapport
[Research Report] RR-6738, INRIA. 2008
Liste complète des métadonnées

Littérature citée [45 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00342279
Contributeur : Marc Tommasi <>
Soumis le : jeudi 27 novembre 2008 - 09:03:35
Dernière modification le : jeudi 11 janvier 2018 - 06:22:13
Document(s) archivé(s) le : jeudi 11 octobre 2012 - 12:07:18

Fichier

RR-6738.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : inria-00342279, version 1

Citation

Rémi Gilleron, Florent Jousse, Marc Tommasi, Isabelle Tellier. Conditional Random Fields for XML Applications. [Research Report] RR-6738, INRIA. 2008. 〈inria-00342279〉

Partager

Métriques

Consultations de la notice

380

Téléchargements de fichiers

240