Mining XML Documents

Laurent Candillier; Ludovic Denoyer; Patrick Gallinari; Marie-Christine Rousset; Alexandre Termier; Anne-Marie Vercoustre

doi:10.4018/978-1-59904-162-9.ch009

Chapitre D'ouvrage Année : 2007

Mining XML Documents

(1) , (2) , (2) , (3) , (4) , (5, 6)

1
2
3
4
5
6

Laurent Candillier

Fonction : Auteur

Groupe de Recherche en Apprentissage Automatique

Ludovic Denoyer

Fonction : Auteur
PersonId : 9178
IdHAL : ludovic-denoyer
ORCID : 0000-0002-7348-788X
IdRef : 089291255

Machine Learning and Information Retrieval

Patrick Gallinari

Fonction : Auteur
PersonId : 751615
IdHAL : patrick-gallinari
ORCID : 0000-0001-9060-9001
IdRef : 070709076

Machine Learning and Information Retrieval

Marie-Christine Rousset

Fonction : Auteur
PersonId : 950209

Laboratoire Logiciels Systèmes Réseaux

Alexandre Termier

Fonction : Auteur
PersonId : 1660
IdHAL : alexandre-termier
ORCID : 0000-0003-1784-0017
IdRef : 13741689X

Institute of Statistical Mathematics

Anne-Marie Vercoustre

Fonction : Auteur
PersonId : 830030

INRIA Rocquencourt

Usage-centered design, analysis and improvement of information systems

Résumé

XML documents are becoming ubiquitous because of their rich and flexible format that can be used for a variety of applications. Giving the increasing size of XML collections as information sources, mining techniques that traditionally exist for text collections or databases need to be adapted and new methods to be invented to exploit the particular structure of XML documents. Basically XML documents can be seen as trees, which are well known to be complex structures. This chapter describes various ways of using and simplifying this tree structure to model documents and support efficient mining algorithms. We focus on three mining tasks: classification and clustering which are standard for text collections; discovering of frequent tree structure which is especially important for heterogeneous collection. This chapter presents some recent approaches and algorithms to support these tasks together with experimental evaluation on a variety of large XML collections.

Mots clés

XML Mining structured documents tree-based model document representation classification clustering frequent patterns frequent trees feature selection stochastic generative model Bayesian networks

Domaines

Intelligence artificielle [cs.AI] Traitement du texte et du document Recherche d'information [cs.IR]

Fichier principal

XML-MiningChapter_final.pdf (912.65 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Anne-Marie Vercoustre : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00188899

Soumis le : lundi 19 novembre 2007-15:55:34

Dernière modification le : mardi 11 avril 2023-15:16:28

Archivage à long terme le : lundi 12 avril 2010-02:42:34

Dates et versions

inria-00188899 , version 1 (19-11-2007)

Identifiants

HAL Id : inria-00188899 , version 1
DOI : 10.4018/978-1-59904-162-9.ch009

Citer

Laurent Candillier, Ludovic Denoyer, Patrick Gallinari, Marie-Christine Rousset, Alexandre Termier, et al.. Mining XML Documents. P. Poncelet, F. Masseglia, M. Teisseire. Data Mining Patterns: New Methods and Applications, Information Science Reference, pp.198-219, 2007, ⟨10.4018/978-1-59904-162-9.ch009⟩. ⟨inria-00188899⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UPMC UNIV-LILLE3 CNRS INRIA MOSTRARE LIP6 INRIA2 SORBONNE-UNIVERSITE SU-SCIENCES

385 Consultations

311 Téléchargements

Mining XML Documents

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager