Mining XML Documents

Abstract : XML documents are becoming ubiquitous because of their rich and flexible format that can be used for a variety of applications. Giving the increasing size of XML collections as information sources, mining techniques that traditionally exist for text collections or databases need to be adapted and new methods to be invented to exploit the particular structure of XML documents. Basically XML documents can be seen as trees, which are well known to be complex structures. This chapter describes various ways of using and simplifying this tree structure to model documents and support efficient mining algorithms. We focus on three mining tasks: classification and clustering which are standard for text collections; discovering of frequent tree structure which is especially important for heterogeneous collection. This chapter presents some recent approaches and algorithms to support these tasks together with experimental evaluation on a variety of large XML collections.
Type de document :
Chapitre d'ouvrage
P. Poncelet, F. Masseglia, M. Teisseire. Data Mining Patterns: New Methods and Applications, Information Science Reference, pp.198-219, 2007, 〈10.4018/978-1-59904-162-9.ch009〉
Liste complète des métadonnées

Littérature citée [27 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00188899
Contributeur : Anne-Marie Vercoustre <>
Soumis le : lundi 19 novembre 2007 - 15:55:34
Dernière modification le : vendredi 12 janvier 2018 - 01:48:50
Document(s) archivé(s) le : lundi 12 avril 2010 - 02:42:34

Fichier

XML-MiningChapter_final.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Laurent Candillier, Ludovic Denoyer, Patrick Gallinari, Marie-Christine Rousset, Alexandre Termier, et al.. Mining XML Documents. P. Poncelet, F. Masseglia, M. Teisseire. Data Mining Patterns: New Methods and Applications, Information Science Reference, pp.198-219, 2007, 〈10.4018/978-1-59904-162-9.ch009〉. 〈inria-00188899〉

Partager

Métriques

Consultations de la notice

382

Téléchargements de fichiers

348