Experiments in Clustering Homogeneous XML Documents to Validate an Existing Typology - Inria - Institut national de recherche en sciences et technologies du numérique Access content directly
Conference Papers Year : 2005

Experiments in Clustering Homogeneous XML Documents to Validate an Existing Typology

Abstract

This paper presents some experiments in clustering homogeneous XMLdocuments to validate an existing classification or more generally anorganisational structure. Our approach integrates techniques for extracting knowledge from documents with unsupervised classification (clustering) of documents. We focus on the feature selection used for representing documents and its impact on the emerging classification. We mix the selection of structured features with fine textual selection based on syntactic characteristics.We illustrate and evaluate this approach with a collection of Inria activity reports for the year 2003. The objective is to cluster projects into larger groups (Themes), based on the keywords or different chapters of these activity reports. We then compare the results of clustering using different feature selections, with the official theme structure used by Inria.
Fichier principal
Vignette du fichier
I-Know.pdf (124.47 Ko) Télécharger le fichier
I-Know.ppt (502 Ko) Télécharger le fichier
Format : Other
Loading...

Dates and versions

inria-00000002 , version 1 (28-04-2005)
inria-00000002 , version 2 (08-07-2005)
inria-00000002 , version 3 (09-08-2005)

Identifiers

Cite

Thierry Despeyroux, Yves Lechevallier, Brigitte Trousse, Anne-Marie Vercoustre. Experiments in Clustering Homogeneous XML Documents to Validate an Existing Typology. 5th International Conference on Knowledge Management (I-Know), Special Track on Knowledge Discovery and Semantic Technologies, Jun 2005, Graz, Austria. ⟨inria-00000002v3⟩

Collections

INRIA INRIA2
231 View
271 Download

Altmetric

Share

Gmail Facebook X LinkedIn More