Text Clustering to Support Knowledge Acquisition from Documents

Stéphane Lapalut

Rapport Année : 1995

Text Clustering to Support Knowledge Acquisition from Documents

(1)

Stéphane Lapalut

Fonction : Auteur

Knowledge acquisition for aided design through agent interaction

Résumé

At the earlier stage of the knowledge acquisition process, interviews of experts produce a large amount of rich but ill-structured texts. Knowledge engineers need some tool to help them in the exploitation of all these texts. We propose the use of a statistical method, the top-down hierarchical classification and a new interpreta tion of its results. The initial statistical analysis proposed by M. Reinert (Reinert, 1979 and 1992) gives two kinds of results: first a segmentation of texts that reflects their «semantic contexts» that we use to raise structures of texts, and second, classes of significant terms belonging to these contexts, which can be related to the experts or to these specialities. In this paper, we describe the method, its empirical validity and its comparison with similar approaches, its uses with examples and results. We conclude with some research directions to deal with so-called "ontologies" on expert's domains.

Mots clés

HIERARCHICAL TOP-DOWN CLASSIFICATION STATISTICAL TEXT ANALYSIS TEXT SEGMENTATION TEXT STRUCTURE DISCOVERY SEMANTIC CONTEXT

Domaines

Autre [cs.OH]

Fichier principal

RR-2639.pdf (227.22 Ko)

Rapport De Recherche Inria : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00074051

Soumis le : mercredi 24 mai 2006-14:23:35

Dernière modification le : mercredi 15 mars 2023-08:58:08

Archivage à long terme le : dimanche 4 avril 2010-21:37:36

Dates et versions

inria-00074051 , version 1 (24-05-2006)

Identifiants

HAL Id : inria-00074051 , version 1

Citer

Stéphane Lapalut. Text Clustering to Support Knowledge Acquisition from Documents. RR-2639, INRIA. 1995. ⟨inria-00074051⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INRIA INRIA-RRRT ACACIA INRIA2 LARA

81 Consultations

344 Téléchargements

Text Clustering to Support Knowledge Acquisition from Documents

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager