A semantic similarity measure for content-based classification of documents

Rim Al Hulou 1 Amedeo Napoli 1 Emmanuel Nauer 1
1 ORPAILLEUR - Knowledge representation, reasonning
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : In the framework of the Semantic Web, content-based processing of data is considered as an essential component for application semantic interoperability. In order to enable such a processing, many W3C standardisation proposals (rdf, rdfschema, owl, etc.) were done. In this paper, we present an approach for semantic similarity. It is based on content-based annotation of textual documents. The contents of documents are represented by labelled trees where nodes are represented by concepts in an ontology which represents the knowledge of the domain of data. Then, a reasoning process is carried out for comparing the labelled trees representing documents and thus comparing the documents. This comparison process, once completed, allows to calculate a semantic similarity measure between documents. Finally, we show how this semantic measure can be used to classify documents according to their content.
Type de document :
[Intern report] A04-R-518 || al_hulou04c, 2004, 15 p
Liste complète des métadonnées

Contributeur : Publications Loria <>
Soumis le : mardi 26 septembre 2006 - 10:15:57
Dernière modification le : jeudi 11 janvier 2018 - 06:19:52


  • HAL Id : inria-00100236, version 1



Rim Al Hulou, Amedeo Napoli, Emmanuel Nauer. A semantic similarity measure for content-based classification of documents. [Intern report] A04-R-518 || al_hulou04c, 2004, 15 p. 〈inria-00100236〉



Consultations de la notice