hal-00760105, version 1
Proper Noun Semantic Clustering using Bag-Of-Vectors
Ali-Reza Ebadat 1Vincent Claveau 1Pascale Sébillot 1
ANLP - Applied Natural Language Processing conference. Special track at the 25th International FLAIRS Conference. (2012) -
Résumé : In this paper, we propose a model for semantic clustering of entities extracted from a text, and we apply it to a Proper Noun classification task. This model is based on a new method to compute the similarity between the entities. In- deed, the classical way of calculating similarity is to build a feature vector or Bag-of-Features for each entity and then use classical similarity functions like cosine. In practice, the fea- tures are contextual ones, such as words around the different occurrences of each entity. Here, we propose to use an alternative representation for en- tities, called Bag-Of-Vectors, or Bag-of-Bags-of-Features. In this new model, each entity is not defined as a unique vector but as a set of vectors, in which each vector is built based on the contextual features of one occurrence of the entity. In or- der to use Bag-Of-Vectors for clustering, we introduce new versions of classical similarity functions such as Cosine, Jac- card and Scalar Products. Experimentally, we show that the Bag-Of-Vectors representa- tion always improve the clustering results compared to clas- sical Bag-Of-Features representations.
- 1 : TEXMEX (INRIA - IRISA)
- CNRS : UMR6074 – INRIA – Institut National des Sciences Appliquées (INSA) - Rennes – Université de Rennes 1
- Domaine : Informatique/Informatique et langage
Informatique/Multimédia
Informatique/Traitement du texte et du document
- hal-00760105, version 1
- http://hal.archives-ouvertes.fr/hal-00760105
- oai:hal.archives-ouvertes.fr:hal-00760105
- Contributeur : Vincent Claveau
- Soumis le : Lundi 3 Décembre 2012, 14:48:54
- Dernière modification le : Jeudi 17 Janvier 2013, 11:11:08






Documents associés
Exporter