Proper Noun Semantic Clustering using Bag-Of-Vectors

Abstract : In this paper, we propose a model for semantic clustering of entities extracted from a text, and we apply it to a Proper Noun classification task. This model is based on a new method to compute the similarity between the entities. In- deed, the classical way of calculating similarity is to build a feature vector or Bag-of-Features for each entity and then use classical similarity functions like cosine. In practice, the fea- tures are contextual ones, such as words around the different occurrences of each entity. Here, we propose to use an alternative representation for en- tities, called Bag-Of-Vectors, or Bag-of-Bags-of-Features. In this new model, each entity is not defined as a unique vector but as a set of vectors, in which each vector is built based on the contextual features of one occurrence of the entity. In or- der to use Bag-Of-Vectors for clustering, we introduce new versions of classical similarity functions such as Cosine, Jac- card and Scalar Products. Experimentally, we show that the Bag-Of-Vectors representa- tion always improve the clustering results compared to clas- sical Bag-Of-Features representations.
Document type :
Conference papers
ANLP - Applied Natural Language Processing conference. Special track at the 25th International FLAIRS Conference., May 2012, Marco Island, FL, United States. 2012


https://hal.archives-ouvertes.fr/hal-00760105
Contributor : Vincent Claveau <>
Submitted on : Monday, December 3, 2012 - 2:48:54 PM
Last modification on : Monday, May 18, 2015 - 12:51:34 AM

File

Ebadat-Ali-Reza-vf.pdf
fileSource_public_author

Identifiers

  • HAL Id : hal-00760105, version 1

Collections

Citation

Ali-Reza Ebadat, Vincent Claveau, Pascale Sébillot. Proper Noun Semantic Clustering using Bag-Of-Vectors. ANLP - Applied Natural Language Processing conference. Special track at the 25th International FLAIRS Conference., May 2012, Marco Island, FL, United States. 2012. <hal-00760105>

Export

Share

Metrics

Consultation de
la notice

174

Téléchargement du document

60