Semantic Clustering using Bag-of-Bag-of-Features

Ali-Reza Ebadat 1 Vincent Claveau 1 Pascale Sébillot 1
1 TEXMEX - Multimedia content-based indexing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : Computing distances between textual representation is at the heart of many Natural Language Processing tasks. The standard approaches initially developed for Information Retrieval are then used; most often they rely on a bag-of-words (or bag-of-feature) description with a TF-IDF (or variants) weighting, a vectorial representation and classical similarity functions like cosine. In this paper, we are interested in such a task, namely the semantic clustering of entities extracted from a text. We argue that for this kind of tasks, more suited representations and similarity measures can be used. In particular, we explore the use of alternative representation for entities called Bag-Of-Vectors (or Bag-of-Bags-of-Features). In this new model, each entity is not defined as a unique vector but as a set of vectors, in which each vector is built based on the contextual features of one occurrence of the entity. In order to use Bag-Of-Vectors for clustering, we introduce new versions of classical similarity functions such as Cosine, Jaccard and Scalar Products. Experimentally, we show that the Bag-Of-Vectors representation always improve the clustering results compared to classical Bag-Of-Features representations.
Type de document :
Communication dans un congrès
CORIA - COnférence en Recherche d'Information et Applications, Mar 2012, Bordeaux, France. pp.229-244, 2012
Liste complète des métadonnées

Littérature citée [18 références]  Voir  Masquer  Télécharger
Contributeur : Pascale Sébillot <>
Soumis le : lundi 19 novembre 2012 - 19:35:35
Dernière modification le : mercredi 21 février 2018 - 01:41:00
Document(s) archivé(s) le : jeudi 21 février 2013 - 11:45:28


Fichiers produits par l'(les) auteur(s)


  • HAL Id : hal-00753912, version 1


Ali-Reza Ebadat, Vincent Claveau, Pascale Sébillot. Semantic Clustering using Bag-of-Bag-of-Features. CORIA - COnférence en Recherche d'Information et Applications, Mar 2012, Bordeaux, France. pp.229-244, 2012. 〈hal-00753912〉



Consultations de la notice


Téléchargements de fichiers