Investigating word interactions in texts. Application to text categorization in genomics

Abstract : Words interacting in a text may be compared, to a certain extent, to molecules interacting and building “complexes”, i.e. multiwords, named entities, or longer-range semantic or syntactic associations. We will call them “k-itemsets” in the sequel, k being their interaction level. We have shown (Cadot 06) than an adequately built subset of these k-itemsets is enough for describing the entirety of the relations at work in a corpus, whatever the level k of these relations. Experimental assessment: we have shown, on a subset of 120,000 abstracts of Web of Science database in the domain of genomics that a small proportion of these itemsets suffices for discriminating with a measurable precision, fifty sub-categories of genomic research. These ones are issued from an unsupervised categorization process involving the whole 230,000 1-itemsets, i.e. individuals words.
Type de document :
Poster
First SaarLorLux Workshop on Systems Biology 2009, Computational, Structural and Medical Approaches for Systems Biology, Dec 2009, Nancy, France
Liste complète des métadonnées

Littérature citée [4 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00442395
Contributeur : Martine Cadot <>
Soumis le : lundi 21 décembre 2009 - 11:14:03
Dernière modification le : mardi 24 avril 2018 - 13:37:26
Document(s) archivé(s) le : jeudi 17 juin 2010 - 23:59:43

Fichier

Cadot_poster_SaarLorLux_Worksh...
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : inria-00442395, version 1

Citation

Martine Cadot, Michel Zitt, Gabriel Meurin, Alain Lelu. Investigating word interactions in texts. Application to text categorization in genomics. First SaarLorLux Workshop on Systems Biology 2009, Computational, Structural and Medical Approaches for Systems Biology, Dec 2009, Nancy, France. 〈inria-00442395〉

Partager

Métriques

Consultations de la notice

480

Téléchargements de fichiers

128