Investigating word interactions in texts. Application to text categorization in genomics - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Poster Année : 2009

Investigating word interactions in texts. Application to text categorization in genomics

Résumé

Words interacting in a text may be compared, to a certain extent, to molecules interacting and building “complexes”, i.e. multiwords, named entities, or longer-range semantic or syntactic associations. We will call them “k-itemsets” in the sequel, k being their interaction level. We have shown (Cadot 06) than an adequately built subset of these k-itemsets is enough for describing the entirety of the relations at work in a corpus, whatever the level k of these relations. Experimental assessment: we have shown, on a subset of 120,000 abstracts of Web of Science database in the domain of genomics that a small proportion of these itemsets suffices for discriminating with a measurable precision, fifty sub-categories of genomic research. These ones are issued from an unsupervised categorization process involving the whole 230,000 1-itemsets, i.e. individuals words.
Fichier principal
Vignette du fichier
Cadot_poster_SaarLorLux_Workshop14_15dec09_v7.pdf (314.04 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

inria-00442395 , version 1 (21-12-2009)

Identifiants

  • HAL Id : inria-00442395 , version 1

Citer

Martine Cadot, Michel Zitt, Gabriel Meurin, Alain Lelu. Investigating word interactions in texts. Application to text categorization in genomics. First SaarLorLux Workshop on Systems Biology 2009, Computational, Structural and Medical Approaches for Systems Biology, Dec 2009, Nancy, France. ⟨inria-00442395⟩
281 Consultations
75 Téléchargements

Partager

Gmail Facebook X LinkedIn More