Skip to Main content Skip to Navigation
Poster communications

Investigating word interactions in texts. Application to text categorization in genomics

Abstract : Words interacting in a text may be compared, to a certain extent, to molecules interacting and building “complexes”, i.e. multiwords, named entities, or longer-range semantic or syntactic associations. We will call them “k-itemsets” in the sequel, k being their interaction level. We have shown (Cadot 06) than an adequately built subset of these k-itemsets is enough for describing the entirety of the relations at work in a corpus, whatever the level k of these relations. Experimental assessment: we have shown, on a subset of 120,000 abstracts of Web of Science database in the domain of genomics that a small proportion of these itemsets suffices for discriminating with a measurable precision, fifty sub-categories of genomic research. These ones are issued from an unsupervised categorization process involving the whole 230,000 1-itemsets, i.e. individuals words.
Complete list of metadata

Cited literature [4 references]  Display  Hide  Download
Contributor : Martine Cadot Connect in order to contact the contributor
Submitted on : Monday, December 21, 2009 - 11:14:03 AM
Last modification on : Thursday, January 20, 2022 - 3:42:17 AM
Long-term archiving on: : Thursday, June 17, 2010 - 11:59:43 PM


Files produced by the author(s)


  • HAL Id : inria-00442395, version 1


Martine Cadot, Michel Zitt, Gabriel Meurin, Alain Lelu. Investigating word interactions in texts. Application to text categorization in genomics. First SaarLorLux Workshop on Systems Biology 2009, Computational, Structural and Medical Approaches for Systems Biology, Dec 2009, Nancy, France. ⟨inria-00442395⟩



Record views


Files downloads