Skip to Main content Skip to Navigation
Poster communications

Investigating word interactions in texts. Application to text categorization in genomics

Abstract : Words interacting in a text may be compared, to a certain extent, to molecules interacting and building “complexes”, i.e. multiwords, named entities, or longer-range semantic or syntactic associations. We will call them “k-itemsets” in the sequel, k being their interaction level. We have shown (Cadot 06) than an adequately built subset of these k-itemsets is enough for describing the entirety of the relations at work in a corpus, whatever the level k of these relations. Experimental assessment: we have shown, on a subset of 120,000 abstracts of Web of Science database in the domain of genomics that a small proportion of these itemsets suffices for discriminating with a measurable precision, fifty sub-categories of genomic research. These ones are issued from an unsupervised categorization process involving the whole 230,000 1-itemsets, i.e. individuals words.
Complete list of metadatas

Cited literature [4 references]  Display  Hide  Download

https://hal.inria.fr/inria-00442395
Contributor : Martine Cadot <>
Submitted on : Monday, December 21, 2009 - 11:14:03 AM
Last modification on : Thursday, January 7, 2021 - 3:44:26 PM
Long-term archiving on: : Thursday, June 17, 2010 - 11:59:43 PM

File

Cadot_poster_SaarLorLux_Worksh...
Files produced by the author(s)

Identifiers

  • HAL Id : inria-00442395, version 1

Citation

Martine Cadot, Michel Zitt, Gabriel Meurin, Alain Lelu. Investigating word interactions in texts. Application to text categorization in genomics. First SaarLorLux Workshop on Systems Biology 2009, Computational, Structural and Medical Approaches for Systems Biology, Dec 2009, Nancy, France. ⟨inria-00442395⟩

Share

Metrics

Record views

541

Files downloads

146