Clustering optimal de gènes fondé sur une mesure de similarité sémantique

Rachid Hafiane 1 Malika Smaïl-Tabbone 2 Marie-Dominique Devignes 3 Salvatore Tabbone 4
1 QGAR - Querying Graphics through Analysis and Recognition
LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
2 ORPAILLEUR - Knowledge representation, reasonning
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
3 ORPAILLEUR - Knowledge representation, reasonning
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
4 QGAR - Querying Graphics through Analysis and Recognition
LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : In various application domains of knowledge extraction or information retrieval, objects are not represented as feature vectors in a vector space but as a pairwise similarity matrix. In molecular biology, such a similarity measure either captures the object structure (e.g. molecules, proteins as sequences of amino acids) or the semantics of their description (genes or diseases described with ontology terms). The numerous existing similarity measures often violate metricity properties. This is the case of our IntelliGO semantic similarity defined as a generalized cosine between two vectors of Gene Ontology terms (Gene Ontology is a directed acyclic graph representing the semantic relationship between terms). Specific techniques exist for embedding pairwise data into Euclidian space for facilitating subsequent clustering of the objects. We report in this paper comparative gene clustering with and without embedding using the Intelligo measure and benchmarks. As for the clustering algorithm, we use an implementation of the C-means algorithm taking as input either a distance" matrix or a set of vectors. We evaluate the clustering quality and discuss the results.
Complete list of metadatas

https://hal.inria.fr/hal-00920700
Contributor : Malika Smail-Tabbone <>
Submitted on : Thursday, December 19, 2013 - 9:23:17 AM
Last modification on : Tuesday, December 18, 2018 - 4:38:34 PM

Identifiers

  • HAL Id : hal-00920700, version 1

Collections

Citation

Rachid Hafiane, Malika Smaïl-Tabbone, Marie-Dominique Devignes, Salvatore Tabbone. Clustering optimal de gènes fondé sur une mesure de similarité sémantique. 10ème édition de la COnférence en Recherche d'Information et Applications - CORIA 2013, Apr 2013, Neufchâtel, Suisse. 15 p. ⟨hal-00920700⟩

Share

Metrics

Record views

288