Benchmarking a new semantic similarity measure using fuzzy clustering and reference sets: Application to cancer expression data

Abstract : Clustering algorithms rely on a similarity or distance measure that directs the grouping of similar objects into the same cluster and the separation of distant objects between distinct clusters. Our recently described semantic similarity measure (IntelliGO), that applies to functional comparison of genes, is tested here for the first time in clustering experiments. The dataset is composed of genes contained in a benchmarking collection of reference sets. Heatmap visualization of hierarchical clustering illustrates the advantages of using the IntelliGO measure over three other similarity measures. Because genes often belong to more than one cluster in functional clustering, fuzzy C-means clustering is also applied to the dataset. The choice of the optimal number of clusters and clustering performance are evaluated by the F-score method using the reference sets. Overlap analysis is proposed as a method for exploiting the matching between clusters and reference sets. Finally, our method is applied to a list of genes found dysregulated in cancer samples. In this case, the reference sets are provided by expression profiles. Overlap analysis between these profiles and functional clusters obtained with fuzzy C-means clustering leads to characterize subsets of genes displaying consistent function and expression profiles.
Document type :
Conference papers
Complete list of metadatas

Cited literature [30 references]  Display  Hide  Download

https://hal.inria.fr/inria-00617692
Contributor : Sidahmed Benabderrahmane <>
Submitted on : Tuesday, November 29, 2011 - 7:12:15 PM
Last modification on : Friday, June 28, 2019 - 9:38:04 AM
Long-term archiving on : Sunday, December 4, 2016 - 6:44:32 AM

File

EGC_2011_sidahmed_V2-last.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : inria-00617692, version 1

Collections

Citation

Sidahmed Benabderrahmane, Marie-Dominique Devignes, Malika Smail-Tabbone, Olivier Poch, Amedeo Napoli, et al.. Benchmarking a new semantic similarity measure using fuzzy clustering and reference sets: Application to cancer expression data. 11ème Conférence Internationale Francophone sur l'Extraction et la Gestion des Connaissances - EGC 2011, Jan 2011, Brest, France. ⟨inria-00617692⟩

Share

Metrics

Record views

343

Files downloads

354