A model selection criterion for model-based clustering of annotated gene expression data

Mélina Gallopin 1 Gilles Celeux 1 Florence Jaffrézic 2 Andrea Rau 2
1 SELECT - Model selection in statistical learning
Inria Saclay - Ile de France, LMO - Laboratoire de Mathématiques d'Orsay, CNRS - Centre National de la Recherche Scientifique : UMR
Abstract : In co-expression analyses of gene expression data, it is often of interest to interpret clusters of co-expressed genes with respect to a set of external information, such as a potentially incomplete list of functional properties for which a subset of genes may be annotated. Based on the framework of finite mixture models, we propose a model selection criterion that takes into account such external gene annotations, providing an efficient tool for selecting a relevant number of clusters and clustering model. This criterion, called the Integrated Completed Annotated Likelihood (ICAL), is defined by adding an entropy term to a penalized likelihood to measure the concordance between a clustering partition and the external annotation information. The ICAL leads to the choice of a model that is more easily interpretable with respect to the known functional gene annotations. We illustrate the interest of this model selection criterion in conjunction with Gaussian mixture models on simulated gene expression data and on real RNA-seq data.
Type de document :
Pré-publication, Document de travail
2014
Liste complète des métadonnées

Littérature citée [31 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01088870
Contributeur : Melina Gallopin <>
Soumis le : vendredi 28 novembre 2014 - 20:27:50
Dernière modification le : jeudi 11 janvier 2018 - 06:22:14
Document(s) archivé(s) le : vendredi 14 avril 2017 - 23:20:18

Fichiers

draftICAL_gallopin.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01088870, version 1

Collections

Citation

Mélina Gallopin, Gilles Celeux, Florence Jaffrézic, Andrea Rau. A model selection criterion for model-based clustering of annotated gene expression data. 2014. 〈hal-01088870〉

Partager

Métriques

Consultations de la notice

446

Téléchargements de fichiers

384