Variable Selection for Clustering with Gaussian Mixture Models

Cathy Maugis 1 Gilles Celeux 1 Marie-Laure Martin-Magniette 2
1 SELECT - Model selection in statistical learning
Inria Saclay - Ile de France, LMO - Laboratoire de Mathématiques d'Orsay, CNRS - Centre National de la Recherche Scientifique : UMR
Abstract : This article is concerned with variable selection for cluster analysis. The problem is regarded as a model selection problem in the model-based cluster analysis context. A general model generalizing the model of Raftery and Dean (2006) is proposed to specify the role of each variable. This model does not need any prior assumptions about the link between the selected and discarded variables. Models are compared with BIC. Variables role is obtained through an algorithm embedding two backward stepwise variable selection algorithms for clustering and linear regression. The consistency of the resulting criterion is proved under regularity conditions. Numerical experiments on simulated datasets and a genomics application highlight the interest of the proposed variable selection procedure.
Type de document :
Rapport
[Research Report] RR-6211, INRIA. 2007, pp.35
Liste complète des métadonnées

Littérature citée [39 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00153057
Contributeur : Rapport de Recherche Inria <>
Soumis le : lundi 11 juin 2007 - 10:31:24
Dernière modification le : jeudi 11 janvier 2018 - 06:22:14
Document(s) archivé(s) le : vendredi 25 novembre 2016 - 15:12:33

Fichiers

RR-6211.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : inria-00153057, version 2

Collections

Citation

Cathy Maugis, Gilles Celeux, Marie-Laure Martin-Magniette. Variable Selection for Clustering with Gaussian Mixture Models. [Research Report] RR-6211, INRIA. 2007, pp.35. 〈inria-00153057v2〉

Partager

Métriques

Consultations de la notice

285

Téléchargements de fichiers

905