Evolutionary Latent Class Clustering of Qualitative Data

Damien Tessier 1 Marc Schoenauer 1 Christophe Biernacki 2 Gilles Celeux 3 Gérard Govaert 4
1 TANC - Algorithmic number theory for cryptology
LIX - Laboratoire d'informatique de l'École polytechnique [Palaiseau], Inria Saclay - Ile de France, X - École polytechnique, CNRS - Centre National de la Recherche Scientifique : UMR7161
3 SELECT - Model selection in statistical learning
Inria Saclay - Ile de France, LMO - Laboratoire de Mathématiques d'Orsay, CNRS - Centre National de la Recherche Scientifique : UMR
Abstract : The latent class model or multivariate multinomial mixture is a powerful model for clustering discrete data. This model is expected to be useful to represent non-homogeneous populations. It uses a conditional independence assumption given the latent class to which a statistical unit is belonging. However, whereas a predictive approach of cluster analysis from qualitative data can be easily derived from a fully Bayesian analysis with Jeffreys non informative prior distributions, it leads to a criterion (the integrated completed likelihood derived from the latent class model) that proves difficult to optimize by the standard approach based on the EM algorithm. An Evolutionary Algorithms is designed to tackle this discrete optimization problem, and an extensive parameter study on a large artificial dataset allows to derive stable parameters. A Monte Carlo approach is used to validate those parameters on other artificial datasets, as well as on some well-known real data: the Evolutionary Algorithm seems to repeatedly perform better than other standard clustering techniques on the same data.
Type de document :
Rapport
[Research Report] RR-6082, INRIA. 2006, pp.24
Liste complète des métadonnées

Littérature citée [25 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00122088
Contributeur : Marc Schoenauer <>
Soumis le : mercredi 27 décembre 2006 - 16:36:36
Dernière modification le : mercredi 4 juillet 2018 - 16:44:02
Document(s) archivé(s) le : vendredi 25 novembre 2016 - 14:00:20

Fichier

latentEA-RR.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : inria-00122088, version 3

Collections

Citation

Damien Tessier, Marc Schoenauer, Christophe Biernacki, Gilles Celeux, Gérard Govaert. Evolutionary Latent Class Clustering of Qualitative Data. [Research Report] RR-6082, INRIA. 2006, pp.24. 〈inria-00122088v3〉

Partager

Métriques

Consultations de la notice

483

Téléchargements de fichiers

749