Skip to Main content Skip to Navigation

Evolutionary Latent Class Clustering of Qualitative Data

Damien Tessier 1 Marc Schoenauer 1 Christophe Biernacki 2 Gilles Celeux 3 Gérard Govaert 4
1 TANC - Algorithmic number theory for cryptology
Inria Saclay - Ile de France, LIX - Laboratoire d'informatique de l'École polytechnique [Palaiseau]
3 SELECT - Model selection in statistical learning
LMO - Laboratoire de Mathématiques d'Orsay, Inria Saclay - Ile de France
Abstract : The latent class model or multivariate multinomial mixture is a powerful model for clustering discrete data. This model is expected to be useful to represent non-homogeneous populations. It uses a conditional independence assumption given the latent class to which a statistical unit is belonging. However, whereas a predictive approach of cluster analysis from qualitative data can be easily derived from a fully Bayesian analysis with Jeffreys non informative prior distributions, it leads to a criterion (the integrated completed likelihood derived from the latent class model) that proves difficult to optimize by the standard approach based on the EM algorithm. An Evolutionary Algorithms is designed to tackle this discrete optimization problem, and an extensive parameter study on a large artificial dataset allows to derive stable parameters. A Monte Carlo approach is used to validate those parameters on other artificial datasets, as well as on some well-known real data: the Evolutionary Algorithm seems to repeatedly perform better than other standard clustering techniques on the same data.
Document type :
Complete list of metadata

Cited literature [25 references]  Display  Hide  Download
Contributor : Marc Schoenauer Connect in order to contact the contributor
Submitted on : Wednesday, December 27, 2006 - 4:36:36 PM
Last modification on : Tuesday, November 16, 2021 - 4:30:30 AM
Long-term archiving on: : Friday, November 25, 2016 - 2:00:20 PM


Files produced by the author(s)


  • HAL Id : inria-00122088, version 3


Damien Tessier, Marc Schoenauer, Christophe Biernacki, Gilles Celeux, Gérard Govaert. Evolutionary Latent Class Clustering of Qualitative Data. [Research Report] RR-6082, INRIA. 2006, pp.24. ⟨inria-00122088v3⟩



Les métriques sont temporairement indisponibles