Skip to Main content Skip to Navigation
Reports

Evolutionary Latent Class Clustering of Qualitative Data

Damien Tessier 1 Marc Schoenauer 1 Christophe Biernacki 2 Gilles Celeux 3 Gérard Govaert 4
1 TANC - Algorithmic number theory for cryptology
Inria Saclay - Ile de France, LIX - Laboratoire d'informatique de l'École polytechnique [Palaiseau]
3 SELECT - Model selection in statistical learning
LMO - Laboratoire de Mathématiques d'Orsay, Inria Saclay - Ile de France
Abstract : The latent class model or multivariate multinomial mixture is a powerful model for clustering discrete data. This model is expected to be useful to represent non-homogeneous populations. It uses a conditional independence assumption given the latent class to which a statistical unit is belonging. However, whereas a predictive approach of cluster analysis from qualitative data can be easily derived from a fully Bayesian analysis with Jeffreys non informative prior distributions, it leads to a criterion (the integrated completed likelihood derived from the latent class model) that proves difficult to optimize by the standard approach based on the EM algorithm. An Evolutionary Algorithms is designed to tackle this discrete optimization problem, and an extensive parameter study on a large artificial dataset allows to derive stable parameters. A Monte Carlo approach is used to validate those parameters on other artificial datasets, as well as on some well-known real data: the Evolutionary Algorithm seems to repeatedly perform better than other standard clustering techniques on the same data.
Document type :
Reports
Complete list of metadatas

Cited literature [25 references]  Display  Hide  Download

https://hal.inria.fr/inria-00122088
Contributor : Marc Schoenauer <>
Submitted on : Wednesday, December 27, 2006 - 4:36:36 PM
Last modification on : Wednesday, October 14, 2020 - 3:59:01 AM
Long-term archiving on: : Friday, November 25, 2016 - 2:00:20 PM

File

latentEA-RR.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : inria-00122088, version 3

Citation

Damien Tessier, Marc Schoenauer, Christophe Biernacki, Gilles Celeux, Gérard Govaert. Evolutionary Latent Class Clustering of Qualitative Data. [Research Report] RR-6082, INRIA. 2006, pp.24. ⟨inria-00122088v3⟩

Share

Metrics

Record views

572

Files downloads

1080