Enhancing the selection of a model-based clustering with external qualitative variables

Jean-Patrick Baudry 1 Margarida Cardoso 2 Gilles Celeux 3 Maria-José Amorim 4 Ana Sousa Ferreira 5
2 BRU-UNIDE
IST - Instituto Superior Técnico, Universidade Técnica de Lisboa
3 SELECT - Model selection in statistical learning
Inria Saclay - Ile de France, LMO - Laboratoire de Mathématiques d'Orsay
4 ISEL
IST - Instituto Superior Técnico, Universidade Técnica de Lisboa
5 BRU-UNIDE & CEAUL
IST - Instituto Superior Técnico, Universidade Técnica de Lisboa
Abstract : In cluster analysis, it is often useful to interpret the obtained partition with respect to external qualitative variables (defining known partitions) derived from alternative information. An approach is proposed in the model-based clustering context to select a model and a number of clusters in order to get a partition which both provides a good fit with the data and is related to the external variables. This approach makes use of the integrated joint likelihood of the data, the partition derived from the mixture model and the known partitions. It is worth noticing that the external qualitative variables are only used to select a relevant mixture model. Each mixture model is fitted by the maximum likelihood methodology from the observed data. Numerical experiments illustrate the promising behaviour of the derived criterion.
Complete list of metadatas

Cited literature [8 references]  Display  Hide  Download

https://hal.inria.fr/hal-00747387
Contributor : Gilles Celeux <>
Submitted on : Wednesday, October 31, 2012 - 11:17:40 AM
Last modification on : Monday, January 13, 2020 - 1:59:27 PM
Long-term archiving on: Friday, February 1, 2013 - 3:37:53 AM

File

RR-8124.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00747387, version 1

Citation

Jean-Patrick Baudry, Margarida Cardoso, Gilles Celeux, Maria-José Amorim, Ana Sousa Ferreira. Enhancing the selection of a model-based clustering with external qualitative variables. [Research Report] RR-8124, INRIA. 2012, pp.14. ⟨hal-00747387⟩

Share

Metrics

Record views

986

Files downloads

5292