8735 articles  [version française]

hal-00747387, version 1

Enhancing the selection of a model-based clustering with external qualitative variables

Jean-Patrick Baudry () a1, Margarida Cardoso () b2, Gilles Celeux () 3, Maria-José Amorim () b2, Ana Sousa Ferreira () c2

N° RR-8124 (2012)

Abstract: In cluster analysis, it is often useful to interpret the obtained partition with respect to external qualitative variables (defining known partitions) derived from alternative information. An approach is proposed in the model-based clustering context to select a model and a number of clusters in order to get a partition which both provides a good fit with the data and is related to the external variables. This approach makes use of the integrated joint likelihood of the data, the partition derived from the mixture model and the known partitions. It is worth noticing that the external qualitative variables are only used to select a relevant mixture model. Each mixture model is fitted by the maximum likelihood methodology from the observed data. Numerical experiments illustrate the promising behaviour of the derived criterion.

  • a –  Université Pierre et Marie Curie - Paris VI
  • b –  ISCTE-Lisbon University Institute
  • c –  Universidade de Lisboa
  • 1:  Laboratoire de Statistique Théorique et Appliquée (LSTA)
  • Université Pierre et Marie Curie (UPMC) - Paris VI
  • 2:  Instituto Superior Técnico - Technical University of Lisbon (IST)
  • Technical University of Lisbon
  • 3:  SELECT (INRIA Saclay - Ile de France)
  • INRIA – Université Paris XI - Paris Sud – CNRS : UMR
  • Domain : Mathematics/Statistics
    Statistics/Statistics Theory
  • Keywords : Model-based Clustering – External Qualitative Variables – Model Selection – Integrated Completed Likelihood – ICL
  • Internal note : RR-8124
  • hal-00747387, version 1
  • oai:hal.inria.fr:hal-00747387
  • From: 
  • Submitted on: Wednesday, 31 October 2012 11:17:40
  • Updated on: Wednesday, 7 November 2012 13:07:30