Model-based clustering of Gaussian copulas for mixed data - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Article Dans Une Revue Communications in Statistics - Theory and Methods Année : 2017

Model-based clustering of Gaussian copulas for mixed data

Résumé

Clustering task of mixed data is a challenging problem. In a probabilistic framework, the main difficulty is due to a shortage of conventional distributions for such data. In this paper, we propose to achieve the mixed data clustering with a Gaussian copula mixture model, since copulas, and in particular the Gaussian ones, are powerful tools for easily modelling the distribution of multivariate variables. Indeed, considering a mixing of continuous, integer and ordinal variables (thus all having a cumulative distribution function), this copula mixture model defines intra-component dependencies similar to a Gaussian mixture, so with classical correlation meaning. Simultaneously, it preserves standard margins associated to continuous, integer and ordered features, namely the Gaussian, the Poisson and the ordered multinomial distributions. As an interesting by-product, the proposed mixture model generalizes many well-known ones and also provides tools of visualization based on the parameters. At a practical level, the Bayesian inference is retained and it is achieved with a Metropolis-within-Gibbs sampler. Experiments on simulated and real data sets finally illustrate the expected advantages of the proposed model for mixed data: flexible and meaningful parametrization combined with visualization features.
Fichier principal
Vignette du fichier
article.pdf (462.92 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00987760 , version 1 (06-05-2014)
hal-00987760 , version 2 (13-08-2014)
hal-00987760 , version 3 (29-09-2015)
hal-00987760 , version 4 (20-12-2016)

Identifiants

Citer

Matthieu Marbac, Christophe Biernacki, Vincent Vandewalle. Model-based clustering of Gaussian copulas for mixed data. Communications in Statistics - Theory and Methods, 2017, 46 (23), pp.11635-11656. ⟨10.1080/03610926.2016.1277753⟩. ⟨hal-00987760v4⟩
698 Consultations
1260 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More