Skip to Main content Skip to Navigation
Journal articles

Model-based clustering for conditionally correlated categorical data

Matthieu Marbac 1 Christophe Biernacki 1, 2 Vincent Vandewalle 1 
1 MODAL - MOdel for Data Analysis and Learning
LPP - Laboratoire Paul Painlevé - UMR 8524, Université de Lille, Sciences et Technologies, Inria Lille - Nord Europe, METRICS - Evaluation des technologies de santé et des pratiques médicales - ULR 2694, Polytech Lille - École polytechnique universitaire de Lille
Abstract : An extension of the latent class model is presented for clustering categorical data by relaxing the classical ''class conditional independence assumption'' of variables. This model consists in grouping the variables into inter-independent and intra-dependent blocks, in order to consider the main intra-class correlations. The dependency between variables grouped inside the same block of a class is taken into account by mixing two extreme distributions, which are respectively the independence and the maximum dependency. When the variables are dependent given the class, this approach is expected to reduce the biases of the latent class model. Indeed, it produces a meaningful dependency model with only a few additional parameters. The parameters are estimated, by maximum likelihood, by means of an EM algorithm. Moreover, a Gibbs sampler is used for model selection in order to overcome the computational intractability of the combinatorial problems involved by the block structure search. Two applications on medical and biological data sets show the relevance of this new model. The results strengthen the view that this model is meaningful and that it reduces the biases induced by the conditional independence assumption of the latent class model.
Complete list of metadata

Cited literature [43 references]  Display  Hide  Download

https://hal.inria.fr/hal-00787757
Contributor : Matthieu Marbac Connect in order to contact the contributor
Submitted on : Thursday, July 10, 2014 - 4:17:46 PM
Last modification on : Wednesday, March 23, 2022 - 3:51:05 PM
Long-term archiving on: : Friday, October 10, 2014 - 12:26:41 PM

File

ccm.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Matthieu Marbac, Christophe Biernacki, Vincent Vandewalle. Model-based clustering for conditionally correlated categorical data. Journal of Classification, Springer Verlag, 2015, 2 (32), pp.145-175. ⟨10.1007/s00357⟩. ⟨hal-00787757v3⟩

Share

Metrics

Record views

417

Files downloads

422