Skip to Main content Skip to Navigation
Journal articles

Model-based clustering for conditionally correlated categorical data

Matthieu Marbac 1 Christophe Biernacki 1, 2 Vincent Vandewalle 1
1 MODAL - MOdel for Data Analysis and Learning
Inria Lille - Nord Europe, LPP - Laboratoire Paul Painlevé - UMR 8524, METRICS - Evaluation des technologies de santé et des pratiques médicales - ULR 2694, Polytech Lille - École polytechnique universitaire de Lille, Université de Lille, Sciences et Technologies
Abstract : An extension of the latent class model is presented for clustering categorical data by relaxing the classical ''class conditional independence assumption'' of variables. This model consists in grouping the variables into inter-independent and intra-dependent blocks, in order to consider the main intra-class correlations. The dependency between variables grouped inside the same block of a class is taken into account by mixing two extreme distributions, which are respectively the independence and the maximum dependency. When the variables are dependent given the class, this approach is expected to reduce the biases of the latent class model. Indeed, it produces a meaningful dependency model with only a few additional parameters. The parameters are estimated, by maximum likelihood, by means of an EM algorithm. Moreover, a Gibbs sampler is used for model selection in order to overcome the computational intractability of the combinatorial problems involved by the block structure search. Two applications on medical and biological data sets show the relevance of this new model. The results strengthen the view that this model is meaningful and that it reduces the biases induced by the conditional independence assumption of the latent class model.
Complete list of metadata

Cited literature [43 references]  Display  Hide  Download

https://hal.inria.fr/hal-00787757
Contributor : Matthieu Marbac <>
Submitted on : Thursday, July 10, 2014 - 4:17:46 PM
Last modification on : Friday, November 27, 2020 - 2:18:02 PM
Long-term archiving on: : Friday, October 10, 2014 - 12:26:41 PM

File

ccm.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Matthieu Marbac, Christophe Biernacki, Vincent Vandewalle. Model-based clustering for conditionally correlated categorical data. Journal of Classification, Springer Verlag, 2015, 2 (32), pp.145-175. ⟨10.1007/s00357⟩. ⟨hal-00787757v3⟩

Share

Metrics

Record views

592

Files downloads

690