Skip to Main content Skip to Navigation
Journal articles

Simultaneous Gaussian Model-Based Clustering for Samples of Multiple Origins

Alexandre Lourme 1 Christophe Biernacki 1
1 MODAL - MOdel for Data Analysis and Learning
LPP - Laboratoire Paul Painlevé - UMR 8524, Université de Lille, Sciences et Technologies, Inria Lille - Nord Europe, METRICS - Evaluation des technologies de santé et des pratiques médicales - ULR 2694, Polytech Lille - École polytechnique universitaire de Lille
Abstract : Gaussian mixture model-based clustering is now a standard tool to estimate some hypothetical underlying partition of a single dataset. In this paper, we aim to cluster several different datasets at the same time in a context where underlying populations, even though different, are not completely unrelated: All individuals are described by the same features and partitions of identical meaning are expected. Justifying from some natural arguments a stochastic linear link between the components of the mixtures associated to each dataset, we propose some parsimonious and meaningful models for a so-called simultaneous clustering method. Maximum likelihood mixture parameters, subject to the linear link constraint, can be easily estimated by a Generalized Expectation Maximization (GEM) algorithm that we describe. Some promising results are obtained in a biological context where simultaneous clustering outperforms independent clustering for partitioning three different subspecies of birds. Further results on ornithological data show that the proposed strategy is robust to the relaxation of the exact descriptor concordance which is one of its main assumptions.
Document type :
Journal articles
Complete list of metadata

Cited literature [17 references]  Display  Hide  Download

https://hal.inria.fr/hal-00921041
Contributor : Christophe Biernacki <>
Submitted on : Thursday, December 19, 2013 - 3:38:20 PM
Last modification on : Thursday, July 1, 2021 - 10:17:27 AM
Long-term archiving on: : Thursday, March 20, 2014 - 9:10:13 AM

File

classifsimul.pdf
Publisher files allowed on an open archive

Identifiers

Collections

Citation

Alexandre Lourme, Christophe Biernacki. Simultaneous Gaussian Model-Based Clustering for Samples of Multiple Origins. Computational Statistics, Springer Verlag, 2013, 28, pp.371-391. ⟨10.1007/s00180-012-0305-5⟩. ⟨hal-00921041⟩

Share

Metrics

Record views

447

Files downloads

707