Simultaneous Gaussian Model-Based Clustering for Samples of Multiple Origins

Alexandre Lourme 1, 2 Christophe Biernacki 1, 2
1 MODAL - MOdel for Data Analysis and Learning
Inria Lille - Nord Europe, LPP - Laboratoire Paul Painlevé - UMR 8524, CERIM - Santé publique : épidémiologie et qualité des soins-EA 2694, Polytech Lille - École polytechnique universitaire de Lille, Université de Lille, Sciences et Technologies
Abstract : Gaussian mixture model-based clustering is now a standard tool to estimate some hypothetical underlying partition of a single dataset. In this paper, we aim to cluster several different datasets at the same time in a context where underlying populations, even though different, are not completely unrelated: All individuals are described by the same features and partitions of identical meaning are expected. Justifying from some natural arguments a stochastic linear link between the components of the mixtures associated to each dataset, we propose some parsimonious and meaningful models for a so-called simultaneous clustering method. Maximum likelihood mixture parameters, subject to the linear link constraint, can be easily estimated by a Generalized Expectation Maximization (GEM) algorithm that we describe. Some promising results are obtained in a biological context where simultaneous clustering outperforms independent clustering for partitioning three different subspecies of birds. Further results on ornithological data show that the proposed strategy is robust to the relaxation of the exact descriptor concordance which is one of its main assumptions.
Type de document :
Article dans une revue
Computational Statistics, Springer Verlag, 2013, 152 (3), pp.371-391
Liste complète des métadonnées

Littérature citée [17 références]  Voir  Masquer  Télécharger
Contributeur : Christophe Biernacki <>
Soumis le : jeudi 19 décembre 2013 - 15:38:20
Dernière modification le : mercredi 14 novembre 2018 - 14:40:11
Document(s) archivé(s) le : jeudi 20 mars 2014 - 09:10:13


Fichiers éditeurs autorisés sur une archive ouverte


  • HAL Id : hal-00921041, version 1



Alexandre Lourme, Christophe Biernacki. Simultaneous Gaussian Model-Based Clustering for Samples of Multiple Origins. Computational Statistics, Springer Verlag, 2013, 152 (3), pp.371-391. 〈hal-00921041〉



Consultations de la notice


Téléchargements de fichiers