Simultaneous Gaussian Model-Based Clustering for Samples of Multiple Origins

Alexandre Lourme 1, 2 Christophe Biernacki 1, 2
1 MODAL - MOdel for Data Analysis and Learning
Inria Lille - Nord Europe, LPP - Laboratoire Paul Painlevé - UMR 8524, CERIM - Santé publique : épidémiologie et qualité des soins-EA 2694, Polytech Lille, Université de Lille 1, IUT’A
Abstract : Gaussian mixture model-based clustering is now a standard tool to estimate some hypothetical underlying partition of a single dataset. In this paper, we aim to cluster several different datasets at the same time in a context where underlying populations, even though different, are not completely unrelated: All individuals are described by the same features and partitions of identical meaning are expected. Justifying from some natural arguments a stochastic linear link between the components of the mixtures associated to each dataset, we propose some parsimonious and meaningful models for a so-called simultaneous clustering method. Maximum likelihood mixture parameters, subject to the linear link constraint, can be easily estimated by a Generalized Expectation Maximization (GEM) algorithm that we describe. Some promising results are obtained in a biological context where simultaneous clustering outperforms independent clustering for partitioning three different subspecies of birds. Further results on ornithological data show that the proposed strategy is robust to the relaxation of the exact descriptor concordance which is one of its main assumptions.
Type de document :
Article dans une revue
Computational Statistics, Springer Verlag, 2013, 152 (3), pp.371-391
Liste complète des métadonnées

Littérature citée [17 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-00921041
Contributeur : Christophe Biernacki <>
Soumis le : jeudi 19 décembre 2013 - 15:38:20
Dernière modification le : mardi 3 juillet 2018 - 11:44:29
Document(s) archivé(s) le : jeudi 20 mars 2014 - 09:10:13

Fichier

classifsimul.pdf
Fichiers éditeurs autorisés sur une archive ouverte

Identifiants

  • HAL Id : hal-00921041, version 1

Collections

Citation

Alexandre Lourme, Christophe Biernacki. Simultaneous Gaussian Model-Based Clustering for Samples of Multiple Origins. Computational Statistics, Springer Verlag, 2013, 152 (3), pp.371-391. 〈hal-00921041〉

Partager

Métriques

Consultations de la notice

338

Téléchargements de fichiers

194