An efficient SEM algorithm for Gaussian Mixtures with missing data

Vincent Vandewalle 1, 2 C Biernacki 1, 3
1 MODAL - MOdel for Data Analysis and Learning
Inria Lille - Nord Europe, LPP - Laboratoire Paul Painlevé - UMR 8524, CERIM - Santé publique : épidémiologie et qualité des soins-EA 2694, Polytech Lille, Université de Lille 1, IUT’A
Abstract : The missing data problem is well-known for statisticians but its frequency increases with the growing size of modern datasets. In Gaussian model-based clustering, the EM algorithm easily takes into account such data by dealing with two kinds of latent levels: the components and the variables. However, the quite familiar degeneracy problem in Gaussian mixtures is aggravated during the EM runs. Indeed, numerical experiments clearly reveal that degeneracy is quite slow and also more frequent than with complete data. In practice, such situations are difficult to detect efficiently. Consequently, degenerated solutions may be confused with valuable solutions and, in addition, computing time may be wasted through wrong runs. A theoretical and practical study of the degeneracy will be presented. Moreover a simple condition on the latent partition to avoid degeneracy will be exhibited. This condition is used in a constrained version of the Stochastic EM (SEM) algorithm. Numerical experiments on real and simulated data illustrate the good behaviour of the proposed algorithm.
Type de document :
Communication dans un congrès
8th International Conference of the ERCIM WG on Computational and Methodological Statistics, Dec 2015, Londres, United Kingdom. 2015
Liste complète des métadonnées

https://hal.inria.fr/hal-01242588
Contributeur : Vincent Vandewalle <>
Soumis le : lundi 14 décembre 2015 - 15:56:14
Dernière modification le : mardi 3 juillet 2018 - 11:49:02

Identifiants

  • HAL Id : hal-01242588, version 1

Collections

Citation

Vincent Vandewalle, C Biernacki. An efficient SEM algorithm for Gaussian Mixtures with missing data. 8th International Conference of the ERCIM WG on Computational and Methodological Statistics, Dec 2015, Londres, United Kingdom. 2015. 〈hal-01242588〉

Partager

Métriques

Consultations de la notice

264

Téléchargements de fichiers

42