Skip to Main content Skip to Navigation
Theses

Estimation et sélection de modèle pour le modèle des blocs latents

Abstract : Classification aims at sharing data sets in homogeneous subsets; the observations in a class are more similar than the observations of other classes. The problem is compounded when the statistician wants to obtain a cross classification on the individuals and the variables. The latent block model uses a law for each crossing object class and class variables, and observations are assumed to be independent conditionally on the choice of these classes. However, factorizing the joint distribution of the labels is impossible, obstructing the calculation of the log-likelihood and the using of the EM algorithm. Several methods and criteria exist to find these partitions, some frequentist ones, some bayesian ones, some stochastic ones... In this thesis, we first proposed sufficient conditions to obtain the identifiability of the model. In a second step, we studied two proposed algorithms to counteract the problem of the EM algorithm: the VEM algorithm (Govaert and Nadif (2008)) and the SEM-Gibbs algorithm (Keribin, Celeux and Govaert (2010)). In particular, we analyzed the combination of both and highlighted why the algorithms degenerate (term used to say that it returns empty classes). By choosing priors wise, we then proposed a Bayesian adaptation to limit this phenomenon. In particular, we used a Gibbs sampler and we proposed a stopping criterion based on the statistics of Brooks-Gelman (1998). We also proposed an adaptation of the Largest Gaps algorithm (Channarond et al. (2012)). By taking their demonstrations, we have shown that the labels and parameters estimators obtained are consistent when the number of rows and columns tend to infinity. Furthermore, we proposed a method to select the number of classes in row and column, the estimation provided is also consistent when the number of row and column is very large. To estimate the number of classes, we studied the ICL criterion (Integrated Completed Likelihood) whose we proposed an exact shape. After studying the asymptotic approximation, we proposed a BIC criterion (Bayesian Information Criterion) and we conjecture that the two criteria select the same results and these estimates are consistent; conjecture supported by theoretical and empirical results. Finally, we compared the different combinations and proposed a methodology for co-clustering.
Document type :
Theses
Complete list of metadata

Cited literature [126 references]  Display  Hide  Download

https://hal.inria.fr/tel-01090340
Contributor : Vincent Brault <>
Submitted on : Wednesday, December 3, 2014 - 1:41:16 PM
Last modification on : Thursday, July 22, 2021 - 3:29:36 AM
Long-term archiving on: : Saturday, April 15, 2017 - 2:36:53 AM

Identifiers

  • HAL Id : tel-01090340, version 1

Collections

Citation

Vincent Brault. Estimation et sélection de modèle pour le modèle des blocs latents. Statistiques [math.ST]. Université Paris Sud - Paris XI, 2014. Français. ⟨NNT : 2014PA112238⟩. ⟨tel-01090340⟩

Share

Metrics

Record views

769

Files downloads

556