Skip to Main content Skip to Navigation
New interface

Modélisation et classification des données binaires en grande dimension : application à l'autopsie verbale

Seydou Nourou Sylla 1, 2, 3, 4 
2 URMITE - Unité de Recherche sur les Maladies Infectieuses Tropicales Emergentes
URMITE - Unité de Recherche sur les Maladies Infectieuses et Tropicales Emergentes
4 MISTIS - Modelling and Inference of Complex and Structured Stochastic Systems
Inria Grenoble - Rhône-Alpes, Grenoble INP - Institut polytechnique de Grenoble - Grenoble Institute of Technology, LJK - Laboratoire Jean Kuntzmann
Abstract : The lack of reliable data about the causes of mortality still constitutes an obstacle for the development of poor regions in the world. In these countries, it is not always easy to obtain reliable information about morbidity and mortality. Verbal autopsy has become the main source of information about the causes of death in many places. This method is based on structured questionnaires to determine the symptoms and to get information about the possible cause of death. These data lead to the development of diagnosis assistance systems which are often based on classification methods. The problem we tackle is the development of a method for automatic diagnosis using survey data. The final objective is to provide a diagnosis by taking into account the presence or absence of symptoms and sociodemographic variables. This approach is based on the construction of discrimination models from multi-class data with a large number of explanatory variables of binary nature. The first part of this thesis uses a mixture model under the assumption of conditional independence together with dimensionality reduction techniques. The binary nature of the answers requires methods based on similarity measures. Thus, a generalization of several measures of similarity and dissimilarity is exposed in this thesis. Since kernels are of great importance in classification, we also present a kernel construction technique from a similarity measure. The second part of this thesis presents a classification method combining both similarity measures and mixture models. The hierarchical structure of the questions asked during the interview and their interactions allows us to define a structure over the data. To better take into account this structure, we present a hierarchical kernel that takes into account the interactions between variables. This kernel combines a hierarchical structure for the variables with a tree structure with two levels and interaction of variables up to a certain order.
Document type :
Complete list of metadata
Contributor : Stephane Girard Connect in order to contact the contributor
Submitted on : Thursday, January 5, 2017 - 12:07:50 PM
Last modification on : Tuesday, October 25, 2022 - 4:21:37 PM
Long-term archiving on: : Thursday, April 6, 2017 - 1:06:18 PM


  • HAL Id : tel-01427119, version 1



Seydou Nourou Sylla. Modélisation et classification des données binaires en grande dimension : application à l'autopsie verbale. Statistiques [math.ST]. Université Gaston Berger de Saint-Louis (SENEGAL), 2016. Français. ⟨NNT : ⟩. ⟨tel-01427119⟩



Record views


Files downloads