Modélisation et classification des données binaires en grande dimension : application à l'autopsie verbale

Abstract : The lack of reliable data about the causes of mortality still constitutes an obstacle for the development of poor regions in the world. In these countries, it is not always easy to obtain reliable information about morbidity and mortality. Verbal autopsy has become the main source of information about the causes of death in many places. This method is based on structured questionnaires to determine the symptoms and to get information about the possible cause of death. These data lead to the development of diagnosis assistance systems which are often based on classification methods. The problem we tackle is the development of a method for automatic diagnosis using survey data. The final objective is to provide a diagnosis by taking into account the presence or absence of symptoms and sociodemographic variables. This approach is based on the construction of discrimination models from multi-class data with a large number of explanatory variables of binary nature. The first part of this thesis uses a mixture model under the assumption of conditional independence together with dimensionality reduction techniques. The binary nature of the answers requires methods based on similarity measures. Thus, a generalization of several measures of similarity and dissimilarity is exposed in this thesis. Since kernels are of great importance in classification, we also present a kernel construction technique from a similarity measure. The second part of this thesis presents a classification method combining both similarity measures and mixture models. The hierarchical structure of the questions asked during the interview and their interactions allows us to define a structure over the data. To better take into account this structure, we present a hierarchical kernel that takes into account the interactions between variables. This kernel combines a hierarchical structure for the variables with a tree structure with two levels and interaction of variables up to a certain order.
Document type :
Theses
Complete list of metadatas

https://hal.inria.fr/tel-01427119
Contributor : Stephane Girard <>
Submitted on : Thursday, January 5, 2017 - 12:07:50 PM
Last modification on : Monday, July 8, 2019 - 4:56:07 PM
Long-term archiving on : Thursday, April 6, 2017 - 1:06:18 PM

Identifiers

  • HAL Id : tel-01427119, version 1

Collections

Citation

Seydou Nourou Sylla. Modélisation et classification des données binaires en grande dimension : application à l'autopsie verbale. Statistiques [math.ST]. Université Gaston Berger de Saint-Louis (SENEGAL), 2016. Français. ⟨tel-01427119⟩

Share

Metrics

Record views

516

Files downloads

888