Skip to Main content Skip to Navigation
Theses

Régression pénalisée de type Lasso pour l'analyse de données biologiques de grande dimension : application à la charge virale du VIH censurée par une limite de quantification et aux données compositionnelles du microbiote

Perrine Soret 1, 2
2 SISTM - Statistics In System biology and Translational Medicine
Inria Bordeaux - Sud-Ouest, BPH - Bordeaux population health
Abstract : In clinical studies and thanks to technological advances, the amount of information collected from the same patient is constantly increasing, leading to situations where the number of explanatory variables is greater than the number of individuals. The Lasso method has proven to be appropriate in the face of over-adjustment problems encountered in high-dimensional settings. This thesis is devoted to the application and development of penalized Lasso-type regressions for clinical data with particular structures. First, in patients with human immunodeficiency virus, mutations in the genes of the virus may be related to the development of resistance to particular treatments. Viral load prediction based on (potentially large number of) mutations helps to guide the choice of treatments. Below a threshold, the viral load is undetectable; we are talking about left-censored data. We propose two new Lasso approaches to the iterative Buckley-James algorithm consisting in imputing censored values with a conditional expectation. By reversing the answer, we can reduce this to a problem of right-censorship, for which non-parametric estimates of conditional expectation have been proposed in survival analysis. Second, we propose a parametric estimate based on a Gaussian hypothesis. Secondly, we are interested in the role of the microbiota in the deterioration of respiratory health. The microbiota data are in the form of relative abundances (proportion of each species per individual, called compositional data) and they have a phylogenetic structure. We have established state of the art methods of statistical analysis of microbiota data. Due to the novelty, few recommendations exist on the applicability and effectiveness of the proposed methods. A simulation study allowed us to compare the selection capacity of penalization methods proposed specifically for this type of data. Then we apply this research to the analysis of the association between bacteria / fungi andthe decline in lung function in cystic fibrosis patients of the MucoFong project.
Complete list of metadata

Cited literature [263 references]  Display  Hide  Download

https://hal.inria.fr/tel-02425113
Contributor : Marta Avalos <>
Submitted on : Sunday, December 29, 2019 - 5:58:56 PM
Last modification on : Tuesday, March 23, 2021 - 10:42:04 AM
Long-term archiving on: : Monday, March 30, 2020 - 12:29:35 PM

File

Mémoire_de_Thèse_Perrine_Sor...
Files produced by the author(s)

Identifiers

  • HAL Id : tel-02425113, version 1

Collections

Citation

Perrine Soret. Régression pénalisée de type Lasso pour l'analyse de données biologiques de grande dimension : application à la charge virale du VIH censurée par une limite de quantification et aux données compositionnelles du microbiote. Machine Learning [stat.ML]. Université de bordeaux, 2019. Français. ⟨tel-02425113⟩

Share

Metrics

Record views

115

Files downloads

834