Skip to Main content Skip to Navigation
Theses

Inférence de réseaux de régulation orientés pour les facteurs de transcription d'Arabidopsis thaliana et création de groupes de co-régulation

Yann Vasseur 1, 2
2 SELECT - Model selection in statistical learning
LMO - Laboratoire de Mathématiques d'Orsay, Inria Saclay - Ile de France
Abstract : This thesis deals with the characterisation of key genes in gene expression regulation, called transcription factors, in the plant Arabidopsis thaliana. Using expression data, our biological goal is to cluster transcription factors in groups of co-regulator transcription factors, and in groups of co-regulated transcription factors. To do so, we propose a two-step procedure. First, we infer the network of regulation between transcription factors. Second, we cluster transcription factors based on their connexion patterns to other transcriptions factors. From a statistical point of view, the transcription factors are the variables and the samples are the observations. The regulatory network between the transcription factors is modelled using a directed graph, where variables are nodes. The estimation of the nodes can be interpreted as a problem of variables selection. To infer the network, we perform LASSO type penalised linear regression. A preliminary approach selects a set of variable along the regularisation path using penalised likelihood criterion. However, this approach is unstable and leads to select too many variables. To overcome this difficulty, we propose to put in competition two selection procedures, designed to deal with high dimension data and mixing linear penalised regression and subsampling. Parameters estimation of the two procedures are designed to lead to select stable set of variables. Stability of results is evaluated on simulated data under a graphical model. Subsequently, we use an unsupervised clustering method on each inferred oriented graph to detect groups of co-regulators and groups of co-regulated. To evaluate the proximity between the two classifications, we have developed an index of comparaison of pairs of partitions whose relevance is tested and promoted. From a practical point of view, we propose a cascade simulation method required to respect the model complexity and inspired from parametric bootstrap, to simulate data under our model. We have validated our model by inspecting the proximity between the two classifications on simulated and real data.
Complete list of metadata

Cited literature [68 references]  Display  Hide  Download

https://hal.inria.fr/tel-01695660
Contributor : Yann Vasseur <>
Submitted on : Monday, January 29, 2018 - 4:11:40 PM
Last modification on : Friday, April 30, 2021 - 9:54:48 AM
Long-term archiving on: : Friday, May 25, 2018 - 8:56:09 AM

File

Manuscrit_definitif.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : tel-01695660, version 1

Citation

Yann Vasseur. Inférence de réseaux de régulation orientés pour les facteurs de transcription d'Arabidopsis thaliana et création de groupes de co-régulation. Méthodologie [stat.ME]. Université Paris Saclay; Laboratoire Select INRIA, 2017. Français. ⟨tel-01695660⟩

Share

Metrics

Record views

292

Files downloads

164