Skip to Main content Skip to Navigation

Classification and clustering for network inférence from RNA-seq data

Mélina Gallopin 1
1 SELECT - Model selection in statistical learning
LMO - Laboratoire de Mathématiques d'Orsay, Inria Saclay - Ile de France
Abstract : This thesis gathers methodologicals contributions to the statistical analysis of next-generation high-throughput transcriptome sequencing data (RNA-seq). RNA-seq data are discrete and the number of samples sequenced is usually small due to the cost of the technology. These two points are the main statistical challenges for modelling RNA-seq data.The first part of the thesis is dedicated to the co-expression analysis of RNA-seq data using model-based clustering. A natural model for discrete RNA-seq data is a Poisson mixture model. However, a Gaussian mixture model in conjunction with a simple transformation applied to the data is a reasonable alternative. We propose to compare the two alternatives using a data-driven criterion to select the model that best fits each dataset. In addition, we present a model selection criterion to take into account external gene annotations. This model selection criterion is not specific to RNA-seq data. It is useful in any co-expression analysis using model-based clustering designed to enrich functional annotation databases.The second part of the thesis is dedicated to network inference using graphical models. The aim of network inference is to detect relationships among genes based on their expression. We propose a network inference model based on a Poisson distribution taking into account the discrete nature and high inter sample variability of RNA-seq data. However, network inference methods require a large number of samples. For Gaussian graphical models, we propose a non-asymptotic approach to detect relevant subsets of genes based on a block-diagonale decomposition of the covariance matrix. This method is not specific to RNA-seq data and reduces the dimension of any network inference problem based on the Gaussian graphical model.
Complete list of metadata

Cited literature [142 references]  Display  Hide  Download
Contributor : Melina Gallopin <>
Submitted on : Monday, January 29, 2018 - 12:18:59 PM
Last modification on : Friday, April 30, 2021 - 9:54:48 AM
Long-term archiving on: : Friday, May 25, 2018 - 9:26:19 AM


Files produced by the author(s)


  • HAL Id : tel-01695408, version 1


Mélina Gallopin. Classification and clustering for network inférence from RNA-seq data. Applications [stat.AP]. Université Paris Sud, 2015. English. ⟨tel-01695408⟩



Record views


Files downloads