Modélisation de l'expression des gènes à partir de données de séquence ADN

Abstract : Gene expression is tightly controlled to ensure a wide variety of cell types and functions. The development of diseases, particularly cancers, is invariably related to deregulations of these controls. Our objective is to model the link between gene expression and nucleotide composition of different regulatory regions in the genome. We propose to address this problem in a regression framework using a Lasso approach coupled to a regression tree. We use exclusively sequence data and we fit a different model for each cell type. We show that (i) different regulatory regions provide particular and complementary information and that (ii) the only information contained in the nucleotide compositions allows predicting gene expression with an error comparable to that obtained using experimental data. Moreover, the fitted linear model is not as powerful for all genes, but better fit certain groups of genes with particular nucleotides compositions.
Document type :
Conference papers
Complete list of metadatas

Cited literature [6 references]  Display  Hide  Download
Contributor : May Taha <>
Submitted on : Thursday, March 14, 2019 - 5:39:34 PM
Last modification on : Saturday, March 30, 2019 - 2:05:12 AM
Long-term archiving on : Saturday, June 15, 2019 - 8:18:04 PM


Files produced by the author(s)


  • HAL Id : hal-02068289, version 1



May Taha, Chloé Bessière, Florent Petitprez, Jimmy Vandel, Jean-Michel Marin, et al.. Modélisation de l'expression des gènes à partir de données de séquence ADN. JdS 2017, 49èmes Journées de Statistique de la SFdS, May 2017, Avignon, France. ⟨hal-02068289⟩



Record views


Files downloads