An EM algorithm for mapping short reads in multiple RNA structure probing experiments

Afaf Saaidi 1, 2 Yann Ponty 2, 1 Mathieu Blanchette 3 Mireille Regnier 1, 2 Bruno Sargueil 4
2 AMIB - Algorithms and Models for Integrative Biology
CNRS - Centre National de la Recherche Scientifique : UMR8623, Polytechnique - X, Inria Saclay - Ile de France, UP11 - Université Paris-Sud - Paris 11, LRI - Laboratoire de Recherche en Informatique, LIX - Laboratoire d'informatique de l'École polytechnique [Palaiseau]
Abstract : An accurate mapping of reads against the sequence of reference is the first step to grant a good NGS data analysis. However, when mapping is about assigning reads to a set of RNA variants, in the case of simultaneous sequencing, the task become hard to handle. Many algorithms have been developed to overcome the issue of mapping reads against a set of homologous sequences at one time but the problem is not fully resolved, particularly when dealing with short reads. The issue addressed in our study is much more challenging; In addition to the parallel assignment issue in the presence of short reads, RNA variants molecules, used for the library sequencing preparation step, undergo a specific experimental treatment SHAPE causing the formation of mutations at the level of structurally unpaired nucleotides. Mutations due to SHAPE might lead to a miss-mapping i.e. a read could be derived from a given RNA variant i and because of SHAPE mutations it becomes more appropriate to assign it to the variant j from which the read has the shortest base distance. In an ongoing work, we are trying to resolve the unprecedented mapping question trough an Expectation Maximization (EM) algorithm where each RNA variant from the set of references would be characterized by a SHAPE mutational profile instead of being merely characterized by a sequence of nucleotides. The EM algorithm aims to maximize the likelihood of a read to be derived from a specific RNA variant and to assess its contribution to build the RNA associated mutational profile.
Type de document :
Communication dans un congrès
Matbio2017, Sep 2017, London, United Kingdom. 2017
Liste complète des métadonnées
Contributeur : Afaf Saaidi <>
Soumis le : mardi 19 septembre 2017 - 16:52:27
Dernière modification le : jeudi 11 janvier 2018 - 06:23:08


  • HAL Id : hal-01590528, version 1


Afaf Saaidi, Yann Ponty, Mathieu Blanchette, Mireille Regnier, Bruno Sargueil. An EM algorithm for mapping short reads in multiple RNA structure probing experiments. Matbio2017, Sep 2017, London, United Kingdom. 2017. 〈hal-01590528〉



Consultations de la notice