Skip to Main content Skip to Navigation
Master thesis

Amélioration du positionnement de fragments ADN pour réalisation de séquences consensus dans le contexte de l’assemblage de novo de longues lectures

Victor Epain 1
1 GenScale - Scalable, Optimized and Parallel Algorithms for Genomics
Inria Rennes – Bretagne Atlantique , IRISA-D7 - GESTION DES DONNÉES ET DE LA CONNAISSANCE
Abstract : DNA molecular in silico analysis requires sequencing it in fragments called reads, and then assembling them. Today’s, long reads sequencing technologies offer the possibility to overcome genome’s repeated regions issues with being entirely covered, but product high errors rate data with sequencing errors, like nucleotides insertions or deletions, called indels. De novo assembly is an assembly without using a reference. Although some assemblers already exist according to several methods - as using De Bruijn graphs or by correcting iteratively the reads for example, we propose a two steps strategy : first, we attribute to the maximum of reads a position on a same positions axis and then we product a consensus sequence thanks to the positioning. At these aims, we propose a modelling for the positioning issue with the mixed integer linear programming (MILP), and we present first ideas for the consensus sequences production with MILP too and multiple sequences alignment from the positioning. As the final aim of this strategy is to formalize the genome assembly issue, we structured it according the mathematical method, that permits to target methodological choices precisely, and then reducing the heuristic uses. Finally, we tested the strategy with bacteria genomes. Despite the fact that positioning results show positive ones, the consensus results are less positive but don’t remove the potentiality of the associated methods.
Complete list of metadata

https://hal.inria.fr/hal-03119772
Contributor : Victor Epain Connect in order to contact the contributor
Submitted on : Monday, January 25, 2021 - 3:07:10 PM
Last modification on : Wednesday, November 3, 2021 - 8:09:50 AM
Long-term archiving on: : Monday, April 26, 2021 - 6:35:52 PM

File

rapport_EPAIN_M2_2020.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-03119772, version 1

Citation

Victor Epain. Amélioration du positionnement de fragments ADN pour réalisation de séquences consensus dans le contexte de l’assemblage de novo de longues lectures. Bio-informatique [q-bio.QM]. 2020. ⟨hal-03119772⟩

Share

Metrics

Les métriques sont temporairement indisponibles