Skip to Main content Skip to Navigation
Master thesis

De novo long reads assembly using integer linear programming

Victor Epain 1
1 GenScale - Scalable, Optimized and Parallel Algorithms for Genomics
Inria Rennes – Bretagne Atlantique , IRISA-D7 - GESTION DES DONNÉES ET DE LA CONNAISSANCE
Abstract : In silico studying a genome requires two steps: sequencing it with cloning and cutting the genome in several reads, and then, assembling the reads. It is well known that the number of sequencing errors is proportional to the reads' size. However, the use of long reads can be an advantage against genome repeated regions issues. De novo is an assembly method which does not use a reference. The purpose of the described here tool, named LOREAS, is to be a de novo assembler in two tasks: first, ordering the long reads, and then, obtaining a consensus sequence of the ordered reads. Currently, only the first task was realised. While other de novo long reads assemblers use heuristics and De Bruijn graphs, LOREAS is based on overlaps similarity between all the long reads. It uses integer linear programming, to find the heaviest path in a graph $G= (V,E,λ)$, where V is the vertices set corresponding to the long reads set, E the set of edges associated with the overlaps between long reads – weighted by λ: the overlap length. When this graph is too huge, the set of reads V is partitioned in several parts. Then, all the parts are solved sequentially. Here we present the solution concerning the first task related to ten bacteria genomes. Seven of them have been successfully solved for less than 12 minutes on a laptop.
Complete list of metadata

Cited literature [24 references]  Display  Hide  Download

https://hal.inria.fr/hal-02413832
Contributor : Victor Epain <>
Submitted on : Thursday, May 21, 2020 - 4:16:09 PM
Last modification on : Thursday, January 7, 2021 - 4:35:52 PM

File

M1_BioInformatic_Internship_re...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02413832, version 3

Citation

Victor Epain. De novo long reads assembly using integer linear programming. Operations Research [cs.RO]. 2019. ⟨hal-02413832v3⟩

Share

Metrics

Record views

47

Files downloads

114