Localized genome assembly from reads to scaffolds: practical traversal of the paired string graph

Rayan Chikhi 1 Dominique Lavenier 1
1 SYMBIOSE - Biological systems and models, bioinformatics and sequences
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : Next-generation de novo short reads assemblers typically use the following strategy: (1) assemble unpaired reads using heuristics leading to contigs; (2) order contigs from paired reads information to produce scaffolds. We propose to unify these two steps by introducing localized assembly: direct construction of scaffolds from reads. To this end, the paired string graph structure is introduced, along with a formal framework for building scaffolds as paths of reads. This framework leads to the design of a novel greedy algorithm for memory-efficient, parallel assembly of paired reads. A prototype implementation of the algorithm has been developed and applied to the assembly of simulated and experimental short reads. Our experiments show that our methods yields longer scaffolds than recent assemblers, and is capable of assembling diploid genomes significantly better than other greedy methods.
Complete list of metadatas

Cited literature [3 references]  Display  Hide  Download

https://hal.inria.fr/inria-00637535
Contributor : Rayan Chikhi <>
Submitted on : Wednesday, November 2, 2011 - 11:53:07 AM
Last modification on : Friday, November 16, 2018 - 1:24:31 AM
Long-term archiving on : Thursday, November 15, 2012 - 10:56:23 AM

File

wabi11_camera.pdf
Files produced by the author(s)

Identifiers

Citation

Rayan Chikhi, Dominique Lavenier. Localized genome assembly from reads to scaffolds: practical traversal of the paired string graph. WABI 2011, Sep 2011, Sarrebruck, Germany. ⟨10.1007/978-3-642-23038-7_4⟩. ⟨inria-00637535⟩

Share

Metrics

Record views

472

Files downloads

238