Parallel and memory-efficient reads indexing for genome assembly

Rayan Chikhi 1, * Guillaume Chapuis 1 Dominique Lavenier 1
* Corresponding author
1 SYMBIOSE - Biological systems and models, bioinformatics and sequences
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : As genomes, transcriptomes and meta-genomes are being sequenced at a faster pace than ever, there is a pressing need for efficient genome assembly methods. Two practical issues in assembly are heavy memory usage and long execution time during the read indexing phase. In this article, a parallel and memory-efficient method is proposed for reads indexing prior to assembly. Specifically, a hash-based structure that stores a reduced amount of read information is designed. Erroneous entries are filtered on the fly during index construction. A prototype implementation has been designed and applied to actual Illumina short reads. Benchmark evaluation shows that this indexing method requires significantly less memory than those from popular assemblers.
Complete list of metadatas

Cited literature [18 references]  Display  Hide  Download

https://hal.inria.fr/inria-00637536
Contributor : Rayan Chikhi <>
Submitted on : Wednesday, November 2, 2011 - 11:56:44 AM
Last modification on : Friday, November 16, 2018 - 1:24:33 AM
Long-term archiving on : Friday, February 3, 2012 - 2:25:47 AM

File

CP144.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : inria-00637536, version 1

Citation

Rayan Chikhi, Guillaume Chapuis, Dominique Lavenier. Parallel and memory-efficient reads indexing for genome assembly. Parallel Bio-Computing 2011, Sep 2011, torun, Poland. ⟨inria-00637536⟩

Share

Metrics

Record views

679

Files downloads

510