Space-efficient and exact de Bruijn graph representation based on a Bloom filter

Abstract : The de Bruijn graph data structure is widely used in next-generation sequencing (NGS). Many programs, e.g. de novo assemblers, rely on in-memory representation of this graph. However, current techniques for representing the de Bruijn graph of a human genome require a large amount of memory (> 30 GB). We propose a new encoding of the de Bruijn graph, which occupies an order of magnitude less space than current representations. The encoding is based on a Bloom filter, with an additional structure to remove critical false positives. An assembly software implementing this structure, Minia, performed a complete de novo assembly of human genome short reads using 5.7 Gb of memory in 23 hours.
Document type :
Conference papers
WABI 2012, Sep 2012, Ljubljana, Slovenia. 7534, pp 236-248, 2012, <10.1007/978-3-642-33122-0_19>


https://hal.archives-ouvertes.fr/hal-00753930
Contributor : Rayan Chikhi <>
Submitted on : Monday, November 19, 2012 - 10:53:47 PM
Last modification on : Thursday, November 22, 2012 - 2:52:29 PM

File

minia.pdf
fileSource_public_author

Identifiers

Collections

Citation

Rayan Chikhi, Guillaume Rizk. Space-efficient and exact de Bruijn graph representation based on a Bloom filter. WABI 2012, Sep 2012, Ljubljana, Slovenia. 7534, pp 236-248, 2012, <10.1007/978-3-642-33122-0_19>. <hal-00753930>

Export

Share

Metrics

Consultation de
la notice

176

Téléchargement du document

54