Space-efficient and exact de Bruijn graph representation based on a Bloom filter - Archive ouverte HAL Access content directly
Journal Articles Algorithms for Molecular Biology Year : 2013

Space-efficient and exact de Bruijn graph representation based on a Bloom filter

(1, 2) , (3)
1
2
3

Abstract

Background
The de Bruijn graph data structure is widely used in next-generation sequencing (NGS). Many programs, e.g. de novo assemblers, rely on in-memory representation of this graph. However, current techniques for representing the de Bruijn graph of a human genome require a large amount of memory (≥30 GB).
Results
We propose a new encoding of the de Bruijn graph, which occupies an order of magnitude less space than current representations. The encoding is based on a Bloom filter, with an additional structure to remove critical false positives.
Conclusions
An assembly software implementing this structure, Minia, performed a complete de novo assembly of human genome short reads using 5.7 GB of memory in 23 hours.
Fichier principal
Vignette du fichier
1748-7188-8-22.pdf (561.47 Ko) Télécharger le fichier
Vignette du fichier
1748-7188-8-22.xml (81.22 Ko) Télécharger le fichier
Origin : Publisher files allowed on an open archive
Format : Other
Loading...

Dates and versions

hal-00868805 , version 1 (02-10-2013)

Identifiers

Cite

Rayan Chikhi, Guillaume Rizk. Space-efficient and exact de Bruijn graph representation based on a Bloom filter. Algorithms for Molecular Biology, 2013, 8 (1), pp.22. ⟨10.1186/1748-7188-8-22⟩. ⟨hal-00868805⟩
411 View
934 Download

Altmetric

Share

Gmail Facebook Twitter LinkedIn More