Space-efficient and exact de Bruijn graph representation based on a Bloom filter

Rayan Chikhi 1, 2, * Guillaume Rizk 3
* Auteur correspondant
1 GenScale - Scalable, Optimized and Parallel Algorithms for Genomics
Inria Rennes – Bretagne Atlantique , IRISA-D7 - GESTION DES DONNÉES ET DE LA CONNAISSANCE
Abstract : Background
The de Bruijn graph data structure is widely used in next-generation sequencing (NGS). Many programs, e.g. de novo assemblers, rely on in-memory representation of this graph. However, current techniques for representing the de Bruijn graph of a human genome require a large amount of memory (≥30 GB).
Results
We propose a new encoding of the de Bruijn graph, which occupies an order of magnitude less space than current representations. The encoding is based on a Bloom filter, with an additional structure to remove critical false positives.
Conclusions
An assembly software implementing this structure, Minia, performed a complete de novo assembly of human genome short reads using 5.7 GB of memory in 23 hours.
Type de document :
Article dans une revue
Algorithms for Molecular Biology, BioMed Central, 2013, 8 (1), pp.22. <10.1186/1748-7188-8-22>
Liste complète des métadonnées

https://hal.inria.fr/hal-00868805
Contributeur : Ed. Bmc <>
Soumis le : mercredi 2 octobre 2013 - 05:40:08
Dernière modification le : jeudi 9 février 2017 - 15:38:50
Document(s) archivé(s) le : vendredi 3 janvier 2014 - 04:32:03

Fichiers

1748-7188-8-22.pdf
Fichiers éditeurs autorisés sur une archive ouverte

Identifiants

Citation

Rayan Chikhi, Guillaume Rizk. Space-efficient and exact de Bruijn graph representation based on a Bloom filter. Algorithms for Molecular Biology, BioMed Central, 2013, 8 (1), pp.22. <10.1186/1748-7188-8-22>. <hal-00868805>

Partager

Métriques

Consultations de
la notice

390

Téléchargements du document

469