Skip to Main content Skip to Navigation
New interface
Conference papers

Compressed Weighted de Bruijn Graphs

Abstract : We propose a new compressed representation for weighted de Bruijn graphs, which is based on the idea of delta-encoding the variations of k-mer abundances on a spanning branching of the graph. Our new data structure is likely to be of practical value: to give an idea, when combined with the compressed BOSS de Bruijn graph representation, it encodes the weighted de Bruijn graph of a 16x-covered DNA read-set (60M distinct k-mers, k = 28) within 4.15 bits per distinct k-mer and can answer abundance queries in about 60 microseconds on a standard machine. In contrast, state of the art tools declare a space usage of at least 30 bits per distinct k-mer for the same task, which is confirmed by our experiments. As a by-product of our new data structure, we exhibit efficient compressed data structures for answering partial sums on edge-weighted trees, which might be of independent interest.
Document type :
Conference papers
Complete list of metadata

https://hal.inria.fr/hal-03395413
Contributor : Marie-France Sagot Connect in order to contact the contributor
Submitted on : Friday, October 22, 2021 - 1:58:54 PM
Last modification on : Tuesday, May 17, 2022 - 2:50:02 PM
Long-term archiving on: : Sunday, January 23, 2022 - 8:02:28 PM

File

LIPIcs-CPM-2021-16.pdf
Publisher files allowed on an open archive

Identifiers

Collections

Citation

Giuseppe F Italiano, Nicola Prezza, Blerina Sinaimeri, Rossano Venturini. Compressed Weighted de Bruijn Graphs. CPM 2021 - 32nd Annual Symposium on Combinatorial Pattern Matching, Jul 2021, Wroclaw, Poland. pp.1-16, ⟨10.4230/LIPIcs.CPM.2021.16⟩. ⟨hal-03395413⟩

Share

Metrics

Record views

47

Files downloads

87