Skip to Main content Skip to Navigation
Conference papers

findere: fast and precise approximate membership query

Lucas Robidou 1 Pierre Peterlongo 1 
1 GenScale - Scalable, Optimized and Parallel Algorithms for Genomics
Inria Rennes – Bretagne Atlantique , IRISA-D7 - GESTION DES DONNÉES ET DE LA CONNAISSANCE
Abstract : Approximate membership query (AMQ) structures such as Cuckoo filters or Bloom filters are widely used for representing large sets of elements. Their lightweight space usage explains their success, mainly as they are the only way to scale hundreds of billions or trillions of elements. However, they suffer by nature from non-avoidable false-positive calls that bias downstream analyses of methods using these data structures. In this work we propose a simple strategy and its implementation for reducing the false-positive rate of any AMQ data structure indexing \kmers (words of length k). The method we propose, called findere, enables to speed-up the queries by a factor two and to decrease the false-positive rate by two order of magnitudes. This achievement is done on the fly at query time, without modifying the original indexing data-structure, without generating false-negative calls and with no memory overhead. This method yields so-called ``construction false positives'', but the amount of such false positives is negligible when the method is used within classical parameter ranges. This method, as simple as effective, reduces either the false-positive rate or the space required to represent a set given a user-defined false-positive rate.
Document type :
Conference papers
Complete list of metadata

https://hal.inria.fr/hal-03243791
Contributor : Pierre Peterlongo Connect in order to contact the contributor
Submitted on : Thursday, July 22, 2021 - 2:44:31 PM
Last modification on : Saturday, August 6, 2022 - 3:33:10 AM

File

findere_spire_2021.pdf
Files produced by the author(s)

Identifiers

Citation

Lucas Robidou, Pierre Peterlongo. findere: fast and precise approximate membership query. SPIRE 2021 - The 28th annual Symposium on String Processing and Information Retrieval, Oct 2021, Lille / Virtual, France. ⟨10.1101/2021.05.31.446182⟩. ⟨hal-03243791v2⟩

Share

Metrics

Record views

107

Files downloads

112