findere: fast and precise approximate membership query - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2021

findere: fast and precise approximate membership query

Résumé

Motivation: Approximate membership query (AMQ) structures as Cuckoo filters or Bloom filters are widely used for representing large sets of elements. Their lightweight space usage explains their success, mainly as they are the only way to scale hundreds of billions or trillions of elements. However, they suffer by nature from non-avoidable false-positive calls that bias downstream analyses of methods using these data structures. Results: In this work we propose a simple strategy and its implementation for reducing the false-positive rate of any AMQ data structure indexing k-mers (words of length k). The method we propose, called findere, enables to speed-up the queries by a factor two and to decrease the false-positive rate by two order of magnitudes. This achievement is done one the fly at query time, without modifying the original indexing data-structure, without generating false-negative calls and with no memory overhead. With no drawback, this method, as simple as it is effective, reduces either the false-positive rate or the space required to represent a set given a user-defined false-positive rate.
Fichier principal
Vignette du fichier
2021.05.31.446182v1.full.pdf (452.68 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03243791 , version 1 (31-05-2021)
hal-03243791 , version 2 (22-07-2021)

Identifiants

Citer

Lucas Robidou, Pierre Peterlongo. findere: fast and precise approximate membership query. 2021. ⟨hal-03243791v1⟩
130 Consultations
169 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More