HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm

Philippe Flajolet; Éric Fusy; Olivier Gandouet; Frédéric Meunier

doi:10.46298/dmtcs.3545

Communication Dans Un Congrès Discrete Mathematics and Theoretical Computer Science Année : 2007

HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm

(1) , (1) , (2) , (1)

1
2

Philippe Flajolet

Fonction : Auteur

Algorithms

Éric Fusy

Fonction : Auteur

Algorithms

Olivier Gandouet

Fonction : Auteur

Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier

Frédéric Meunier

Fonction : Auteur

Algorithms

Résumé

This extended abstract describes and analyses a near-optimal probabilistic algorithm, HYPERLOGLOG, dedicated to estimating the number of \emphdistinct elements (the cardinality) of very large data ensembles. Using an auxiliary memory of m units (typically, "short bytes''), HYPERLOGLOG performs a single pass over the data and produces an estimate of the cardinality such that the relative accuracy (the standard error) is typically about $1.04/\sqrt{m}$. This improves on the best previously known cardinality estimator, LOGLOG, whose accuracy can be matched by consuming only 64% of the original memory. For instance, the new algorithm makes it possible to estimate cardinalities well beyond $10^9$ with a typical accuracy of 2% while using a memory of only 1.5 kilobytes. The algorithm parallelizes optimally and adapts to the sliding window model.

Mots clés

cardinality estimation Probabilistic algorithm

Domaines

Algorithme et structure de données [cs.DS] Mathématique discrète [cs.DM] Combinatoire [math.CO] Géométrie algorithmique [cs.CG]

Fichier principal

dmAH0110.pdf (515.14 Ko)

Origine : Fichiers éditeurs autorisés sur une archive ouverte

Coordination Episciences Iam : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00406166

Soumis le : lundi 17 août 2015-17:00:04

Dernière modification le : jeudi 4 avril 2024-15:46:45

Archivage à long terme le : mercredi 18 novembre 2015-12:17:58

Dates et versions

hal-00406166 , version 1 (21-07-2009)

hal-00406166 , version 2 (17-08-2015)

Identifiants

HAL Id : hal-00406166 , version 2
DOI : 10.46298/dmtcs.3545

Citer

Philippe Flajolet, Éric Fusy, Olivier Gandouet, Frédéric Meunier. HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm. AofA: Analysis of Algorithms, Jun 2007, Juan les Pins, France. pp.137-156, ⟨10.46298/dmtcs.3545⟩. ⟨hal-00406166v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA LIRMM INRIA2 TDS-MACS MIPS UNIV-MONTPELLIER

31870 Consultations

7804 Téléchargements

HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager