Skip to Main content Skip to Navigation
Conference papers

HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm

Abstract : This extended abstract describes and analyses a near-optimal probabilistic algorithm, HYPERLOGLOG , dedicated to estimating the number of distinct elements (the cardinality) of very large data ensembles. Using an auxiliary memory of m units (typically, “short bytes”), HYPERLOGLOG performs a single pass over the data and produces an estimate of the cardinality such that the relative accuracy (the standard error) is typically about 1.04/√m. This improves on the best previously known cardinality estimator, LOGLOG, whose accuracy can be matched by consuming only 64% of the original memory. For instance, the new algorithm makes it possible to estimate cardinalities well beyond 10^9 with a typical accuracy of 2% while using a memory of only 1.5 kilobytes. The algorithm parallelizes optimally and adapts to the sliding window model.
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-00406166
Contributor : Eric Fusy Connect in order to contact the contributor
Submitted on : Tuesday, July 21, 2009 - 3:41:06 PM
Last modification on : Friday, October 22, 2021 - 3:07:09 PM
Long-term archiving on: : Monday, October 15, 2012 - 3:31:07 PM

File

FlFuGaMe07.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00406166, version 1

Collections

LIX | X-LIX | X | LIX_EMC

Citation

Philippe Flajolet, Eric Fusy, Olivier Gandouet, Frédéric Meunier. HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm. Analysis of Algorithms 2007 (AofA07), Jun 2007, Juan les pins, France. pp.127--146. ⟨hal-00406166v1⟩

Share

Metrics

Record views

3956

Files downloads

1988