An optimal cardinality estimation algorithm based on order statistics and its full analysis

Jérémie Lumbroso 1
1 APR - Algorithmes, Programmes et Résolution
LIP6 - Laboratoire d'Informatique de Paris 6
Abstract : Building on the ideas of Flajolet and Martin (1985), Alon et al. (1987), Bar-Yossef et al. (2002), Giroire (2005), we develop a new algorithm for cardinality estimation, based on order statistics which, according to Chassaing and Gerin (2006), is optimal among similar algorithms. This algorithm has a remarkably simple analysis that allows us to take its $\textit{fine-tuning}$ and the $\textit{characterization of its properties}$ further than has been done until now. We prove that, asymptotically, it is $\textit{strictly unbiased}$ (contrarily to Probabilistic Counting, Loglog, Hyperloglog), we verify that its relative precision is about $1/\sqrt{m-2}$ when $m$ words of storage are used, and we fully characterize the limit law of the estimates it provides, in terms of gamma distribution―-this is the first such algorithm for which the limit law has been established. We also develop a Poisson analysis for the pre-asymptotic regime. In this way, we are able to devise a complete algorithm, covering all cardinalities ranges from $0$ to very large.
Type de document :
Communication dans un congrès
Drmota, Michael and Gittenberger, Bernhard. 21st International Meeting on Probabilistic, Combinatorial, and Asymptotic Methods in the Analysis of Algorithms (AofA'10), Jun 2010, Vienna, Austria. Discrete Mathematics and Theoretical Computer Science, DMTCS Proceedings vol. AM, 21st International Meeting on Probabilistic, Combinatorial, and Asymptotic Methods in the Analysis of Algorithms (AofA'10), pp.489-504, 2010, DMTCS Proceedings
Liste complète des métadonnées

Littérature citée [12 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01185578
Contributeur : Coordination Episciences Iam <>
Soumis le : jeudi 20 août 2015 - 16:32:49
Dernière modification le : jeudi 11 janvier 2018 - 06:26:46
Document(s) archivé(s) le : mercredi 26 avril 2017 - 09:57:44

Fichier

dmAM0134.pdf
Fichiers éditeurs autorisés sur une archive ouverte

Identifiants

  • HAL Id : hal-01185578, version 1

Collections

Citation

Jérémie Lumbroso. An optimal cardinality estimation algorithm based on order statistics and its full analysis. Drmota, Michael and Gittenberger, Bernhard. 21st International Meeting on Probabilistic, Combinatorial, and Asymptotic Methods in the Analysis of Algorithms (AofA'10), Jun 2010, Vienna, Austria. Discrete Mathematics and Theoretical Computer Science, DMTCS Proceedings vol. AM, 21st International Meeting on Probabilistic, Combinatorial, and Asymptotic Methods in the Analysis of Algorithms (AofA'10), pp.489-504, 2010, DMTCS Proceedings. 〈hal-01185578〉

Partager

Métriques

Consultations de la notice

118

Téléchargements de fichiers

140