Skip to Main content Skip to Navigation
Conference papers

Efficient estimation of the cardinality of large data sets

Abstract : Giroire has recently proposed an algorithm which returns the $\textit{approximate}$ number of distinct elements in a large sequence of words, under strong constraints coming from the analysis of large data bases. His estimation is based on statistical properties of uniform random variables in $[0,1]$. In this note we propose an optimal estimation, using Kullback information and estimation theory.
Complete list of metadata

Cited literature [10 references]  Display  Hide  Download

https://hal.inria.fr/hal-00095370
Contributor : Coordination Episciences Iam <>
Submitted on : Monday, August 17, 2015 - 2:23:51 PM
Last modification on : Friday, February 26, 2021 - 3:22:08 AM
Long-term archiving on: : Wednesday, November 18, 2015 - 12:07:29 PM

File

dmAG0137.pdf
Publisher files allowed on an open archive

Identifiers

  • HAL Id : hal-00095370, version 5

Collections

Citation

Philippe Chassaing, Lucas Gerin. Efficient estimation of the cardinality of large data sets. Fourth Colloquium on Mathematics and Computer Science Algorithms, Trees, Combinatorics and Probabilities, 2006, Nancy, France. pp.419-422. ⟨hal-00095370v5⟩

Share

Metrics

Record views

604

Files downloads

658