# Efficient estimation of the cardinality of large data sets

Abstract : Giroire has recently proposed an algorithm which returns the $\textit{approximate}$ number of distinct elements in a large sequence of words, under strong constraints coming from the analysis of large data bases. His estimation is based on statistical properties of uniform random variables in $[0,1]$. In this note we propose an optimal estimation, using Kullback information and estimation theory.
Keywords :
Document type :
Conference papers
Domain :

Cited literature [10 references]

https://hal.inria.fr/hal-00095370
Contributor : Coordination Episciences Iam Connect in order to contact the contributor
Submitted on : Monday, August 17, 2015 - 2:23:51 PM
Last modification on : Friday, July 9, 2021 - 11:30:23 AM
Long-term archiving on: : Wednesday, November 18, 2015 - 12:07:29 PM

### File

dmAG0137.pdf
Publisher files allowed on an open archive

### Citation

Philippe Chassaing, Lucas Gerin. Efficient estimation of the cardinality of large data sets. Fourth Colloquium on Mathematics and Computer Science Algorithms, Trees, Combinatorics and Probabilities, 2006, Nancy, France. pp.419-422, ⟨10.46298/dmtcs.3492⟩. ⟨hal-00095370v5⟩

Record views