Efficient estimation of the cardinality of large data sets

Abstract : Giroire has recently proposed an algorithm which returns the $\textit{approximate}$ number of distinct elements in a large sequence of words, under strong constraints coming from the analysis of large data bases. His estimation is based on statistical properties of uniform random variables in $[0,1]$. In this note we propose an optimal estimation, using Kullback information and estimation theory.
Keywords :
Type de document :
Communication dans un congrès
Chassaing, Philippe and others. Fourth Colloquium on Mathematics and Computer Science Algorithms, Trees, Combinatorics and Probabilities, 2006, Nancy, France. Discrete Mathematics and Theoretical Computer Science, DMTCS Proceedings vol. AG, Fourth Colloquium on Mathematics and Computer Science Algorithms, Trees, Combinatorics and Probabilities, pp.419-422, 2006, DMTCS Proceedings
Domaine :

Littérature citée [10 références]

https://hal.inria.fr/hal-00095370
Contributeur : Coordination Episciences Iam <>
Soumis le : lundi 17 août 2015 - 14:23:51
Dernière modification le : mardi 24 avril 2018 - 13:55:38
Document(s) archivé(s) le : mercredi 18 novembre 2015 - 12:07:29

Fichier

dmAG0137.pdf
Fichiers éditeurs autorisés sur une archive ouverte

Identifiants

• HAL Id : hal-00095370, version 5

Citation

Philippe Chassaing, Lucas Gerin. Efficient estimation of the cardinality of large data sets. Chassaing, Philippe and others. Fourth Colloquium on Mathematics and Computer Science Algorithms, Trees, Combinatorics and Probabilities, 2006, Nancy, France. Discrete Mathematics and Theoretical Computer Science, DMTCS Proceedings vol. AG, Fourth Colloquium on Mathematics and Computer Science Algorithms, Trees, Combinatorics and Probabilities, pp.419-422, 2006, DMTCS Proceedings. 〈hal-00095370v5〉

Métriques

Consultations de la notice

280

Téléchargements de fichiers