Efficient estimation of the cardinality of large data sets

Abstract : Giroire has recently proposed an algorithm which returns the $\textit{approximate}$ number of distinct elements in a large sequence of words, under strong constraints coming from the analysis of large data bases. His estimation is based on statistical properties of uniform random variables in $[0,1]$. In this note we propose an optimal estimation, using Kullback information and estimation theory.
Type de document :
Communication dans un congrès
Chassaing, Philippe and others. Fourth Colloquium on Mathematics and Computer Science Algorithms, Trees, Combinatorics and Probabilities, 2006, Nancy, France. Discrete Mathematics and Theoretical Computer Science, DMTCS Proceedings vol. AG, Fourth Colloquium on Mathematics and Computer Science Algorithms, Trees, Combinatorics and Probabilities, pp.419-422, 2006, DMTCS Proceedings
Liste complète des métadonnées

Littérature citée [10 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-00095370
Contributeur : Coordination Episciences Iam <>
Soumis le : lundi 17 août 2015 - 14:23:51
Dernière modification le : jeudi 11 janvier 2018 - 06:12:15
Document(s) archivé(s) le : mercredi 18 novembre 2015 - 12:07:29

Fichier

dmAG0137.pdf
Fichiers éditeurs autorisés sur une archive ouverte

Identifiants

  • HAL Id : hal-00095370, version 5

Collections

Citation

Philippe Chassaing, Lucas Gerin. Efficient estimation of the cardinality of large data sets. Chassaing, Philippe and others. Fourth Colloquium on Mathematics and Computer Science Algorithms, Trees, Combinatorics and Probabilities, 2006, Nancy, France. Discrete Mathematics and Theoretical Computer Science, DMTCS Proceedings vol. AG, Fourth Colloquium on Mathematics and Computer Science Algorithms, Trees, Combinatorics and Probabilities, pp.419-422, 2006, DMTCS Proceedings. 〈hal-00095370v5〉

Partager

Métriques

Consultations de la notice

177

Téléchargements de fichiers

66