Efficient estimation of the cardinality of large data sets

Philippe Chassaing; Lucas Gerin

Communication Dans Un Congrès Année : 2006

Efficient estimation of the cardinality of large data sets

(1) , (1)

Philippe Chassaing

Fonction : Auteur
PersonId : 7545
IdHAL : philippe-chassaing
IdRef : 060774274

Institut Élie Cartan de Nancy

Lucas Gerin

Fonction : Auteur
PersonId : 835101

Institut Élie Cartan de Nancy

Résumé

F.Giroire has recently proposed an algorithm which returns the approximate number of distincts elements in a large sequence of words, under strong constraints coming from the analysis of large data bases. His estimation is based on statistical properties of uniform random variables in $[0,1]$. In this note we propose an optimal estimation, using Kullback information and estimation theory.

Mots clés

cardinality large multiset approximate counting data stream algorithms

Domaines

Probabilités [math.PR]

Fichier principal

EstimationEtendue.pdf (130.56 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Lucas Gerin : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00095370

Soumis le : mercredi 29 août 2007-15:13:05

Dernière modification le : mercredi 3 avril 2024-13:54:02

Archivage à long terme le : jeudi 23 septembre 2010-17:01:24

Dates et versions

hal-00095370 , version 1 (12-01-2007)

hal-00095370 , version 2 (28-08-2007)

hal-00095370 , version 3 (29-08-2007)

hal-00095370 , version 4 (22-04-2011)

hal-00095370 , version 5 (17-08-2015)

Identifiants

HAL Id : hal-00095370 , version 3
ARXIV : math/0701347

Citer

Philippe Chassaing, Lucas Gerin. Efficient estimation of the cardinality of large data sets. 4th Colloquium on Mathematics and Computer Science, 2006, France. pp.419-422. ⟨hal-00095370v3⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

651 Consultations

767 Téléchargements

Efficient estimation of the cardinality of large data sets

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Altmetric

Partager