Efficient estimation of the cardinality of large data sets

Philippe Chassaing; Lucas Gerin

doi:10.46298/dmtcs.3492

Communication Dans Un Congrès Discrete Mathematics and Theoretical Computer Science Année : 2006

Efficient estimation of the cardinality of large data sets

(1) , (1)

Philippe Chassaing

Fonction : Auteur
PersonId : 7545
IdHAL : philippe-chassaing
IdRef : 060774274

Institut Élie Cartan de Nancy

Lucas Gerin

Fonction : Auteur
PersonId : 835101

Institut Élie Cartan de Nancy

Résumé

Giroire has recently proposed an algorithm which returns the $\textit{approximate}$ number of distinct elements in a large sequence of words, under strong constraints coming from the analysis of large data bases. His estimation is based on statistical properties of uniform random variables in $[0,1]$. In this note we propose an optimal estimation, using Kullback information and estimation theory.

Mots clés

cardinality large multiset approximate counting

Domaines

Algorithme et structure de données [cs.DS] Mathématique discrète [cs.DM] Combinatoire [math.CO]

Fichier principal

dmAG0137.pdf (168 Ko)

Origine : Fichiers éditeurs autorisés sur une archive ouverte

Coordination Episciences Iam : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00095370

Soumis le : lundi 17 août 2015-14:23:51

Dernière modification le : jeudi 4 avril 2024-03:09:50

Archivage à long terme le : mercredi 18 novembre 2015-12:07:29

Dates et versions

hal-00095370 , version 1 (12-01-2007)

hal-00095370 , version 2 (28-08-2007)

hal-00095370 , version 3 (29-08-2007)

hal-00095370 , version 4 (22-04-2011)

hal-00095370 , version 5 (17-08-2015)

Identifiants

HAL Id : hal-00095370 , version 5
DOI : 10.46298/dmtcs.3492

Citer

Philippe Chassaing, Lucas Gerin. Efficient estimation of the cardinality of large data sets. Fourth Colloquium on Mathematics and Computer Science Algorithms, Trees, Combinatorics and Probabilities, 2006, Nancy, France. pp.419-422, ⟨10.46298/dmtcs.3492⟩. ⟨hal-00095370v5⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA IECN UNIV-LORRAINE TDS-MACS

651 Consultations

767 Téléchargements

Efficient estimation of the cardinality of large data sets

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager