Order statistics and estimating cardinalities of massive data sets - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Discrete Mathematics and Theoretical Computer Science Année : 2005

Order statistics and estimating cardinalities of massive data sets

Frédéric Giroire

Résumé

We introduce a new class of algorithms to estimate the cardinality of very large multisets using constant memory and doing only one pass on the data. It is based on order statistics rather that on bit patterns in binary representations of numbers. We analyse three families of estimators. They attain a standard error of $\frac{1}{\sqrt{M}}$ using $M$ units of storage, which places them in the same class as the best known algorithms so far. They have a very simple internal loop, which gives them an advantage in term of processing speed. The algorithms are validated on internet traffic traces.
Fichier principal
Vignette du fichier
dmAD0115.pdf (136.54 Ko) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte
Loading...

Dates et versions

hal-01184025 , version 1 (12-08-2015)

Identifiants

Citer

Frédéric Giroire. Order statistics and estimating cardinalities of massive data sets. 2005 International Conference on Analysis of Algorithms, 2005, Barcelona, Spain. pp.157-166, ⟨10.46298/dmtcs.3353⟩. ⟨hal-01184025⟩
65 Consultations
562 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More