Order statistics and estimating cardinalities of massive data sets

Frédéric Giroire 1
1 ALGORITHMS - Algorithms
Inria Paris-Rocquencourt
Abstract : We introduce a new class of algorithms to estimate the cardinality of very large multisets using constant memory and doing only one pass on the data. It is based on order statistics rather that on bit patterns in binary representations of numbers. We analyse three families of estimators. They attain a standard error of $\frac{1}{\sqrt{M}}$ using $M$ units of storage, which places them in the same class as the best known algorithms so far. They have a very simple internal loop, which gives them an advantage in term of processing speed. The algorithms are validated on internet traffic traces.
Type de document :
Communication dans un congrès
Conrado Martínez. 2005 International Conference on Analysis of Algorithms, 2005, Barcelona, Spain. Discrete Mathematics and Theoretical Computer Science, DMTCS Proceedings vol. AD, International Conference on Analysis of Algorithms, pp.157-166, 2005, DMTCS Proceedings
Liste complète des métadonnées

Littérature citée [12 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01184025
Contributeur : Coordination Episciences Iam <>
Soumis le : mercredi 12 août 2015 - 15:51:21
Dernière modification le : mardi 17 avril 2018 - 11:34:30
Document(s) archivé(s) le : vendredi 13 novembre 2015 - 11:39:58

Fichier

dmAD0115.pdf
Fichiers éditeurs autorisés sur une archive ouverte

Identifiants

  • HAL Id : hal-01184025, version 1

Collections

Citation

Frédéric Giroire. Order statistics and estimating cardinalities of massive data sets. Conrado Martínez. 2005 International Conference on Analysis of Algorithms, 2005, Barcelona, Spain. Discrete Mathematics and Theoretical Computer Science, DMTCS Proceedings vol. AD, International Conference on Analysis of Algorithms, pp.157-166, 2005, DMTCS Proceedings. 〈hal-01184025〉

Partager

Métriques

Consultations de la notice

137

Téléchargements de fichiers

94