Order statistics and estimating cardinalities of massive data sets

Frédéric Giroire 1
1 MASCOTTE - Algorithms, simulation, combinatorics and optimization for telecommunications
CRISAM - Inria Sophia Antipolis - Méditerranée , COMRED - COMmunications, Réseaux, systèmes Embarqués et Distribués
Abstract : A new class of algorithms to estimate the cardinality of very large multisets using constant memory and doing only one pass on the data is introduced here. It is based on order statistics rather than on bit patterns in binary representations of numbers. Three families of estimators are analyzed. They attain a standard error of using M units of storage, which places them in the same class as the best known algorithms so far. The algorithms have a very simple internal loop, which gives them an advantage in terms of processing speed. For instance, a memory of only 12 kB and only few seconds are sufficient to process a multiset with several million elements and to build an estimate with accuracy of order 2 percent. The algorithms are validated both by mathematical analysis and by experimentations on real internet traffic., OPTx-editorial-board=yes, OPTx-proceedings=yes, OPTx-international-audience=yes.
Type de document :
Article dans une revue
Discrete Applied Mathematics, Elsevier, 2009, 157 (2), pp.406-427
Liste complète des métadonnées


https://hal.inria.fr/hal-00646123
Contributeur : Frédéric Giroire <>
Soumis le : mardi 29 novembre 2011 - 11:37:07
Dernière modification le : mardi 21 février 2012 - 16:37:53
Document(s) archivé(s) le : lundi 5 décembre 2016 - 00:00:36

Fichier

Gir09.pdf
Fichiers éditeurs autorisés sur une archive ouverte

Identifiants

  • HAL Id : hal-00646123, version 1

Collections

Citation

Frédéric Giroire. Order statistics and estimating cardinalities of massive data sets. Discrete Applied Mathematics, Elsevier, 2009, 157 (2), pp.406-427. <hal-00646123>

Partager

Métriques

Consultations de
la notice

156

Téléchargements du document

139