# Data Streams as Random Permutations: the Distinct Element Problem

2 APR - Algorithmes, Programmes et Résolution
LIP6 - Laboratoire d'Informatique de Paris 6
3 ALGORITHMS - Algorithms
Inria Paris-Rocquencourt
Abstract : In this paper, we show that data streams can sometimes usefully be studied as random permutations. This simple observation allows a wealth of classical and recent results from combinatorics to be recycled, with minimal effort, as estimators for various statistics over data streams. We illustrate this by introducing RECORDINALITY, an algorithm which estimates the number of distinct elements in a stream by counting the number of $k$-records occurring in it. The algorithm has a score of interesting properties, such as providing a random sample of the set underlying the stream. To the best of our knowledge, a modified version of RECORDINALITY is the first cardinality estimation algorithm which, in the random-order model, uses neither sampling nor hashing.
Keywords :
Type de document :
Communication dans un congrès
Broutin, Nicolas and Devroye, Luc. 23rd International Meeting on Probabilistic, Combinatorial, and Asymptotic Methods in the Analysis of Algorithms (AofA'12), Jun 2012, Montreal, Canada. Discrete Mathematics and Theoretical Computer Science, DMTCS Proceedings vol. AQ, 23rd Intern. Meeting on Probabilistic, Combinatorial, and Asymptotic Methods for the Analysis of Algorithms (AofA'12), pp.323-338, 2012, DMTCS Proceedings
Domaine :

Littérature citée [27 références]

https://hal.inria.fr/hal-01197221
Contributeur : Coordination Episciences Iam <>
Soumis le : vendredi 11 septembre 2015 - 12:54:37
Dernière modification le : jeudi 11 janvier 2018 - 02:06:28
Document(s) archivé(s) le : mardi 29 décembre 2015 - 00:32:26

### Fichier

dmAQ0124.pdf
Fichiers éditeurs autorisés sur une archive ouverte

### Identifiants

• HAL Id : hal-01197221, version 1

### Citation

Ahmed Helmi, Jérémie Lumbroso, Conrado Martínez, Alfredo Viola. Data Streams as Random Permutations: the Distinct Element Problem. Broutin, Nicolas and Devroye, Luc. 23rd International Meeting on Probabilistic, Combinatorial, and Asymptotic Methods in the Analysis of Algorithms (AofA'12), Jun 2012, Montreal, Canada. Discrete Mathematics and Theoretical Computer Science, DMTCS Proceedings vol. AQ, 23rd Intern. Meeting on Probabilistic, Combinatorial, and Asymptotic Methods for the Analysis of Algorithms (AofA'12), pp.323-338, 2012, DMTCS Proceedings. 〈hal-01197221〉

### Métriques

Consultations de la notice

## 275

Téléchargements de fichiers