HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Conference papers

Data Streams as Random Permutations: the Distinct Element Problem

Abstract : In this paper, we show that data streams can sometimes usefully be studied as random permutations. This simple observation allows a wealth of classical and recent results from combinatorics to be recycled, with minimal effort, as estimators for various statistics over data streams. We illustrate this by introducing RECORDINALITY, an algorithm which estimates the number of distinct elements in a stream by counting the number of $k$-records occurring in it. The algorithm has a score of interesting properties, such as providing a random sample of the set underlying the stream. To the best of our knowledge, a modified version of RECORDINALITY is the first cardinality estimation algorithm which, in the random-order model, uses neither sampling nor hashing.
Complete list of metadata

Cited literature [27 references]  Display  Hide  Download

Contributor : Coordination Episciences Iam Connect in order to contact the contributor
Submitted on : Friday, September 11, 2015 - 12:54:37 PM
Last modification on : Friday, February 4, 2022 - 3:12:56 AM
Long-term archiving on: : Tuesday, December 29, 2015 - 12:32:26 AM


Publisher files allowed on an open archive



Ahmed Helmi, Jérémie Lumbroso, Conrado Martínez, Alfredo Viola. Data Streams as Random Permutations: the Distinct Element Problem. 23rd International Meeting on Probabilistic, Combinatorial, and Asymptotic Methods in the Analysis of Algorithms (AofA'12), Jun 2012, Montreal, Canada. pp.323-338, ⟨10.46298/dmtcs.3002⟩. ⟨hal-01197221⟩



Record views


Files downloads