Skip to Main content Skip to Navigation
Conference papers

On Power-Law Distributed Balls in Bins and its Applications to View Size Estimation

Ioannis Atsonios 1, 2 Olivier Beaumont 1, 2 Nicolas Hanusse 1, 2 Yusik Kim 3, 4
1 CEPAGE - Algorithmics for computationally intensive applications over wide scale distributed platforms
Université Sciences et Technologies - Bordeaux 1, Inria Bordeaux - Sud-Ouest, École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB), CNRS - Centre National de la Recherche Scientifique : UMR5800
3 TAO - Machine Learning and Optimisation
CNRS - Centre National de la Recherche Scientifique : UMR8623, Inria Saclay - Ile de France, UP11 - Université Paris-Sud - Paris 11, LRI - Laboratoire de Recherche en Informatique
Abstract : The view size estimation plays an important role in query optimization. It has been observed that many data follow a power law distribution. In this paper, we consider the balls in bins problem where we place balls into $N$ bins when the bin selection probabilities follow a power law distribution. As a generalization to the coupon collector's problem, we address the problem of determining the expected number of balls that need to be thrown in order to have at least one ball in each of the $N$ bins. We prove that $\Theta(\frac{N^\alpha \ln N}{c_N^{\alpha}})$ balls are needed to achieve this where $\alpha$ is the parameter of the power law distribution and $c_N^{\alpha}=\frac{\alpha-1}{\alpha-N^{\alpha-1}}$ for $\alpha \neq 1$ and $c_N^{\alpha}=\frac{1}{\ln N}$ for $\alpha=1$. Next, when fixing the number of balls that are thrown to $T$, we provide closed form upper and lower bounds on the expected number of bins that have at least one occupant. For $n$ large and $\alpha>1$, we prove that our bounds are tight up to a constant factor of $\left(\frac{\alpha}{\alpha-1}\right)^{1-\frac{1}{\alpha}} \leq e^{1/e} \simeq 1.4$.
Complete list of metadatas

Cited literature [13 references]  Display  Hide  Download

https://hal.inria.fr/inria-00618785
Contributor : Olivier Beaumont <>
Submitted on : Friday, September 2, 2011 - 8:53:40 PM
Last modification on : Wednesday, September 16, 2020 - 5:04:16 PM
Long-term archiving on: : Saturday, December 3, 2011 - 2:27:11 AM

File

isaac.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : inria-00618785, version 1

Collections

Citation

Ioannis Atsonios, Olivier Beaumont, Nicolas Hanusse, Yusik Kim. On Power-Law Distributed Balls in Bins and its Applications to View Size Estimation. ISAAC, Dec 2011, Yokohama, Japan. ⟨inria-00618785⟩

Share

Metrics

Record views

471

Files downloads

792