# The weighted words collector

* Auteur correspondant
3 AMIB - Algorithms and Models for Integrative Biology
LIX - Laboratoire d'informatique de l'École polytechnique [Palaiseau], LRI - Laboratoire de Recherche en Informatique, UP11 - Université Paris-Sud - Paris 11, Inria Saclay - Ile de France, X - École polytechnique, CNRS - Centre National de la Recherche Scientifique : UMR8623
Abstract : We consider the word collector problem, i.e. the expected number of calls to a random weighted generator before all the words of a given length in a language are generated. The originality of this instance of the non-uniform coupon collector lies in the, potentially large, multiplicity of the words/coupons of a given probability/composition. We obtain a general theorem that gives an asymptotic equivalent for the expected waiting time of a general version of the Coupon Collector. This theorem is especially well-suited for classes of coupons featuring high multiplicities. Its application to a given language essentially necessitates knowledge on the number of words of a given composition/probability. We illustrate the application of our theorem, in a step-by-step fashion, on four exemplary languages, whose analyses reveal a large diversity of asymptotic waiting times, generally expressible as $\kappa \cdot m^p \cdot (\log{m})^q \cdot (\log \log{m})^r$, for $m$ the number of words, and $p, q, r$ some positive real numbers.
Keywords :
Type de document :
Communication dans un congrès
Nicolas, Broutin (INRIA, France) and Luc, Devroye (McGill, Cana. AOFA - 23rd International Meeting on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms - 2012, Jun 2012, Montreal, Canada. DMTCS, DMTCS Proceedings vol. AQ, 23rd Intern. Meeting on Probabilistic, Combinatorial, and Asymptotic Methods for the Analysis of Algorithms (AofA'12), pp.243--264, 2012, 〈http://www.dmtcs.org/dmtcs-ojs/index.php/proceedings/article/view/dmAQ0120〉

Littérature citée [19 références]

https://hal.inria.fr/hal-00666399
Contributeur : Yann Ponty <>
Soumis le : lundi 16 avril 2012 - 23:16:52
Dernière modification le : jeudi 10 mai 2018 - 02:06:12
Document(s) archivé(s) le : mardi 17 juillet 2012 - 02:36:59

### Fichiers

WordCollector-Daniele-Jeremie-...
Fichiers produits par l'(les) auteur(s)

### Identifiants

• HAL Id : hal-00666399, version 2
• ARXIV : 1202.0920

### Citation

Jérémie Du Boisberranger, Danièle Gardy, Yann Ponty. The weighted words collector. Nicolas, Broutin (INRIA, France) and Luc, Devroye (McGill, Cana. AOFA - 23rd International Meeting on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms - 2012, Jun 2012, Montreal, Canada. DMTCS, DMTCS Proceedings vol. AQ, 23rd Intern. Meeting on Probabilistic, Combinatorial, and Asymptotic Methods for the Analysis of Algorithms (AofA'12), pp.243--264, 2012, 〈http://www.dmtcs.org/dmtcs-ojs/index.php/proceedings/article/view/dmAQ0120〉. 〈hal-00666399v2〉

### Métriques

Consultations de la notice

## 467

Téléchargements de fichiers