Rare Events and Conditional Events on Random Strings

Abstract : Some strings -the texts- are assumed to be randomly generated, according to a probability model that is either a Bernoulli model or a Markov model. A rare event is the over or under-representation of a word or a set of words. The aim of this paper is twofold. First, a single word is given. One studies the tail distribution of the number of its occurrences. Sharp large deviation estimates are derived. Second, one assumes that a given word is overrepresented. The distribution of a second word is studied; formulae for the expectation and the variance are derived. In both cases, the formulae are accurate and actually computable. These results have applications in computational biology, where a genome is viewed as a text.
Type de document :
Article dans une revue
Discrete Mathematics and Theoretical Computer Science, DMTCS, 2004, 6 (2), pp.191-214
Liste complète des métadonnées

https://hal.inria.fr/hal-00959004
Contributeur : Service Ist Inria Sophia Antipolis-Méditerranée / I3s <>
Soumis le : jeudi 13 mars 2014 - 17:05:07
Dernière modification le : jeudi 11 janvier 2018 - 06:20:11
Document(s) archivé(s) le : vendredi 13 juin 2014 - 12:10:54

Fichier

dm060203.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00959004, version 1

Citation

Mireille Régnier, Alain Denise. Rare Events and Conditional Events on Random Strings. Discrete Mathematics and Theoretical Computer Science, DMTCS, 2004, 6 (2), pp.191-214. 〈hal-00959004〉

Partager

Métriques

Consultations de la notice

132

Téléchargements de fichiers

100