Rare Events and Conditional Events on Random Strings

Abstract : Some strings -the texts- are assumed to be randomly generated, according to a probability model that is either a Bernoulli model or a Markov model. A rare event is the over or under-representation of a word or a set of words. The aim of this paper is twofold. First, a single word is given. One studies the tail distribution of the number of its occurrences. Sharp large deviation estimates are derived. Second, one assumes that a given word is overrepresented. The distribution of a second word is studied; formulae for the expectation and the variance are derived. In both cases, the formulae are accurate and actually computable. These results have applications in computational biology, where a genome is viewed as a text.
Document type :
Journal articles
Liste complète des métadonnées

https://hal.inria.fr/hal-00959004
Contributor : Service Ist Inria Sophia Antipolis-Méditerranée / I3s <>
Submitted on : Thursday, March 13, 2014 - 5:05:07 PM
Last modification on : Friday, May 25, 2018 - 12:02:05 PM
Document(s) archivé(s) le : Friday, June 13, 2014 - 12:10:54 PM

File

dm060203.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00959004, version 1

Collections

Citation

Mireille Régnier, Alain Denise. Rare Events and Conditional Events on Random Strings. Discrete Mathematics and Theoretical Computer Science, DMTCS, 2004, 6 (2), pp.191-214. ⟨hal-00959004⟩

Share

Metrics

Record views

177

Files downloads

326