Large deviation properties for patterns

Mireille Regnier 1, 2 Jérémie Bourdon 3
1 AMIB - Algorithms and Models for Integrative Biology
LIX - Laboratoire d'informatique de l'École polytechnique [Palaiseau], LRI - Laboratoire de Recherche en Informatique, UP11 - Université Paris-Sud - Paris 11, Inria Saclay - Ile de France
Abstract : Deciding whether a given pattern is over- or under-represented according to a given background model is a key question in computational biology. Such a decision is usually made by computing some p-values reflecting the ''exceptionality'' of a pattern in a given sequence or set of sequences. In the simplest cases (short and simple patterns, simple background model, small number of sequences), an exact p-value can be computed with a tractable complexity. The realistic cases are in general too complicated to get such an exact $p$-value. Approximations are thus proposed (Gaussian, Poisson, Large deviation approximations). These approximations are applicable under some conditions: Gaussian approximations are valid in the central domain while Poisson and Large deviation approximations are valid for rare events. In the present paper, we prove a large deviation approximation to the double strands counting problem that refers to a counting of a given pattern in a set of sequences that arise from both strands of the genome. In that case, dependencies between a sequence and its reverse complement cannot be neglected. They are captured here for a Bernoulli model from general combinatorial properties of the pattern. A large deviation result is also provided for a set of small sequences.
Complete list of metadatas
Contributor : Mireille Regnier <>
Submitted on : Tuesday, October 1, 2013 - 2:54:27 PM
Last modification on : Wednesday, March 27, 2019 - 4:41:29 PM

Links full text



Mireille Regnier, Jérémie Bourdon. Large deviation properties for patterns. Journal of Discrete Algorithms, Elsevier, 2013, ⟨⟩. ⟨10.1016/j.jda.2013.09.004⟩. ⟨hal-00868462⟩



Record views