Accurate prediction of the statistics of repetitions in random sequences: a case study in Archaea genomes

Mireille Regnier 1, 2 Philippe Chassignet 2, 1
1 AMIB - Algorithms and Models for Integrative Biology
LIX - Laboratoire d'informatique de l'École polytechnique [Palaiseau], LRI - Laboratoire de Recherche en Informatique, UP11 - Université Paris-Sud - Paris 11, Inria Saclay - Ile de France
Abstract : Repetitive patterns in genomic sequences have a great biological significance and also algorithmic implications. Analytic combinatorics allow to derive formula for the expected length of repetitions in a random sequence. Asymptotic results, that generalize previous works on a binary alphabet, are easily computable. Simulations on random sequences show their accuracy. As an application, the sample case of Archaea genomes illustrates how biological sequences may differ from random sequences.
Document type :
Journal articles
Complete list of metadatas

Cited literature [18 references]  Display  Hide  Download

https://hal.inria.fr/hal-01304366
Contributor : Mireille Regnier <>
Submitted on : Tuesday, April 19, 2016 - 4:29:54 PM
Last modification on : Wednesday, March 27, 2019 - 4:41:29 PM
Long-term archiving on : Tuesday, November 15, 2016 - 6:28:56 AM

File

revision4.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01304366, version 1

Citation

Mireille Regnier, Philippe Chassignet. Accurate prediction of the statistics of repetitions in random sequences: a case study in Archaea genomes. Frontiers in Bioengineering and Biotechnology, Frontiers, 2016. ⟨hal-01304366⟩

Share

Metrics

Record views

595

Files downloads

150