Reconsidering the significance of genomic word frequencies.

Abstract : By conventional wisdom, a feature that occurs too often or too rarely in a genome can indicate a functional element. To infer functionality from frequency, it is crucial to precisely characterize occurrences in randomly evolving DNA. We find that the frequency of oligonucleotides in a genomic sequence follows primarily a Pareto-lognormal distribution, which encapsulates lognormal and power-law features found across all known genomes. Such a distribution could be the result of completely random evolution by a copying process. Our characterization of the entire frequency distribution of genomic words opens a way to a more accurate reasoning about their over- and underrepresentation in genomic sequences.
Liste complète des métadonnées

https://hal.inria.fr/inria-00448737
Contributeur : Laurent Noé <>
Soumis le : mardi 26 janvier 2010 - 09:49:20
Dernière modification le : mardi 6 mars 2018 - 17:40:54

Lien texte intégral

Identifiants

Citation

Miklós Csűrös, Laurent Noé, Gregory Kucherov. Reconsidering the significance of genomic word frequencies.. Trends in Genetics, Elsevier, 2007, 23 (11), pp.543-6. 〈http://linkinghub.elsevier.com/retrieve/pii/S0168952507002983〉. 〈10.1016/j.tig.2007.07.008〉. 〈inria-00448737〉

Partager

Métriques

Consultations de la notice

140