Efficient and accurate P-value computation for Position Weight Matrices

Helene Touzet 1, 2 Jean-Stéphane Varré 1, 2
2 SEQUOIA - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe
Abstract : Background Position Weight Matrices (PWMs) are probabilistic representations of signals in sequences. They are widely used to model approximate patterns in DNA or in protein sequences. The usage of PWMs needs as a prerequisite to knowing the statistical significance of a word according to its score. This is done by defining the P-value of a score, which is the probability that the background model can achieve a score larger than or equal to the observed value. This gives rise to the following problem: Given a P-value, find the corresponding score threshold. Existing methods rely on dynamic programming or probability generating functions. For many examples of PWMs, they fail to give accurate results in a reasonable amount of time. Results The contribution of this paper is two fold. First, we study the theoretical complexity of the problem, and we prove that it is NP-hard. Then, we describe a novel algorithm that solves the P-value problem efficiently. The main idea is to use a series of discretized score distributions that improves the final result step by step until some convergence criterion is met. Moreover, the algorithm is capable of calculating the exact P-value without any error, even for matrices with non-integer coefficient values. The same approach is also used to devise an accurate algorithm for the reverse problem: finding the P-value for a given score. Both methods are implemented in a software called TFM-PVALUE, that is freely available. Conclusion We have tested TFM-PVALUE on a large set of PWMs representing transcription factor binding sites. Experimental results show that it achieves better performance in terms of computational time and precision than existing tools.
Type de document :
Article dans une revue
Algorithms for Molecular Biology, BioMed Central, 2007, 2 (15), pp.1-12
Liste complète des métadonnées

Littérature citée [15 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00270263
Contributeur : Jean-Stéphane Varré <>
Soumis le : vendredi 4 avril 2008 - 11:28:34
Dernière modification le : jeudi 11 janvier 2018 - 06:22:13
Document(s) archivé(s) le : vendredi 28 septembre 2012 - 12:15:59

Fichier

Algorithms_for_Molecular_Biolo...
Fichiers éditeurs autorisés sur une archive ouverte

Identifiants

  • HAL Id : inria-00270263, version 1

Collections

Citation

Helene Touzet, Jean-Stéphane Varré. Efficient and accurate P-value computation for Position Weight Matrices. Algorithms for Molecular Biology, BioMed Central, 2007, 2 (15), pp.1-12. 〈inria-00270263〉

Partager

Métriques

Consultations de la notice

195

Téléchargements de fichiers

145