Analysis of pattern overlaps and exact computation of P-values of pattern occurrences numbers: case of Hidden Markov Models

Mireille Régnier 1, 2 Evgenia Furletova 3 Victor Yakovlev 3 Mikhail Roytberg 4, 5
2 AMIB - Algorithms and Models for Integrative Biology
LIX - Laboratoire d'informatique de l'École polytechnique [Palaiseau], LRI - Laboratoire de Recherche en Informatique, UP11 - Université Paris-Sud - Paris 11, Inria Saclay - Ile de France
Abstract : We present a novel algorithm, SufPref, computing an exact pvalue for Hidden Markov models (HMM). The algorithm inductively traverses specific data structure, the overlap graph. Nodes of the graph are associated with the overlaps of words from a given set H. Edges are associated to the prefix and suffix relations between ovelaps. An originality of our data structure is that pattern H need not be explicitly represented in nodes or leaves. The algorithm relies on the Cartesian product of the overlap graph and the graph of HMM states; the approach is analogous to a weighted automaton approach. The gain in size of SufPref data structure leads to significant space and time complexity improvements. We suppose that all words in the pattern H are of the same length m. The algorithm SufPref was implemented as a C++ program; it can be used both as Web-server and a stand alone program for Linux and Windows.
Complete list of metadatas

Cited literature [37 references]  Display  Hide  Download

https://hal.inria.fr/hal-00858701
Contributor : Mireille Regnier <>
Submitted on : Thursday, September 5, 2013 - 5:46:05 PM
Last modification on : Friday, April 26, 2019 - 11:28:44 AM
Long-term archiving on : Thursday, April 6, 2017 - 4:00:35 PM

File

jwtemplate.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Mireille Régnier, Evgenia Furletova, Victor Yakovlev, Mikhail Roytberg. Analysis of pattern overlaps and exact computation of P-values of pattern occurrences numbers: case of Hidden Markov Models. Algorithms for Molecular Biology, BioMed Central, 2014, 9 (1), ⟨10.1186/s13015-014-0025-1⟩. ⟨hal-00858701⟩

Share

Metrics

Record views

655

Files downloads

373