Abstract : We propose a new data mining method based on second-order hidden Markov models (HMM2) that implements a background model coupled with dedicated a posteriori decoding algorithms to extract DNA heterogeneities. An unsupervised training and a state splitting algorithm specify a HMM2 that observe fixed length sequences (k-mer and k-d-k mer) rather than nucleotides. The training process does not require any a priori knowledge. We tested this data mining method on the Actinomycete genomes (Streptomyces and Mycobacterium) and found many sequences that appear to be parts of the binding sites for transcriptional factors.
https://hal.inria.fr/inria-00001213 Contributor : Agnès VidardConnect in order to contact the contributor Submitted on : Wednesday, April 5, 2006 - 3:57:05 PM Last modification on : Friday, March 11, 2022 - 9:52:27 AM Long-term archiving on: : Wednesday, March 29, 2017 - 12:13:01 PM
Files
Restricted access
To satisfy the distribution rights of the publisher, the document is embargoed
until : jamais
Sébastien Hergalant, Bertrand Aigle, Pierre Leblond, Jean-François Mari. Fouille de données du génome à l'aide de modèles de Markov cachés. Extraction et Gestion de Connaissances - EGC 2005, Jan 2005, Paris/France, France. pp.141 -- 148. ⟨inria-00001213⟩