HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Conference papers

Data Mining Using Hidden Markov Models (HMM2) to Detect Heterogeneities into Bacteria Genomes

Abstract : The Streptococcus genus contains both pathogenic bacteria and bacteria used in the food-processing industry. We are developing a statistical segmentation method to identify heterogeneous sequences such as sequences acquired from recent horizontal transfer or genes weakly or strongly expressed. The method is based on second order Hidden Markov Models (HMM2). After an automatic unsupervised training, this method allows to demarcating some particular areas into a genome. After checking the efficiency of such models on various controls and on chimeric sequences generated in silico, we choose a HMM2 (3-mer, 5 states) to analyse the complete genome sequence of S. Thermophilus CNRZ1066 (1.8 Mb). More the 80 atypical segments were extracted and are currently analysed further.
Document type :
Conference papers
Complete list of metadata

Cited literature [2 references]  Display  Hide  Download

Contributor : Jean-François Mari Connect in order to contact the contributor
Submitted on : Sunday, July 3, 2005 - 6:35:06 PM
Last modification on : Wednesday, February 2, 2022 - 3:55:48 PM
Long-term archiving on: : Thursday, April 1, 2010 - 9:50:23 PM


  • HAL Id : inria-00000142, version 1



Catherine Eng, Annabelle Thibessard, Sébastien Hergalant, Jean-François Mari, Pierre Leblond. Data Mining Using Hidden Markov Models (HMM2) to Detect Heterogeneities into Bacteria Genomes. Journées Ouvertes Biologie, Informatique et Mathématiques - JOBIM 2005, JOBIM, Jul 2005, Lyon/France, France. ⟨inria-00000142⟩



Record views


Files downloads