Skip to Main content Skip to Navigation
Conference papers

Data Mining Using Hidden Markov Models (HMM2) to Detect Heterogeneities into Bacteria Genomes

Abstract : The Streptococcus genus contains both pathogenic bacteria and bacteria used in the food-processing industry. We are developing a statistical segmentation method to identify heterogeneous sequences such as sequences acquired from recent horizontal transfer or genes weakly or strongly expressed. The method is based on second order Hidden Markov Models (HMM2). After an automatic unsupervised training, this method allows to demarcating some particular areas into a genome. After checking the efficiency of such models on various controls and on chimeric sequences generated in silico, we choose a HMM2 (3-mer, 5 states) to analyse the complete genome sequence of S. Thermophilus CNRZ1066 (1.8 Mb). More the 80 atypical segments were extracted and are currently analysed further.
Document type :
Conference papers
Complete list of metadatas

Cited literature [2 references]  Display  Hide  Download
Contributor : Jean-François Mari <>
Submitted on : Sunday, July 3, 2005 - 6:35:06 PM
Last modification on : Friday, June 5, 2020 - 10:58:07 PM
Long-term archiving on: : Thursday, April 1, 2010 - 9:50:23 PM


  • HAL Id : inria-00000142, version 1



Catherine Eng, Annabelle Thibessard, Sébastien Hergalant, Jean-François Mari, Pierre Leblond. Data Mining Using Hidden Markov Models (HMM2) to Detect Heterogeneities into Bacteria Genomes. Journées Ouvertes Biologie, Informatique et Mathématiques - JOBIM 2005, JOBIM, Jul 2005, Lyon/France, France. ⟨inria-00000142⟩



Record views


Files downloads