A new data mining approach for the detection of bacterial promoters combining stochastic and combinatorial methods

Catherine Eng 1, * Charu Asthana 2 Bertrand Aigle 1 Sébastien Hergalant 2 Jean-Francois Mari 2 Pierre Leblond 1
* Auteur correspondant
2 ORPAILLEUR - Knowledge representation, reasonning
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : We present a new data mining method based on stochastic analysis (HMM for Hidden Markov Model) and combinatorial methods for discovering new transcriptional factors in bacterial genome sequences. Sigma factor binding sites (SFBSs) were described as patterns of box1 - spacer - box2 corresponding to the -35 and -10 DNA motifs of bacterial promoters. We used a high-order Hidden Markov Model in which the hidden process is a second-order Markov chain. Applied on the genome of the model bacterium Streptomyces coelicolor (2), the a posteriori state probabilities revealed local maxima or peaks whose distribution was enriched in the intergenic sequences (``iPeaks'' for intergenic peaks). Short DNA sequences underlying the iPeaks were extracted and clustered by a hierarchical classification algorithm based on the SmithWaterman local similarity. Some selected motif consensuses were used as box1 (-35 motif) in the search of a potential neighbouring box2 (-10 motif) using a word enumeration algorithm. This new SFBS mining methodology applied on Streptomyces coelicolor was successful to retrieve already known SFBSs and to suggest new potential transcriptional factor binding sites (TFBSs). The well defined SigR regulon (oxidative stress response) was also used as a test quorum to compare first and second-order HMM. Our approach also allowed the preliminary detection of known SFBSs in Bacillus subtilis.
Type de document :
Article dans une revue
Journal of Computational Biology, Mary Ann Liebert, 2009, 16 (9), pp.1211-1225. 〈10.1089/cmb.2008.0122〉
Liste complète des métadonnées

Littérature citée [77 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00419969
Contributeur : Jean-François Mari <>
Soumis le : jeudi 17 février 2011 - 10:36:15
Dernière modification le : vendredi 20 juillet 2018 - 14:24:02
Document(s) archivé(s) le : mercredi 18 mai 2011 - 02:21:16

Fichier

jcb-eng.pdf
Fichiers éditeurs autorisés sur une archive ouverte

Identifiants

Collections

Citation

Catherine Eng, Charu Asthana, Bertrand Aigle, Sébastien Hergalant, Jean-Francois Mari, et al.. A new data mining approach for the detection of bacterial promoters combining stochastic and combinatorial methods. Journal of Computational Biology, Mary Ann Liebert, 2009, 16 (9), pp.1211-1225. 〈10.1089/cmb.2008.0122〉. 〈inria-00419969〉

Partager

Métriques

Consultations de la notice

447

Téléchargements de fichiers

346