Skip to Main content Skip to Navigation
Journal articles

Probabilistic grammatical model for helix‐helix contact site classification

Witold Dyrka 1, 2, * Jean‐christophe Nebel 3 Malgorzata Kotulska 1, *
* Corresponding author
2 MAGNOME - Models and Algorithms for the Genome
CNRS - Centre National de la Recherche Scientifique : UMR5800, UB - Université de Bordeaux, Inria Bordeaux - Sud-Ouest
Abstract : Background
Hidden Markov Models power many state‐of‐the‐art tools in the field of protein bioinformatics. While excelling in their tasks, these methods of protein analysis do not convey directly information on medium‐ and long‐range residue‐residue interactions. This requires an expressive power of at least context‐free grammars. However, application of more powerful grammar formalisms to protein analysis has been surprisingly limited.
Results
In this work, we present a probabilistic grammatical framework for problem‐specific protein languages and apply it to classification of transmembrane helix‐helix pairs configurations. The core of the model consists of a probabilistic context‐free grammar, automatically inferred by a genetic algorithm from only a generic set of expert‐based rules and positive training samples. The model was applied to produce sequence based descriptors of four classes of transmembrane helix‐helix contact site configurations. The highest performance of the classifiers reached A U C R O C of 0.70. The analysis of grammar parse trees revealed the ability of representing structural features of helix‐helix contact sites.
Conclusions
We demonstrated that our probabilistic context‐free framework for analysis of protein sequences outperforms the state of the art in the task of helix‐helix contact site classification. However, this is achieved without necessarily requiring modeling long range dependencies between interacting residues. A significant feature of our approach is that grammar rules and parse trees are human‐readable. Thus they could provide biologically meaningful information for molecular biologists.
Document type :
Journal articles
Complete list of metadata

Cited literature [140 references]  Display  Hide  Download

https://hal.inria.fr/hal-00925929
Contributor : Ed. Bmc <>
Submitted on : Wednesday, January 8, 2014 - 9:07:59 PM
Last modification on : Thursday, February 11, 2021 - 2:54:03 PM
Long-term archiving on: : Wednesday, April 9, 2014 - 3:30:25 AM

Files

1748-7188-8-31.pdf
Publisher files allowed on an open archive

Identifiers

Collections

Citation

Witold Dyrka, Jean‐christophe Nebel, Malgorzata Kotulska. Probabilistic grammatical model for helix‐helix contact site classification. Algorithms for Molecular Biology, BioMed Central, 2013, 8 (1), pp.31. ⟨10.1186/1748-7188-8-31⟩. ⟨hal-00925929⟩

Share

Metrics

Record views

419

Files downloads

833