Probabilistic context-free grammars for classification of helix-helix contact sites and recognition of amyloidogenic peptides

Witold Dyrka; Florence Thirion; Jean-Christophe Nebel; Malgorzata Kotulska

Poster Année : 2013

Probabilistic context-free grammars for classification of helix-helix contact sites and recognition of amyloidogenic peptides

(1, 2) , (2) , (3) , (2)

1
2
3

Witold Dyrka

Fonction : Auteur
PersonId : 950372

Models and Algorithms for the Genome

Institute of Biomedical Engineering and Instrumentation - Group of Bioinformatics and Biophysics of Nanopores

Florence Thirion

Fonction : Auteur

Institute of Biomedical Engineering and Instrumentation - Group of Bioinformatics and Biophysics of Nanopores

Jean-Christophe Nebel

Fonction : Auteur
PersonId : 950379

Bioinformatics & Genomic Signal Processing Research Group

Malgorzata Kotulska

Fonction : Auteur
PersonId : 945076

Institute of Biomedical Engineering and Instrumentation - Group of Bioinformatics and Biophysics of Nanopores

Résumé

Hidden Markov Models power many state-of-the-art tools in the field of protein bioinformatics. While excelling in their tasks, these methods of protein analysis do not convey directly information on medium and long-range residue-residue interactions. This requires an expressive power of at least context-free grammars. However, application of more powerful grammar formalisms to protein analysis has been surprisingly limited. We have developed a probabilistic grammatical framework for problem-specific protein languages, which has been already successfully applied to recognition of ligand binding sites. The core of the model consists of a probabilistic context-free grammar (PCFG), automatically inferred by a genetic algorithm from only a generic set of expert-based rules and positive training sequences. Here, we show that the PCFG approach matches state-of-the-art performance in two other tasks: classification of transmembrane helix-helix pairs and recognition of amyloidogenic peptides. First, the framework was applied to produce grammar descriptors of four classes of transmembrane helix-helix contact sites. The highest performance of the classifiers reached AUC ROC of 0.70. Second, the analogous approach was used to distinguish between amyloidogenic and non-amyloidogenic protein fragments. It yielded good results whether these fragments were isolated or within an entire protein (AUC ROC up to 0.80). Finally, an attempt to model pairing amyloidogenic fragments resulted in classifiers reaching AUC ROC of 0.70. A significant feature of the PCFG method is that grammar rules and parse trees are human-readable, and thus could provide biologically meaningful information.

Domaines

Bio-informatique [q-bio.QM] Bio-Informatique, Biologie Systémique [q-bio.QM]

Fichier principal

wdyrka_ptbi2013_abstrakt.pdf (98.09 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Witold Dyrka : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00937763

Soumis le : mardi 28 janvier 2014-18:00:08

Dernière modification le : vendredi 24 mars 2023-14:52:58

Archivage à long terme le : mardi 29 avril 2014-09:55:10

Dates et versions

hal-00937763 , version 1 (28-01-2014)

Identifiants

HAL Id : hal-00937763 , version 1

Citer

Witold Dyrka, Florence Thirion, Jean-Christophe Nebel, Malgorzata Kotulska. Probabilistic context-free grammars for classification of helix-helix contact sites and recognition of amyloidogenic peptides. 11th Workshop on Bioinformatics and 6th Symposium of the Polish Bioinformatics Society, Sep 2013, Wroclaw, Poland. ⟨hal-00937763⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA LABRI INRIA2

283 Consultations

91 Téléchargements

Probabilistic context-free grammars for classification of helix-helix contact sites and recognition of amyloidogenic peptides

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager