Better scoring schemes for the recognition of functional proteins by protomata

Manon Ruffini

Mémoires D'étudiants -- Hal-Inria+ Année : 2017

Better scoring schemes for the recognition of functional proteins by protomata

(1, 2)

1
2

Manon Ruffini

Fonction : Auteur

Dynamics, Logics and Inference for biological Systems and Sequences

École normale supérieure - Rennes

Résumé

Proteins perform very important functions within organisms. Predicting these functions is a major problem in biology. To address this issue, predictive models of functional families from the sequences of amino acids that form the proteins have been developed. The Dyliss team developed a machine learning algorithm, named Protomata-learner, that learns weighted automata representing these families and the possible disjunctions between members. New sequences can be compared to these models and assigned a score to predict their belonging to the family. Despite good results, the sequence weighting strategy and the null-models in Protomata are rather basic. During my internship, I investigated alternative sequence weighting strategies and null-models. Besides, the expressivity of Protomata leads to a great variability of scores and the choice of the classification threshold was left to the user. So, I proposed a normalization of the score, and a method to assess the significance of scores, to simplify the prediction. I implemented these new strategies and compared them on several data sets. Preliminary results show a good improvement of the prediction power of the computed models.

Mots clés

proteins statistical modelling automata Dirichlet mixture sequence weighting null-model significance

Domaines

Bio-Informatique, Biologie Systémique [q-bio.QM]

Fichier principal

Rapport_de_stage_master.pdf (2.11 Mo)

François Coste : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01557941

Soumis le : jeudi 6 juillet 2017-16:43:31

Dernière modification le : vendredi 24 mars 2023-14:53:04

Archivage à long terme le : mercredi 24 janvier 2018-02:40:09

Dates et versions

hal-01557941 , version 1 (06-07-2017)

Identifiants

HAL Id : hal-01557941 , version 1

Citer

Manon Ruffini. Better scoring schemes for the recognition of functional proteins by protomata. Quantitative Methods [q-bio.QM]. 2017. ⟨hal-01557941⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA INSA-RENNES IRISA CENTRALESUPELEC INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

216 Consultations

137 Téléchargements

Better scoring schemes for the recognition of functional proteins by protomata

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager