Better scoring schemes for the recognition of functional proteins by protomata - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Mémoires D'étudiants -- Hal-Inria+ Année : 2017

Better scoring schemes for the recognition of functional proteins by protomata

Résumé

Proteins perform very important functions within organisms. Predicting these functions is a major problem in biology. To address this issue, predictive models of functional families from the sequences of amino acids that form the proteins have been developed. The Dyliss team developed a machine learning algorithm, named Protomata-learner, that learns weighted automata representing these families and the possible disjunctions between members. New sequences can be compared to these models and assigned a score to predict their belonging to the family. Despite good results, the sequence weighting strategy and the null-models in Protomata are rather basic. During my internship, I investigated alternative sequence weighting strategies and null-models. Besides, the expressivity of Protomata leads to a great variability of scores and the choice of the classification threshold was left to the user. So, I proposed a normalization of the score, and a method to assess the significance of scores, to simplify the prediction. I implemented these new strategies and compared them on several data sets. Preliminary results show a good improvement of the prediction power of the computed models.
Fichier principal
Vignette du fichier
Rapport_de_stage_master.pdf (2.11 Mo) Télécharger le fichier
Loading...

Dates et versions

hal-01557941 , version 1 (06-07-2017)

Identifiants

  • HAL Id : hal-01557941 , version 1

Citer

Manon Ruffini. Better scoring schemes for the recognition of functional proteins by protomata. Quantitative Methods [q-bio.QM]. 2017. ⟨hal-01557941⟩
216 Consultations
137 Téléchargements

Partager

Gmail Facebook X LinkedIn More