Skip to Main content Skip to Navigation
Master thesis

Better scoring schemes for the recognition of functional proteins by protomata

Manon Ruffini 1, 2
1 Dyliss - Dynamics, Logics and Inference for biological Systems and Sequences
Inria Rennes – Bretagne Atlantique , IRISA-D7 - GESTION DES DONNÉES ET DE LA CONNAISSANCE
Abstract : Proteins perform very important functions within organisms. Predicting these functions is a major problem in biology. To address this issue, predictive models of functional families from the sequences of amino acids that form the proteins have been developed. The Dyliss team developed a machine learning algorithm, named Protomata-learner, that learns weighted automata representing these families and the possible disjunctions between members. New sequences can be compared to these models and assigned a score to predict their belonging to the family. Despite good results, the sequence weighting strategy and the null-models in Protomata are rather basic. During my internship, I investigated alternative sequence weighting strategies and null-models. Besides, the expressivity of Protomata leads to a great variability of scores and the choice of the classification threshold was left to the user. So, I proposed a normalization of the score, and a method to assess the significance of scores, to simplify the prediction. I implemented these new strategies and compared them on several data sets. Preliminary results show a good improvement of the prediction power of the computed models.
Document type :
Master thesis
Complete list of metadatas

Cited literature [31 references]  Display  Hide  Download

https://hal.inria.fr/hal-01557941
Contributor : François Coste <>
Submitted on : Thursday, July 6, 2017 - 4:43:31 PM
Last modification on : Saturday, July 11, 2020 - 3:23:06 AM
Long-term archiving on: : Wednesday, January 24, 2018 - 2:40:09 AM

Identifiers

  • HAL Id : hal-01557941, version 1

Citation

Manon Ruffini. Better scoring schemes for the recognition of functional proteins by protomata. Quantitative Methods [q-bio.QM]. 2017. ⟨hal-01557941⟩

Share

Metrics

Record views

301

Files downloads

102