Better scoring schemes for the recognition of functional proteins by protomata

Manon Ruffini 1, 2
1 Dyliss - Dynamics, Logics and Inference for biological Systems and Sequences
Inria Rennes – Bretagne Atlantique , IRISA_D7 - GESTION DES DONNÉES ET DE LA CONNAISSANCE
Abstract : Proteins perform very important functions within organisms. Predicting these functions is a major problem in biology. To address this issue, predictive models of functional families from the sequences of amino acids that form the proteins have been developed. The Dyliss team developed a machine learning algorithm, named Protomata-learner, that learns weighted automata representing these families and the possible disjunctions between members. New sequences can be compared to these models and assigned a score to predict their belonging to the family. Despite good results, the sequence weighting strategy and the null-models in Protomata are rather basic. During my internship, I investigated alternative sequence weighting strategies and null-models. Besides, the expressivity of Protomata leads to a great variability of scores and the choice of the classification threshold was left to the user. So, I proposed a normalization of the score, and a method to assess the significance of scores, to simplify the prediction. I implemented these new strategies and compared them on several data sets. Preliminary results show a good improvement of the prediction power of the computed models.
Type de document :
Mémoires d'étudiants -- Hal-inria+
Quantitative Methods [q-bio.QM]. 2017
Liste complète des métadonnées

Littérature citée [32 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01557941
Contributeur : François Coste <>
Soumis le : jeudi 6 juillet 2017 - 16:43:31
Dernière modification le : jeudi 11 janvier 2018 - 06:28:15

Identifiants

  • HAL Id : hal-01557941, version 1

Citation

Manon Ruffini. Better scoring schemes for the recognition of functional proteins by protomata. Quantitative Methods [q-bio.QM]. 2017. 〈hal-01557941〉

Partager

Métriques

Consultations de la notice

146

Téléchargements de fichiers

15