GrAPFI: predicting enzymatic function of proteins from domain similarity graphs - Archive ouverte HAL Access content directly
Journal Articles BMC Bioinformatics Year : 2020

GrAPFI: predicting enzymatic function of proteins from domain similarity graphs

(1) , (1) , (1)
1

Abstract

Background: Thanks to recent developments in genomic sequencing technologies, the number of protein sequences in public databases is growing enormously. To enrich and exploit this immensely valuable data, it is essential to annotate these sequences with functional properties such as Enzyme Commission (EC) numbers, for example. The January 2019 release of the Uniprot Knowledge base (UniprotKB) contains around 140 million protein sequences. However, only about half of a million of these (UniprotKB/SwissProt) have been reviewed and functionally annotated by expert curators using data extracted from the literature and computational analyses. To reduce the gap between the annotated and unannotated protein sequences, it is essential to develop accurate automatic protein function annotation techniques. Results: In this work, we present GrAPFI (Graph-based Automatic Protein Function Inference) for automatically annotating proteins with EC number functional descriptors from a protein domain similarity graph. We validated the performance of GrAPFI using six reference proteomes in UniprotKB/SwissProt, namely Human, Mouse, Rat, Yeast, E. Coli and Arabidopsis thaliana. We also compared GrAPFI with existing EC prediction approaches such as ECPred, DEEPre, and SVMProt. This shows that GrAPFI achieves better accuracy and comparable or better coverage with respect to these earlier approaches. Conclusions: GrAPFI is a novel protein function annotation tool that performs automatic inference on a network of proteins that are related according to their domain composition. Our evaluation of GrAPFI shows that it gives better performance than other state of the art methods. GrAPFI is available at https://gitlab.inria.fr/bsarker/bmc_grapfi.git as a stand alone tool written in Python.
Fichier principal
Vignette du fichier
Sarker et al BMC Bioinformatics-2020.pdf (2.99 Mo) Télécharger le fichier
Origin : Files produced by the author(s)

Dates and versions

hal-03022601 , version 1 (24-11-2020)

Identifiers

Cite

Bishnu Sarker, David Ritchie, Sabeur Aridhi. GrAPFI: predicting enzymatic function of proteins from domain similarity graphs. BMC Bioinformatics, 2020, ⟨10.1186/s12859-020-3460-7⟩. ⟨hal-03022601⟩
60 View
55 Download

Altmetric

Share

Gmail Facebook Twitter LinkedIn More