Graph Based Automatic Protein Function Annotation Improved by Semantic Similarity - Archive ouverte HAL Access content directly
Conference Papers Year : 2020

Graph Based Automatic Protein Function Annotation Improved by Semantic Similarity

(1) , (1, 2) , (1) , (1)
1
2

Abstract

Functional annotation of protein is a very challenging task primarily because manual annotation requires a great amount of human efforts and still it’s nearly impossible to keep pace with the exponentially growing number of protein sequences coming into the public databases, thanks to the high throughput sequencing technology. For example, the UniProt Knowledge-base (UniProtKB) is currently the largest and most comprehensive resource for protein sequence and annotation data. According to the November, 2019 release of UniProtKB, some 561,000 sequences are manually reviewed but over 150 million sequences lack reviewed functional annotations. Moreover, it is an expensive deal in terms of the cost it incurs and the time it takes. On the contrary, exploiting this huge quantity of data is important to understand life at the molecular level, and is central to understanding human disease processes and drug discovery. To be useful, protein sequences need to be annotated with functional properties such as Enzyme Commission (EC) numbers and Gene Ontology(GO) terms. The ability to automatically annotate protein sequences in UniProtKB/TrEMBL, the non-reviewed UniProt sequence repository, would represent a major step towards bridging the gap between annotated and un-annotated protein sequences. In this paper, we extend a neighborhood based network inference technique for automatic GO annotation using protein similarity graph built on protein domain and family information. The underlying philosophy of our approach assumes that proteins can be linked through the domains, families, and superfamilies that they share. We propose an efficient pruning and post-processing technique by integrating semantic similarity of GO terms. We show by empirical results that the proposed hierarchical post-processing potentially improves the performance of other GO annotation tools as well.
Fichier principal
Vignette du fichier
IWBBIO_Sarkeretal_paper_207.pdf (226.86 Ko) Télécharger le fichier
Origin : Files produced by the author(s)

Dates and versions

hal-03025827 , version 1 (15-02-2022)

Identifiers

Cite

Bishnu Sarker, Navya Khare, Marie-Dominique Devignes, Sabeur Aridhi. Graph Based Automatic Protein Function Annotation Improved by Semantic Similarity. IWBBIO 2020 - 8th International Work-Conference on Bioinformatics and Biomedical Engineering, May 2020, GRANADA, Spain. pp.261-272, ⟨10.1007/978-3-030-45385-5_24⟩. ⟨hal-03025827⟩
104 View
72 Download

Altmetric

Share

Gmail Facebook Twitter LinkedIn More