Exploiting Complex Protein Domain Networks for Protein Function Annotation

Bishnu Sarker 1, 2 David Ritchie 1 Sabeur Aridhi 2, 1
1 CAPSID - Computational Algorithms for Protein Structures and Interactions
Inria Nancy - Grand Est, LORIA - AIS - Department of Complex Systems, Artificial Intelligence & Robotics
Abstract : Huge numbers of protein sequences are now available in public databases. In order to exploit more fully this valuable biological data, these sequences need to be annotated with functional properties such as Enzyme Commission (EC) numbers and Gene Ontology terms. The UniProt Knowledgebase (UniProtKB) is currently the largest and most comprehensive resource for protein sequence and annotation data. In the March 2018 release of UniProtKB, some 556,000 sequences have been manually curated but over 111 million sequences still lack functional annotations. The ability to annotate automatically these unannotated sequences would represent a major advance for the field of bioinformatics. Here, we present a novel network-based approach called GrAPFI for the automatic functional annotation of protein sequences. The underlying assumption of GrAPFI is that proteins may be related to each other by the protein domains, families, and super-families that they share. Several protein domain databases exist such as In-terPro, Pfam, SMART, CDD, Gene3D, and Prosite, for example. Our approach uses Interpro domains, because the InterPro database contains information from several other major protein family and domain databases. Our results show that GrAPFI achieves better EC number annotation performance than several other previously described approaches.
Complete list of metadatas

Cited literature [34 references]  Display  Hide  Download

https://hal.inria.fr/hal-01920595
Contributor : Bishnu Sarker <>
Submitted on : Tuesday, November 13, 2018 - 1:23:31 PM
Last modification on : Tuesday, December 18, 2018 - 4:40:22 PM
Long-term archiving on : Thursday, February 14, 2019 - 2:16:17 PM

File

ComplexNetCameraReady.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01920595, version 1

Collections

Citation

Bishnu Sarker, David Ritchie, Sabeur Aridhi. Exploiting Complex Protein Domain Networks for Protein Function Annotation. Complex Networks 2018 - 7th International Conference on Complex Networks and Their Applications, Dec 2018, Cambridge, United Kingdom. ⟨hal-01920595⟩

Share

Metrics

Record views

92

Files downloads

127