Neighborhood-Based Label Propagation in Large Protein Graphs

Sabeur Aridhi 1 Seyed Ziaeddin Alborzi 1 Malika Smaïl-Tabbone 2 Marie-Dominique Devignes 1 David Ritchie 1
1 CAPSID - Computational Algorithms for Protein Structures and Interactions
Inria Nancy - Grand Est, LORIA - AIS - Department of Complex Systems, Artificial Intelligence & Robotics
2 ORPAILLEUR - Knowledge representation, reasonning
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : Understanding protein function is one of the keys to understanding life at the molecular level. It is also important in several scenarios including human disease and drug discovery. In this age of rapid and affordable biological sequencing, the number of sequences accumulating in databases is rising with an increasing rate. This presents many challenges for biologists and computer scientists alike. In order to make sense of this huge quantity of data, these sequences should be annotated with functional properties. UniProtKB consists of two components: i) the UniProtKB/Swiss-Prot database containing protein sequences with reliable information manually reviewed by expert bio-curators and ii) the UniProtKB/TrEMBL database that is used for storing and processing the unknown sequences. Hence, for all proteins we have available the sequence along with few more information such as the taxon and some structural domains. Pairwise similarity can be defined and computed on proteins based on such attributes. Other important attributes, while present for proteins in Swiss-Prot, are often missing for proteins in TrEMBL, such as their function and cellular localization. The enormous number of protein sequences now in TrEMBL calls for rapid procedures to annotate them automatically. In this work, we present DistNBLP, a novel Distributed Neighborhood-Based Label Propagation approach for large-scale annotation of proteins. To do this, the functional annotations of reviewed proteins are used to predict those of non-reviewed proteins using label propagation on a graph representation of the protein database. DistNBLP is built on top of the "akka" toolkit for building resilient distributed message-driven applications.
Liste complète des métadonnées

https://hal.inria.fr/hal-01573381
Contributeur : Sabeur Aridhi <>
Soumis le : mercredi 9 août 2017 - 12:44:54
Dernière modification le : mardi 28 novembre 2017 - 15:18:04

Fichier

aridhi-et-al-SIG-ISMB2017-Fina...
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01573381, version 1
  • ARXIV : 1708.07074

Citation

Sabeur Aridhi, Seyed Ziaeddin Alborzi, Malika Smaïl-Tabbone, Marie-Dominique Devignes, David Ritchie. Neighborhood-Based Label Propagation in Large Protein Graphs. Function SIG @ ISMB/ECCB 2017, Jul 2017, Prague, Czech Republic. pp.2. 〈hal-01573381〉

Partager

Métriques

Consultations de la notice

220

Téléchargements de fichiers

15