Neighborhood-Based Label Propagation in Large Protein Graphs

Sabeur Aridhi 1 Seyed Ziaeddin Alborzi 1 Malika Smaïl-Tabbone 2 Marie-Dominique Devignes 1 David Ritchie 1
1 CAPSID - Computational Algorithms for Protein Structures and Interactions
Inria Nancy - Grand Est, LORIA - AIS - Department of Complex Systems, Artificial Intelligence & Robotics
2 ORPAILLEUR - Knowledge representation, reasonning
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : Understanding protein function is one of the keys to understanding life at the molecular level. It is also important in several scenarios including human disease and drug discovery. In this age of rapid and affordable biological sequencing, the number of sequences accumulating in databases is rising with an increasing rate. This presents many challenges for biologists and computer scientists alike. In order to make sense of this huge quantity of data, these sequences should be annotated with functional properties. UniProtKB consists of two components: i) the UniProtKB/Swiss-Prot database containing protein sequences with reliable information manually reviewed by expert bio-curators and ii) the UniProtKB/TrEMBL database that is used for storing and processing the unknown sequences. Hence, for all proteins we have available the sequence along with few more information such as the taxon and some structural domains. Pairwise similarity can be defined and computed on proteins based on such attributes. Other important attributes, while present for proteins in Swiss-Prot, are often missing for proteins in TrEMBL, such as their function and cellular localization. The enormous number of protein sequences now in TrEMBL calls for rapid procedures to annotate them automatically. In this work, we present DistNBLP, a novel Distributed Neighborhood-Based Label Propagation approach for large-scale annotation of proteins. To do this, the functional annotations of reviewed proteins are used to predict those of non-reviewed proteins using label propagation on a graph representation of the protein database. DistNBLP is built on top of the "akka" toolkit for building resilient distributed message-driven applications.
Complete list of metadatas

https://hal.inria.fr/hal-01573381
Contributor : Sabeur Aridhi <>
Submitted on : Wednesday, August 9, 2017 - 12:44:54 PM
Last modification on : Tuesday, December 18, 2018 - 4:40:22 PM

File

aridhi-et-al-SIG-ISMB2017-Fina...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01573381, version 1
  • ARXIV : 1708.07074

Citation

Sabeur Aridhi, Seyed Ziaeddin Alborzi, Malika Smaïl-Tabbone, Marie-Dominique Devignes, David Ritchie. Neighborhood-Based Label Propagation in Large Protein Graphs. Function SIG @ ISMB/ECCB 2017, Jul 2017, Prague, Czech Republic. pp.2. ⟨hal-01573381⟩

Share

Metrics

Record views

365

Files downloads

39