Connectivity Inference in Mass Spectrometry based Structure Determination

Deepesh Agarwal 1 Julio Araujo 2 Christelle Caillouet 2 Frédéric Cazals 1, * David Coudert 2 Stéphane Pérennes 2
* Corresponding author
1 ABS - Algorithms, Biology, Structure
CRISAM - Inria Sophia Antipolis - Méditerranée
2 COATI - Combinatorics, Optimization and Algorithms for Telecommunications
CRISAM - Inria Sophia Antipolis - Méditerranée , Laboratoire I3S - COMRED - COMmunications, Réseaux, systèmes Embarqués et Distribués
Abstract : We consider the following Minimum Connectivity Inference problem (MCI), which arises in structural biology: given vertex sets $V_i\subseteq V, i\in I$, find the graph $G=(V,E)$ minimizing the size of the edge set $E$, such that the sub-graph of $G$ induced by each $V_i$ is connected. This problem arises in structural biology, when one aims at finding the pairwise contacts between the proteins of a protein assembly, given the lists of proteins involved in sub-complexes. We present four contributions. First, using a reduction of the set cover problem, we establish that MCI is APX-hard. Second, we show how to solve the problem to optimality using a mixed integer linear programming formulation (MILP). Third, we develop a greedy algorithm based on union-find data structures (Greedy), yielding a $2(\log_2 |V|+\log_2 \kappa)$-approximation, with $\kappa$ the maximum number of subsets $V_i$ a vertex belongs to. Fourth, application-wise, we use the MILP and the greedy heuristic to solve the aforementioned connectivity inference problem in structural biology. We show that the solutions of MILP and Greedy are more parsimonious than those reported by the algorithm initially developed in biophysics, which are not qualified in terms of optimality. Since MILP outputs a set of optimal solutions, we introduce the notion of consensus solution. Using assemblies whose pairwise contacts are known exhaustively, we show an almost perfect agreement between the contacts predicted by our algorithms and the experimentally determined ones, especially for consensus solutions.
Complete list of metadatas

Cited literature [24 references]  Display  Hide  Download

https://hal.inria.fr/hal-00837496
Contributor : Frederic Cazals <>
Submitted on : Saturday, June 22, 2013 - 11:44:46 AM
Last modification on : Tuesday, September 10, 2019 - 1:12:47 AM
Long-term archiving on : Monday, September 23, 2013 - 3:05:08 AM

File

RR-8320-MCI.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00837496, version 1

Collections

Citation

Deepesh Agarwal, Julio Araujo, Christelle Caillouet, Frédéric Cazals, David Coudert, et al.. Connectivity Inference in Mass Spectrometry based Structure Determination. [Research Report] RR-8320, INRIA. 2013, pp.23. ⟨hal-00837496⟩

Share

Metrics

Record views

921

Files downloads

626