Distributed edge partitioning

Hlib Mykhailenko 1
1 NEO - Network Engineering and Operations
CRISAM - Inria Sophia Antipolis - Méditerranée
Abstract : In distributed graph computation, graph partitioning is an important preliminary step because the computation time can significantly depend on how the graph has been split among the different executors. In this thesis we explore the graph partitioning problem. Recently, edge partitioning approach has been advocated as a better approach to process graphs with a power-law degree distribution, which are very common in real-world datasets. That is why we focus on edge partition- ing approach. We start by an overview of existing metrics, to evaluate the quality of the graph partitioning. We briefly study existing graph processing systems: Hadoop, Giraph, Giraph++, Distributed GrahpLab, and PowerGraph with their key features. Next, we compare them to Spark, a popular big-data processing framework with its graph processing APIs — GraphX. We provide an overview of existing edge partitioning algorithms and introduce partitioner classification. We conclude that, based only on published work, it is not possible to draw a clear conclusion about the relative performances of these partitioners. For this reason, we have experimentally compared all the edge partitioners currently avail- able for GraphX. Results suggest that Hybrid-Cut partitioner provides the best performance. We then study how it is possible to evaluate the quality of a parti- tion before running a computation. To this purpose, we carry experiments with GraphX and we perform an accurate statistical analysis using a linear regression model. Our experimental results show that communication metrics like vertex-cut and communication cost are effective predictors in most of the cases. Finally, we propose a framework for distributed edge partitioning based on distributed simulated annealing which can be used to optimize a large family of partitioning metrics. We provide sufficient conditions for convergence to the optimum and discuss which metrics can be efficiently optimized in a distributed way. We implemented our framework with GraphX and performed a comparison with JA-BE-JA-VC, a state-of-the-art partitioner that inspired our approach. We show that our approach can provide significant improvements.
Document type :
Theses
Complete list of metadatas

Cited literature [44 references]  Display  Hide  Download

https://hal.inria.fr/tel-01551107
Contributor : Abes Star <>
Submitted on : Tuesday, October 3, 2017 - 3:53:07 PM
Last modification on : Thursday, November 14, 2019 - 5:38:06 PM

File

2017AZUR4042.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01551107, version 2

Collections

Citation

Hlib Mykhailenko. Distributed edge partitioning. Other [cs.OH]. Université Côte d'Azur, 2017. English. ⟨NNT : 2017AZUR4042⟩. ⟨tel-01551107v2⟩

Share

Metrics

Record views

329

Files downloads

356