Skip to Main content Skip to Navigation
New interface

Parallel Branch-and-Bound revisited for solving permutation combinatorial optimization problems on multi-core processors and coprocessors

Rudi Leroy 1 
1 DOLPHIN - Parallel Cooperative Multi-criteria Optimization
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189
Abstract : Solving large permutation Combinatorial Optimization Problems (COPs) using Branch-and-Bound (B&B) algorithms results in the generation of a very large pool of subproblems. Therefore, defining a dedicated data structure is crucial to store and manage efficiently that pool. In this Ph.D thesis, we propose an original data structure called Integer-Vector-Matrix (IVM) for permutation COPs based on the factorial number system. Consequently, we redefine the operators of the B&B algorithm acting on it. For performance evaluation in terms of memory footprint and CPU time usage, we conduct a complexity analysis and an extensive experimentation using the permutation Flow-Shop Scheduling Problem (FSP) as a case study. Compared to the Linked-List (LL) data structure usually used for B&B, IVM requires up to two orders of magnitude less memory than LL for large FSP instances such as scheduling 500 jobs on 20 machines. Moreover, the IVM-based B&B is up to one order of magnitude faster than its LL-based counterpart in managing the pool of subproblems. Another advantage of IVM over LL is that its memory requirement is constant and predictable. On the other hand, according to the Top500 international ranking the tendency of HPC technologies is to use multi-core processors and many-core accelerators/coprocessors as key-building blocks. Another contribution of this thesis is to revisit parallel B&B for multi-core processors, GPU accelerators and MIC coprocessors using IVM and LL. As the tree explored by a B&B applied to FSP is highly irregular in shape and size, the thread-based Work Stealing (WS) paradigm is used for parallelization on multi-core processors. Unlike most related works that use concurrent data structures, we propose a private IVM-based and LL-based WS mechanism. In addition, work units are coded in a coalesced thus optimized way using factoradic-based intervals. We also investigate five different WS strategies. Extensive experiments show that in overall IVM outperforms LL in memory footprint and CPU time usage whatever is the used WS strategy. For the many-core parallelization, the proposed approach consists in performing the branching and bounding operators on the coprocessor. Such approach raises some issues, addressed in this thesis, mainly thread mapping, thread/branch divergence and data placement optimization on GPU, and vectorization on Intel Xeon Phi. An extensive experimental study shows that IVM is particularly more efficient than LL within the many-core context. Moreover, even with vectorization (of the lower bound), which allows a significant performance improvement on Intel Xeon Phi, the GPU-based approach is faster. Finally, the many-core approaches are faster than their multi-core counterpart by one order of magnitude.
Complete list of metadata

Cited literature [51 references]  Display  Hide  Download
Contributor : Nouredine Melab Connect in order to contact the contributor
Submitted on : Sunday, December 27, 2015 - 6:29:06 PM
Last modification on : Tuesday, November 22, 2022 - 2:26:16 PM


  • HAL Id : tel-01248563, version 1


Rudi Leroy. Parallel Branch-and-Bound revisited for solving permutation combinatorial optimization problems on multi-core processors and coprocessors. Distributed, Parallel, and Cluster Computing [cs.DC]. Université Lille 1, 2015. English. ⟨NNT : ⟩. ⟨tel-01248563⟩



Record views


Files downloads