Abstract : On the road to exascale, coprocessors are increasingly becoming key building blocks of High Performance Computing platforms. In addition to their energy efficiency, these many-core devices boost the performance
of multi-core processors. In this paper, we revisit the design and implementation of Branch-and-Bound (B&B) algorithms for multi-core processors and Intel Xeon Phi coprocessors considering the offload mode as well as the native one. In addition, two major parallel models are considered: the master-worker and the work pool models. We address several parallel computing issues including processor-coprocessor data transfer optimization and vectorization. The proposed approaches have been experimented using the Flow-Shop scheduling problem (FSP) and two hardware configurations equivalent in terms of energy consumption: Intel Xeon E5-2670 processor and Intel Xeon Phi 5110P coprocessor. The reported results show that: (1) the proposed vectorization mechanism reduces the execution time by 42.1% (resp. 21.2%) in the many-core (resp. multi-core) approach ; (2) the native mode allows a faster execution on MIC than the offload mode for all FSP problem instances ; (3) the many-core approach (offload or native) is in average twice faster than the multi-core approach ; (4) the work pool parallel model is more suited for many/multi-core B&B applied to FSP than the master-worker model because
of its irregular nature.