Appendix -Execution times (seconds) for different problem instances and pool sizes, All the matrices MM are located in the GPU global memory, pp.8192-16384 ,

DOI: 10.1002/cpe N. MELAB, I. CHAKROUN AND A. BENDJOUDI Table A.6. Appendix -Execution times (seconds) for different problem instances and pool sizes, Ltd. Concurrency Computat.: Pract. Exper. Matrices JM , LM , RM , QM and MM are, vol.4096, pp.8192-16384, 2013. ,

Appendix -Execution times (seconds) for different problem instances and pool sizes QM and MM are located in the GPU global memory and JM on the Shared memory, Matrices P TM , LM , RM, vol.7, issue.4096, pp.8192-16384 ,

Appendix -Execution times (seconds) for different problem instances and pool sizes QM and MM are located in the GPU global memory and P TM and JM on the Shared memory, Matrices LM , RM, vol.8, issue.4096, pp.8192-16384 ,

Parallel Algorithm Design for Branch and Bound, Tutorials on Emerging Methodologies and Applications in Operations, pp.1-5, 2005. ,

DOI : 10.1007/0-387-22827-6_5

A parallel algorithm for graph matching and its MasPar implementation, IEEE Transactions on Parallel and Distributed Systems, vol.8, issue.5, pp.490-501, 1997. ,

DOI : 10.1109/71.598276

Analysis and implementation of branch-and-bound algorithms on a hypercube multicomputer, IEEE Transactions on Computers, vol.39, issue.3, pp.384-387, 1990. ,

DOI : 10.1109/12.48868

Solving the traveling salesman problem with a distributed branch-and-bound algorithm on a 1024 processor network, Proceedings of 9th International Parallel Processing Symposium, pp.182-189, 1995. ,

DOI : 10.1109/IPPS.1995.395930

Branch-and-Bound interval global optimization on shared memory multiprocessors, Optimization Methods and Software, vol.23, issue.5, pp.689-701, 2008. ,

DOI : 10.1080/10556780802086300

Parallel Branch-and-Branch Algorithms: Survey and Synthesis, Operations Research, vol.42, issue.6, pp.1042-1066, 1994. ,

DOI : 10.1287/opre.42.6.1042

A Grid-enabled Branch and Bound Algorithm for Solving Challenging Combinatorial Optimization Problems, 2007 IEEE International Parallel and Distributed Processing Symposium, pp.1-9, 2007. ,

DOI : 10.1109/IPDPS.2007.370217

URL : https://hal.archives-ouvertes.fr/inria-00083814

Scientific Computing with Multicore and Accelerators, 2010. ,

DOI : 10.1201/b10376

GPU Computing for Parallel Local Search Metaheuristic Algorithms, IEEE Transactions on Computers, vol.62, issue.1, pp.173-185, 2013. ,

DOI : 10.1109/TC.2011.206

Optimal two- and three-stage production schedules with setup times included, Naval Research Logistics Quarterly, vol.1, issue.1, pp.61-68, 1954. ,

DOI : 10.1002/nav.3800010110

A general bounding scheme for the permutation flow-shop problem, Operations Research, vol.26, issue.1, 1978. ,

Taillard's FSP benchmarks. (Available from: http://mistic, 2012. ,

Reducing thread divergence in a GPU-accelerated branch-andbound algorithm. Concurrency and Computation: Practice and Experience, Wiley, vol.25, issue.8, pp.1121-1136, 2013. ,

Computers and Intractability: A Guide to the Theory of NP-Commpleteness, 1979. ,

An extension of Johnson's results on job IDT scheduling, Naval Research Logistics Quarterly, vol.1, issue.3, pp.201-203, 1956. ,

DOI : 10.1002/nav.3800030307

Jobs on Two Machines with Arbitrary Time Lags, Management Science, vol.5, issue.3, pp.293-298, 1959. ,

DOI : 10.1287/mnsc.5.3.293

Available from: http://developer.download.nvidia.com/ compute, NVIDIA_CUDA_BestPracticesGuide_2.3.pdf) [Accessed, 2012. ,