A. Table, P. Jm, . Tm, and R. Lm, Appendix -Execution times (seconds) for different problem instances and pool sizes, All the matrices MM are located in the GPU global memory, pp.8192-16384

©. Copyright, DOI: 10.1002/cpe N. MELAB, I. CHAKROUN AND A. BENDJOUDI Table A.6. Appendix -Execution times (seconds) for different problem instances and pool sizes, Ltd. Concurrency Computat.: Pract. Exper. Matrices JM , LM , RM , QM and MM are, vol.4096, pp.8192-16384, 2013.

A. Table, Appendix -Execution times (seconds) for different problem instances and pool sizes QM and MM are located in the GPU global memory and JM on the Shared memory, Matrices P TM , LM , RM, vol.7, issue.4096, pp.8192-16384

A. Table, Appendix -Execution times (seconds) for different problem instances and pool sizes QM and MM are located in the GPU global memory and P TM and JM on the Shared memory, Matrices LM , RM, vol.8, issue.4096, pp.8192-16384

D. Bader, W. Hart, and C. Phillips, Parallel Algorithm Design for Branch and Bound, Tutorials on Emerging Methodologies and Applications in Operations, pp.1-5, 2005.
DOI : 10.1007/0-387-22827-6_5

R. Allen, L. Cinque, S. Tanimoto, L. Shapiro, and D. Yasuda, A parallel algorithm for graph matching and its MasPar implementation, IEEE Transactions on Parallel and Distributed Systems, vol.8, issue.5, pp.490-501, 1997.
DOI : 10.1109/71.598276

M. Quinn, Analysis and implementation of branch-and-bound algorithms on a hypercube multicomputer, IEEE Transactions on Computers, vol.39, issue.3, pp.384-387, 1990.
DOI : 10.1109/12.48868

S. Tschoke, R. Lubling, and B. Monien, Solving the traveling salesman problem with a distributed branch-and-bound algorithm on a 1024 processor network, Proceedings of 9th International Parallel Processing Symposium, pp.182-189, 1995.
DOI : 10.1109/IPPS.1995.395930

L. Casadoa, J. Martinez, I. Garcia, and E. Hendrix, Branch-and-Bound interval global optimization on shared memory multiprocessors, Optimization Methods and Software, vol.23, issue.5, pp.689-701, 2008.
DOI : 10.1080/10556780802086300

B. Gendron and T. Crainic, Parallel Branch-and-Branch Algorithms: Survey and Synthesis, Operations Research, vol.42, issue.6, pp.1042-1066, 1994.
DOI : 10.1287/opre.42.6.1042

M. Mezmaz, N. Melab, and E. Talbi, A Grid-enabled Branch and Bound Algorithm for Solving Challenging Combinatorial Optimization Problems, 2007 IEEE International Parallel and Distributed Processing Symposium, pp.1-9, 2007.
DOI : 10.1109/IPDPS.2007.370217

URL : https://hal.archives-ouvertes.fr/inria-00083814

J. Kurzak, D. Bader, and J. Dongarra, Scientific Computing with Multicore and Accelerators, 2010.
DOI : 10.1201/b10376

T. Luong, N. Melab, and E. Talbi, GPU Computing for Parallel Local Search Metaheuristic Algorithms, IEEE Transactions on Computers, vol.62, issue.1, pp.173-185, 2013.
DOI : 10.1109/TC.2011.206

S. Johnson, Optimal two- and three-stage production schedules with setup times included, Naval Research Logistics Quarterly, vol.1, issue.1, pp.61-68, 1954.
DOI : 10.1002/nav.3800010110

J. Lenstra, B. Lenstra, and A. Kan, A general bounding scheme for the permutation flow-shop problem, Operations Research, vol.26, issue.1, 1978.

E. Taillard, Taillard's FSP benchmarks. (Available from: http://mistic, 2012.

I. Chakroun, M. Mezmaz, N. Melab, and A. Bendjoudi, Reducing thread divergence in a GPU-accelerated branch-andbound algorithm. Concurrency and Computation: Practice and Experience, Wiley, vol.25, issue.8, pp.1121-1136, 2013.

M. Garey and D. Johnson, Computers and Intractability: A Guide to the Theory of NP-Commpleteness, 1979.

J. Jackson, An extension of Johnson's results on job IDT scheduling, Naval Research Logistics Quarterly, vol.1, issue.3, pp.201-203, 1956.
DOI : 10.1002/nav.3800030307

L. Mitten, Jobs on Two Machines with Arbitrary Time Lags, Management Science, vol.5, issue.3, pp.293-298, 1959.
DOI : 10.1287/mnsc.5.3.293

N. Cuda-c-programming-best-practices-guide, Available from: http://developer.download.nvidia.com/ compute, NVIDIA_CUDA_BestPracticesGuide_2.3.pdf) [Accessed, 2012.