FTH-B&B: A Fault-Tolerant HierarchicalBranch and Bound for Large ScaleUnreliable Environments

Abstract : Solving to optimality large instances of combinatorial optimization problems using Brand and Bound (B&B) algorithms requires a huge amount of computing resources. In this paper, we investigate the design and implementation of such algorithms on computational grids. Most of existing grid-based B&B algorithms are based on the Master-Worker paradigm, their scalability is therefore limited. In addition, even if the volatility of resources is a major issue in grids fault tolerance is rarely addressed in these works. We thereby propose FTH-B&B, a fault tolerant hierarchical B&B. FTH-B&B is based on different new mechanisms enabling to efficiently build and maintain balanced the hierarchy, and to store and recover work units (sub-problems). FTH-B&B has been implemented on top of the ProActive grid middleware and programming environment and applied to the Flow-Shop scheduling problem. Very often, the validation of existing grid-based B&B works is performed either through simulation or a very small real grid. In this paper, we experimented FTH-B&B on the Grid’5000 real French nation-wide computational grid using up to 1,900 processor cores distributed over six sites. The reported results show that the overhead induced by the proposed mechanisms is very low and an efficiency close to 100 percent can be achieved on some Taillards benchmarks of the Flow-Shop problem. In addition, the results demonstrate the robustness of the proposed mechanisms even in extreme failure situations.
Liste complète des métadonnées

https://hal.inria.fr/hal-01107787
Contributor : Nouredine Melab <>
Submitted on : Wednesday, January 21, 2015 - 3:28:55 PM
Last modification on : Thursday, April 4, 2019 - 10:18:06 AM

Identifiers

Citation

Ahcène Bendjoudi, Nouredine Melab, El-Ghazali Talbi. FTH-B&B: A Fault-Tolerant HierarchicalBranch and Bound for Large ScaleUnreliable Environments. IEEE Transactions on Computers, Institute of Electrical and Electronics Engineers, 2014, 63 (09), pp.2302 - 2315. ⟨10.1109/TC.2013.40⟩. ⟨hal-01107787⟩

Share

Metrics

Record views

425