FTH-B&B: A Fault-Tolerant HierarchicalBranch and Bound for Large ScaleUnreliable Environments - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Article Dans Une Revue IEEE Transactions on Computers Année : 2014

FTH-B&B: A Fault-Tolerant HierarchicalBranch and Bound for Large ScaleUnreliable Environments

Résumé

Solving to optimality large instances of combinatorial optimization problems using Brand and Bound (B&B) algorithms requires a huge amount of computing resources. In this paper, we investigate the design and implementation of such algorithms on computational grids. Most of existing grid-based B&B algorithms are based on the Master-Worker paradigm, their scalability is therefore limited. In addition, even if the volatility of resources is a major issue in grids fault tolerance is rarely addressed in these works. We thereby propose FTH-B&B, a fault tolerant hierarchical B&B. FTH-B&B is based on different new mechanisms enabling to efficiently build and maintain balanced the hierarchy, and to store and recover work units (sub-problems). FTH-B&B has been implemented on top of the ProActive grid middleware and programming environment and applied to the Flow-Shop scheduling problem. Very often, the validation of existing grid-based B&B works is performed either through simulation or a very small real grid. In this paper, we experimented FTH-B&B on the Grid’5000 real French nation-wide computational grid using up to 1,900 processor cores distributed over six sites. The reported results show that the overhead induced by the proposed mechanisms is very low and an efficiency close to 100 percent can be achieved on some Taillards benchmarks of the Flow-Shop problem. In addition, the results demonstrate the robustness of the proposed mechanisms even in extreme failure situations.
Fichier non déposé

Dates et versions

hal-01107787 , version 1 (21-01-2015)

Identifiants

Citer

Ahcène Bendjoudi, Nouredine Melab, El-Ghazali Talbi. FTH-B&B: A Fault-Tolerant HierarchicalBranch and Bound for Large ScaleUnreliable Environments. IEEE Transactions on Computers, 2014, 63 (09), pp.2302 - 2315. ⟨10.1109/TC.2013.40⟩. ⟨hal-01107787⟩
348 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More