Heuristic Search Value Iteration for zero-sum Stochastic Games - Archive ouverte HAL Access content directly
Journal Articles IEEE Transactions on Games Year : 2021

Heuristic Search Value Iteration for zero-sum Stochastic Games

(1) , (2) , (3) , (1)
1
2
3

Abstract

In sequential decision-making, heuristic search algorithms allow exploiting both the initial situation and an admissible heuristic to efficiently search for an optimal solution, often for planning purposes. Such algorithms exist for problems with uncertain dynamics, partial observability, multiple criteria, or multiple collaborating agents. Here we look at two-player zero-sum stochastic games with discounted criterion, in a view to propose a solution tailored to the fully observable case, while solutions have been proposed for particular, though still more general, partially observable cases. This setting induces reasoning on both a lower and an upper bound of the value function, which leads us to proposing zsSG-HSVI, an algorithm based on Heuristic Search Value Iteration (HSVI), and which thus relies on generating trajectories. We demonstrate that, each player acting optimistically, and employing simple heuristic initializations, HSVI's convergence in finite time to an ϵ-optimal solution is preserved. An empirical study of the resulting approach is conducted on benchmark problems of various sizes.
Fichier principal
Vignette du fichier
accepted.pdf (402.73 Ko) Télécharger le fichier
Origin : Files produced by the author(s)

Dates and versions

hal-03080314 , version 1 (27-05-2021)

Identifiers

Cite

Olivier Buffet, Jilles Dibangoye, Abdallah Saffidine, Vincent Thomas. Heuristic Search Value Iteration for zero-sum Stochastic Games. IEEE Transactions on Games, 2021, 13 (3), pp.1-10. ⟨10.1109/TG.2020.3005214⟩. ⟨hal-03080314⟩
115 View
110 Download

Altmetric

Share

Gmail Facebook Twitter LinkedIn More