Whittle index based Q-learning for restless bandits with average reward - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Article Dans Une Revue Automatica Année : 2022

Whittle index based Q-learning for restless bandits with average reward

Résumé

A novel reinforcement learning algorithm is introduced for multiarmed restless bandits with average reward, using the paradigms of Q-learning and Whittle index. Specifically, we leverage the structure of the Whittle index policy to reduce the search space of Q-learning, resulting in major computational gains. Rigorous convergence analysis is provided, supported by numerical experiments. The numerical experiments show excellent empirical performance of the proposed scheme.
Fichier principal
Vignette du fichier
QBanditV3.pdf (470.86 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03582664 , version 1 (21-02-2022)

Identifiants

Citer

Konstantin E Avrachenkov, Vivek Borkar. Whittle index based Q-learning for restless bandits with average reward. Automatica, 2022, 139, pp.110186. ⟨10.1016/j.automatica.2022.110186⟩. ⟨hal-03582664⟩
52 Consultations
76 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More