QWI: Q-learning with Whittle Index - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2022

QWI: Q-learning with Whittle Index

Résumé

The Whittle index policy is a heuristic that has shown remarkable good performance (with guaranted asymptotic optimality) when applied to the class of problems known as multi-armed restless bandits. In this paper we develop QWI, an algorithm based on Q-learning in order to learn the Whittle indices. The key feature is the deployment of two timescales, a relatively faster one to update the state-action Q-functions, and a relatively slower one to update the Whittle indices. In our main result, we show that the algorithm converges to the Whittle indices of the problem. Numerical computations show that our algorithm converges much faster than both the standard Q-learning algorithm as well as neural-network based approximate Q-learning.
Fichier principal
Vignette du fichier
QWIpaper.pdf (1.05 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03601370 , version 1 (08-03-2022)

Licence

Paternité

Identifiants

Citer

Francisco Robledo, Vivek Borkar, Urtzi Ayesta, Konstantin Avrachenkov. QWI: Q-learning with Whittle Index. RLNQ 2021 - Reinforcement Learning in Networks and Queues (workshop at ACM Sigmetrics 2021), Jun 2021, Beijing, China. pp.47-50, ⟨10.1145/3512798.3512816⟩. ⟨hal-03601370⟩
77 Consultations
121 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More