QWI: Q-learning with Whittle Index

Francisco Robledo; Vivek Borkar; Urtzi Ayesta; Konstantin Avrachenkov

doi:10.1145/3512798.3512816

Communication Dans Un Congrès Année : 2022

QWI: Q-learning with Whittle Index

(1) , (2) , (3, 4) , (5)

1
2
3
4
5

Francisco Robledo

Fonction : Auteur

Université de Pau et des Pays de l'Adour

Vivek Borkar

Fonction : Auteur

Department of Electrical Engineering [IIT-Bombay]

Urtzi Ayesta

Fonction : Auteur
PersonId : 14024
IdHAL : urtzi-ayesta
IdRef : 087245019

Réseaux, Mobiles, Embarqués, Sans fil, Satellites

Centre National de la Recherche Scientifique

Konstantin Avrachenkov

Fonction : Auteur
PersonId : 11963
IdHAL : konstantin-avrachenkov
ORCID : 0000-0002-8124-8272
IdRef : 087245280

Network Engineering and Operations

Résumé

The Whittle index policy is a heuristic that has shown remarkable good performance (with guaranted asymptotic optimality) when applied to the class of problems known as multi-armed restless bandits. In this paper we develop QWI, an algorithm based on Q-learning in order to learn the Whittle indices. The key feature is the deployment of two timescales, a relatively faster one to update the state-action Q-functions, and a relatively slower one to update the Whittle indices. In our main result, we show that the algorithm converges to the Whittle indices of the problem. Numerical computations show that our algorithm converges much faster than both the standard Q-learning algorithm as well as neural-network based approximate Q-learning.

Domaines

Apprentissage [cs.LG] Système multi-agents [cs.MA] Optimisation et contrôle [math.OC]

Fichier principal

QWIpaper.pdf (1.05 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Konstantin Avrachenkov : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-03601370

Soumis le : mardi 8 mars 2022-11:14:49

Dernière modification le : lundi 15 avril 2024-16:05:26

Archivage à long terme le : jeudi 9 juin 2022-19:11:11

Dates et versions

hal-03601370 , version 1 (08-03-2022)

Licence

Paternité

Identifiants

HAL Id : hal-03601370 , version 1
DOI : 10.1145/3512798.3512816

Citer

Francisco Robledo, Vivek Borkar, Urtzi Ayesta, Konstantin Avrachenkov. QWI: Q-learning with Whittle Index. RLNQ 2021 - Reinforcement Learning in Networks and Queues (workshop at ACM Sigmetrics 2021), Jun 2021, Beijing, China. pp.47-50, ⟨10.1145/3512798.3512816⟩. ⟨hal-03601370⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-TLSE2 CNRS INRIA UNIV-PAU UT1-CAPITOLE INRIA2 TDS-MACS UNIV-COTEDAZUR IRIT IRIT-RMESS IRIT-ASR IRIT-CNRS TOULOUSE-INP UNIV-UT3 UT3-TOULOUSEINP

77 Consultations

121 Téléchargements

QWI: Q-learning with Whittle Index

Résumé

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager