Whittle index based Q-learning for restless bandits with average reward

Konstantin E Avrachenkov; Vivek Borkar

doi:10.1016/j.automatica.2022.110186

Article Dans Une Revue Automatica Année : 2022

Whittle index based Q-learning for restless bandits with average reward

(1) , (2)

1
2

Konstantin E Avrachenkov

Fonction : Auteur
PersonId : 11963
IdHAL : konstantin-avrachenkov
ORCID : 0000-0002-8124-8272
IdRef : 087245280

Network Engineering and Operations

Vivek Borkar

Fonction : Auteur
PersonId : 994265
ORCID : 0000-0003-0756-5402

Department of Electrical Engineering [IIT-Bombay]

Résumé

A novel reinforcement learning algorithm is introduced for multiarmed restless bandits with average reward, using the paradigms of Q-learning and Whittle index. Specifically, we leverage the structure of the Whittle index policy to reduce the search space of Q-learning, resulting in major computational gains. Rigorous convergence analysis is provided, supported by numerical experiments. The numerical experiments show excellent empirical performance of the proposed scheme.

Mots clés

reinforcement learning restless bandits Whittle index Q-learning average reward

Domaines

Optimisation et contrôle [math.OC] Apprentissage [cs.LG]

Fichier principal

QBanditV3.pdf (470.86 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Konstantin Avrachenkov : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-03582664

Soumis le : lundi 21 février 2022-13:15:30

Dernière modification le : lundi 15 avril 2024-10:53:24

Archivage à long terme le : dimanche 22 mai 2022-18:42:23

Dates et versions

hal-03582664 , version 1 (21-02-2022)

Identifiants

HAL Id : hal-03582664 , version 1
ARXIV : 2004.14427
DOI : 10.1016/j.automatica.2022.110186

Citer

Konstantin E Avrachenkov, Vivek Borkar. Whittle index based Q-learning for restless bandits with average reward. Automatica, 2022, 139, pp.110186. ⟨10.1016/j.automatica.2022.110186⟩. ⟨hal-03582664⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INRIA INRIA2 TDS-MACS UNIV-COTEDAZUR

52 Consultations

76 Téléchargements

Whittle index based Q-learning for restless bandits with average reward

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager