BelMan: An Information-Geometric Approach to Stochastic Bandits

Debabrota Basu; Pierre Senellart; Stéphane Bressan

Communication Dans Un Congrès Année : 2019

BelMan: An Information-Geometric Approach to Stochastic Bandits

(1) , (2, 3) , (4)

1
2
3
4

Debabrota Basu

Fonction : Auteur
PersonId : 742129
IdHAL : debabrota-basu

Chalmers University of Technology [Göteborg]

Pierre Senellart

Fonction : Auteur
PersonId : 11778
IdHAL : pierre-senellart
ORCID : 0000-0002-7909-5369
IdRef : 124713769

Value from Data

Data, Intelligence and Graphs

Stéphane Bressan

Fonction : Auteur

School of computing [Singapore]

Résumé

We propose a Bayesian information-geometric approach to the exploration-exploitation trade-off in stochastic multi-armed bandits. The uncertainty on reward generation and belief is represented using the manifold of joint distributions of rewards and beliefs. Accumulated information is summarised by the barycentre of joint distributions, the pseudobelief-reward. While the pseudobelief-reward facilitates information accumulation through exploration, another mechanism is needed to increase exploitation by gradually focusing on higher rewards, the pseudobelief-focal-reward. Our resulting algorithm, BelMan, alternates between projection of the pseudobelief-focal-reward onto belief-reward distributions to choose the arm to play, and projection of the updated belief-reward distributions onto the pseudobelief-focal-reward. We theoretically prove BelMan to be asymptotically optimal and to incur a sublinear regret growth. We instantiate BelMan to stochastic bandits with Bernoulli and exponential rewards, and to a real-life application of scheduling queueing bandits. Comparative evaluation with the state of the art shows that BelMan is not only competitive for Bernoulli bandits but in many cases also outperforms other approaches for exponential and queueing bandits.

Domaines

Apprentissage [cs.LG]

Fichier principal

main.pdf (3.99 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Pierre Senellart : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-02195539

Soumis le : vendredi 26 juillet 2019-13:42:07

Dernière modification le : lundi 11 décembre 2023-11:31:00

Dates et versions

hal-02195539 , version 1 (26-07-2019)

Identifiants

HAL Id : hal-02195539 , version 1

Citer

Debabrota Basu, Pierre Senellart, Stéphane Bressan. BelMan: An Information-Geometric Approach to Stochastic Bandits. ECML/PKDD - The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Sep 2019, Würzburg, Germany. ⟨hal-02195539⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM ENS-PARIS CNRS INRIA INRIA2 PSL LTCI INFRES DIG IP_PARIS

207 Consultations

316 Téléchargements

BelMan: An Information-Geometric Approach to Stochastic Bandits

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager