BelMan: Bayesian Bandits on the Belief--Reward Manifold

Debabrota Basu; Pierre Senellart; Stéphane Bressan

Pré-Publication, Document De Travail Année : 2018

BelMan: Bayesian Bandits on the Belief--Reward Manifold

(1) , (2, 3) , (1)

1
2
3

Debabrota Basu

Fonction : Auteur
PersonId : 742129
IdHAL : debabrota-basu

School of computing [Singapore]

Pierre Senellart

Fonction : Auteur
PersonId : 11778
IdHAL : pierre-senellart
ORCID : 0000-0002-7909-5369
IdRef : 124713769

Value from Data

Laboratoire Traitement et Communication de l'Information

Stéphane Bressan

Fonction : Auteur
PersonId : 906326

School of computing [Singapore]

Résumé

We propose a generic, Bayesian, information geometric approach to the exploration--exploitation trade-off in multi-armed bandit problems. Our approach, BelMan, uniformly supports pure exploration, exploration--exploitation, and two-phase bandit problems. The knowledge on bandit arms and their reward distributions is summarised by the barycentre of the joint distributions of beliefs and rewards of the arms, the \emph{pseudobelief-reward}, within the beliefs-rewards manifold. BelMan alternates \emph{information projection} and \emph{reverse information projection}, i.e., projection of the pseudobelief-reward onto beliefs-rewards to choose the arm to play, and projection of the resulting beliefs-rewards onto the pseudobelief-reward. It introduces a mechanism that infuses an exploitative bias by means of a \emph{focal distribution}, i.e., a reward distribution that gradually concentrates on higher rewards. Comparative performance evaluation with state-of-the-art algorithms shows that BelMan is not only competitive but can also outperform other approaches in specific setups, for instance involving many arms and continuous rewards.

Mots clés

Multi-armed bandit Statistical manifolds Bayesian bandit Alternating information projection

Domaines

Base de données [cs.DB] Web Autres [stat.ML]

Pierre Senellart : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01891813

Soumis le : mercredi 10 octobre 2018-08:28:51

Dernière modification le : vendredi 19 avril 2024-16:18:58

Dates et versions

hal-01891813 , version 1 (10-10-2018)

Identifiants

HAL Id : hal-01891813 , version 1
ARXIV : 1805.01627

Citer

Debabrota Basu, Pierre Senellart, Stéphane Bressan. BelMan: Bayesian Bandits on the Belief--Reward Manifold. 2018. ⟨hal-01891813⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM ENS-PARIS CNRS INRIA PARISTECH INRIA2 PSL LTCI

114 Consultations

0 Téléchargements

BelMan: Bayesian Bandits on the Belief--Reward Manifold

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager