Efficient learning by implicit exploration in bandit problems with side observations

Tomáš Kocák; Gergely Neu; Michal Valko; Rémi Munos

Communication Dans Un Congrès Année : 2014

Efficient learning by implicit exploration in bandit problems with side observations

(1) , (1) , (1) , (1)

Tomáš Kocák

Fonction : Auteur
PersonId : 955512

Sequential Learning

Gergely Neu

Fonction : Auteur
PersonId : 961171

Sequential Learning

Michal Valko

Fonction : Auteur
PersonId : 284
IdHAL : michal
IdRef : 22360934X

Sequential Learning

Rémi Munos

Fonction : Auteur
PersonId : 836863

Sequential Learning

Résumé

We consider online learning problems under a a partial observability model capturing situations where the information conveyed to the learner is between full information and bandit feedback. In the simplest variant, we assume that in addition to its own loss, the learner also gets to observe losses of some other actions. The revealed losses depend on the learner's action and a directed observation system chosen by the environment. For this setting, we propose the first algorithm that enjoys near-optimal regret guarantees without having to know the observation system before selecting its actions. Along similar lines, we also define a new partial information setting that models online combinatorial optimization problems where the feedback received by the learner is between semi-bandit and full feedback. As the predictions of our first algorithm cannot be always computed efficiently in this setting, we propose another algorithm with similar properties and with the benefit of always being computationally efficient, at the price of a slightly more complicated tuning mechanism. Both algorithms rely on a novel exploration strategy called implicit exploration, which is shown to be more efficient both computationally and information-theoretically than previously studied exploration strategies for the problem.

Domaines

Machine Learning [stat.ML]

Fichier principal

kocak2014efficient.pdf (409.14 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Michal Valko : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01079351

Soumis le : lundi 3 novembre 2014-13:30:10

Dernière modification le : vendredi 24 mars 2023-14:52:59

Dates et versions

hal-01079351 , version 1 (01-11-2014)

hal-01079351 , version 2 (03-11-2014)

Identifiants

HAL Id : hal-01079351 , version 2

Citer

Tomáš Kocák, Gergely Neu, Michal Valko, Rémi Munos. Efficient learning by implicit exploration in bandit problems with side observations. Neural Information Processing Systems, Dec 2014, Montréal, Canada. ⟨hal-01079351v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-LILLE3 CNRS INRIA LAGIS CRISTAL INRIA2 CRISTAL-SEQUEL

446 Consultations

305 Téléchargements

Efficient learning by implicit exploration in bandit problems with side observations

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager