Online Stochastic Optimization under Correlated Bandit Feedback

Mohammad Gheshlaghi Azar; Alessandro Lazaric; Emma Brunskill

Communication Dans Un Congrès Année : 2014

Online Stochastic Optimization under Correlated Bandit Feedback

(1) , (2) , (3)

1
2
3

Mohammad Gheshlaghi Azar

Fonction : Auteur

Northwestern University [Chicago, Ill. USA]

Alessandro Lazaric

Fonction : Auteur
PersonId : 15817
IdHAL : romain-warlop
ORCID : 0000-0002-4432-9591
IdRef : 234172223

Sequential Learning

Emma Brunskill

Fonction : Auteur

Computer Science Department - Carnegie Mellon University

Résumé

In this paper we consider the problem of online stochastic optimization of a locally smooth func-tion under bandit feedback. We introduce the high-confidence tree (HCT) algorithm, a novel anytime X -armed bandit algorithm, and derive regret bounds matching the performance of state-of-the-art algorithms in terms of the dependency on number of steps and the near-optimality di-mension. The main advantage of HCT is that it handles the challenging case of correlated ban-dit feedback (reward), whereas existing meth-ods require rewards to be conditionally indepen-dent. HCT also improves on the state-of-the-art in terms of the memory requirement, as well as requiring a weaker smoothness assumption on the mean-reward function in comparison with the existing anytime algorithms. Finally, we discuss how HCT can be applied to the problem of policy search in reinforcement learning and we report preliminary empirical results.

Domaines

Machine Learning [stat.ML]

Fichier principal

paper (1).pdf (692.72 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Alessandro Lazaric : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01080138

Soumis le : mardi 4 novembre 2014-15:26:14

Dernière modification le : vendredi 24 mars 2023-14:52:59

Archivage à long terme le : jeudi 5 février 2015-11:05:37

Dates et versions

hal-01080138 , version 1 (04-11-2014)

Identifiants

HAL Id : hal-01080138 , version 1

Citer

Mohammad Gheshlaghi Azar, Alessandro Lazaric, Emma Brunskill. Online Stochastic Optimization under Correlated Bandit Feedback. 31st International Conference on Machine Learning, Jun 2014, Beijing, China. ⟨hal-01080138⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-LILLE3 CNRS INRIA LAGIS CRISTAL INRIA2 CRISTAL-SEQUEL

259 Consultations

66 Téléchargements

Online Stochastic Optimization under Correlated Bandit Feedback

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager