Robust Risk-averse Stochastic Multi-Armed Bandits

Abstract : We study a variant of the standard stochastic multi-armed bandit problem when one is not interested in the arm with the best mean, but instead in the arm maximizing some coherent risk measure criterion. Further, we are studying the deviations of the regret instead of the less informative expected regret. We provide an algorithm, called RA-UCB to solve this problem, together with a high probability bound on its regret.
Type de document :
Autre publication
Extended version with supplementary material of the same paper submitted to the conference ALT 2013. 2013
Liste complète des métadonnées


https://hal.inria.fr/hal-00821670
Contributeur : Odalric-Ambrym Maillard <>
Soumis le : samedi 11 mai 2013 - 16:25:06
Dernière modification le : lundi 13 mai 2013 - 09:12:57
Document(s) archivé(s) le : lundi 19 août 2013 - 15:45:24

Fichier

RiskAwareKLMAB_Arxiv.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00821670, version 1

Citation

Odalric-Ambrym Maillard. Robust Risk-averse Stochastic Multi-Armed Bandits. Extended version with supplementary material of the same paper submitted to the conference ALT 2013. 2013. <hal-00821670>

Partager

Métriques

Consultations de
la notice

264

Téléchargements du document

337