Qualitative Multi-Armed Bandits: A Quantile-Based Approach

Abstract : We formalize and study the multi-armed bandit (MAB) problem in a generalized stochastic setting, in which rewards are not assumed to be numerical. Instead, rewards are measured on a qualitative scale that allows for comparison but invalidates arithmetic operations such as averaging. Correspondingly, instead of characterizing an arm in terms of the mean of the underlying distribution, we opt for using a quantile of that distribution as a representative value. We address the problem of quantile-based online learning both for the case of a finite (pure exploration) and infinite time horizon (cumulative regret minimization). For both cases, we propose suitable algorithms and analyze their properties. These properties are also illustrated by means of first experimental studies.
Type de document :
Communication dans un congrès
32nd International Conference on Machine Learning, Jul 2015, Lille, France. Proceedings of The 32nd International Conference on Machine Learning, pp.1660-1668, 〈http://icml.cc/2015/〉
Liste complète des métadonnées

Littérature citée [25 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01204708
Contributeur : Balazs Szorenyi <>
Soumis le : jeudi 24 septembre 2015 - 14:36:36
Dernière modification le : mardi 3 juillet 2018 - 11:43:30
Document(s) archivé(s) le : mardi 29 décembre 2015 - 09:49:32

Fichier

qmab_final.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01204708, version 1

Collections

Citation

Balazs Szorenyi, Róbert Busa-Fekete, Paul Weng, Eyke Hüllermeier. Qualitative Multi-Armed Bandits: A Quantile-Based Approach. 32nd International Conference on Machine Learning, Jul 2015, Lille, France. Proceedings of The 32nd International Conference on Machine Learning, pp.1660-1668, 〈http://icml.cc/2015/〉. 〈hal-01204708〉

Partager

Métriques

Consultations de la notice

588

Téléchargements de fichiers

257