Qualitative Multi-Armed Bandits: A Quantile-Based Approach - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2015

Qualitative Multi-Armed Bandits: A Quantile-Based Approach

Résumé

We formalize and study the multi-armed bandit (MAB) problem in a generalized stochastic setting, in which rewards are not assumed to be numerical. Instead, rewards are measured on a qualitative scale that allows for comparison but invalidates arithmetic operations such as averaging. Correspondingly, instead of characterizing an arm in terms of the mean of the underlying distribution, we opt for using a quantile of that distribution as a representative value. We address the problem of quantile-based online learning both for the case of a finite (pure exploration) and infinite time horizon (cumulative regret minimization). For both cases, we propose suitable algorithms and analyze their properties. These properties are also illustrated by means of first experimental studies.
Fichier principal
Vignette du fichier
qmab_final.pdf (484.63 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01204708 , version 1 (24-09-2015)

Identifiants

  • HAL Id : hal-01204708 , version 1

Citer

Balazs Szorenyi, Róbert Busa-Fekete, Paul Weng, Eyke Hüllermeier. Qualitative Multi-Armed Bandits: A Quantile-Based Approach. 32nd International Conference on Machine Learning, Jul 2015, Lille, France. pp.1660-1668. ⟨hal-01204708⟩
871 Consultations
453 Téléchargements

Partager

Gmail Facebook X LinkedIn More