Sub-sampling for Multi-armed Bandits - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Article Dans Une Revue Proceedings of the European Conference on Machine Learning Année : 2014

Sub-sampling for Multi-armed Bandits

Résumé

The stochastic multi-armed bandit problem is a popular model of the exploration/exploitation trade-off in sequential decision problems. We introduce a novel algorithm that is based on sub-sampling. Despite its simplicity, we show that the algorithm demonstrates excellent empirical performances against state-of-the-art algorithms, including Thompson sampling and KL-UCB. The algorithm is very flexible, it does need to know a set of reward distributions in advance nor the range of the rewards. It is not restricted to Bernoulli distributions and is also invariant under rescaling of the rewards. We provide a detailed experimental study comparing the algorithm to the state of the art, the main intuition that explains the striking results, and conclude with a finite-time regret analysis for this algorithm in the simplified two-arm bandit setting.
Fichier principal
Vignette du fichier
BESA2_corrected.pdf (900.46 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01025651 , version 1 (18-07-2014)
hal-01025651 , version 2 (16-12-2014)

Identifiants

  • HAL Id : hal-01025651 , version 2

Citer

Akram Baransi, Odalric-Ambrym Maillard, Shie Mannor. Sub-sampling for Multi-armed Bandits. Proceedings of the European Conference on Machine Learning, 2014, pp.13. ⟨hal-01025651v2⟩
375 Consultations
2620 Téléchargements

Partager

Gmail Facebook X LinkedIn More