Sub-sampling for Multi-armed Bandits

Akram Baransi; Odalric-Ambrym Maillard; Shie Mannor

Article Dans Une Revue Proceedings of the European Conference on Machine Learning Année : 2014

Sub-sampling for Multi-armed Bandits

(1) , (2) , (1)

1
2

Akram Baransi

Fonction : Auteur

Department of Electrical Engineering - Technion [Haïfa]

Odalric-Ambrym Maillard

Fonction : Auteur
PersonId : 5563
IdHAL : odalric-ambrym-maillard
ORCID : 0000-0001-7935-7026
IdRef : 158055594

Technion - Israel Institute of Technology [Haifa]

Shie Mannor

Fonction : Auteur

Department of Electrical Engineering - Technion [Haïfa]

Résumé

The stochastic multi-armed bandit problem is a popular model of the exploration/exploitation trade-off in sequential decision problems. We introduce a novel algorithm that is based on sub-sampling. Despite its simplicity, we show that the algorithm demonstrates excellent empirical performances against state-of-the-art algorithms, including Thompson sampling and KL-UCB. The algorithm is very flexible, it does need to know a set of reward distributions in advance nor the range of the rewards. It is not restricted to Bernoulli distributions and is also invariant under rescaling of the rewards. We provide a detailed experimental study comparing the algorithm to the state of the art, the main intuition that explains the striking results, and conclude with a finite-time regret analysis for this algorithm in the simplified two-arm bandit setting.

Mots clés

Multi-armed Bandits Sub-sampling Reinforcement Learning

Domaines

Statistiques [math.ST] Théorie [stat.TH]

Fichier principal

BESA2_corrected.pdf (900.46 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Odalric-Ambrym Maillard : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01025651

Soumis le : mardi 16 décembre 2014-17:53:37

Dernière modification le : vendredi 5 avril 2024-11:24:27

Archivage à long terme le : lundi 23 mars 2015-14:29:11

Dates et versions

hal-01025651 , version 1 (18-07-2014)

hal-01025651 , version 2 (16-12-2014)

Identifiants

HAL Id : hal-01025651 , version 2

Citer

Akram Baransi, Odalric-Ambrym Maillard, Shie Mannor. Sub-sampling for Multi-armed Bandits. Proceedings of the European Conference on Machine Learning, 2014, pp.13. ⟨hal-01025651v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

375 Consultations

2620 Téléchargements

Sub-sampling for Multi-armed Bandits

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Partager