Generalized Bayesian Pursuit: A Novel Scheme for Multi-Armed Bernoulli Bandit Problems

Xuan Zhang; B. John Oommen; Ole-Christoffer Granmo

doi:10.1007/978-3-642-23960-1_16

Communication Dans Un Congrès Année : 2011

Generalized Bayesian Pursuit: A Novel Scheme for Multi-Armed Bernoulli Bandit Problems

(1) , (2, 1) , (1)

1
2

Xuan Zhang

Fonction : Auteur

University of Agder

B. John Oommen

Fonction : Auteur

Carleton University

University of Agder

Ole-Christoffer Granmo

Fonction : Auteur

University of Agder

Résumé

In the last decades, a myriad of approaches to the multi-armed bandit problem have appeared in several different fields. The current top performing algorithms from the field of Learning Automata reside in the Pursuit family, while UCB-Tuned and the ε-greedy class of algorithms can be seen as state-of-the-art regret minimizing algorithms. Recently, however, the Bayesian Learning Automaton (BLA) outperformed all of these, and other schemes, in a wide range of experiments. Although seemingly incompatible, in this paper we integrate the foundational learning principles motivating the design of the BLA, with the principles of the so-called Generalized Pursuit algorithm (GPST), leading to the Generalized Bayesian Pursuit algorithm (GBPST). As in the BLA, the estimates are truly Bayesian in nature, however, instead of basing exploration upon direct sampling from the estimates, GBPST explores by means of the arm selection probability vector of GPST. Further, as in the GPST, in the interest of higher rates of learning, a set of arms that are currently perceived as being optimal is pursued to minimize the probability of pursuing a wrong arm. It turns out that GBPST is superior to GPST and that it even performs better than the BLA by controlling the learning speed of GBPST. We thus believe that GBPST constitutes a new avenue of research, in which the performance benefits of the GPST and the BLA are mutually augmented, opening up for improved performance in a number of applications, currently being tested.

Mots clés

Bandit Problems Estimator Algorithms Generalized Bayesian Pursuit Algorithm Beta Distribution Conjugate Priors

Domaines

Informatique [cs]

Fichier principal

978-3-642-23960-1_16_Chapter.pdf (180.73 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Hal Ifip : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01571485

Soumis le : mercredi 2 août 2017-16:22:26

Dernière modification le : vendredi 5 juin 2020-17:10:10

Dates et versions

hal-01571485 , version 1 (02-08-2017)

Licence

Paternité

Identifiants

HAL Id : hal-01571485 , version 1
DOI : 10.1007/978-3-642-23960-1_16

Citer

Xuan Zhang, B. John Oommen, Ole-Christoffer Granmo. Generalized Bayesian Pursuit: A Novel Scheme for Multi-Armed Bernoulli Bandit Problems. 12th Engineering Applications of Neural Networks (EANN 2011) and 7th Artificial Intelligence Applications and Innovations (AIAI), Sep 2011, Corfu, Greece. pp.122-131, ⟨10.1007/978-3-642-23960-1_16⟩. ⟨hal-01571485⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

IFIP IFIP-AICT IFIP-TC IFIP-WG IFIP-TC12 IFIP-AIAI IFIP-WG12-5 IFIP-AICT-364

65 Consultations

68 Téléchargements

Generalized Bayesian Pursuit: A Novel Scheme for Multi-Armed Bernoulli Bandit Problems

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager