A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2015

A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits

Pratik Gajane
Tanguy Urvoy
  • Fonction : Auteur
  • PersonId : 933382

Résumé

We study the K-armed dueling bandit problem which is a variation of the classical Multi-Armed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose a new algorithm called Relative Exponential-weight algorithm for Exploration and Exploitation (REX3) to handle the adversarial utility-based formulation of this problem. This algorithm is a non-trivial extension of the Exponential-weight algorithm for Exploration and Exploitation (EXP3) algorithm. We prove a finite time expected regret upper bound of order O(sqrt(K ln(K)T)) for this algorithm and a general lower bound of order omega(sqrt(KT)). At the end, we provide experimental results using real data from information retrieval applications.

Domaines

Autre [cs.OH]
Fichier principal
Vignette du fichier
rex3_icml.pdf (1.06 Mo) Télécharger le fichier
gajane15-supp.zip (1.4 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01225614 , version 1 (14-01-2016)

Identifiants

Citer

Pratik Gajane, Tanguy Urvoy, Fabrice Clérot. A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits. Proceedings of the 32nd International Conference on Machine Learning , Jul 2015, Lille, France. pp.218-227. ⟨hal-01225614⟩
207 Consultations
86 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More