A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits - Archive ouverte HAL Access content directly
Conference Papers Year : 2015

A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits

(1, 2) , (2) , (2)
1
2
Pratik Gajane
Tanguy Urvoy
  • Function : Author
  • PersonId : 933382

Abstract

We study the K-armed dueling bandit problem which is a variation of the classical Multi-Armed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose a new algorithm called Relative Exponential-weight algorithm for Exploration and Exploitation (REX3) to handle the adversarial utility-based formulation of this problem. This algorithm is a non-trivial extension of the Exponential-weight algorithm for Exploration and Exploitation (EXP3) algorithm. We prove a finite time expected regret upper bound of order O(sqrt(K ln(K)T)) for this algorithm and a general lower bound of order omega(sqrt(KT)). At the end, we provide experimental results using real data from information retrieval applications.
Fichier principal
Vignette du fichier
rex3_icml.pdf (1.06 Mo) Télécharger le fichier
Vignette du fichier
gajane15-supp.zip (1.4 Mo) Télécharger le fichier
Origin : Files produced by the author(s)
Origin : Files produced by the author(s)
Loading...

Dates and versions

hal-01225614 , version 1 (14-01-2016)

Identifiers

Cite

Pratik Gajane, Tanguy Urvoy, Fabrice Clérot. A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits. Proceedings of the 32nd International Conference on Machine Learning , Jul 2015, Lille, France. pp.218-227. ⟨hal-01225614⟩
197 View
79 Download

Altmetric

Share

Gmail Facebook Twitter LinkedIn More