A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits

Pratik Gajane 1, 2 Tanguy Urvoy 2 Fabrice Clérot 2
1 SEQUEL - Sequential Learning
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
Abstract : We study the K-armed dueling bandit problem which is a variation of the classical Multi-Armed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose a new algorithm called Relative Exponential-weight algorithm for Exploration and Exploitation (REX3) to handle the adversarial utility-based formulation of this problem. This algorithm is a non-trivial extension of the Exponential-weight algorithm for Exploration and Exploitation (EXP3) algorithm. We prove a finite time expected regret upper bound of order O(sqrt(K ln(K)T)) for this algorithm and a general lower bound of order omega(sqrt(KT)). At the end, we provide experimental results using real data from information retrieval applications.
Type de document :
Communication dans un congrès
Proceedings of the 32nd International Conference on Machine Learning , Jul 2015, Lille, France. pp.218-227, 2015
Liste complète des métadonnées

Littérature citée [26 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01225614
Contributeur : Pratik Gajane <>
Soumis le : jeudi 14 janvier 2016 - 13:37:23
Dernière modification le : jeudi 11 janvier 2018 - 06:27:32
Document(s) archivé(s) le : vendredi 15 avril 2016 - 14:31:39

Identifiants

  • HAL Id : hal-01225614, version 1
  • ARXIV : 1601.03855

Citation

Pratik Gajane, Tanguy Urvoy, Fabrice Clérot. A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits. Proceedings of the 32nd International Conference on Machine Learning , Jul 2015, Lille, France. pp.218-227, 2015. 〈hal-01225614〉

Partager

Métriques

Consultations de la notice

221

Téléchargements de fichiers

108