Online learning with Erdős-Rényi side-observation graphs

Tomáš Kocák 1 Gergely Neu 1, 2 Michal Valko 1
1 SEQUEL - Sequential Learning
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
Abstract : We consider adversarial multi-armed bandit problems where the learner is allowed to observe losses of a number of arms beside the arm that it actually chose. We study the case where all non-chosen arms reveal their loss with an unknown probability rt, independently of each other and the action of the learner. Moreover, we allow rt to change in every round t, which rules out the possibility of estimating rt by a well-concentrated sample average. We propose an algorithm which operates under the assumption that rt is large enough to warrant at least one side observation with high probability. We show that after T rounds in a bandit problem with N arms, the expected regret of our algorithm is of order O(sqrt(sum(t=1)T (1/rt) log N )), given that rt less than log T / (2N-2) for all t. All our bounds are within logarithmic factors of the best achievable performance of any algorithm that is even allowed to know exact values of rt.
Type de document :
Communication dans un congrès
Uncertainty in Artificial Intelligence, Jun 2016, New York City, United States
Liste complète des métadonnées

Littérature citée [20 références]  Voir  Masquer  Télécharger
Contributeur : Michal Valko <>
Soumis le : mardi 24 mai 2016 - 10:37:20
Dernière modification le : mardi 3 juillet 2018 - 11:45:55


Fichiers produits par l'(les) auteur(s)


  • HAL Id : hal-01320588, version 1



Tomáš Kocák, Gergely Neu, Michal Valko. Online learning with Erdős-Rényi side-observation graphs. Uncertainty in Artificial Intelligence, Jun 2016, New York City, United States. 〈hal-01320588〉



Consultations de la notice


Téléchargements de fichiers