Multi-Player Bandits Revisited

Abstract : Multi-player Multi-Armed Bandits (MAB) have been extensively studied in the literature, motivated by applications to Cognitive Radio systems. Driven by such applications as well, we motivate the introduction of several levels of feedback for multi-player MAB algorithms. Most existing work assume that sensing information is available to the algorithm. Under this assumption, we improve the state-of-the-art lower bound for the regret of any decentralized algorithms and introduce two algorithms, RandTopM and MCTopM, that are shown to empirically outperform existing algorithms. Moreover, we provide strong theoretical guarantees for these algorithms, including a notion of asymptotic optimality in terms of the number of selections of bad arms. We then introduce a promising heuristic, called Selfish, that can operate without sensing information, which is crucial for emerging applications to Internet of Things networks. We investigate the empirical performance of this algorithm and provide some first theoretical elements for the understanding of its behavior.
Document type :
Conference papers
Liste complète des métadonnées

https://hal.inria.fr/hal-01629733
Contributor : Lilian Besson <>
Submitted on : Monday, March 12, 2018 - 6:48:06 PM
Last modification on : Tuesday, April 2, 2019 - 2:17:09 AM

Files

BK__ALT_2018.pdf
Files produced by the author(s)

Licence


Distributed under a Creative Commons Attribution - NonCommercial - ShareAlike 4.0 International License

Identifiers

  • HAL Id : hal-01629733, version 2
  • ARXIV : 1711.02317

Citation

Lilian Besson, Emilie Kaufmann. Multi-Player Bandits Revisited. Algorithmic Learning Theory, Mehryar Mohri; Karthik Sridharan, Apr 2018, Lanzarote, Spain. ⟨hal-01629733v2⟩

Share

Metrics

Record views

563

Files downloads

473