Multi-Player Bandits Revisited

Lilian Besson; Emilie Kaufmann

Communication Dans Un Congrès Année : 2018

Multi-Player Bandits Revisited

Modèles de Bandits Multi-Joueurs Revisités

(1, 2, 3, 4, 5) , (6, 3, 7)

1
2
3
4
5
6
7

Lilian Besson

Fonction : Auteur
PersonId : 14893
IdHAL : lilian-besson
ORCID : 0000-0003-2767-2563
IdRef : 24252883X

Institut d'Électronique et des Technologies du numéRique

SUPELEC-Campus Rennes

Sequential Learning

CentraleSupélec

Signal, Communication et Electronique Embarquée

Emilie Kaufmann

Fonction : Auteur
PersonId : 10422
IdHAL : emilie-kaufmann
ORCID : 0000-0002-5496-824X
IdRef : 197040810

Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189

Sequential Learning

Centre National de la Recherche Scientifique

Résumé

Multi-player Multi-Armed Bandits (MAB) have been extensively studied in the literature, motivated by applications to Cognitive Radio systems. Driven by such applications as well, we motivate the introduction of several levels of feedback for multi-player MAB algorithms. Most existing work assume that sensing information is available to the algorithm. Under this assumption, we improve the state-of-the-art lower bound for the regret of any decentralized algorithms and introduce two algorithms, RandTopM and MCTopM, that are shown to empirically outperform existing algorithms. Moreover, we provide strong theoretical guarantees for these algorithms, including a notion of asymptotic optimality in terms of the number of selections of bad arms. We then introduce a promising heuristic, called Selfish, that can operate without sensing information, which is crucial for emerging applications to Internet of Things networks. We investigate the empirical performance of this algorithm and provide some first theoretical elements for the understanding of its behavior.

Les bandits multi-joueurs multiarmes (MAB) ont fait l'objet d'études approfondies dans la littérature, motivés par des applications aux systèmes de radio intelligente. De telles applications motivent l'introduction de plusieurs niveaux d'informations pour les algorithmes MAB multi-joueurs. La plupart des travaux récents supposent que l'algorithme dispose d'informations de détection (sensing). Dans cette hypothèse, nous améliorons la meilleure borne inférieure connue pour le regret de tout algorithme décentralisé, et introduisons deux algorithmes, RandTopM et MCTopM, qui sont empiriquement meilleurs par rapport aux algorithmes existants. De plus, nous fournissons de solides garanties théoriques pour ces algorithmes, y compris une notion d'optimalité asymptotique en termes de nombre de sélections des mauvais bras. Nous introduisons ensuite une heuristique prometteuse, appelée Selfish, qui peut fonctionner sans utiliser le sensing, ce qui est crucial pour les applications émergentes aux réseaux de type Internet des Objets. Nous étudions les performances empiriques de cet algorithme et fournissons quelques premiers éléments théoriques pour la compréhension de son comportement.

Mots clés

Multi-Armed Bandits Cognitive Radio Opportunistic Spectrum Access Reinforcement learning Decentralized algorithms

Domaines

Machine Learning [stat.ML]

Fichier principal

BK__ALT_2018.pdf (1.11 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Lilian Besson : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01629733

Soumis le : lundi 12 mars 2018-18:48:06

Dernière modification le : mercredi 24 janvier 2024-09:54:23

Dates et versions

hal-01629733 , version 1 (06-11-2017)

hal-01629733 , version 2 (12-03-2018)

Licence

Paternité - Pas d'utilisation commerciale - Partage selon les Conditions Initiales

Identifiants

HAL Id : hal-01629733 , version 2
ARXIV : 1711.02317

Citer

Lilian Besson, Emilie Kaufmann. Multi-Player Bandits Revisited. Algorithmic Learning Theory, Mehryar Mohri; Karthik Sridharan, Apr 2018, Lanzarote, Spain. ⟨hal-01629733v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-NANTES UNIV-RENNES1 CNRS INRIA INSA-RENNES IETR SUP_SCEE SUP_IETR CENTRALESUPELEC CRISTAL INRIA2 CRISTAL-SEQUEL UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UNIV-LILLE INSA-GROUPE ANR UR1-MATH-NUM NANTES-UNIVERSITE

995 Consultations

1145 Téléchargements

Multi-Player Bandits Revisited

Modèles de Bandits Multi-Joueurs Revisités

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager