Approximate information maximization for bandit games

Entropy maximization and free energy minimization are general physical principles for modeling the dynamics of various physical systems. Notable examples include modeling decision-making within the brain using the free-energy principle, optimizing the accuracy-complexity trade-off when accessing hidden variables with the information bottleneck principle (Tishby et al., 2000), and navigation in random environments using information maximization (Vergassola et al., 2007). Built on this principle, we propose a new class of bandit algorithms that maximize an approximation to the information of a key variable within the system. To this end, we develop an approximated analytical physics-based representation of an entropy to forecast the information gain of each action and greedily choose the one with the largest information gain. This method yields strong performances in classical bandit settings. Motivated by its empirical success, we prove its asymptotic optimality for the two-armed bandit problem with Gaussian rewards. Owing to its ability to encompass the system’s properties in a global physical functional, this approach can be efficiently adapted to more complex bandit settings, calling for further investigation of information maximization approaches for multi-armed bandit problems.

Mots clés

Multi-armed Bandit Problem Physic-based modelling Information maximization principle Statistical inference

Domaines

Machine Learning [stat.ML]

Fichier principal

Approximate information maximization for bandit games.pdf (777.89 Ko)

Figure1.pdf (299.37 Ko)

Figure2.pdf (206.65 Ko)

Figure_bernoulli_Aistats.pdf (206.56 Ko)

Figure_bernoulli_Aistats_close_arms.pdf (206.14 Ko)

Figure_close_arms.pdf (205.9 Ko)

Figure_multiple_arms.pdf (206 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Alex Barbier--Chebbah : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04246907

Soumis le : lundi 30 octobre 2023-10:16:41

Dernière modification le : jeudi 2 mai 2024-11:46:16

Dates et versions

hal-04246907 , version 1 (17-10-2023)

hal-04246907 , version 2 (23-10-2023)

hal-04246907 , version 3 (30-10-2023)

Identifiants

HAL Id : hal-04246907 , version 3
ARXIV : 2310.12563

Citer

Alex Barbier-Chebbah, Christian L. Vestergaard, Jean-Baptiste Masson, Etienne Boursier. Approximate information maximization for bandit games. 2023. ⟨hal-04246907v3⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

PASTEUR CNRS INRIA INSMI LM-ORSAY INRIA2 UNIV-PARIS-SACLAY ANR PRAIRIE-IA GS-MATHEMATIQUES GS-COMPUTER-SCIENCE DEC-PROC-BAYES PASTEUR_UMR3571

140 Consultations

85 Téléchargements