Skip to Main content Skip to Navigation
New interface
Conference papers

Bellmanian Bandit Network

Antoine Bureau 1, 2 Michèle Sebag 1, 3, 2 
2 TAO - Machine Learning and Optimisation
LRI - Laboratoire de Recherche en Informatique, UP11 - Université Paris-Sud - Paris 11, Inria Saclay - Ile de France, CNRS - Centre National de la Recherche Scientifique : UMR8623
Abstract : This paper presents a new reinforcement learning (RL) algorithm called Bellmanian Bandit Network (BBN), where action selection in each state is formalized as a multi-armed bandit problem. The first contribution lies in the definition of an exploratory reward inspired from the intrinsic motivation criterion [1], combined with the RL reward. The second contribution is to use a network of multi-armed bandits to achieve the convergence toward the optimal Q-value function. The BBN algorithm is validated in stationary and non-stationary grid-world environments, comparatively to [1].
Document type :
Conference papers
Complete list of metadata

Cited literature [13 references]  Display  Hide  Download
Contributor : Antoine Bureau Connect in order to contact the contributor
Submitted on : Tuesday, January 13, 2015 - 5:18:23 PM
Last modification on : Tuesday, October 25, 2022 - 4:20:44 PM
Long-term archiving on: : Saturday, April 15, 2017 - 5:22:40 PM


Files produced by the author(s)


  • HAL Id : hal-01102970, version 1


Antoine Bureau, Michèle Sebag. Bellmanian Bandit Network. Autonomously Learning Robots, at NIPS 2014, Gerhard Neumann (TU-Darmstadt); Joelle Pineau (McGill University); Peter Auer (Uni Leoben); Marc Toussaint (Uni Stuttgart), Dec 2014, Montréal, Canada. ⟨hal-01102970⟩



Record views


Files downloads