Bellmanian Bandit Network

Antoine Bureau 1, 2 Michèle Sebag 1, 3, 2
2 TAO - Machine Learning and Optimisation
LRI - Laboratoire de Recherche en Informatique, UP11 - Université Paris-Sud - Paris 11, Inria Saclay - Ile de France, CNRS - Centre National de la Recherche Scientifique : UMR8623
Abstract : This paper presents a new reinforcement learning (RL) algorithm called Bellmanian Bandit Network (BBN), where action selection in each state is formalized as a multi-armed bandit problem. The first contribution lies in the definition of an exploratory reward inspired from the intrinsic motivation criterion [1], combined with the RL reward. The second contribution is to use a network of multi-armed bandits to achieve the convergence toward the optimal Q-value function. The BBN algorithm is validated in stationary and non-stationary grid-world environments, comparatively to [1].
Document type :
Conference papers
Complete list of metadatas

Cited literature [13 references]  Display  Hide  Download

https://hal.inria.fr/hal-01102970
Contributor : Antoine Bureau <>
Submitted on : Tuesday, January 13, 2015 - 5:18:23 PM
Last modification on : Thursday, April 5, 2018 - 12:30:12 PM
Long-term archiving on : Saturday, April 15, 2017 - 5:22:40 PM

File

nips14_BBN.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01102970, version 1

Collections

Citation

Antoine Bureau, Michèle Sebag. Bellmanian Bandit Network. Autonomously Learning Robots, at NIPS 2014, Gerhard Neumann (TU-Darmstadt); Joelle Pineau (McGill University); Peter Auer (Uni Leoben); Marc Toussaint (Uni Stuttgart), Dec 2014, Montréal, Canada. ⟨hal-01102970⟩

Share

Metrics

Record views

516

Files downloads

374