Optimistic planning in Markov decision processes using a generative model

Balázs Szörényi 1, 2 Gunnar Kedenburg 1 Rémi Munos 1
1 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal, Inria Lille - Nord Europe
Abstract : We consider the problem of online planning in a Markov decision process with discounted rewards for any given initial state. We consider the PAC sample com-plexity problem of computing, with probability 1−δ, an -optimal action using the smallest possible number of calls to the generative model (which provides reward and next-state samples). We design an algorithm, called StOP (for Stochastic-Optimistic Planning), based on the "optimism in the face of uncertainty" princi-ple. StOP can be used in the general setting, requires only a generative model, and enjoys a complexity bound that only depends on the local structure of the MDP.
Liste complète des métadonnées

Cited literature [21 references]  Display  Hide  Download

https://hal.inria.fr/hal-01079366
Contributor : Balazs Szorenyi <>
Submitted on : Saturday, November 1, 2014 - 11:24:24 AM
Last modification on : Thursday, February 21, 2019 - 10:52:49 AM
Document(s) archivé(s) le : Monday, February 2, 2015 - 4:51:50 PM

File

StOP_nips.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01079366, version 1

Citation

Balázs Szörényi, Gunnar Kedenburg, Rémi Munos. Optimistic planning in Markov decision processes using a generative model. Advances in Neural Information Processing Systems 27, Dec 2014, Montréal, Canada. ⟨hal-01079366⟩

Share

Metrics

Record views

301

Files downloads

240