Skip to Main content Skip to Navigation
New interface
Conference papers

Optimistic planning in Markov decision processes using a generative model

Balázs Szörényi 1, 2 Gunnar Kedenburg 1 Rémi Munos 1 
1 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal
Abstract : We consider the problem of online planning in a Markov decision process with discounted rewards for any given initial state. We consider the PAC sample com-plexity problem of computing, with probability 1−δ, an -optimal action using the smallest possible number of calls to the generative model (which provides reward and next-state samples). We design an algorithm, called StOP (for Stochastic-Optimistic Planning), based on the "optimism in the face of uncertainty" princi-ple. StOP can be used in the general setting, requires only a generative model, and enjoys a complexity bound that only depends on the local structure of the MDP.
Complete list of metadata

Cited literature [21 references]  Display  Hide  Download
Contributor : Balazs Szorenyi Connect in order to contact the contributor
Submitted on : Saturday, November 1, 2014 - 11:24:24 AM
Last modification on : Thursday, January 20, 2022 - 4:16:45 PM
Long-term archiving on: : Monday, February 2, 2015 - 4:51:50 PM


Files produced by the author(s)


  • HAL Id : hal-01079366, version 1


Balázs Szörényi, Gunnar Kedenburg, Rémi Munos. Optimistic planning in Markov decision processes using a generative model. Advances in Neural Information Processing Systems 27, Dec 2014, Montréal, Canada. ⟨hal-01079366⟩



Record views


Files downloads