Skip to Main content Skip to Navigation
Conference papers

Planning in entropy-regularized Markov decision processes and games

Jean-Bastien Grill 1, 2 Omar Domingues 1 Pierre Ménard 1 Rémi Munos 2, 1 Michal Valko 2, 1
1 SEQUEL - Sequential Learning
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
Abstract : We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the environment. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve problem-independent sample complexity of order O(1/ε 4) for a desired accuracy ε, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sample complexity in the worst case.
Document type :
Conference papers
Complete list of metadatas

Cited literature [1 references]  Display  Hide  Download

https://hal.inria.fr/hal-02387515
Contributor : Michal Valko <>
Submitted on : Friday, November 29, 2019 - 6:46:38 PM
Last modification on : Friday, January 24, 2020 - 2:34:28 PM

File

smoothcruiser2019.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02387515, version 1

Citation

Jean-Bastien Grill, Omar Domingues, Pierre Ménard, Rémi Munos, Michal Valko. Planning in entropy-regularized Markov decision processes and games. Neural Information Processing Systems, 2019, Vancouver, Canada. ⟨hal-02387515⟩

Share

Metrics

Record views

90

Files downloads

451