Skip to Main content Skip to Navigation
Conference papers

Planning in entropy-regularized Markov decision processes and games

Jean-Bastien Grill 1, 2 Omar Domingues 1 Pierre Ménard 1 Rémi Munos 2, 1 Michal Valko 2, 1
1 SEQUEL - Sequential Learning
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189
Abstract : We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the environment. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve problem-independent sample complexity of order O(1/ε 4) for a desired accuracy ε, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sample complexity in the worst case.
Document type :
Conference papers
Complete list of metadata

Cited literature [1 references]  Display  Hide  Download
Contributor : Michal Valko Connect in order to contact the contributor
Submitted on : Friday, November 29, 2019 - 6:46:38 PM
Last modification on : Tuesday, January 4, 2022 - 6:14:18 AM


Files produced by the author(s)


  • HAL Id : hal-02387515, version 1


Jean-Bastien Grill, Omar Domingues, Pierre Ménard, Rémi Munos, Michal Valko. Planning in entropy-regularized Markov decision processes and games. Neural Information Processing Systems, 2019, Vancouver, Canada. ⟨hal-02387515⟩



Les métriques sont temporairement indisponibles