Skip to Main content Skip to Navigation
Conference papers

A Theory of Regularized Markov Decision Processes

Abstract : Many recent successful (deep) reinforcement learning algorithms make use of regularization, generally based on entropy or Kullback-Leibler divergence. We propose a general theory of regularized Markov Decision Processes that generalizes these approaches in two directions: we consider a larger class of regularizers, and we consider the general modified policy iteration approach, encompassing both policy iteration and value iteration. The core building blocks of this theory are a notion of regularized Bellman operator and the Legendre-Fenchel transform, a classical tool of convex optimization. This approach allows for error propagation analyses of general algorithmic schemes of which (possibly variants of) classical algorithms such as Trust Region Policy Optimization, Soft Q-learning, Stochastic Actor Critic or Dynamic Policy Programming are special cases. This also draws connections to proximal convex optimization, especially to Mirror Descent.
Complete list of metadata

https://hal.inria.fr/hal-02273741
Contributor : Bruno Scherrer <>
Submitted on : Thursday, August 29, 2019 - 11:10:49 AM
Last modification on : Tuesday, March 2, 2021 - 5:12:06 PM

Links full text

Identifiers

  • HAL Id : hal-02273741, version 1
  • ARXIV : 1901.11275

Collections

Citation

Matthieu Geist, Bruno Scherrer, Olivier Pietquin. A Theory of Regularized Markov Decision Processes. ICML 2019 - Thirty-sixth International Conference on Machine Learning, Jun 2019, Long Island, United States. ⟨hal-02273741⟩

Share

Metrics

Record views

72