Skip to Main content Skip to Navigation
Conference papers

Solving Ergodic Markov Decision Processes and Perfect Information Zero-sum Stochastic Games by Variance Reduced Deflated Value Iteration

Marianne Akian 1, 2 Stéphane Gaubert 1, 2 Zheng Qu 3 Omar Saadi 1, 2
1 TROPICAL - TROPICAL
CMAP - Centre de Mathématiques Appliquées - Ecole Polytechnique, Inria Saclay - Ile de France
Abstract : Recently, Sidford, Wang, Wu and Ye (2018) developed an algorithm combining variance reduction techniques with value iteration to solve discounted Markov decision processes. This algorithm has a sublinear complexity when the discount factor is fixed. Here, we extend this approach to mean-payoff problems, including both Markov decision processes and perfect information zero-sum stochastic games. We obtain sublinear complexity bounds, assuming there is a distinguished state which is accessible from all initial states and for all policies. Our method is based on a reduction from the mean payoff problem to the discounted problem by a Doob h-transform, combined with a deflation technique. The complexity analysis of this algorithm uses at the same time the techniques developed by Sidford et al. in the discounted case and non-linear spectral theory techniques (Collatz-Wielandt characterization of the eigenvalue).
Complete list of metadata

https://hal.inria.fr/hal-02423846
Contributor : Stephane Gaubert <>
Submitted on : Thursday, December 26, 2019 - 10:24:49 AM
Last modification on : Friday, April 30, 2021 - 10:00:26 AM

Links full text

Identifiers

  • HAL Id : hal-02423846, version 1
  • ARXIV : 1909.06185

Citation

Marianne Akian, Stéphane Gaubert, Zheng Qu, Omar Saadi. Solving Ergodic Markov Decision Processes and Perfect Information Zero-sum Stochastic Games by Variance Reduced Deflated Value Iteration. CDC 2019 - 58th IEEE Conference on Decision and Control, Dec 2019, Nice, France. ⟨hal-02423846⟩

Share

Metrics

Record views

176