Skip to Main content Skip to Navigation
Conference papers

Stochastic Variance-Reduced Policy Gradient

Matteo Papini 1 Damiano Binaghi 1 Giuseppe Canonaco 1 Matteo Pirotta 2 Marcello Restelli 1
2 SEQUEL - Sequential Learning
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189
Abstract : In this paper, we propose a novel reinforcement-learning algorithm consisting in a stochastic variance-reduced version of policy gradient for solving Markov Decision Processes (MDPs). Stochastic variance-reduced gradient (SVRG) methods have proven to be very successful in supervised learning. However, their adaptation to policy gradient is not straightforward and needs to account for I) a non-concave objective function; II) approximations in the full gradient computation; and III) a non-stationary sampling process. The result is SVRPG, a stochastic variance-reduced policy gradient algorithm that leverages on importance weights to preserve the unbiasedness of the gradient estimate. Under standard assumptions on the MDP, we provide convergence guarantees for SVRPG with a convergence rate that is linear under increasing batch sizes. Finally, we suggest practical variants of SVRPG, and we empirically evaluate them on continuous MDPs.
Document type :
Conference papers
Complete list of metadata
Contributor : Matteo Pirotta Connect in order to contact the contributor
Submitted on : Friday, November 30, 2018 - 11:31:35 AM
Last modification on : Tuesday, January 4, 2022 - 5:32:48 AM
Long-term archiving on: : Friday, March 1, 2019 - 2:18:15 PM


Files produced by the author(s)


  • HAL Id : hal-01940394, version 1



Matteo Papini, Damiano Binaghi, Giuseppe Canonaco, Matteo Pirotta, Marcello Restelli. Stochastic Variance-Reduced Policy Gradient. ICML 2018 - 35th International Conference on Machine Learning, Jul 2018, Stockholm, Sweden. pp.4026-4035. ⟨hal-01940394⟩



Les métriques sont temporairement indisponibles