Stochastic Variance-Reduced Policy Gradient

Matteo Papini 1 Damiano Binaghi 1 Giuseppe Canonaco 1 Matteo Pirotta 2 Marcello Restelli 1
2 SEQUEL - Sequential Learning
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
Abstract : In this paper, we propose a novel reinforcement-learning algorithm consisting in a stochastic variance-reduced version of policy gradient for solving Markov Decision Processes (MDPs). Stochastic variance-reduced gradient (SVRG) methods have proven to be very successful in supervised learning. However, their adaptation to policy gradient is not straightforward and needs to account for I) a non-concave objective function; II) approximations in the full gradient computation; and III) a non-stationary sampling process. The result is SVRPG, a stochastic variance-reduced policy gradient algorithm that leverages on importance weights to preserve the unbiasedness of the gradient estimate. Under standard assumptions on the MDP, we provide convergence guarantees for SVRPG with a convergence rate that is linear under increasing batch sizes. Finally, we suggest practical variants of SVRPG, and we empirically evaluate them on continuous MDPs.
Document type :
Conference papers
Complete list of metadatas

https://hal.inria.fr/hal-01940394
Contributor : Matteo Pirotta <>
Submitted on : Friday, November 30, 2018 - 11:31:35 AM
Last modification on : Friday, March 22, 2019 - 1:37:11 AM
Long-term archiving on : Friday, March 1, 2019 - 2:18:15 PM

File

supplementary.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01940394, version 1

Citation

Matteo Papini, Damiano Binaghi, Giuseppe Canonaco, Matteo Pirotta, Marcello Restelli. Stochastic Variance-Reduced Policy Gradient. ICML 2018 - 35th International Conference on Machine Learning, Jul 2018, Stockholm, Sweden. pp.4026-4035. ⟨hal-01940394⟩

Share

Metrics

Record views

69

Files downloads

23