Stochastic Variance-Reduced Policy Gradient - Archive ouverte HAL Access content directly
Conference Papers Year :

Stochastic Variance-Reduced Policy Gradient

(1) , (1) , (1) , (2) , (1)
1
2

Abstract

In this paper, we propose a novel reinforcement-learning algorithm consisting in a stochastic variance-reduced version of policy gradient for solving Markov Decision Processes (MDPs). Stochastic variance-reduced gradient (SVRG) methods have proven to be very successful in supervised learning. However, their adaptation to policy gradient is not straightforward and needs to account for I) a non-concave objective function; II) approximations in the full gradient computation; and III) a non-stationary sampling process. The result is SVRPG, a stochastic variance-reduced policy gradient algorithm that leverages on importance weights to preserve the unbiasedness of the gradient estimate. Under standard assumptions on the MDP, we provide convergence guarantees for SVRPG with a convergence rate that is linear under increasing batch sizes. Finally, we suggest practical variants of SVRPG, and we empirically evaluate them on continuous MDPs.
Fichier principal
Vignette du fichier
supplementary.pdf (550.75 Ko) Télécharger le fichier
Origin : Files produced by the author(s)

Dates and versions

hal-01940394 , version 1 (30-11-2018)

Identifiers

  • HAL Id : hal-01940394 , version 1

Cite

Matteo Papini, Damiano Binaghi, Giuseppe Canonaco, Matteo Pirotta, Marcello Restelli. Stochastic Variance-Reduced Policy Gradient. ICML 2018 - 35th International Conference on Machine Learning, Jul 2018, Stockholm, Sweden. pp.4026-4035. ⟨hal-01940394⟩
95 View
64 Download

Share

Gmail Facebook Twitter LinkedIn More