Stochastic Variance-Reduced Policy Gradient - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2018

Stochastic Variance-Reduced Policy Gradient

Résumé

In this paper, we propose a novel reinforcement-learning algorithm consisting in a stochastic variance-reduced version of policy gradient for solving Markov Decision Processes (MDPs). Stochastic variance-reduced gradient (SVRG) methods have proven to be very successful in supervised learning. However, their adaptation to policy gradient is not straightforward and needs to account for I) a non-concave objective function; II) approximations in the full gradient computation; and III) a non-stationary sampling process. The result is SVRPG, a stochastic variance-reduced policy gradient algorithm that leverages on importance weights to preserve the unbiasedness of the gradient estimate. Under standard assumptions on the MDP, we provide convergence guarantees for SVRPG with a convergence rate that is linear under increasing batch sizes. Finally, we suggest practical variants of SVRPG, and we empirically evaluate them on continuous MDPs.
Fichier principal
Vignette du fichier
supplementary.pdf (550.75 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-01940394 , version 1 (30-11-2018)

Identifiants

  • HAL Id : hal-01940394 , version 1

Citer

Matteo Papini, Damiano Binaghi, Giuseppe Canonaco, Matteo Pirotta, Marcello Restelli. Stochastic Variance-Reduced Policy Gradient. ICML 2018 - 35th International Conference on Machine Learning, Jul 2018, Stockholm, Sweden. pp.4026-4035. ⟨hal-01940394⟩
103 Consultations
79 Téléchargements

Partager

Gmail Facebook X LinkedIn More