Stochastic Variance-Reduced Policy Gradient

Matteo Papini; Damiano Binaghi; Giuseppe Canonaco; Matteo Pirotta; Marcello Restelli

Communication Dans Un Congrès Année : 2018

Stochastic Variance-Reduced Policy Gradient

(1) , (1) , (1) , (2) , (1)

1
2

Matteo Papini

Fonction : Auteur
PersonId : 1024224

Department of Electronics, Information, and Bioengineering [Milano]

Damiano Binaghi

Fonction : Auteur

Department of Electronics, Information, and Bioengineering [Milano]

Giuseppe Canonaco

Fonction : Auteur

Department of Electronics, Information, and Bioengineering [Milano]

Matteo Pirotta

Fonction : Auteur
PersonId : 1023840

Sequential Learning

Marcello Restelli

Fonction : Auteur

Department of Electronics, Information, and Bioengineering [Milano]

Résumé

In this paper, we propose a novel reinforcement-learning algorithm consisting in a stochastic variance-reduced version of policy gradient for solving Markov Decision Processes (MDPs). Stochastic variance-reduced gradient (SVRG) methods have proven to be very successful in supervised learning. However, their adaptation to policy gradient is not straightforward and needs to account for I) a non-concave objective function; II) approximations in the full gradient computation; and III) a non-stationary sampling process. The result is SVRPG, a stochastic variance-reduced policy gradient algorithm that leverages on importance weights to preserve the unbiasedness of the gradient estimate. Under standard assumptions on the MDP, we provide convergence guarantees for SVRPG with a convergence rate that is linear under increasing batch sizes. Finally, we suggest practical variants of SVRPG, and we empirically evaluate them on continuous MDPs.

Domaines

Machine Learning [stat.ML]

Fichier principal

supplementary.pdf (550.75 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Matteo Pirotta : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01940394

Soumis le : vendredi 30 novembre 2018-11:31:35

Dernière modification le : mercredi 24 janvier 2024-09:54:23

Archivage à long terme le : vendredi 1 mars 2019-14:18:15

Dates et versions

hal-01940394 , version 1 (30-11-2018)

Identifiants

HAL Id : hal-01940394 , version 1

Citer

Matteo Papini, Damiano Binaghi, Giuseppe Canonaco, Matteo Pirotta, Marcello Restelli. Stochastic Variance-Reduced Policy Gradient. ICML 2018 - 35th International Conference on Machine Learning, Jul 2018, Stockholm, Sweden. pp.4026-4035. ⟨hal-01940394⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA CRISTAL INRIA2 CRISTAL-SEQUEL UNIV-LILLE ANR

103 Consultations

79 Téléchargements

Stochastic Variance-Reduced Policy Gradient

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager