Samples Are Useful? Not Always: denoising policy gradient updates using variance explained - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2019

Samples Are Useful? Not Always: denoising policy gradient updates using variance explained

Résumé

Policy gradient algorithms in reinforcement learning optimize the policy directly and rely on efficiently sampling an environment. Most sampling procedures are based solely on the sampling of the agent's policy. However, other measures directly available through these algorithms could be used to improve sampling before each policy update. Following this line of thoughts, we propose the use of SAUNA, a method where a transition is rejected from the gradient update if it does not meet a particular criterion, and kept otherwise. This criterion is the fraction of variance explained $\mathcal{V}^{ex}$, a measure of the discrepancy between a model and actual samples. $\mathcal{V}^{ex}$ can be used to evaluate the impact each transition will have on the learning. This criterion refines sampling and improves the policy gradient algorithm. In this paper: (a) We introduce and explore $\mathcal{V}^{ex}$, the selection criterion used to improve the sampling procedure. (b) We conduct experiments across a variety of benchmark environments, including standard continuous control problems. Our results show better performance than if we did not use the $\mathcal{V}^{ex}$ criterion for the policy gradient update. (c) We investigate why $\mathcal{V}^{ex}$ provides a reliable assessment for the selection of samples that will have a positive impact on learning. (d) We show how to interpret this criterion as a dynamic way to adjust the ratio between exploration and exploitation.
Fichier principal
Vignette du fichier
main.pdf (4.97 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-02091547 , version 1 (08-04-2019)
hal-02091547 , version 2 (10-04-2019)
hal-02091547 , version 3 (24-09-2019)
hal-02091547 , version 4 (25-09-2019)
hal-02091547 , version 5 (13-05-2020)
hal-02091547 , version 6 (20-11-2020)

Identifiants

  • HAL Id : hal-02091547 , version 2

Citer

Yannis Flet-Berliac, Philippe Preux. Samples Are Useful? Not Always: denoising policy gradient updates using variance explained. 2019. ⟨hal-02091547v2⟩
333 Consultations
690 Téléchargements

Partager

Gmail Facebook X LinkedIn More