Taylor expansion of discount factors - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2021

Taylor expansion of discount factors

Mark Rowland
  • Fonction : Auteur
Rémi Munos
  • Fonction : Auteur
Michal Valko

Résumé

In practical reinforcement learning (RL), the discount factor used for estimating value functions often differs from that used for defining the evaluation objective. In this work, we study the effect that this discrepancy of discount factors has during learning, and discover a family of objectives that interpolate value functions of two distinct discount factors. Our analysis suggests new ways for estimating value functions and performing policy optimization updates, which demonstrate empirical performance gains. This framework also leads to new insights on commonly-used deep RL heuristic modifications to policy optimization algorithms.
Fichier principal
Vignette du fichier
tang2021taylor.pdf (2.45 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03289295 , version 1 (16-07-2021)

Identifiants

  • HAL Id : hal-03289295 , version 1

Citer

Yunhao Tang, Mark Rowland, Rémi Munos, Michal Valko. Taylor expansion of discount factors. International Conference on Machine Learning, Jul 2021, Vienna / Virtual, Austria. ⟨hal-03289295⟩
30 Consultations
42 Téléchargements

Partager

Gmail Facebook X LinkedIn More