Skip to Main content Skip to Navigation
Conference papers

Taylor expansion of discount factors

Abstract : In practical reinforcement learning (RL), the discount factor used for estimating value functions often differs from that used for defining the evaluation objective. In this work, we study the effect that this discrepancy of discount factors has during learning, and discover a family of objectives that interpolate value functions of two distinct discount factors. Our analysis suggests new ways for estimating value functions and performing policy optimization updates, which demonstrate empirical performance gains. This framework also leads to new insights on commonly-used deep RL heuristic modifications to policy optimization algorithms.
Document type :
Conference papers
Complete list of metadata

https://hal.inria.fr/hal-03289295
Contributor : Michal Valko Connect in order to contact the contributor
Submitted on : Friday, July 16, 2021 - 5:55:16 PM
Last modification on : Tuesday, February 15, 2022 - 11:02:04 AM
Long-term archiving on: : Sunday, October 17, 2021 - 7:47:28 PM

File

tang2021taylor.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-03289295, version 1

Citation

yunhao Tang, Mark Rowland, Rémi Munos, Michal Valko. Taylor expansion of discount factors. International Conference on Machine Learning, Jul 2021, Vienna / Virtual, Austria. ⟨hal-03289295⟩

Share

Metrics

Record views

28

Files downloads

38