Beyond the one-step greedy approach in reinforcement learning

Abstract : The famous Policy Iteration algorithm alternates between policy improvement and policy evaluation. Implementations of this algorithm with several variants of the latter evaluation stage, e.g, n-step and trace-based returns, have been analyzed in previous works. However, the case of multiple-step lookahead policy improvement, despite the recent increase in empirical evidence of its strength, has to our knowledge not been carefully analyzed yet. In this work, we introduce the first such analysis. Namely, we formulate variants of multiple-step policy improvement, derive new algorithms using these definitions and prove their convergence. Moreover, we show that recent prominent Reinforcement Learning algorithms fit well into our unified framework. We thus shed light on their empirical success and give a recipe for deriving new algorithms for future study.
Complete list of metadatas

https://hal.inria.fr/hal-01927939
Contributor : Bruno Scherrer <>
Submitted on : Tuesday, November 20, 2018 - 11:37:59 AM
Last modification on : Monday, November 26, 2018 - 2:03:56 PM

Identifiers

  • HAL Id : hal-01927939, version 1
  • ARXIV : 1802.03654

Collections

Citation

Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor. Beyond the one-step greedy approach in reinforcement learning. ICML 2018 - 35th International Conference on Machine Learning, Jul 2018, Stockholm, Sweden. ⟨hal-01927939⟩

Share

Metrics

Record views

65

Files downloads

50