HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Conference papers

Multiple-step greedy policies in online and approximate reinforcement learning

Abstract : Multiple-step lookahead policies have demonstrated high empirical competence in Reinforcement Learning, via the use of Monte Carlo Tree Search or Model Predictive Control. In a recent work [5], multiple-step greedy policies and their use in vanilla Policy Iteration algorithms were proposed and analyzed. In this work, we study multiple-step greedy algorithms in more practical setups. We begin by highlighting a counter-intuitive difficulty, arising with soft-policy updates: even in the absence of approximations, and contrary to the 1-step-greedy case, monotonic policy improvement is not guaranteed unless the update stepsize is sufficiently large. Taking particular care about this difficulty, we formulate and analyze online and approximate algorithms that use such a multi-step greedy operator.
Complete list of metadata

Cited literature [29 references]  Display  Hide  Download

Contributor : Bruno Scherrer Connect in order to contact the contributor
Submitted on : Tuesday, November 20, 2018 - 11:27:29 AM
Last modification on : Friday, January 21, 2022 - 3:13:31 AM


Files produced by the author(s)


  • HAL Id : hal-01927962, version 1
  • ARXIV : 1805.07956



Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor. Multiple-step greedy policies in online and approximate reinforcement learning. NeurIPS 2018 - Thirty-second Conference on Neural Information Processing Systems, Dec 2018, Montréal, Canada. ⟨hal-01927962⟩



Record views


Files downloads