Skip to Main content Skip to Navigation
Conference papers

Convergence of Online and Approximate Multiple-Step Lookahead Policy Iteration

Abstract : Anderson (1965) acceleration is an old and simple method for accelerating the computation of a fixed point. However, as far as we know and quite surprisingly, it has never been applied to dynamic programming or reinforcement learning. In this paper, we explain briefly what Anderson acceleration is and how it can be applied to value iteration, this being supported by preliminary experiments showing a significant speed up of convergence, that we critically discuss. We also discuss how this idea could be applied more generally to (deep) reinforcement learning.
Complete list of metadata
Contributor : Bruno Scherrer Connect in order to contact the contributor
Submitted on : Tuesday, November 20, 2018 - 11:32:43 AM
Last modification on : Friday, January 21, 2022 - 3:13:30 AM


Files produced by the author(s)


  • HAL Id : hal-01927977, version 1
  • ARXIV : 1809.09501



Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor. Convergence of Online and Approximate Multiple-Step Lookahead Policy Iteration. EWRL 2018 - 14th European workshop on Reinforcement Learning, Oct 2018, Lille, France. ⟨hal-01927977⟩



Record views


Files downloads