Convergence of Online and Approximate Multiple-Step Lookahead Policy Iteration - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2018

Convergence of Online and Approximate Multiple-Step Lookahead Policy Iteration

Résumé

Anderson (1965) acceleration is an old and simple method for accelerating the computation of a fixed point. However, as far as we know and quite surprisingly, it has never been applied to dynamic programming or reinforcement learning. In this paper, we explain briefly what Anderson acceleration is and how it can be applied to value iteration, this being supported by preliminary experiments showing a significant speed up of convergence, that we critically discuss. We also discuss how this idea could be applied more generally to (deep) reinforcement learning.
Fichier principal
Vignette du fichier
ewrl_approx_cr_final.pdf (427.91 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-01927977 , version 1 (20-11-2018)

Identifiants

Citer

Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor. Convergence of Online and Approximate Multiple-Step Lookahead Policy Iteration. EWRL 2018 - 14th European workshop on Reinforcement Learning, Oct 2018, Lille, France. ⟨hal-01927977⟩
124 Consultations
183 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More