Convergence of Online and Approximate Multiple-Step Lookahead Policy Iteration

Yonathan Efroni; Gal Dalal; Bruno Scherrer; Shie Mannor

Communication Dans Un Congrès Année : 2018

Convergence of Online and Approximate Multiple-Step Lookahead Policy Iteration

(1) , (1) , (2, 3) , (1)

1
2
3

Yonathan Efroni

Fonction : Auteur
PersonId : 1039311

Department of Electrical Engineering - Technion [Haïfa]

Gal Dalal

Fonction : Auteur
PersonId : 1039312

Department of Electrical Engineering - Technion [Haïfa]

Bruno Scherrer

Fonction : Auteur
PersonId : 1406
IdHAL : bruno-scherrer
IdRef : 073360708

Biology, genetics and statistics

Institut Élie Cartan de Lorraine

Shie Mannor

Fonction : Auteur
PersonId : 837619

Department of Electrical Engineering - Technion [Haïfa]

Résumé

Anderson (1965) acceleration is an old and simple method for accelerating the computation of a fixed point. However, as far as we know and quite surprisingly, it has never been applied to dynamic programming or reinforcement learning. In this paper, we explain briefly what Anderson acceleration is and how it can be applied to value iteration, this being supported by preliminary experiments showing a significant speed up of convergence, that we critically discuss. We also discuss how this idea could be applied more generally to (deep) reinforcement learning.

Mots clés

accelerated fixed point Reinforcement learning

Domaines

Optimisation et contrôle [math.OC] Recherche opérationnelle [math.OC] Apprentissage [cs.LG] Statistiques [math.ST]

Fichier principal

ewrl_approx_cr_final.pdf (427.91 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Bruno Scherrer : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01927977

Soumis le : mardi 20 novembre 2018-11:32:43

Dernière modification le : mercredi 24 avril 2024-13:16:20

Dates et versions

hal-01927977 , version 1 (20-11-2018)

Identifiants

HAL Id : hal-01927977 , version 1
ARXIV : 1809.09501

Citer

Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor. Convergence of Online and Approximate Multiple-Step Lookahead Policy Iteration. EWRL 2018 - 14th European workshop on Reinforcement Learning, Oct 2018, Lille, France. ⟨hal-01927977⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 CNRS INRIA IRISA IECN UNIV-LORRAINE INRIA2 TDS-MACS UR1-MATH-STIC UR1-UFR-ISTIC IECLPS UNIV-RENNES UR1-MATH-NUM

124 Consultations

183 Téléchargements

Convergence of Online and Approximate Multiple-Step Lookahead Policy Iteration

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager