Beyond the one-step greedy approach in reinforcement learning

Yonathan Efroni; Gal Dalal; Bruno Scherrer; Shie Mannor

Communication Dans Un Congrès Année : 2018

Beyond the one-step greedy approach in reinforcement learning

(1) , (1) , (2, 3) , (1)

1
2
3

Yonathan Efroni

Fonction : Auteur

Department of Electrical Engineering - Technion [Haïfa]

Gal Dalal

Fonction : Auteur

Department of Electrical Engineering - Technion [Haïfa]

Bruno Scherrer

Fonction : Auteur
PersonId : 1406
IdHAL : bruno-scherrer
IdRef : 073360708

Biology, genetics and statistics

Institut Élie Cartan de Lorraine

Shie Mannor

Fonction : Auteur
PersonId : 950750

Department of Electrical Engineering - Technion [Haïfa]

Résumé

The famous Policy Iteration algorithm alternates between policy improvement and policy evaluation. Implementations of this algorithm with several variants of the latter evaluation stage, e.g, n-step and trace-based returns, have been analyzed in previous works. However, the case of multiple-step lookahead policy improvement, despite the recent increase in empirical evidence of its strength, has to our knowledge not been carefully analyzed yet. In this work, we introduce the first such analysis. Namely, we formulate variants of multiple-step policy improvement, derive new algorithms using these definitions and prove their convergence. Moreover, we show that recent prominent Reinforcement Learning algorithms fit well into our unified framework. We thus shed light on their empirical success and give a recipe for deriving new algorithms for future study.

Domaines

Optimisation et contrôle [math.OC] Recherche opérationnelle [math.OC] Complexité [cs.CC] Apprentissage [cs.LG] Statistiques [math.ST]

Fichier principal

beyond_final_camera_ready.pdf (2.36 Mo)

all_results_final.pdf (333.78 Ko)

all_results_final.pdf_tex (11.56 Ko)

all_results_final.svg (821.15 Ko)

hPI_res.pdf (698.08 Ko)

kappaPI_res.pdf (699.85 Ko)

lambdaPI_res.pdf (697.5 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Bruno Scherrer : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01927939

Soumis le : mardi 20 novembre 2018-11:37:59

Dernière modification le : mercredi 17 avril 2024-11:24:43

Dates et versions

hal-01927939 , version 1 (20-11-2018)

Identifiants

HAL Id : hal-01927939 , version 1
ARXIV : 1802.03654

Citer

Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor. Beyond the one-step greedy approach in reinforcement learning. ICML 2018 - 35th International Conference on Machine Learning, Jul 2018, Stockholm, Sweden. ⟨hal-01927939⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 CNRS INRIA IRISA IECN UNIV-LORRAINE INRIA2 TDS-MACS UR1-MATH-STIC UR1-UFR-ISTIC IECLPS UNIV-RENNES UR1-MATH-NUM

124 Consultations

66 Téléchargements

Beyond the one-step greedy approach in reinforcement learning

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager