Beyond the one-step greedy approach in reinforcement learning - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2018

Beyond the one-step greedy approach in reinforcement learning

Résumé

The famous Policy Iteration algorithm alternates between policy improvement and policy evaluation. Implementations of this algorithm with several variants of the latter evaluation stage, e.g, n-step and trace-based returns, have been analyzed in previous works. However, the case of multiple-step lookahead policy improvement, despite the recent increase in empirical evidence of its strength, has to our knowledge not been carefully analyzed yet. In this work, we introduce the first such analysis. Namely, we formulate variants of multiple-step policy improvement, derive new algorithms using these definitions and prove their convergence. Moreover, we show that recent prominent Reinforcement Learning algorithms fit well into our unified framework. We thus shed light on their empirical success and give a recipe for deriving new algorithms for future study.
Fichier principal
Vignette du fichier
beyond_final_camera_ready.pdf (2.36 Mo) Télécharger le fichier
all_results_final.pdf (333.78 Ko) Télécharger le fichier
all_results_final.pdf_tex (11.56 Ko) Télécharger le fichier
all_results_final.svg (821.15 Ko) Télécharger le fichier
hPI_res.pdf (698.08 Ko) Télécharger le fichier
kappaPI_res.pdf (699.85 Ko) Télécharger le fichier
lambdaPI_res.pdf (697.5 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-01927939 , version 1 (20-11-2018)

Identifiants

Citer

Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor. Beyond the one-step greedy approach in reinforcement learning. ICML 2018 - 35th International Conference on Machine Learning, Jul 2018, Stockholm, Sweden. ⟨hal-01927939⟩
124 Consultations
66 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More