Multiple-step greedy policies in online and approximate reinforcement learning

Yonathan Efroni; Gal Dalal; Bruno Scherrer; Shie Mannor

Communication Dans Un Congrès Année : 2018

Multiple-step greedy policies in online and approximate reinforcement learning

(1) , (1) , (2, 3) , (1)

1
2
3

Yonathan Efroni

Fonction : Auteur
PersonId : 1039311

Department of Electrical Engineering - Technion [Haïfa]

Gal Dalal

Fonction : Auteur
PersonId : 1039312

Department of Electrical Engineering - Technion [Haïfa]

Bruno Scherrer

Fonction : Auteur
PersonId : 1406
IdHAL : bruno-scherrer
IdRef : 073360708

Biology, genetics and statistics

Institut Élie Cartan de Lorraine

Shie Mannor

Fonction : Auteur
PersonId : 837619

Department of Electrical Engineering - Technion [Haïfa]

Résumé

Multiple-step lookahead policies have demonstrated high empirical competence in Reinforcement Learning, via the use of Monte Carlo Tree Search or Model Predictive Control. In a recent work [5], multiple-step greedy policies and their use in vanilla Policy Iteration algorithms were proposed and analyzed. In this work, we study multiple-step greedy algorithms in more practical setups. We begin by highlighting a counter-intuitive difficulty, arising with soft-policy updates: even in the absence of approximations, and contrary to the 1-step-greedy case, monotonic policy improvement is not guaranteed unless the update stepsize is sufficiently large. Taking particular care about this difficulty, we formulate and analyze online and approximate algorithms that use such a multi-step greedy operator.

Domaines

Optimisation et contrôle [math.OC] Recherche opérationnelle [math.OC] Complexité [cs.CC] Apprentissage [cs.LG] Statistiques [math.ST]

Fichier principal

approximate_online_cr_final.pdf (343.92 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Bruno Scherrer : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01927962

Soumis le : mardi 20 novembre 2018-11:27:29

Dernière modification le : jeudi 18 avril 2024-17:00:21

Dates et versions

hal-01927962 , version 1 (20-11-2018)

Identifiants

HAL Id : hal-01927962 , version 1
ARXIV : 1805.07956

Citer

Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor. Multiple-step greedy policies in online and approximate reinforcement learning. NeurIPS 2018 - Thirty-second Conference on Neural Information Processing Systems, Dec 2018, Montréal, Canada. ⟨hal-01927962⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 CNRS INRIA IRISA IECN UNIV-LORRAINE INRIA2 TDS-MACS UR1-MATH-STIC UR1-UFR-ISTIC IECLPS UNIV-RENNES UR1-MATH-NUM

81 Consultations

124 Téléchargements

Multiple-step greedy policies in online and approximate reinforcement learning

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager