Off-Policy Actor-Critic

Thomas Degris 1 Martha White 2 Richard Sutton 2
1 Flowers - Flowing Epigenetic Robots and Systems
Inria Bordeaux - Sud-Ouest, U2IS - Unité d'Informatique et d'Ingénierie des Systèmes
2 RLAI
Department of Computing Science [Edmonton]
Abstract : This paper presents the first actor-critic algorithm for off-policy reinforcement learning. Our algorithm is online and incremental, and its per-time-step complexity scales linearly with the number of learned weights. Previous work on actor-critic algorithms is limited to the on-policy setting and does not take advantage of the recent advances in off-policy gradient temporal-difference learning. Off-policy techniques, such as Greedy-GQ, enable a target policy to be learned while following and obtaining data from another (behavior) policy. For many problems, however, actor-critic methods are more practical than action value methods (like Greedy-GQ) because they explicitly represent the policy; consequently, the policy can be stochastic and utilize a large action space. In this paper, we illustrate how to practically combine the generality and learning potential of off-policy learning with the flexibility in action selection given by actor-critic methods. We derive an incremental, linear time and space complexity algorithm that includes eligibility traces, prove convergence under assumptions similar to previous off-policy algorithms, and empirically show better or comparable performance to existing algorithms on standard reinforcement-learning benchmark problems.
Type de document :
Communication dans un congrès
International Conference on Machine Learning, Jun 2012, Edinburgh, United Kingdom. 2012
Liste complète des métadonnées

https://hal.inria.fr/hal-00764021
Contributeur : Thomas Degris <>
Soumis le : mercredi 12 décembre 2012 - 11:02:17
Dernière modification le : jeudi 16 novembre 2017 - 17:12:03
Document(s) archivé(s) le : dimanche 18 décembre 2016 - 00:11:54

Fichier

Degris_Offpac.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00764021, version 1

Collections

Citation

Thomas Degris, Martha White, Richard Sutton. Off-Policy Actor-Critic. International Conference on Machine Learning, Jun 2012, Edinburgh, United Kingdom. 2012. 〈hal-00764021〉

Partager

Métriques

Consultations de la notice

242

Téléchargements de fichiers

78