Natural Actor-Critic Algorithms

Shalabh Bhatnagar; Richard Sutton; Mohammad Ghavamzadeh; Mark Lee

doi:10.1016/j.automatica.2009.07.008

Article Dans Une Revue Automatica Année : 2009

Natural Actor-Critic Algorithms

(1) , (2) , (3) , (2)

1
2
3

Shalabh Bhatnagar

Fonction : Auteur

Department of Computer Science and Automation [Bangalore]

Richard Sutton

Fonction : Auteur

Department of Computing Science [Edmonton]

Mohammad Ghavamzadeh

Fonction : Auteur
PersonId : 868946

Sequential Learning

Mark Lee

Fonction : Auteur

Department of Computing Science [Edmonton]

Résumé

We present four new reinforcement learning algorithms based on actor-critic, function approximation, and natural gradient ideas, and we provide their convergence proofs. Actor-critic reinforcement learning methods are online approximations to policy iteration in which the value function parameters are estimated using temporal difference learning and the policy parameters are updated by stochastic gradient descent. Methods based on policy gradients in this way are of special interest because of their compatibility with function approximation methods, which are needed to handle large or infinite state spaces. The use of temporal difference learning in this way is of special interest because in many applications it dramatically reduces the variance of the gradient estimates. The use of the natural gradient is of interest because it can produce better conditioned parameterizations and has been shown to further reduce variance in some cases. Our results extend prior two-timescale convergence results for actor-critic methods by Konda and Tsitsiklis by using temporal difference learning in the actor and by incorporating natural gradients. Our results extend prior empirical studies of natural actor-critic methods by Peters, Vijayakumar and Schaal by providing the first convergence proofs and the first fully incremental algorithms. We present empirical results verifying the convergence of our algorithms.

Domaines

Informatique

Fichier principal

tr-final.pdf (352.59 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Mohammad Ghavamzadeh : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00840470

Soumis le : mardi 2 juillet 2013-15:26:26

Dernière modification le : lundi 24 avril 2023-11:02:26

Archivage à long terme le : jeudi 3 octobre 2013-10:40:06

Dates et versions

hal-00840470 , version 1 (02-07-2013)

Identifiants

HAL Id : hal-00840470 , version 1
DOI : 10.1016/j.automatica.2009.07.008

Citer

Shalabh Bhatnagar, Richard Sutton, Mohammad Ghavamzadeh, Mark Lee. Natural Actor-Critic Algorithms. Automatica, 2009, 45 (11), ⟨10.1016/j.automatica.2009.07.008⟩. ⟨hal-00840470⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-LILLE3 CNRS INRIA LAGIS INRIA2

10422 Consultations

16230 Téléchargements

Natural Actor-Critic Algorithms

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager