Deterministic Policy Gradient Algorithms

David Silver; Guy Lever; Nicolas Heess; Thomas Degris; Daan Wierstra; Martin Riedmiller

Communication Dans Un Congrès Année : 2014

Deterministic Policy Gradient Algorithms

(1) , (2) , (1) , (3) , (4) , (4)

1
2
3
4

David Silver

Fonction : Auteur

DeepMind [London]

Guy Lever

Fonction : Auteur

University College of London [London]

Nicolas Heess

Fonction : Auteur

DeepMind [London]

Thomas Degris

Fonction : Auteur
PersonId : 934007

Flowing Epigenetic Robots and Systems

Daan Wierstra

Fonction : Auteur

DeepMind Technologies

Martin Riedmiller

Fonction : Auteur

DeepMind Technologies

Résumé

In this paper we consider deterministic policy gradient algorithms for reinforcement learning with continuous actions. The deterministic pol- icy gradient has a particularly appealing form: it is the expected gradient of the action-value func- tion. This simple form means that the deter- ministic policy gradient can be estimated much more efficiently than the usual stochastic pol- icy gradient. To ensure adequate exploration, we introduce an off-policy actor-critic algorithm that learns a deterministic target policy from an exploratory behaviour policy. We demonstrate that deterministic policy gradient algorithms can significantly outperform their stochastic counter- parts in high-dimensional action spaces.

Domaines

Apprentissage [cs.LG]

Fichier principal

dpg-icml2014.pdf (335.61 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Thomas Degris : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00938992

Soumis le : mercredi 29 janvier 2014-18:07:21

Dernière modification le : mercredi 15 mars 2023-08:50:07

Archivage à long terme le : dimanche 9 avril 2017-02:40:14

Dates et versions

hal-00938992 , version 1 (29-01-2014)

Identifiants

HAL Id : hal-00938992 , version 1

Citer

David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, et al.. Deterministic Policy Gradient Algorithms. ICML, Jun 2014, Beijing, China. ⟨hal-00938992⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENSTA INRIA PARISTECH INRIA2

6513 Consultations

11743 Téléchargements

Deterministic Policy Gradient Algorithms

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager