Deterministic Policy Gradient Algorithms

Abstract : In this paper we consider deterministic policy gradient algorithms for reinforcement learning with continuous actions. The deterministic pol- icy gradient has a particularly appealing form: it is the expected gradient of the action-value func- tion. This simple form means that the deter- ministic policy gradient can be estimated much more efficiently than the usual stochastic pol- icy gradient. To ensure adequate exploration, we introduce an off-policy actor-critic algorithm that learns a deterministic target policy from an exploratory behaviour policy. We demonstrate that deterministic policy gradient algorithms can significantly outperform their stochastic counter- parts in high-dimensional action spaces.
Document type :
Conference papers
Complete list of metadatas

Cited literature [21 references]  Display  Hide  Download

https://hal.inria.fr/hal-00938992
Contributor : Thomas Degris <>
Submitted on : Wednesday, January 29, 2014 - 6:07:21 PM
Last modification on : Wednesday, July 3, 2019 - 10:48:04 AM
Long-term archiving on : Sunday, April 9, 2017 - 2:40:14 AM

File

dpg-icml2014.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00938992, version 1

Citation

David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, et al.. Deterministic Policy Gradient Algorithms. ICML, Jun 2014, Beijing, China. ⟨hal-00938992⟩

Share

Metrics

Record views

4577

Files downloads

5828