Model-Free Reinforcement Learning with Continuous Action in Practice

Thomas Degris 1 Patrick Pilarski 2 Richard Sutton 2
1 Flowers - Flowing Epigenetic Robots and Systems
Inria Bordeaux - Sud-Ouest, U2IS - Unité d'Informatique et d'Ingénierie des Systèmes
2 RLAI
Department of Computing Science [Edmonton]
Abstract : Reinforcement learning methods are often con- sidered as a potential solution to enable a robot to adapt to changes in real time to an unpredictable environment. However, with continuous action, only a few existing algorithms are practical for real-time learning. In such a setting, most effective methods have used a parameterized policy structure, often with a separate parameterized value function. The goal of this paper is to assess such actor-critic methods to form a fully specified practical algorithm. Our specific contributions include 1) developing the extension of existing incremental policy-gradient algorithms to use eligibility traces, 2) an empir- ical comparison of the resulting algorithms using continuous actions, 3) the evaluation of a gradient-scaling technique that can significantly improve performance. Finally, we apply our actor-critic algorithm to learn on a robotic platform with a fast sensorimotor cycle (10ms). Overall, these results constitute an important step towards practical real-time learning control with continuous action.
Type de document :
Communication dans un congrès
American Control Conference, Jun 2012, Montreal, Canada. 2012
Liste complète des métadonnées

Littérature citée [17 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-00764281
Contributeur : Thomas Degris <>
Soumis le : mercredi 12 décembre 2012 - 16:32:05
Dernière modification le : jeudi 16 novembre 2017 - 17:12:03
Document(s) archivé(s) le : dimanche 18 décembre 2016 - 00:21:21

Fichier

DegrisACC2012.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00764281, version 1

Collections

Citation

Thomas Degris, Patrick Pilarski, Richard Sutton. Model-Free Reinforcement Learning with Continuous Action in Practice. American Control Conference, Jun 2012, Montreal, Canada. 2012. 〈hal-00764281〉

Partager

Métriques

Consultations de la notice

183

Téléchargements de fichiers

980