Adaptive Batch Size for Safe Policy Gradients

Matteo Papini 1 Matteo Pirotta 2 Marcello Restelli 1
2 SEQUEL - Sequential Learning
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
Abstract : Policy gradient methods are among the best Reinforcement Learning (RL) techniques to solve complex control problems. In real-world RL applications, it is common to have a good initial policy whose performance needs to be improved and it may not be acceptable to try bad policies during the learning process. Although several methods for choosing the step size exist, research paid less attention to determine the batch size, that is the number of samples used to estimate the gradient direction for each update of the policy parameters. In this paper, we propose a set of methods to jointly optimize the step and the batch sizes that guarantee (with high probability) to improve the policy performance after each update. Besides providing theoretical guarantees, we show numerical simulations to analyse the behaviour of our methods.
Type de document :
Communication dans un congrès
The Thirty-first Annual Conference on Neural Information Processing Systems (NIPS), Dec 2017, Long Beach, United States
Liste complète des métadonnées

Littérature citée [24 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01653330
Contributeur : Alessandro Lazaric <>
Soumis le : vendredi 1 décembre 2017 - 12:19:12
Dernière modification le : mardi 3 juillet 2018 - 11:34:55

Fichier

6950-adaptive-batch-size-for-s...
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01653330, version 1

Citation

Matteo Papini, Matteo Pirotta, Marcello Restelli. Adaptive Batch Size for Safe Policy Gradients. The Thirty-first Annual Conference on Neural Information Processing Systems (NIPS), Dec 2017, Long Beach, United States. 〈hal-01653330〉

Partager

Métriques

Consultations de la notice

191

Téléchargements de fichiers

31