Adaptive Batch Size for Safe Policy Gradients

Matteo Papini; Matteo Pirotta; Marcello Restelli

Communication Dans Un Congrès Année : 2017

Adaptive Batch Size for Safe Policy Gradients

(1) , (2) , (1)

1
2

Matteo Papini

Fonction : Auteur
PersonId : 1024224

Department of Electronics, Information, and Bioengineering [Milano]

Matteo Pirotta

Fonction : Auteur
PersonId : 1023840

Sequential Learning

Marcello Restelli

Fonction : Auteur
PersonId : 960707

Department of Electronics, Information, and Bioengineering [Milano]

Résumé

Policy gradient methods are among the best Reinforcement Learning (RL) techniques to solve complex control problems. In real-world RL applications, it is common to have a good initial policy whose performance needs to be improved and it may not be acceptable to try bad policies during the learning process. Although several methods for choosing the step size exist, research paid less attention to determine the batch size, that is the number of samples used to estimate the gradient direction for each update of the policy parameters. In this paper, we propose a set of methods to jointly optimize the step and the batch sizes that guarantee (with high probability) to improve the policy performance after each update. Besides providing theoretical guarantees, we show numerical simulations to analyse the behaviour of our methods.

Domaines

Machine Learning [stat.ML]

Fichier principal

6950-adaptive-batch-size-for-safe-policy-gradients.pdf (434.19 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Alessandro Lazaric : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01653330

Soumis le : vendredi 1 décembre 2017-12:19:12

Dernière modification le : mercredi 24 janvier 2024-09:54:23

Dates et versions

hal-01653330 , version 1 (01-12-2017)

Identifiants

HAL Id : hal-01653330 , version 1

Citer

Matteo Papini, Matteo Pirotta, Marcello Restelli. Adaptive Batch Size for Safe Policy Gradients. The Thirty-first Annual Conference on Neural Information Processing Systems (NIPS), Dec 2017, Long Beach, United States. ⟨hal-01653330⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA CRISTAL INRIA2 CRISTAL-SEQUEL UNIV-LILLE

219 Consultations

117 Téléchargements

Adaptive Batch Size for Safe Policy Gradients

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager