Robust Reinforcement Learning with Bayesian Optimisation and Quadrature

Supratik Paul; Konstantinos Chatzilygeroudis; Kamil Ciosek; Jean-Baptiste Mouret; Michael A Osborne; Shimon Whiteson

Article Dans Une Revue Journal of Machine Learning Research Année : 2020

Robust Reinforcement Learning with Bayesian Optimisation and Quadrature

(1) , (2) , (1) , (2) , (3) , (1)

1
2
3

Supratik Paul

Fonction : Auteur
PersonId : 1077519

Department of Computer Science [Oxford]

Konstantinos Chatzilygeroudis

Fonction : Auteur
PersonId : 10921
IdHAL : konstantinos-chatzilygeroudis
ORCID : 0000-0003-3585-1027
IdRef : 234845414

Lifelong Autonomy and interaction skills for Robots in a Sensing ENvironment

Kamil Ciosek

Fonction : Auteur
PersonId : 1077520

Department of Computer Science [Oxford]

Jean-Baptiste Mouret

Fonction : Auteur
PersonId : 1495
IdHAL : jb-mouret
ORCID : 0000-0002-2513-027X
IdRef : 137470002

Lifelong Autonomy and interaction skills for Robots in a Sensing ENvironment

Michael A Osborne

Fonction : Auteur

Department of Engineering Science [Oxford]

Shimon Whiteson

Fonction : Auteur
PersonId : 1077521

Department of Computer Science [Oxford]

Résumé

Bayesian optimisation has been successfully applied to a variety of reinforcement learning problems. However, the traditional approach for learning optimal policies in simulators does not utilise the opportunity to improve learning by adjusting certain environment variables: state features that are unobservable and randomly determined by the environment in a physical setting but are controllable in a simulator. This article considers the problem of finding a robust policy while taking into account the impact of environment variables. We present alternating optimisation and quadrature (ALOQ), which uses Bayesian optimisation and Bayesian quadrature to address such settings. We also present transferable ALOQ (TALOQ), for settings where simulator inaccuracies lead to difficulty in transferring the learnt policy to the physical system. We show that our algorithms are robust to the presence of significant rare events, which may not be observable under random sampling but play a substantial role in determining the optimal policy. Experimental results across different domains show that our algorithms learn robust policies efficiently.

Mots clés

Reinforcement Learning Bayesian Optimisation Bayesian Quadrature Sig- nificant rare events Environment variables

Domaines

Automatique / Robotique Robotique [cs.RO] Intelligence artificielle [cs.AI]

Fichier principal

18-216.pdf (1.67 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Jean-Baptiste Mouret : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-02943567

Soumis le : samedi 19 septembre 2020-21:25:19

Dernière modification le : jeudi 1 février 2024-10:04:42

Archivage à long terme le : jeudi 3 décembre 2020-13:50:10

Dates et versions

hal-02943567 , version 1 (19-09-2020)

Identifiants

HAL Id : hal-02943567 , version 1

Citer

Supratik Paul, Konstantinos Chatzilygeroudis, Kamil Ciosek, Jean-Baptiste Mouret, Michael A Osborne, et al.. Robust Reinforcement Learning with Bayesian Optimisation and Quadrature. Journal of Machine Learning Research, 2020, 21, pp.1 - 31. ⟨hal-02943567⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 CNRS INRIA IRISA UNIV-LORRAINE INRIA2 TDS-MACS LORIA LORIA-AIS UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM CREATIV-LAB

90 Consultations

94 Téléchargements

Robust Reinforcement Learning with Bayesian Optimisation and Quadrature

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager