Multi-objective Model-based Policy Search for Data-efficient Learning with Sparse Rewards

Rituraj Kaushik; Konstantinos Chatzilygeroudis; Jean-Baptiste Mouret

Communication Dans Un Congrès Année : 2018

Multi-objective Model-based Policy Search for Data-efficient Learning with Sparse Rewards

(1) , (1) , (1)

Rituraj Kaushik

Fonction : Auteur

Lifelong Autonomy and interaction skills for Robots in a Sensing ENvironment

Konstantinos Chatzilygeroudis

Fonction : Auteur
PersonId : 10921
IdHAL : konstantinos-chatzilygeroudis
ORCID : 0000-0003-3585-1027
IdRef : 234845414

Lifelong Autonomy and interaction skills for Robots in a Sensing ENvironment

Jean-Baptiste Mouret

Fonction : Auteur
PersonId : 1495
IdHAL : jb-mouret
ORCID : 0000-0002-2513-027X
IdRef : 137470002

Lifelong Autonomy and interaction skills for Robots in a Sensing ENvironment

Résumé

The most data-efficient algorithms for reinforcement learning in robotics are model-based policy search algorithms, which alternate between learning a dynamical model of the robot and optimizing a policy to maximize the expected return given the model and its uncertainties. However, the current algorithms lack an effective exploration strategy to deal with sparse or misleading reward scenarios: if they do not experience any state with a positive reward during the initial random exploration, it is very unlikely to solve the problem. Here, we propose a novel model-based policy search algorithm, Multi-DEX, that leverages a learned dynamical model to efficiently explore the task space and solve tasks with sparse rewards in a few episodes. To achieve this, we frame the policy search problem as a multi-objective, model-based policy optimization problem with three objectives: (1) generate maximally novel state trajectories, (2) maximize the cumulative reward and (3) keep the system in state-space regions for which the model is as accurate as possible. We then optimize these objectives using a Pareto-based multi-objective optimization algorithm. The experiments show that Multi-DEX is able to solve sparse reward scenarios (with a simulated robotic arm) in much lower interaction time than VIME, TRPO, GEP-PG, CMA-ES and Black-DROPS.

Mots clés

Model-based Policy Search Exploration Sparse Reward Reinforcement learning RL

Domaines

Automatique / Robotique Robotique [cs.RO] Intelligence artificielle [cs.AI]

Fichier principal

mops (1).pdf (1.13 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Jean-Baptiste Mouret : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01884294

Soumis le : dimanche 30 septembre 2018-16:44:09

Dernière modification le : jeudi 1 février 2024-10:04:39

Archivage à long terme le : lundi 31 décembre 2018-12:59:26

Dates et versions

hal-01884294 , version 1 (30-09-2018)

Identifiants

HAL Id : hal-01884294 , version 1
ARXIV : 1806.09351

Citer

Rituraj Kaushik, Konstantinos Chatzilygeroudis, Jean-Baptiste Mouret. Multi-objective Model-based Policy Search for Data-efficient Learning with Sparse Rewards. CoRL 2018 - Conference on Robot Learning, Oct 2018, Zurich, Switzerland. ⟨hal-01884294⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 CNRS INRIA IRISA UNIV-LORRAINE INRIA2 TDS-MACS LORIA LORIA-AIS UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

78 Consultations

121 Téléchargements

Multi-objective Model-based Policy Search for Data-efficient Learning with Sparse Rewards

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager