Multi-objective Model-based Policy Search for Data-efficient Learning with Sparse Rewards

Rituraj Kaushik 1 Konstantinos Chatzilygeroudis 1 Jean-Baptiste Mouret 1
1 LARSEN - Lifelong Autonomy and interaction skills for Robots in a Sensing ENvironment
Inria Nancy - Grand Est, LORIA - AIS - Department of Complex Systems, Artificial Intelligence & Robotics
Abstract : The most data-efficient algorithms for reinforcement learning in robotics are model-based policy search algorithms, which alternate between learning a dynamical model of the robot and optimizing a policy to maximize the expected return given the model and its uncertainties. However, the current algorithms lack an effective exploration strategy to deal with sparse or misleading reward scenarios: if they do not experience any state with a positive reward during the initial random exploration, it is very unlikely to solve the problem. Here, we propose a novel model-based policy search algorithm, Multi-DEX, that leverages a learned dynamical model to efficiently explore the task space and solve tasks with sparse rewards in a few episodes. To achieve this, we frame the policy search problem as a multi-objective, model-based policy optimization problem with three objectives: (1) generate maximally novel state trajectories, (2) maximize the cumulative reward and (3) keep the system in state-space regions for which the model is as accurate as possible. We then optimize these objectives using a Pareto-based multi-objective optimization algorithm. The experiments show that Multi-DEX is able to solve sparse reward scenarios (with a simulated robotic arm) in much lower interaction time than VIME, TRPO, GEP-PG, CMA-ES and Black-DROPS.
Complete list of metadatas

Cited literature [56 references]  Display  Hide  Download

https://hal.inria.fr/hal-01884294
Contributor : Jean-Baptiste Mouret <>
Submitted on : Sunday, September 30, 2018 - 4:44:09 PM
Last modification on : Tuesday, December 18, 2018 - 4:40:22 PM
Long-term archiving on : Monday, December 31, 2018 - 12:59:26 PM

File

mops (1).pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01884294, version 1
  • ARXIV : 1806.09351

Citation

Rituraj Kaushik, Konstantinos Chatzilygeroudis, Jean-Baptiste Mouret. Multi-objective Model-based Policy Search for Data-efficient Learning with Sparse Rewards. CoRL 2018 - Conference on Robot Learning, Oct 2018, Zurich, Switzerland. ⟨hal-01884294⟩

Share

Metrics

Record views

83

Files downloads

50