Preference-Based Policy Learning

Riad Akrour 1 Marc Schoenauer 1, 2 Michèle Sebag 1, 3
1 TAO - Machine Learning and Optimisation
CNRS - Centre National de la Recherche Scientifique : UMR8623, Inria Saclay - Ile de France, UP11 - Université Paris-Sud - Paris 11, LRI - Laboratoire de Recherche en Informatique
Abstract : Many machine learning approaches in robotics, based on re- inforcement learning, inverse optimal control or direct policy learning, critically rely on robot simulators. This paper investigates a simulator- free direct policy learning, called Preference-based Policy Learning (PPL). PPL iterates a four-step process: the robot demonstrates a candidate pol- icy; the expert ranks this policy comparatively to other ones according to her preferences; these preferences are used to learn a policy return estimate; the robot uses the policy return estimate to build new can- didate policies, and the process is iterated until the desired behavior is obtained. PPL requires a good representation of the policy search space be available, enabling one to learn accurate policy return estimates and limiting the human ranking effort needed to yield a good policy. Further- more, this representation cannot use informed features (e.g., how far the robot is from any target) due to the simulator-free setting. As a second contribution, this paper proposes a representation based on the agnostic exploitation of the robotic log. The convergence of PPL is analytically studied and its experimental validation on two problems, involving a single robot in a maze and two interacting robots, is presented.
Complete list of metadatas

Cited literature [28 references]  Display  Hide  Download

https://hal.inria.fr/inria-00625001
Contributor : Riad Akrour <>
Submitted on : Tuesday, September 20, 2011 - 12:41:44 PM
Last modification on : Thursday, April 5, 2018 - 12:30:12 PM
Long-term archiving on : Wednesday, December 21, 2011 - 2:22:09 AM

File

Preference-based_Policy_Learni...
Files produced by the author(s)

Identifiers

  • HAL Id : inria-00625001, version 1

Collections

Citation

Riad Akrour, Marc Schoenauer, Michèle Sebag. Preference-Based Policy Learning. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Sep 2011, Athènes, Greece. pp.12-27. ⟨inria-00625001⟩

Share

Metrics

Record views

743

Files downloads

1486