Preference-Based Policy Learning

Riad Akrour 1 Marc Schoenauer 1, 2 Michèle Sebag 1, 3
1 TAO - Machine Learning and Optimisation
LRI - Laboratoire de Recherche en Informatique, UP11 - Université Paris-Sud - Paris 11, Inria Saclay - Ile de France, CNRS - Centre National de la Recherche Scientifique : UMR8623
Abstract : Many machine learning approaches in robotics, based on re- inforcement learning, inverse optimal control or direct policy learning, critically rely on robot simulators. This paper investigates a simulator- free direct policy learning, called Preference-based Policy Learning (PPL). PPL iterates a four-step process: the robot demonstrates a candidate pol- icy; the expert ranks this policy comparatively to other ones according to her preferences; these preferences are used to learn a policy return estimate; the robot uses the policy return estimate to build new can- didate policies, and the process is iterated until the desired behavior is obtained. PPL requires a good representation of the policy search space be available, enabling one to learn accurate policy return estimates and limiting the human ranking effort needed to yield a good policy. Further- more, this representation cannot use informed features (e.g., how far the robot is from any target) due to the simulator-free setting. As a second contribution, this paper proposes a representation based on the agnostic exploitation of the robotic log. The convergence of PPL is analytically studied and its experimental validation on two problems, involving a single robot in a maze and two interacting robots, is presented.
Type de document :
Communication dans un congrès
Dimitrios Gunopulos, Thomas Hofmann, Donato Malerba, Michalis Vazirgiannis. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Sep 2011, Athènes, Greece. Springer Verlag, 6911, pp.12-27, 2011, LNAI
Liste complète des métadonnées

Littérature citée [28 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00625001
Contributeur : Riad Akrour <>
Soumis le : mardi 20 septembre 2011 - 12:41:44
Dernière modification le : jeudi 5 avril 2018 - 12:30:12
Document(s) archivé(s) le : mercredi 21 décembre 2011 - 02:22:09

Fichier

Preference-based_Policy_Learni...
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : inria-00625001, version 1

Collections

Citation

Riad Akrour, Marc Schoenauer, Michèle Sebag. Preference-Based Policy Learning. Dimitrios Gunopulos, Thomas Hofmann, Donato Malerba, Michalis Vazirgiannis. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Sep 2011, Athènes, Greece. Springer Verlag, 6911, pp.12-27, 2011, LNAI. 〈inria-00625001〉

Partager

Métriques

Consultations de la notice

667

Téléchargements de fichiers

1171