Preference-Based Policy Learning

Riad Akrour; Marc Schoenauer; Michèle Sebag

Communication Dans Un Congrès Année : 2011

Preference-Based Policy Learning

(1) , (1, 2) , (1, 3)

1
2
3

Riad Akrour

Fonction : Auteur
PersonId : 910562

Machine Learning and Optimisation

Marc Schoenauer

Fonction : Auteur
PersonId : 739309
IdHAL : evomarc
ORCID : 0000-0003-1450-6830
IdRef : 057775575

Machine Learning and Optimisation

Microsoft Research - Inria Joint Centre

Michèle Sebag

Fonction : Auteur
PersonId : 836537

Machine Learning and Optimisation

Laboratoire de Recherche en Informatique

Résumé

Many machine learning approaches in robotics, based on re- inforcement learning, inverse optimal control or direct policy learning, critically rely on robot simulators. This paper investigates a simulator- free direct policy learning, called Preference-based Policy Learning (PPL). PPL iterates a four-step process: the robot demonstrates a candidate pol- icy; the expert ranks this policy comparatively to other ones according to her preferences; these preferences are used to learn a policy return estimate; the robot uses the policy return estimate to build new can- didate policies, and the process is iterated until the desired behavior is obtained. PPL requires a good representation of the policy search space be available, enabling one to learn accurate policy return estimates and limiting the human ranking effort needed to yield a good policy. Further- more, this representation cannot use informed features (e.g., how far the robot is from any target) due to the simulator-free setting. As a second contribution, this paper proposes a representation based on the agnostic exploitation of the robotic log. The convergence of PPL is analytically studied and its experimental validation on two problems, involving a single robot in a maze and two interacting robots, is presented.

Domaines

Apprentissage [cs.LG] Robotique [cs.RO]

Fichier principal

Preference-based_Policy_Learning.pdf (248.74 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Riad Akrour : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00625001

Soumis le : mardi 20 septembre 2011-12:41:44

Dernière modification le : lundi 12 février 2024-09:48:04

Archivage à long terme le : mercredi 21 décembre 2011-02:22:09

Dates et versions

inria-00625001 , version 1 (20-09-2011)

Identifiants

HAL Id : inria-00625001 , version 1

Citer

Riad Akrour, Marc Schoenauer, Michèle Sebag. Preference-Based Policy Learning. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Sep 2011, Athènes, Greece. pp.12-27. ⟨inria-00625001⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

EC-PARIS CNRS INRIA UMR8623 INRIA2 LRI-AO UNIV-PARIS-SACLAY

340 Consultations

1964 Téléchargements

Preference-Based Policy Learning

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager