APRIL: Active Preference-learning based Reinforcement Learning

Riad Akrour 1, 2 Marc Schoenauer 1, 2 Michèle Sebag 2
1 TAO - Machine Learning and Optimisation
LRI - Laboratoire de Recherche en Informatique, UP11 - Université Paris-Sud - Paris 11, Inria Saclay - Ile de France, CNRS - Centre National de la Recherche Scientifique : UMR8623
Abstract : This paper focuses on reinforcement learning (RL) with limited prior knowledge. In the domain of swarm robotics for instance, the expert can hardly design a reward function or demonstrate the target behavior, forbidding the use of both standard RL and inverse reinforcement learning. Although with a limited expertise, the human expert is still often able to emit preferences and rank the agent demonstrations. Earlier work has presented an iterative preference-based RL framework: expert preferences are exploited to learn an approximate policy return, thus enabling the agent to achieve direct policy search. Iteratively, the agent selects a new candidate policy and demonstrates it; the expert ranks the new demonstration comparatively to the previous best one; the expert's ranking feedback enables the agent to refine the approximate policy return, and the process is iterated. In this paper, preference-based reinforcement learning is combined with active ranking in order to decrease the number of ranking queries to the expert needed to yield a satisfactory policy. Experiments on the mountain car and the cancer treatment testbeds witness that a couple of dozen rankings enable to learn a competent policy.
Type de document :
Communication dans un congrès
P. Flach et al. ECML PKDD 2012, Sep 2012, Bristol, United Kingdom. Springer Verlag, 7524, pp.116-131, 2012, LNCS
Liste complète des métadonnées

Littérature citée [30 références]  Voir  Masquer  Télécharger

Contributeur : Marc Schoenauer <>
Soumis le : vendredi 3 août 2012 - 17:50:20
Dernière modification le : jeudi 5 avril 2018 - 12:30:12
Document(s) archivé(s) le : vendredi 16 décembre 2016 - 05:15:39


Fichiers produits par l'(les) auteur(s)


  • HAL Id : hal-00722744, version 1
  • ARXIV : 1208.0984



Riad Akrour, Marc Schoenauer, Michèle Sebag. APRIL: Active Preference-learning based Reinforcement Learning. P. Flach et al. ECML PKDD 2012, Sep 2012, Bristol, United Kingdom. Springer Verlag, 7524, pp.116-131, 2012, LNCS. 〈hal-00722744〉



Consultations de la notice


Téléchargements de fichiers