28577 articles – 22062 references  [version française]

hal-00722744, version 1

APRIL: Active Preference-learning based Reinforcement Learning

Riad Akrour () 12, Marc Schoenauer (, http://www.lri.fr/~marc) 12, Michèle Sebag () 2

ECML PKDD 2012 7524 (2012) 116-131

Abstract: This paper focuses on reinforcement learning (RL) with limited prior knowledge. In the domain of swarm robotics for instance, the expert can hardly design a reward function or demonstrate the target behavior, forbidding the use of both standard RL and inverse reinforcement learning. Although with a limited expertise, the human expert is still often able to emit preferences and rank the agent demonstrations. Earlier work has presented an iterative preference-based RL framework: expert preferences are exploited to learn an approximate policy return, thus enabling the agent to achieve direct policy search. Iteratively, the agent selects a new candidate policy and demonstrates it; the expert ranks the new demonstration comparatively to the previous best one; the expert's ranking feedback enables the agent to refine the approximate policy return, and the process is iterated. In this paper, preference-based reinforcement learning is combined with active ranking in order to decrease the number of ranking queries to the expert needed to yield a satisfactory policy. Experiments on the mountain car and the cancer treatment testbeds witness that a couple of dozen rankings enable to learn a competent policy.

  • 1:  TAO (INRIA Saclay - Ile de France)
  • INRIA – CNRS : UMR8623 – Université Paris XI - Paris Sud
  • 2:  Laboratoire de Recherche en Informatique (LRI)
  • CNRS : UMR8623 – Université Paris XI - Paris Sud
  • Domain : Computer Science/Artificial Intelligence
  • Keywords : reinforcement learning – preference learning – interactive optimization – robotics
 
  • hal-00722744, version 1
  • oai:hal.inria.fr:hal-00722744
  • From: 
  • Submitted on: Friday, 3 August 2012 17:50:20
  • Updated on: Sunday, 5 August 2012 08:35:38