APRIL: Active Preference-learning based Reinforcement Learning - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2012

APRIL: Active Preference-learning based Reinforcement Learning

Résumé

This paper focuses on reinforcement learning (RL) with limited prior knowledge. In the domain of swarm robotics for instance, the expert can hardly design a reward function or demonstrate the target behavior, forbidding the use of both standard RL and inverse reinforcement learning. Although with a limited expertise, the human expert is still often able to emit preferences and rank the agent demonstrations. Earlier work has presented an iterative preference-based RL framework: expert preferences are exploited to learn an approximate policy return, thus enabling the agent to achieve direct policy search. Iteratively, the agent selects a new candidate policy and demonstrates it; the expert ranks the new demonstration comparatively to the previous best one; the expert's ranking feedback enables the agent to refine the approximate policy return, and the process is iterated. In this paper, preference-based reinforcement learning is combined with active ranking in order to decrease the number of ranking queries to the expert needed to yield a satisfactory policy. Experiments on the mountain car and the cancer treatment testbeds witness that a couple of dozen rankings enable to learn a competent policy.
Fichier principal
Vignette du fichier
April_camera.pdf (330.79 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00722744 , version 1 (03-08-2012)

Identifiants

Citer

Riad Akrour, Marc Schoenauer, Michèle Sebag. APRIL: Active Preference-learning based Reinforcement Learning. ECML PKDD 2012, Sep 2012, Bristol, United Kingdom. pp.116-131. ⟨hal-00722744⟩
284 Consultations
255 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More