Preference-based reinforcement learning: evolutionary direct policy search using a preference-based racing algorithm - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Article Dans Une Revue Machine Learning Année : 2014

Preference-based reinforcement learning: evolutionary direct policy search using a preference-based racing algorithm

Résumé

We introduce a novel approach to preference-based reinforcement learn-ing, namely a preference-based variant of a direct policy search method based on evolutionary optimization. The core of our approach is a preference-based racing algorithm that selects the best among a given set of candidate policies with high probability. To this end, the algorithm operates on a suitable ordinal preference structure and only uses pairwise comparisons between sample rollouts of the policies. Embedding the racing algorithm in a rank-based evolutionary search procedure, we show that approxima-tions of the so-called Smith set of optimal policies can be produced with certain theoretical guarantees. Apart from a formal performance and complexity analysis, we present first experimental studies showing that our approach performs well in practice.
Fichier principal
Vignette du fichier
revised_1_1.pdf (556.46 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01079370 , version 1 (01-11-2014)

Identifiants

Citer

Róbert Busa-Fekete, Balázs Szörényi, Paul Weng, Weiwei Cheng, Eyke Hüllermeier. Preference-based reinforcement learning: evolutionary direct policy search using a preference-based racing algorithm. Machine Learning, 2014, 97 (3), pp.327-351. ⟨10.1007/s10994-014-5458-8⟩. ⟨hal-01079370⟩
422 Consultations
409 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More