Direct Value Learning: a Preference-based Approach to Reinforcement Learning

David Meunier 1, 2 Yutaka Deguchi 3 Riad Akrour 1, 2 Enoshin Suzuki 3 Marc Schoenauer 1, 2 Michèle Sebag 1, 2
2 TAO - Machine Learning and Optimisation
CNRS - Centre National de la Recherche Scientifique : UMR8623, Inria Saclay - Ile de France, UP11 - Université Paris-Sud - Paris 11, LRI - Laboratoire de Recherche en Informatique
Abstract : Learning by imitation, among the most promising techniques for reinforcement learning in complex domains, critically depends on the human designer ability to provide sufficiently many demonstrations of satisfactory quality. The approach presented in this paper, referred to as DIVA (Direct Value Learning for Reinforcement Learning), aims at addressing both above limitations by exploiting simple experiments. The approach stems from a straightforward remark: while it is rather easy to set a robot in a target situation, the quality of its situation will naturally deteriorate upon the action of naive controllers. The demonstration of such naive controllers can thus be used to learn directly a value function, through a preference learning approach. Under some conditions on the transition model, this value function enables to define an optimal controller. The DIVA approach is experimentally demonstrated by teaching a robot to follow another robot. Importantly, the approach does not require any robotic simulator to be available, nor does it require any pattern-recognition primitive (e.g. seeing the other robot) to be provided.
Type de document :
Communication dans un congrès
Johannes Fürnkranz and Eyke Hüllermeier. ECAI-12 Workshop on Preference Learning: Problems and Applications in AI, Aug 2012, Montpellier, France. pp.42-47, 2012, 〈www2.lirmm.fr/ecai2012/images/stories/ecai_doc/pdf/workshop/W30_PL12-Proceedings.pdf〉
Liste complète des métadonnées

Littérature citée [19 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-00932976
Contributeur : Marc Schoenauer <>
Soumis le : dimanche 19 janvier 2014 - 07:24:50
Dernière modification le : jeudi 5 avril 2018 - 12:30:12
Document(s) archivé(s) le : samedi 19 avril 2014 - 22:06:25

Fichier

06-sebag.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00932976, version 1

Collections

Citation

David Meunier, Yutaka Deguchi, Riad Akrour, Enoshin Suzuki, Marc Schoenauer, et al.. Direct Value Learning: a Preference-based Approach to Reinforcement Learning. Johannes Fürnkranz and Eyke Hüllermeier. ECAI-12 Workshop on Preference Learning: Problems and Applications in AI, Aug 2012, Montpellier, France. pp.42-47, 2012, 〈www2.lirmm.fr/ecai2012/images/stories/ecai_doc/pdf/workshop/W30_PL12-Proceedings.pdf〉. 〈hal-00932976〉

Partager

Métriques

Consultations de la notice

329

Téléchargements de fichiers

546