hal-00747921, version 2
Semi-Supervised Apprenticeship Learning
Michal Valko
1Mohammad Ghavamzadeh
1Alessandro Lazaric
1
Journal of Machine Learning Research: Workshop and Conference Proceedings 24 (2012) 131-141
Résumé : In apprenticeship learning we aim to learn a good policy by observing the behavior of an expert or a set of experts. In particular, we consider the case where the expert acts so as to maximize an unknown reward function defined as a linear combination of a set of state features. In this paper, we consider the setting where we observe many sample trajectories (i.e., sequences of states) but only one or a few of them are labeled as experts' trajectories. We investigate the conditions under which the remaining unlabeled trajectories can help in learning a policy with a good performance. In particular, we define an extension to the max-margin inverse reinforcement learning proposed by Abbeel and Ng (2004) where, at each iteration, the max-margin optimization step is replaced by a semi-supervised optimization problem which favors classifiers separating clusters of trajectories. Finally, we report empirical results on two grid-world domains showing that the semi-supervised algorithm is able to output a better policy in fewer iterations than the related algorithm that does not take the unlabeled trajectories into account.
- 1 : SEQUEL (INRIA Lille - Nord Europe)
- INRIA – CNRS : UMR8146 – Université Lille I - Sciences et technologies – Université Lille III - Sciences humaines et sociales – Ecole Centrale de Lille
- Domaine : Statistiques/Machine Learning
- Versions disponibles : v1 (03-11-2012) v2 (16-01-2013)
- hal-00747921, version 2
- http://hal.inria.fr/hal-00747921
- oai:hal.inria.fr:hal-00747921
- Contributeur : Michal Valko
- Soumis le : Mercredi 16 Janvier 2013, 15:21:17
- Dernière modification le : Mercredi 16 Janvier 2013, 21:12:23






Documents associés
Exporter