Semi-Supervised Apprenticeship Learning

Michal Valko 1 Mohammad Ghavamzadeh 1 Alessandro Lazaric 1
1 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal, Inria Lille - Nord Europe
Abstract : In apprenticeship learning we aim to learn a good policy by observing the behavior of an expert or a set of experts. In particular, we consider the case where the expert acts so as to maximize an unknown reward function defined as a linear combination of a set of state features. In this paper, we consider the setting where we observe many sample trajectories (i.e., sequences of states) but only one or a few of them are labeled as experts' trajectories. We investigate the conditions under which the remaining unlabeled trajectories can help in learning a policy with a good performance. In particular, we define an extension to the max-margin inverse reinforcement learning proposed by Abbeel and Ng (2004) where, at each iteration, the max-margin optimization step is replaced by a semi-supervised optimization problem which favors classifiers separating clusters of trajectories. Finally, we report empirical results on two grid-world domains showing that the semi-supervised algorithm is able to output a better policy in fewer iterations than the related algorithm that does not take the unlabeled trajectories into account.
Document type :
Journal articles
Complete list of metadatas
Contributor : Michal Valko <>
Submitted on : Friday, November 2, 2012 - 7:00:48 PM
Last modification on : Thursday, June 27, 2019 - 1:36:43 PM
Long-term archiving on : Sunday, February 3, 2013 - 3:37:31 AM


Files produced by the author(s)


  • HAL Id : hal-00747921, version 1


Michal Valko, Mohammad Ghavamzadeh, Alessandro Lazaric. Semi-Supervised Apprenticeship Learning. Journal of Machine Learning Research, Microtome Publishing, 2012, The 10th European Workshop on Reinforcement Learning, 24. ⟨hal-00747921v1⟩



Record views


Files downloads