Abstract : We consider the problem of learning the optimal policy of an unknown Markov decision process (MDP) when expert demonstrations are available along with interaction samples. We build on classification-based policy iteration to perform a seamless integration of interaction and expert data, thus obtaining an algorithm which can benefit from both sources of information at the same time. Furthermore , we provide a full theoretical analysis of the performance across iterations providing insights on how the algorithm works. Finally, we report an empirical evaluation of the algorithm and a comparison with the state-of-the-art algorithms.
https://hal.inria.fr/hal-01237659
Contributor : Alessandro Lazaric <>
Submitted on : Thursday, December 3, 2015 - 3:46:20 PM Last modification on : Friday, December 11, 2020 - 6:44:05 PM Long-term archiving on: : Saturday, April 29, 2017 - 7:49:10 AM
Jessica Chemali, Alessandro Lazaric. Direct Policy Iteration with Demonstrations. IJCAI - 24th International Joint Conference on Artificial Intelligence, Jul 2015, Buenos Aires, Argentina. ⟨hal-01237659⟩