Skip to Main content Skip to Navigation
Conference papers

Direct Policy Iteration with Demonstrations

Abstract : We consider the problem of learning the optimal policy of an unknown Markov decision process (MDP) when expert demonstrations are available along with interaction samples. We build on classification-based policy iteration to perform a seamless integration of interaction and expert data, thus obtaining an algorithm which can benefit from both sources of information at the same time. Furthermore , we provide a full theoretical analysis of the performance across iterations providing insights on how the algorithm works. Finally, we report an empirical evaluation of the algorithm and a comparison with the state-of-the-art algorithms.
Document type :
Conference papers
Complete list of metadata

Cited literature [13 references]  Display  Hide  Download
Contributor : Alessandro Lazaric Connect in order to contact the contributor
Submitted on : Thursday, December 3, 2015 - 3:46:20 PM
Last modification on : Thursday, January 20, 2022 - 4:12:31 PM
Long-term archiving on: : Saturday, April 29, 2017 - 7:49:10 AM


Files produced by the author(s)


  • HAL Id : hal-01237659, version 1



Jessica Chemali, Alessandro Lazaric. Direct Policy Iteration with Demonstrations. IJCAI - 24th International Joint Conference on Artificial Intelligence, Jul 2015, Buenos Aires, Argentina. ⟨hal-01237659⟩



Les métriques sont temporairement indisponibles