Direct Policy Iteration with Demonstrations

Abstract : We consider the problem of learning the optimal policy of an unknown Markov decision process (MDP) when expert demonstrations are available along with interaction samples. We build on classification-based policy iteration to perform a seamless integration of interaction and expert data, thus obtaining an algorithm which can benefit from both sources of information at the same time. Furthermore , we provide a full theoretical analysis of the performance across iterations providing insights on how the algorithm works. Finally, we report an empirical evaluation of the algorithm and a comparison with the state-of-the-art algorithms.
Document type :
Conference papers
Complete list of metadatas

Cited literature [13 references]  Display  Hide  Download

https://hal.inria.fr/hal-01237659
Contributor : Alessandro Lazaric <>
Submitted on : Thursday, December 3, 2015 - 3:46:20 PM
Last modification on : Thursday, August 22, 2019 - 12:10:38 PM
Long-term archiving on : Saturday, April 29, 2017 - 7:49:10 AM

File

DPID_CameraReady.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01237659, version 1

Citation

Jessica Chemali, Alessandro Lazaric. Direct Policy Iteration with Demonstrations. IJCAI - 24th International Joint Conference on Artificial Intelligence, Jul 2015, Buenos Aires, Argentina. ⟨hal-01237659⟩

Share

Metrics

Record views

344

Files downloads

160