Direct Policy Iteration with Demonstrations - Archive ouverte HAL Access content directly
Conference Papers Year : 2015

Direct Policy Iteration with Demonstrations

(1) , (2, 3)


We consider the problem of learning the optimal policy of an unknown Markov decision process (MDP) when expert demonstrations are available along with interaction samples. We build on classification-based policy iteration to perform a seamless integration of interaction and expert data, thus obtaining an algorithm which can benefit from both sources of information at the same time. Furthermore , we provide a full theoretical analysis of the performance across iterations providing insights on how the algorithm works. Finally, we report an empirical evaluation of the algorithm and a comparison with the state-of-the-art algorithms.
Fichier principal
Vignette du fichier
DPID_CameraReady.pdf (337.12 Ko) Télécharger le fichier
Origin : Files produced by the author(s)

Dates and versions

hal-01237659 , version 1 (03-12-2015)


  • HAL Id : hal-01237659 , version 1


Jessica Chemali, Alessandro Lazaric. Direct Policy Iteration with Demonstrations. IJCAI - 24th International Joint Conference on Artificial Intelligence, Jul 2015, Buenos Aires, Argentina. ⟨hal-01237659⟩
212 View
329 Download


Gmail Facebook Twitter LinkedIn More