Direct Policy Iteration with Demonstrations

Jessica Chemali 1 Alessandro Lazaric 2, 3
3 SEQUEL - Sequential Learning
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
Abstract : We consider the problem of learning the optimal policy of an unknown Markov decision process (MDP) when expert demonstrations are available along with interaction samples. We build on classification-based policy iteration to perform a seamless integration of interaction and expert data, thus obtaining an algorithm which can benefit from both sources of information at the same time. Furthermore , we provide a full theoretical analysis of the performance across iterations providing insights on how the algorithm works. Finally, we report an empirical evaluation of the algorithm and a comparison with the state-of-the-art algorithms.
Type de document :
Communication dans un congrès
IJCAI - 24th International Joint Conference on Artificial Intelligence, Jul 2015, Buenos Aires, Argentina. 2015
Liste complète des métadonnées

Littérature citée [13 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01237659
Contributeur : Alessandro Lazaric <>
Soumis le : jeudi 3 décembre 2015 - 15:46:20
Dernière modification le : jeudi 11 janvier 2018 - 06:27:32
Document(s) archivé(s) le : samedi 29 avril 2017 - 07:49:10

Fichier

DPID_CameraReady.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01237659, version 1

Citation

Jessica Chemali, Alessandro Lazaric. Direct Policy Iteration with Demonstrations. IJCAI - 24th International Joint Conference on Artificial Intelligence, Jul 2015, Buenos Aires, Argentina. 2015. 〈hal-01237659〉

Partager

Métriques

Consultations de la notice

209

Téléchargements de fichiers

99