Difference of Convex Functions Programming for Reinforcement Learning

Bilal Piot 1, 2 Matthieu Geist 1 Olivier Pietquin 3, 4, 2
1 IMS - Equipe Information, Multimodalité et Signal
UMI2958 - Georgia Tech - CNRS [Metz], SUPELEC-Campus Metz
2 SEQUEL - Sequential Learning
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
Abstract : Large Markov Decision Processes are usually solved using Approximate Dy-namic Programming methods such as Approximate Value Iteration or Ap-proximate Policy Iteration. The main contribution of this paper is to show that, alternatively, the optimal state-action value function can be estimated using Difference of Convex functions (DC) Programming. To do so, we study the minimization of a norm of the Optimal Bellman Residual (OBR) T * Q − Q, where T * is the so-called optimal Bellman operator. Control-ling this residual allows controlling the distance to the optimal action-value function, and we show that minimizing an empirical norm of the OBR is consistant in the Vapnik sense. Finally, we frame this optimization problem as a DC program. That allows envisioning using the large related literature on DC Programming to address the Reinforcement Leaning problem.
Type de document :
Communication dans un congrès
Advances in Neural Information Processing Systems (NIPS 2014), Dec 2014, Montreal, Canada. 〈http://nips.cc/Conferences/2014/〉
Liste complète des métadonnées

Littérature citée [22 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01104419
Contributeur : Olivier Pietquin <>
Soumis le : vendredi 16 janvier 2015 - 16:49:27
Dernière modification le : mardi 3 juillet 2018 - 11:43:09
Document(s) archivé(s) le : vendredi 11 septembre 2015 - 06:59:32

Fichier

5443-difference-of-convex-func...
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01104419, version 1

Citation

Bilal Piot, Matthieu Geist, Olivier Pietquin. Difference of Convex Functions Programming for Reinforcement Learning. Advances in Neural Information Processing Systems (NIPS 2014), Dec 2014, Montreal, Canada. 〈http://nips.cc/Conferences/2014/〉. 〈hal-01104419〉

Partager

Métriques

Consultations de la notice

774

Téléchargements de fichiers

846