Difference of Convex Functions Programming for Reinforcement Learning

Bilal Piot; Matthieu Geist; Olivier Pietquin

Communication Dans Un Congrès Année : 2014

Difference of Convex Functions Programming for Reinforcement Learning

(1, 2) , (1) , (3, 4, 2)

1
2
3
4

Bilal Piot

Fonction : Auteur
PersonId : 963155

IMS : Information, Multimodalité & Signal

Sequential Learning

Matthieu Geist

Fonction : Auteur
PersonId : 6945
IdHAL : matthieu-geist

IMS : Information, Multimodalité & Signal

Olivier Pietquin

Fonction : Auteur
PersonId : 4024
IdHAL : olivier-pietquin
ORCID : 0000-0002-5386-465X
IdRef : 142821861

Institut universitaire de France

Laboratoire d'Informatique Fondamentale de Lille

Sequential Learning

Résumé

Large Markov Decision Processes are usually solved using Approximate Dy-namic Programming methods such as Approximate Value Iteration or Ap-proximate Policy Iteration. The main contribution of this paper is to show that, alternatively, the optimal state-action value function can be estimated using Difference of Convex functions (DC) Programming. To do so, we study the minimization of a norm of the Optimal Bellman Residual (OBR) T * Q − Q, where T * is the so-called optimal Bellman operator. Control-ling this residual allows controlling the distance to the optimal action-value function, and we show that minimizing an empirical norm of the OBR is consistant in the Vapnik sense. Finally, we frame this optimization problem as a DC program. That allows envisioning using the large related literature on DC Programming to address the Reinforcement Leaning problem.

Domaines

Informatique [cs] Sciences de l'ingénieur [physics]

Fichier principal

5443-difference-of-convex-functions-programming-for-reinforcement-learning.pdf (375.21 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Olivier Pietquin : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01104419

Soumis le : vendredi 16 janvier 2015-16:49:27

Dernière modification le : lundi 15 avril 2024-11:25:23

Archivage à long terme le : vendredi 11 septembre 2015-06:59:32

Dates et versions

hal-01104419 , version 1 (16-01-2015)

Identifiants

HAL Id : hal-01104419 , version 1

Citer

Bilal Piot, Matthieu Geist, Olivier Pietquin. Difference of Convex Functions Programming for Reinforcement Learning. Advances in Neural Information Processing Systems (NIPS 2014), Dec 2014, Montreal, Canada. ⟨hal-01104419⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

SUPELEC UNIV-LILLE3 CNRS INRIA CRISTAL INRIA2 CRISTAL-SEQUEL UNIV-LILLE ANR

711 Consultations

905 Téléchargements

Difference of Convex Functions Programming for Reinforcement Learning

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager