Difference of Convex Functions Programming for Reinforcement Learning - Archive ouverte HAL Access content directly
Conference Papers Year :

Difference of Convex Functions Programming for Reinforcement Learning

(1, 2) , (1) , (3, 4, 2)


Large Markov Decision Processes are usually solved using Approximate Dy-namic Programming methods such as Approximate Value Iteration or Ap-proximate Policy Iteration. The main contribution of this paper is to show that, alternatively, the optimal state-action value function can be estimated using Difference of Convex functions (DC) Programming. To do so, we study the minimization of a norm of the Optimal Bellman Residual (OBR) T * Q − Q, where T * is the so-called optimal Bellman operator. Control-ling this residual allows controlling the distance to the optimal action-value function, and we show that minimizing an empirical norm of the OBR is consistant in the Vapnik sense. Finally, we frame this optimization problem as a DC program. That allows envisioning using the large related literature on DC Programming to address the Reinforcement Leaning problem.
Fichier principal
Vignette du fichier
5443-difference-of-convex-functions-programming-for-reinforcement-learning.pdf (375.21 Ko) Télécharger le fichier
Origin : Files produced by the author(s)

Dates and versions

hal-01104419 , version 1 (16-01-2015)


  • HAL Id : hal-01104419 , version 1


Bilal Piot, Matthieu Geist, Olivier Pietquin. Difference of Convex Functions Programming for Reinforcement Learning. Advances in Neural Information Processing Systems (NIPS 2014), Dec 2014, Montreal, Canada. ⟨hal-01104419⟩
696 View
879 Download


Gmail Facebook Twitter LinkedIn More