Skip to Main content Skip to Navigation
New interface
Reports (Technical report)

Reinforcement Learning with a Near Optimal Rate of Convergence

Mohammad Gheshlaghi Azar 1 Rémi Munos 2 Mohammad Ghavamzadeh 2 Hilbert Kappen 1 
2 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal
Abstract : We consider the problem of model-free reinforcement learning in the Markovian decision processes (MDP) under the PAC ("probably approximately correct") model. We introduce a new variant of Q-learning, called speedy Q-learning (SQL), to address the problem of the slow convergence in the standard Q-learning algorithm, and prove PAC bounds on the performance of SQL. The bounds show that for any MDP with n state-action pairs and the discount factor \gamma \in [0, 1) a total of O(n \log(n/\delta)/((1 − \gamma)^4\epsilon^2)) step suffices for the SQL algorithm to converge to an \epsilon-optimal action-value function with probability 1 − \delta. We also establish a lower-bound of \Omega(n \log(1/\delta)/((1 − \gamma)^2\epsilon^2)) for all reinforcement learning algorithms, which matches the upper bound in terms of \epsilon, \delta and n (up to a logarithmic factor). Further, our results have better dependencies on \epsilon and 1 −\gamma, and thus, are tighter than the best available results for Q-learning. SQL also improves on existing results for the batch Q-value iteration, so far considered to be more efficient than the incremental methods like Q-learning.
Document type :
Reports (Technical report)
Complete list of metadata

Cited literature [24 references]  Display  Hide  Download
Contributor : Mohammad Ghavamzadeh Connect in order to contact the contributor
Submitted on : Tuesday, November 29, 2011 - 6:42:45 PM
Last modification on : Thursday, December 1, 2022 - 2:02:08 PM
Long-term archiving on: : Monday, December 5, 2016 - 9:36:39 AM


Files produced by the author(s)


  • HAL Id : inria-00636615, version 2



Mohammad Gheshlaghi Azar, Rémi Munos, Mohammad Ghavamzadeh, Hilbert Kappen. Reinforcement Learning with a Near Optimal Rate of Convergence. [Technical Report] 2011. ⟨inria-00636615v2⟩



Record views


Files downloads