Reinforcement Learning with a Near Optimal Rate of Convergence

Mohammad Gheshlaghi Azar 1 Rémi Munos 2 Mohammad Ghavamzadeh 2 Hilbert Kappen 1
2 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal
Abstract : We consider the problem of model-free reinforcement learning in the Markovian decision processes (MDP) under the PAC ("probably approximately correct") model. We introduce a new variant of Q-learning, called speedy Q-learning (SQL), to address the problem of the slow convergence in the standard Q-learning algorithm, and prove PAC bounds on the performance of SQL. The bounds show that for any MDP with n state-action pairs and the discount factor \gamma \in [0, 1) a total of O(n \log(n/\delta)/((1 − \gamma)^4\epsilon^2)) step suffices for the SQL algorithm to converge to an \epsilon-optimal action-value function with probability 1 − \delta. We also establish a lower-bound of \Omega(n \log(1/\delta)/((1 − \gamma)^2\epsilon^2)) for all reinforcement learning algorithms, which matches the upper bound in terms of \epsilon, \delta and n (up to a logarithmic factor). Further, our results have better dependencies on \epsilon and 1 −\gamma, and thus, are tighter than the best available results for Q-learning. SQL also improves on existing results for the batch Q-value iteration, so far considered to be more efficient than the incremental methods like Q-learning.
Document type :
Reports
Complete list of metadatas

Cited literature [24 references]  Display  Hide  Download

https://hal.inria.fr/inria-00636615
Contributor : Mohammad Ghavamzadeh <>
Submitted on : Tuesday, November 29, 2011 - 6:42:45 PM
Last modification on : Thursday, February 21, 2019 - 10:52:49 AM
Long-term archiving on : Monday, December 5, 2016 - 9:36:39 AM

File

SQL_vNov19.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : inria-00636615, version 2

Collections

Citation

Mohammad Gheshlaghi Azar, Rémi Munos, Mohammad Ghavamzadeh, Hilbert Kappen. Reinforcement Learning with a Near Optimal Rate of Convergence. [Technical Report] 2011. ⟨inria-00636615v2⟩

Share

Metrics

Record views

529

Files downloads

928