Reinforcement Learning with a Near Optimal Rate of Convergence

Mohammad Gheshlaghi Azar 1 Rémi Munos 2 Mohammad Ghavamzadeh 2 Hilbert Kappen 1
2 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal
Abstract : We consider the problem of model-free reinforcement learning in the Markovian decision processes (MDP) under the PAC ("probably approximately correct") model. We introduce a new variant of Q-learning, called speedy Q-learning (SQL), to address the problem of the slow convergence in the standard Q-learning algorithm, and prove PAC bounds on the performance of SQL. The bounds show that for any MDP with n state-action pairs and the discount factor \gamma \in [0, 1) a total of O(n \log(n/\delta)/((1 − \gamma)^4\epsilon^2)) step suffices for the SQL algorithm to converge to an \epsilon-optimal action-value function with probability 1 − \delta. We also establish a lower-bound of \Omega(n \log(1/\delta)/((1 − \gamma)^2\epsilon^2)) for all reinforcement learning algorithms, which matches the upper bound in terms of \epsilon, \delta and n (up to a logarithmic factor). Further, our results have better dependencies on \epsilon and 1 −\gamma, and thus, are tighter than the best available results for Q-learning. SQL also improves on existing results for the batch Q-value iteration, so far considered to be more efficient than the incremental methods like Q-learning.
Type de document :
Rapport
[Technical Report] 2011
Liste complète des métadonnées

Littérature citée [24 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00636615
Contributeur : Mohammad Ghavamzadeh <>
Soumis le : mardi 29 novembre 2011 - 18:42:45
Dernière modification le : jeudi 11 janvier 2018 - 06:22:13
Document(s) archivé(s) le : lundi 5 décembre 2016 - 09:36:39

Fichier

SQL_vNov19.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : inria-00636615, version 2

Collections

Citation

Mohammad Gheshlaghi Azar, Rémi Munos, Mohammad Ghavamzadeh, Hilbert Kappen. Reinforcement Learning with a Near Optimal Rate of Convergence. [Technical Report] 2011. 〈inria-00636615v2〉

Partager

Métriques

Consultations de la notice

437

Téléchargements de fichiers

487