Zap Q-Learning for Optimal Stopping - Inria - Institut national de recherche en sciences et technologies du numérique Access content directly
Conference Papers Year : 2020

Zap Q-Learning for Optimal Stopping

Abstract

This paper concerns approximate solutions to the optimal stopping problem for a geometrically ergodic Markov chain on a continuous state space. The starting point is the Galerkin relaxation of the dynamic programming equations that was introduced by Tsitsikilis and Van Roy in the 1990s, which motivated their Q-learning algorithm for optimal stopping. It is known that the convergence rate of Q-learning is in many cases very slow. The reason for slow convergence is explained here, along with a variant of "Zap-Q-learning" algorithm, designed to achieve the optimal rate of convergence. The main contribution is to establish consistency of Zap-Qlearning algorithm for a linear function approximation setting. The theoretical results are illustrated using an example from finance.
No file

Dates and versions

hal-03094388 , version 1 (04-01-2021)

Identifiers

Cite

Shuhang Chen, Adithya Devraj, Ana Bušić, Sean Meyn. Zap Q-Learning for Optimal Stopping. ACC 2020 - American Control Conference, Jul 2020, Denver / Virtual, United States. pp.3920-3925, ⟨10.23919/ACC45564.2020.9147481⟩. ⟨hal-03094388⟩
32 View
0 Download

Altmetric

Share

Gmail Facebook X LinkedIn More