Regret bounds for model-free linear quadratic control, 2018. ,

The generalization ability of online algorithms for dependent data, IEEE Transactions on Information Theory, vol.59, issue.1, pp.573-587, 2013. ,

Statistical guarantees for the EM algorithm: From population to sample-based analysis, The Annals of Statistics, vol.45, issue.1, pp.77-120, 2017. ,

DOI : 10.1214/16-aos1435

URL : http://arxiv.org/pdf/1408.2156

Infinite-horizon policy-gradient estimation, Journal of Artificial Intelligence Research, vol.15, pp.319-350, 2001. ,

DOI : 10.1613/jair.806

URL : https://jair.org/index.php/jair/article/download/10289/24545

Adaptive Algorithms and Stochastic Approximation ,

A finite time analysis of temporal difference learning with linear function approximation, Conference On Learning Theory, pp.1691-1692, 2018. ,

Stochastic approximation with two time scales, Systems & Control Letters, vol.29, issue.5, pp.291-294, 1997. ,

Stochastic approximation: a dynamical systems viewpoint, vol.48, 2009. ,

Online learning and stochastic approximations, vol.17, p.142, 1998. ,

Optimization methods for large-scale machine learning, SIAM Review, vol.60, issue.2, pp.223-311, 2018. ,

On-line Expectation Maximization algorithm for latent data models, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.71, issue.3, pp.593-613, 2009. ,

Stochastic Expectation Maximization with variance reduction, Advances in Neural Information Processing Systems, pp.7978-7988, 2018. ,

Finite sample analysis of twotimescale stochastic approximation with applications to reinforcement learning, Conference On Learning Theory, 2018. ,

Finite sample analyses for td (0) with function approximation, Thirty-Second AAAI Conference on Artificial Intelligence, 2018. ,

, , 2012.

, Note that an exact characterization for C 2 is also possible