Regret bounds for model-free linear quadratic control, 2018. ,
The generalization ability of online algorithms for dependent data, IEEE Transactions on Information Theory, vol.59, issue.1, pp.573-587, 2013. ,
Statistical guarantees for the EM algorithm: From population to sample-based analysis, The Annals of Statistics, vol.45, issue.1, pp.77-120, 2017. ,
DOI : 10.1214/16-aos1435
URL : http://arxiv.org/pdf/1408.2156
Infinite-horizon policy-gradient estimation, Journal of Artificial Intelligence Research, vol.15, pp.319-350, 2001. ,
DOI : 10.1613/jair.806
URL : https://jair.org/index.php/jair/article/download/10289/24545
Adaptive Algorithms and Stochastic Approximation ,
A finite time analysis of temporal difference learning with linear function approximation, Conference On Learning Theory, pp.1691-1692, 2018. ,
Stochastic approximation with two time scales, Systems & Control Letters, vol.29, issue.5, pp.291-294, 1997. ,
Stochastic approximation: a dynamical systems viewpoint, vol.48, 2009. ,
Online learning and stochastic approximations, vol.17, p.142, 1998. ,
Optimization methods for large-scale machine learning, SIAM Review, vol.60, issue.2, pp.223-311, 2018. ,
On-line Expectation Maximization algorithm for latent data models, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.71, issue.3, pp.593-613, 2009. ,
Stochastic Expectation Maximization with variance reduction, Advances in Neural Information Processing Systems, pp.7978-7988, 2018. ,
Finite sample analysis of twotimescale stochastic approximation with applications to reinforcement learning, Conference On Learning Theory, 2018. ,
Finite sample analyses for td (0) with function approximation, Thirty-Second AAAI Conference on Artificial Intelligence, 2018. ,
, , 2012.
, Note that an exact characterization for C 2 is also possible