Y. Abbasi-yadkori, N. Lazic, and C. Szepesvari, Regret bounds for model-free linear quadratic control, 2018.

A. Agarwal, C. John, and . Duchi, The generalization ability of online algorithms for dependent data, IEEE Transactions on Information Theory, vol.59, issue.1, pp.573-587, 2013.

S. Balakrishnan, J. Martin, B. Wainwright, and . Yu, Statistical guarantees for the EM algorithm: From population to sample-based analysis, The Annals of Statistics, vol.45, issue.1, pp.77-120, 2017.
DOI : 10.1214/16-aos1435
URL : http://arxiv.org/pdf/1408.2156

J. Baxter, L. Peter, and . Bartlett, Infinite-horizon policy-gradient estimation, Journal of Artificial Intelligence Research, vol.15, pp.319-350, 2001.
DOI : 10.1613/jair.806
URL : https://jair.org/index.php/jair/article/download/10289/24545

A. Benveniste, P. Priouret, and M. Métivier, Adaptive Algorithms and Stochastic Approximation

J. Bhandari, D. Russo, and R. Singal, A finite time analysis of temporal difference learning with linear function approximation, Conference On Learning Theory, pp.1691-1692, 2018.

S. Vivek and . Borkar, Stochastic approximation with two time scales, Systems & Control Letters, vol.29, issue.5, pp.291-294, 1997.

S. Vivek and . Borkar, Stochastic approximation: a dynamical systems viewpoint, vol.48, 2009.

L. Bottou, Online learning and stochastic approximations, vol.17, p.142, 1998.

L. Bottou, E. Frank, J. Curtis, and . Nocedal, Optimization methods for large-scale machine learning, SIAM Review, vol.60, issue.2, pp.223-311, 2018.

O. Cappé and E. Moulines, On-line Expectation Maximization algorithm for latent data models, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.71, issue.3, pp.593-613, 2009.

J. Chen, J. Zhu, Y. W. Teh, and T. Zhang, Stochastic Expectation Maximization with variance reduction, Advances in Neural Information Processing Systems, pp.7978-7988, 2018.

G. Dalal, B. Szorenyi, G. Thoppe, and S. Mannor, Finite sample analysis of twotimescale stochastic approximation with applications to reinforcement learning, Conference On Learning Theory, 2018.

G. Dalal, B. Szörényi, G. Thoppe, and S. Mannor, Finite sample analyses for td (0) with function approximation, Thirty-Second AAAI Conference on Artificial Intelligence, 2018.

T. Degris, M. White, and R. Sutton, , 2012.

, Note that an exact characterization for C 2 is also possible