A. Agarwal and J. C. Duchi, Distributed delayed stochastic optimization, Advances in Neural Information Processing Systems, pp.873-881, 2011.

S. Arya and Y. Yang, Randomized allocation with nonparametric estimation for contextual multi-armed bandits with delayed rewards, 2019.

J. Audibert and S. Bubeck, Minimax policies for bandits games, 2009.

P. Auer, N. Cesa-bianchi, and P. Fischer, Finite-time analysis of the multiarmed bandit problem, Machine learning, vol.47, issue.2-3, pp.235-256, 2002.

A. Carpentier and A. K. Kim, Adaptive and minimax optimal estimation of the tail coefficient, Statistica Sinica, pp.1133-1144, 2015.

N. Cesa-bianchi, C. Gentile, and Y. Mansour, Nonstochastic bandits with composite anonymous feedback, Conference On Learning Theory, pp.750-773, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01916981

O. Chapelle, Modeling delayed feedback in display advertising, Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp.1097-1105, 2014.

O. Chapelle and L. Li, An empirical evaluation of thompson sampling, Advances in neural information processing systems, pp.2249-2257, 2011.

T. M. Cover, Universal portfolios, The Kelly Capital Growth Investment Criterion: Theory and Practice, pp.181-209, 2011.

L. De-haan and A. Ferreira, Extreme value theory: an introduction, 2007.

M. Dudik, D. Hsu, S. Kale, N. Karampatziakis, J. Langford et al., Efficient optimal learning for contextual bandits, 2011.

J. Garcia, F. R. Ervin, and R. A. Koelling, Learning with prolonged delay of reinforcement, Psychonomic Science, vol.5, issue.3, pp.121-122, 1966.

S. Garg and A. K. Akash, Stochastic bandits with delayed composite anonymous feedback, 2019.

P. Joulani, A. Gyorgy, C. Szepesvári, P. Joulani, A. Gyorgy et al., Delaytolerant online convex optimization: Unified analysis and adaptive-gradient algorithms, Thirtieth AAAI Conference on Artificial Intelligence, pp.1453-1461, 2013.

T. L. Lai and H. Robbins, Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, vol.6, issue.1, pp.4-22, 1985.

J. Langford, A. J. Smola, and M. Zinkevich, Slow learners are fast, Proceedings of the 22nd International Conference on Neural Information Processing Systems, pp.2331-2339, 2009.

T. Lattimore and C. Szepesvári, , 2018.

T. Lattimore and C. Szepesvári, Bandit algorithms, 2019.

. Com/book and . Pdf,

T. Mandel, Y. Liu, E. Brunskill, and Z. Popovi?, The queue method: Handling delay, heuristics, prior data, and evaluation in bandits, Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015.

T. A. Mann, S. Gowal, R. Jiang, H. Hu, B. Lakshminarayanan et al., Learning from delayed outcomes with intermediate observations, 2018.

B. Mcmahan, M. Streeter, C. Pike-burke, S. Agrawal, C. Szepesvari et al., Delay-tolerant algorithms for asynchronous distributed online learning, Advances in Neural Information Processing Systems, pp.4102-4110, 2014.

K. Quanrud and D. Khashabi, Online learning with adversarial delays, Advances in neural information processing systems, pp.1270-1278, 2015.

S. Sra, A. W. Yu, M. Li, A. J. Smola, and . Adadelay, Delay adaptive distributed stochastic convex optimization, 2015.

T. S. Thune, N. Cesa-bianchi, and Y. Seldin, Nonstochastic multiarmed bandits with unrestricted delays, 2019.

C. Vernade, O. Cappé, and V. Perchet, Stochastic bandit models for delayed conversions, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01545667

C. Vernade, A. Carpentier, G. Zappella, B. Ermis, and M. Brueckner, Contextual bandits under delayed feedback, 2018.

M. J. Weinberger and E. Ordentlich, On delayed prediction of individual sequences, IEEE Transactions on Information Theory, vol.48, issue.7, pp.1959-1976, 2002.

Y. Yoshikawa and Y. Imai, A nonparametric delayed feedback model for conversion rate prediction, 2018.

Z. Zhou, R. Xu, and J. Blanchet, Learning in generalized linear contextual bandits with stochastic delays, Advances in Neural Information Processing Systems, vol.32, pp.5198-5209, 2019.