D. Abbasi-yadkori, C. Pál, and . Szepesvári, Improved algorithms for linear stochastic bandits, Advances in Neural Information Processing Systems (NIPS), 2011.

A. Agarwal, M. Dudík, S. Kale, J. Langford, and R. E. Schapire, Contextual bandit learning with predictable rewards, Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2012.

A. Agarwal, D. Hsu, S. Kale, J. Langford, L. Li et al., Taming the monster: A fast and simple algorithm for contextual bandits, 2014.

A. Agarwal, S. Bird, M. Cozowicz, L. Hoang, J. Langford et al., A multiworld testing decision service, 2016.

A. Agarwal, A. Krishnamurthy, J. Langford, and H. Luo, Open problem: First-order regret bounds for contextual bandits, Conference on Learning Theory (COLT, 2017.

S. Agrawal and N. Goyal, Thompson sampling for contextual bandits with linear payoffs, Proceedings of the International Conference on Machine Learning (ICML), 2013.

H. Bastani, M. Bayati, and K. Khosravi, Mostly exploration-free algorithms for contextual bandits, 2017.

A. Blum, A. Kalai, and J. Langford, Beating the hold-out: Bounds for k-fold and progressive cross-validation, Conference on Learning Theory (COLT), 1999.

O. Chapelle and L. Li, An empirical evaluation of thompson sampling, Advances in Neural Information Processing Systems (NIPS), 2011.

J. Duchi, E. Hazan, and Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, Journal of Machine Learning Research (JMLR), vol.12, pp.2121-2159, 2011.

M. Dudik, D. Hsu, S. Kale, N. Karampatziakis, J. Langford et al., Efficient optimal learning for contextual bandits, Conference on Uncertainty in Artificial Intelligence (UAI), 2011.

M. Dudik, J. Langford, and L. Li, Doubly robust policy evaluation and learning, Proceedings of the International Conference on Machine Learning (ICML), 2011.

D. Eckles and M. Kaptein, Thompson sampling with the online bootstrap, 2014.

D. J. Foster, A. Agarwal, M. Dudík, H. Luo, and R. E. Schapire, Practical contextual bandits with regression oracles, Proceedings of the International Conference on Machine Learning (ICML), 2018.

S. Hanneke, Theory of disagreement-based active learning. Foundations and Trends in Machine Learning, vol.7, 2014.

X. He, J. Pan, O. Jin, T. Xu, B. Liu et al., Practical lessons from predicting clicks on ads at facebook, Proceedings of the Eighth International Workshop on Data Mining for Online Advertising, 2014.

D. J. Hsu, Algorithms for active learning, 2010.

T. Huang, A. Agarwal, D. J. Hsu, J. Langford, and R. E. Schapire, Efficient and parsimonious agnostic active learning, Advances in Neural Information Processing Systems (NIPS), 2015.

K. G. Jamieson, L. Jain, C. Fernandez, N. J. Glattard, and R. Nowak, Next: A system for real-world development, evaluation, and application of active learning, Advances in Neural Information Processing Systems, 2015.

S. M. Kakade and A. Tewari, On the generalization ability of online strongly convex programming algorithms, Advances in Neural Information Processing Systems (NIPS), 2009.

S. Kannan, J. Morgenstern, A. Roth, B. Waggoner, and Z. S. Wu, A smoothed analysis of the greedy algorithm for the linear contextual bandit problem, Advances in Neural Information Processing Systems (NIPS), 2018.

N. Karampatziakis and J. Langford, Online importance weight aware updates, Conference on Uncertainty in Artificial Intelligence (UAI), 2011.

A. Krishnamurthy, A. Agarwal, T. Huang, H. Daume, I. et al., Active learning for cost-sensitive classification, 2017.

J. Langford and T. Zhang, The epoch-greedy algorithm for multi-armed bandits with side information, Advances in Neural Information Processing Systems (NIPS), 2008.

L. Li, W. Chu, J. Langford, and R. E. Schapire, A contextual-bandit approach to personalized news article recommendation, Proceedings of the 19th international conference on World wide web, 2010.

P. Massart and ´. E. Nédélec, Risk bounds for statistical learning, The Annals of Statistics, vol.34, issue.5, 2006.

H. B. Mcmahan, G. Holt, D. Sculley, M. Young, D. Ebner et al., Ad click prediction: a view from the trenches, Proceedings of the 19th ACM international conference on Knowledge discovery and data mining (KDD), 2013.

I. Osband and B. Van-roy, Bootstrapped thompson sampling and deep exploration, 2015.

N. C. Oza and S. Russell, Online bagging and boosting, Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2001.

Z. Qin, V. Petricek, N. Karampatziakis, L. Li, and J. Langford, Efficient online bootstrapping for large scale learning, Workshop on Parallel and Large-scale Machine Learning (BigLearning@NIPS), 2013.

S. Ross, P. Mineiro, and J. Langford, Normalized online learning, Conference on Uncertainty in Artificial Intelligence (UAI), 2013.

D. Russo, B. Van-roy, A. Kazerouni, and I. Osband, A tutorial on thompson sampling, 2017.

W. R. Thompson, On the likelihood that one unknown probability exceeds another in view of the evidence of two samples, Biometrika, vol.25, issue.3/4, p.1933