Y. Abbasi-yadkori, D. Pál, and C. Szepesvári, Improved algorithms for linear stochastic bandits, Advances in Neural Information Processing Systems (NIPS), 2011.

A. Agarwal, S. Bird, M. Cozowicz, L. Hoang, J. Langford et al., A multiworld testing decision service. arXiv preprint, 2016.

A. Agarwal, M. Dudík, S. Kale, J. Langford, and R. E. Schapire, Contextual bandit learning with predictable rewards, Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2012.

A. Agarwal, D. Hsu, S. Kale, J. Langford, L. Li et al., Taming the monster: A fast and simple algorithm for contextual bandits. arXiv preprint, 2014.

A. Agarwal, A. Krishnamurthy, J. Langford, and H. Luo, Open problem: First-order regret bounds for contextual bandits, Conference on Learning Theory (COLT), 2017.

S. Agrawal and N. Goyal, Thompson sampling for contextual bandits with linear payoffs, Proceedings of the International Conference on Machine Learning (ICML), 2013.

H. Bastani, M. Bayati, and K. Khosravi, Exploiting the natural exploration in contextual bandits, 2017.

A. Blum, A. Kalai, and J. Langford, Beating the hold-out, Proceedings of the twelfth annual conference on Computational learning theory , COLT '99, 1999.
DOI : 10.1145/307400.307439

O. Chapelle and L. Li, An empirical evaluation of thompson sampling, Advances in Neural Information Processing Systems (NIPS), 2011.

J. Duchi, E. Hazan, and Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, Journal of Machine Learning Research, vol.12, pp.2121-2159, 2011.

M. Dudik, D. Hsu, S. Kale, N. Karampatziakis, J. Langford et al., Efficient optimal learning for contextual bandits, Conference on Uncertainty in Artificial Intelligence (UAI), 2011.

M. Dudik, J. Langford, and L. Li, Doubly robust policy evaluation and learning, Proceedings of the International Conference on Machine Learning (ICML), 2011.

D. Eckles and M. Kaptein, Thompson sampling with the online bootstrap. arXiv preprint, 2014.

D. J. Foster, A. Agarwal, M. Dudík, H. Luo, and R. E. Schapire, Practical contextual bandits with regression oracles, ICML, 2018.

S. Hanneke, Theory of Disagreement-Based Active Learning, Machine Learning, p.2014
DOI : 10.1561/2200000037

X. He, J. Pan, O. Jin, T. Xu, B. Liu et al., Practical Lessons from Predicting Clicks on Ads at Facebook, Proceedings of 20th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, ADKDD'14, 2014.
DOI : 10.1145/2648584.2648589

D. J. Hsu, Algorithms for active learning, 2010.

T. Huang, A. Agarwal, D. J. Hsu, J. Langford, and R. E. Schapire, Efficient and parsimonious agnostic active learning, Advances in Neural Information Processing Systems (NIPS), 2015.

K. G. Jamieson, L. Jain, C. Fernandez, N. J. Glattard, and R. Nowak, Next: A system for real-world development, evaluation, and application of active learning, Advances in Neural Information Processing Systems, pp.2656-2664, 2015.

S. M. Kakade and A. Tewari, On the generalization ability of online strongly convex programming algorithms, Advances in Neural Information Processing Systems (NIPS), 2009.

S. Kannan, J. Morgenstern, A. Roth, B. Waggoner, and Z. S. Wu, A smoothed analysis of the greedy algorithm for the linear contextual bandit problem. arXiv preprint, 2018.

N. Karampatziakis and J. Langford, Online importance weight aware updates, Conference on Uncertainty in Artificial Intelligence (UAI), 2011.

A. Krishnamurthy, A. Agarwal, T. Huang, H. Daume, I. et al., Active learning for cost-sensitive classification. arXiv preprint, 2017.

J. Langford and T. Zhang, The epoch-greedy algorithm for multi-armed bandits with side information, Advances in Neural Information Processing Systems (NIPS), 2008.

L. Li, W. Chu, J. Langford, and R. E. Schapire, A contextual-bandit approach to personalized news article recommendation, Proceedings of the 19th international conference on World wide web, WWW '10, 2010.
DOI : 10.1145/1772690.1772758

URL : http://www.cs.rutgers.edu/~lihong/pub/Li10Contextual.pdf

P. Massart and É. Nédélec, Risk bounds for statistical learning. The Annals of Statistics, 2006.
DOI : 10.1214/009053606000000786

URL : http://doi.org/10.1214/009053606000000786

H. B. Mcmahan, G. Holt, D. Sculley, M. Young, D. Ebner et al., Ad click prediction, Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '13, 2013.
DOI : 10.1145/2487575.2488200

I. Osband and B. Van-roy, Bootstrapped thompson sampling and deep exploration. arXiv preprint, 2015.

N. C. Oza and S. Russell, Online Bagging and Boosting, 2005 IEEE International Conference on Systems, Man and Cybernetics, 2001.
DOI : 10.1109/ICSMC.2005.1571498

URL : http://www.cs.berkeley.edu/~oza/papers/aistats01.ps

Z. Qin, V. Petricek, N. Karampatziakis, L. Li, and J. Langford, Efficient online bootstrapping for large scale learning, Workshop on Parallel and Large-scale Machine Learning (BigLearning@NIPS), 2013.

S. Ross, P. Mineiro, and J. Langford, Normalized online learning, Conference on Uncertainty in Artificial Intelligence (UAI), 2013.

D. Russo, B. Van-roy, A. Kazerouni, and I. Osband, A tutorial on thompson sampling, 2017.

W. R. Thompson, On the likelihood that one unknown probability exceeds another in view of the evidence of two samples, Biometrika, vol.25, issue.3, 1933.