Improved algorithms for linear stochastic bandits, Advances in Neural Information Processing Systems (NIPS), 2011. ,
A multiworld testing decision service. arXiv preprint, 2016. ,
Contextual bandit learning with predictable rewards, Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2012. ,
Taming the monster: A fast and simple algorithm for contextual bandits. arXiv preprint, 2014. ,
Open problem: First-order regret bounds for contextual bandits, Conference on Learning Theory (COLT), 2017. ,
Thompson sampling for contextual bandits with linear payoffs, Proceedings of the International Conference on Machine Learning (ICML), 2013. ,
Exploiting the natural exploration in contextual bandits, 2017. ,
Beating the hold-out, Proceedings of the twelfth annual conference on Computational learning theory , COLT '99, 1999. ,
DOI : 10.1145/307400.307439
An empirical evaluation of thompson sampling, Advances in Neural Information Processing Systems (NIPS), 2011. ,
Adaptive subgradient methods for online learning and stochastic optimization, Journal of Machine Learning Research, vol.12, pp.2121-2159, 2011. ,
Efficient optimal learning for contextual bandits, Conference on Uncertainty in Artificial Intelligence (UAI), 2011. ,
Doubly robust policy evaluation and learning, Proceedings of the International Conference on Machine Learning (ICML), 2011. ,
Thompson sampling with the online bootstrap. arXiv preprint, 2014. ,
Practical contextual bandits with regression oracles, ICML, 2018. ,
Theory of Disagreement-Based Active Learning, Machine Learning, p.2014 ,
DOI : 10.1561/2200000037
Practical Lessons from Predicting Clicks on Ads at Facebook, Proceedings of 20th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, ADKDD'14, 2014. ,
DOI : 10.1145/2648584.2648589
Algorithms for active learning, 2010. ,
Efficient and parsimonious agnostic active learning, Advances in Neural Information Processing Systems (NIPS), 2015. ,
Next: A system for real-world development, evaluation, and application of active learning, Advances in Neural Information Processing Systems, pp.2656-2664, 2015. ,
On the generalization ability of online strongly convex programming algorithms, Advances in Neural Information Processing Systems (NIPS), 2009. ,
A smoothed analysis of the greedy algorithm for the linear contextual bandit problem. arXiv preprint, 2018. ,
Online importance weight aware updates, Conference on Uncertainty in Artificial Intelligence (UAI), 2011. ,
Active learning for cost-sensitive classification. arXiv preprint, 2017. ,
The epoch-greedy algorithm for multi-armed bandits with side information, Advances in Neural Information Processing Systems (NIPS), 2008. ,
A contextual-bandit approach to personalized news article recommendation, Proceedings of the 19th international conference on World wide web, WWW '10, 2010. ,
DOI : 10.1145/1772690.1772758
URL : http://www.cs.rutgers.edu/~lihong/pub/Li10Contextual.pdf
Risk bounds for statistical learning. The Annals of Statistics, 2006. ,
DOI : 10.1214/009053606000000786
URL : http://doi.org/10.1214/009053606000000786
Ad click prediction, Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '13, 2013. ,
DOI : 10.1145/2487575.2488200
Bootstrapped thompson sampling and deep exploration. arXiv preprint, 2015. ,
Online Bagging and Boosting, 2005 IEEE International Conference on Systems, Man and Cybernetics, 2001. ,
DOI : 10.1109/ICSMC.2005.1571498
URL : http://www.cs.berkeley.edu/~oza/papers/aistats01.ps
Efficient online bootstrapping for large scale learning, Workshop on Parallel and Large-scale Machine Learning (BigLearning@NIPS), 2013. ,
Normalized online learning, Conference on Uncertainty in Artificial Intelligence (UAI), 2013. ,
A tutorial on thompson sampling, 2017. ,
On the likelihood that one unknown probability exceeds another in view of the evidence of two samples, Biometrika, vol.25, issue.3, 1933. ,