Improved algorithms for linear stochastic bandits, Advances in Neural Information Processing Systems, vol.24, pp.2312-2320, 2011. ,
Using confidence bounds for exploitation-exploration trade-offs, J. Mach. Learn. Res, vol.3, pp.397-422, 2003. ,
Near-optimal regret bounds for reinforcement learning, Advances in Neural Information Processing Systems 21, pp.89-96, 2009. ,
Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends R in Machine Learning, vol.5, pp.1-122, 2012. ,
Efficient algorithms for optimum cycle mean and optimum cost to time ratio problems, Proceedings of the 36th Annual ACM/IEEE Design Automation Conference, DAC '99, pp.37-42, 1999. ,
Optimally Sensing a Single Channel Without Prior Information: The Tiling Algorithm and Regret Bounds, IEEE Journal of Selected Topics in Signal Processing, vol.5, issue.1, pp.68-76, 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-00408867
Tight policy regret bounds for improving and decaying bandits, Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI'16, pp.1562-1570, 2016. ,
An algorithmic framework for performing collaborative filtering, Proceedings of the 1999 Conference on Research and Development in Information Retrieval, 1999. ,
Near-optimal regret bounds for reinforcement learning, J. Mach. Learn. Res, vol.11, pp.1563-1600, 2010. ,
Non-stochastic best arm identification and hyperparameter optimization, AISTATS, 2016. ,
Just in time recommendations: Modeling the dynamics of boredom in activity streams, Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, WSDM '15, pp.233-242, 2015. ,
A characterization of the minimum cycle mean in a digraph, vol.23, pp.309-311, 1978. ,
Time-Decaying Bandits for Non-stationary Systems, pp.460-466, 2014. ,
Collaborative filtering with temporal dynamics, Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '09, pp.447-456, 2009. ,
Matrix factorization techniques for recommender systems, Computer, vol.42, issue.8, pp.30-37, 2009. ,
A contextual-bandit approach to personalized news article recommendation, pp.661-670, 2010. ,
Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms, Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, WSDM '11, pp.297-306, 2011. ,
Online regret bounds for markov decision processes with deterministic transitions, Algorithmic Learning Theory, pp.123-137, 2008. ,
Regret bounds for restless markov bandits, Theor. Comput. Sci, vol.558, pp.62-76, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00765450
An mdp-based recommender system, J. Mach. Learn. Res, vol.6, pp.1265-1295, 2005. ,
Best-Arm Identification in Linear Bandits, NIPSAdvances in Neural Information Processing Systems, vol.27, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01075701
Introduction to Reinforcement Learning, 1998. ,
Batch learning from logged bandit feedback through counterfactual risk minimization, J. Mach. Learn. Res, vol.16, issue.1, pp.1731-1755, 2015. ,
Online Learning of Rested and Restless Bandits, IEEE Transactions on Information Theory, vol.58, issue.8, 2012. ,