A. Agarwal, M. Dudík, S. Kale, J. Langford, and R. E. Schapire, Contextual bandit learning with predictable rewards, Proceedings of the 15th International Conference on Artificial Intelligence and Statistics (AISTATS'12), 2012.

A. Anandkumar, D. P. Foster, D. Hsu, S. Kakade, and Y. Liu, A Spectral Algorithm for Latent Dirichlet Allocation, Proceedings of Advances in Neural Information Processing Systems 25 (NIPS'12), pp.926-934, 2012.
DOI : 10.1007/s00453-014-9909-1

A. Anandkumar, R. Ge, D. Hsu, and S. M. Kakade, A tensor spectral approach to learning mixed membership community models, Journal of Machine Learning Research, vol.1, p.65, 2013.

A. Anandkumar, R. Ge, D. Hsu, S. M. Kakade, and M. Telgarsky, Tensor decompositions for learning latent variable models, p.7559, 1210.

A. Anandkumar, D. Hsu, and S. M. Kakade, A method of moments for mixture models and hidden markov models, Proceeding of the 25th Annual Conference on Learning Theory (COLT'12), pp.33-34, 2012.

P. Auer, N. Cesa-bianchi, and P. Fischer, Finite-time analysis of the multi-armed bandit problem, Machine Learning, vol.47, issue.2/3, pp.235-256, 2002.
DOI : 10.1023/A:1013689704352

M. G. Azar, A. Lazaric, and E. Brunskill, Sequential transfer in multi-armed bandit with finite set of models, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00924025

G. Cavallanti, N. Cesa-bianchi, and C. Gentile, Linear algorithms for online multitask classification, Journal of Machine Learning Research, vol.11, pp.2901-2934, 2010.

O. Dekel, P. M. Long, and Y. Singer, Online Multitask Learning, Proceedings of the 19th Annual Conference on Learning Theory (COLT'06), pp.453-467, 2006.
DOI : 10.1007/11776420_34

A. Garivier and E. Moulines, On Upper-Confidence Bound Policies for Switching Bandit Problems, Proceedings of the 22nd international conference on Algorithmic learning theory, pp.174-188, 2011.
DOI : 10.1007/978-3-642-24412-4_16

F. Kleibergen and R. Paap, Generalized reduced rank tests using the singular value decomposition, Journal of Econometrics, vol.133, issue.1, pp.97-126, 2006.
DOI : 10.1016/j.jeconom.2005.02.011

J. Langford and T. Zhang, The epoch-greedy algorithm for multi-armed bandits with side information, Proceedings of Advances in Neural Information Processing Systems 20 (NIPS'07), 2007.

A. Lazaric, Transfer in Reinforcement Learning: A Framework and a Survey, Reinforcement Learning: State of the Art, 2011.
DOI : 10.1007/978-3-642-27645-3_5

URL : https://hal.archives-ouvertes.fr/hal-00772626

G. Lugosi, O. Papaspiliopoulos, and G. Stoltz, Online multi-task learning with hard constraints, Proceedings of the 22nd Annual Conference on Learning Theory (COLT'09), 2009.
URL : https://hal.archives-ouvertes.fr/hal-00362643

T. A. Mann and Y. Choe, Directed exploration in reinforcement learning with transferred knowledge, Proceedings of the Tenth European Workshop on Reinforcement Learning, 2012.

S. J. Pan and Q. Yang, A Survey on Transfer Learning, IEEE Transactions on Knowledge and Data Engineering, vol.22, issue.10, pp.1345-1359, 2010.
DOI : 10.1109/TKDE.2009.191

H. Robbins, Some aspects of the sequential design of experiments, Bulletin of the American Mathematical Society, vol.58, issue.5, pp.527-535, 1952.
DOI : 10.1090/S0002-9904-1952-09620-8

A. Saha, P. Rai, I. Daumé, H. Venkatasubramanian, and S. , Online learning of multiple tasks and their relationships, Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS'11), 2011.

G. W. Stewart and J. Sun, Matrix perturbation theory, 1990.

M. E. Taylor, Transfer in Reinforcement Learning Domains, 2009.
DOI : 10.1007/978-3-642-01882-4

P. Wedin, Perturbation bounds in connection with singular value decomposition, BIT, vol.2, issue.1, pp.99-111, 1972.
DOI : 10.1007/BF01932678