M. Agueh and G. Carlier, Barycenters in the wasserstein space, SIAM Journal on Mathematical Analysis, vol.43, issue.2, pp.904-924, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00637399

S. I. Amari and H. Nagaoka, Methods of information geometry, Translations of mathematical monographs, vol.191, 2007.

P. Auer, N. Cesa-bianchi, and P. Fischer, Finite-time analysis of the multiarmed bandit problem, Machine learning, vol.47, issue.2-3, pp.235-256, 2002.

F. Barbaresco, Information geometry of covariance matrix: Cartan-siegel homogeneous bounded domains, mostow/berger fibration and frechet median, Matrix Information Geometry, pp.199-255, 2013.

R. Bellman, A problem in the sequential design of experiments, Sankhy?: The Indian Journal of Statistics, vol.16, issue.3/4, pp.221-229, 1933.

L. D. Brown, Fundamentals of Statistical Exponential Families: With Applications in Statistical Decision Theory, 1986.

S. Bubeck and N. Cesa-bianchi, Regret analysis of stochastic and nonstochastic multi-armed bandit problems, Foundations and Trends in Machine Learning, vol.5, issue.1, pp.1-122, 2012.

S. Bubeck, R. Munos, and G. Stoltz, Pure exploration in multi-armed bandits problems, pp.23-37, 2009.

O. Cappé, A. Garivier, and É. Kaufmann, , 2012.

I. Csiszár, Sanov property, generalized I-projection and a conditional limit theorem, The Annals of Probability, vol.12, issue.3, pp.768-793, 1984.

M. H. Degroot, Optimal statistical decisions, Wiley Classics Library, vol.82, 2005.

R. Durrett, Probability: theory and examples, 2010.

M. Faheem and P. Senellart, Adaptive web crawling through structure-based link classification, Proc. ICADL, pp.39-51, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01261960

A. Garivier and O. Cappé, The KL-UCB algorithm for bounded stochastic bandits and beyond, COLT. pp, pp.359-376, 2011.

A. Garivier, T. Lattimore, and E. Kaufmann, On explore-then-commit strategies, Advances in Neural Information Processing Systems, vol.29, pp.784-792, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01322906

S. I. Amari and H. Nagaoka, Methods of information geometry, Translations of mathematical monographs, vol.191, 2007.

J. Y. Audibert and S. Bubeck, Best arm identification in multi-armed bandits, COLT. pp, pp.41-53, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00654404

P. Auer, N. Cesa-bianchi, and P. Fischer, Finite-time analysis of the multiarmed bandit problem, Machine learning, vol.47, issue.2-3, pp.235-256, 2002.

R. Bellman, A problem in the sequential design of experiments, Sankhy?: The Indian Journal of Statistics, vol.16, issue.3/4, pp.221-229, 1933.

J. M. Bernardo, Algorithm AS 103: Psi (digamma) function, Journal of the Royal Statistical Society. Series C (Applied Statistics), vol.25, issue.3, pp.315-317, 1976.

L. D. Brown, Fundamentals of Statistical Exponential Families: With Applications in Statistical Decision Theory, 1986.

S. Bubeck, T. Wang, and N. Viswanathan, Multiple identifications in multi-armed bandits, ICML. pp, pp.258-265, 2013.

S. Bubeck, R. Munos, and G. Stoltz, Pure exploration in multi-armed bandits problems, pp.23-37, 2009.

T. M. Cover and J. A. Thomas, Elements of information theory, 2012.

I. Csiszár, Sanov property, generalized I-projection and a conditional limit theorem, The Annals of Probability, vol.12, issue.3, pp.768-793, 1984.

C. Dann and E. Brunskill, Sample complexity of episodic fixed-horizon reinforcement learning, NIPS. pp, pp.2818-2826, 2015.

G. Darmois, Sur les lois de probabilites a estimation exhaustive, C. R. Acad. Sci, pp.1265-1266, 1935.

M. H. Degroot, Optimal statistical decisions, Wiley Classics Library, vol.82, 2005.

M. Faheem and P. Senellart, Adaptive web crawling through structure-based link classification, Proc. ICADL, pp.39-51, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01261960

A. Garivier and O. Cappé, The KL-UCB algorithm for bounded stochastic bandits and beyond, COLT. pp, pp.359-376, 2011.

J. C. Gittins, Bandit processes and dynamic allocation indices, Journal of the Royal Statistical Society. Series B (Methodological), vol.41, issue.2, pp.148-177, 1979.

J. Honda and A. Takemura, An asymptotically optimal policy for finite support models in the multiarmed bandit problem, Machine Learning, vol.85, issue.3, pp.361-391, 2011.

E. Kaufmann, On bayesian index policies for sequential resource allocation, Annals of Statistics, vol.46, issue.2, pp.842-865, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01251606

E. Kaufmann, O. Cappé, and A. Garivier, On Bayesian upper confidence bounds for bandit problems, AISTATS. pp, pp.592-600, 2012.

E. Kaufmann and S. Kalyanakrishnan, Information complexity in bandit subset selection, COLT. pp, pp.228-251, 2013.

B. O. Koopman, On distributions admitting a sufficient statistic, Transactions of the American Mathematical society, vol.39, issue.3, pp.399-409, 1936.

S. Kullback, Information theory and statistics, Courier Corporation, 1997.

T. L. Lai, Asymptotic solutions of bandit problems, Stochastic differential systems, stochastic control theory and applications, pp.275-292, 1988.

T. L. Lai and H. Robbins, Asymptotically efficient adaptive allocation rules, Adv. Appl. Math, vol.6, issue.1, pp.4-22, 1985.

J. Nino-mora, Computing a classic index for finite-horizon bandits, INFORMS Journal on Computing, vol.23, issue.2, pp.254-267, 2011.

I. Osband, D. Russo, and B. Van-roy, More) efficient reinforcement learning via posterior sampling, NIPS. pp, pp.3003-3011, 2013.

S. R. Putta and T. Tulabandhula, Pure exploration in episodic fixed-horizon Markov decision processes, AAMAS. pp, pp.1703-1704, 2017.

H. Robbins, Some aspects of the sequential design of experiments, Bull. Amer. Math. Soc, vol.58, issue.5, pp.527-535, 1952.

S. L. Scott, A modern Bayesian look at the multi-armed bandit, Applied Stochastic Models in Business and Industry, vol.26, issue.6, pp.639-658, 2010.

W. R. Thompson, On the likelihood that one unknown probability exceeds another in view of the evidence of two samples, Biometrika, vol.25, p.285, 1933.