R. S. Sutton, D. Precup, and S. Singh, Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning, Artificial Intelligence, vol.112, issue.1-2, pp.181-211, 1999.
DOI : 10.1016/S0004-3702(99)00052-1

A. Mcgovern and A. G. Barto, Automatic discovery of subgoals in reinforcement learning using diverse density, Proceedings of the Eighteenth International Conference on Machine Learning, pp.361-368, 2001.

I. Menache, S. Mannor, and N. Shimkin, Q-Cut???Dynamic Discovery of Sub-goals in Reinforcement Learning, Proceedings of the 13th European Conference on Machine Learning, pp.295-306, 2002.
DOI : 10.1007/3-540-36755-1_25

Ö. Özgür¸sim¸özgür¸sim¸sek and A. G. Barto, Using relative novelty to identify useful temporal abstractions in reinforcement learning, Proceedings of the Twenty-first International Conference on Machine Learning, 2004.

P. Samuel, C. , and D. Precup, Automatic construction of temporally extended actions for mdps using bisimulation metrics, Proceedings of the 9th European Conference on Recent Advances in Reinforcement Learning, pp.140-152, 2012.

Y. Kfir, N. Levy, and . Shimkin, Unified inter and intra options learning using policy gradient methods, EWRL, pp.153-164

M. Sairamesh and B. Ravindran, Options with Exceptions, Proceedings of the 9th European Conference on Recent Advances in Reinforcement Learning, pp.165-176, 2012.
DOI : 10.1007/978-3-642-29946-9_18

T. Arthur-mann, D. J. Mankowitz, and S. Mannor, Time-regularized interrupting options (TRIO), Proceedings of the 31th International Conference on Machine Learning, ICML 2014 Conference Proceedings, pp.1350-1358, 2014.

C. Tessler, S. Givony, T. Zahavy, D. J. Mankowitz, and S. Mannor, A deep hierarchical approach to lifelong learning in minecraft, Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, pp.1553-1561, 2017.

M. Stolle and D. Precup, Learning Options in Reinforcement Learning, SARA, volume 2371 of Lecture Notes in Computer Science, pp.212-223, 2002.
DOI : 10.1007/3-540-45622-8_16
URL : http://rl.cs.mcgill.ca/~mstoll/stolle-precup.pdf

A. Timothy, S. Mann, and . Mannor, Scaling up approximate value iteration with options: Better policies with fewer iterations, Proceedings of the 31th International Conference on Machine Learning, ICML 2014 Conference Proceedings, pp.127-135, 2014.

N. K. Jong, T. Hester, and P. Stone, The utility of temporal abstraction in reinforcement learning, The Seventh International Joint Conference on Autonomous Agents and Multiagent Systems, 2008.

E. Brunskill and L. Li, PAC-inspired Option Discovery in Lifelong Reinforcement Learning, Proceedings of the 31st International Conference on Machine Learning, ICML 2014 JMLR Proceedings, pp.316-324, 2014.

R. Fruit and A. Lazaric, Exploration?exploitation in mdps with options, Proceedings of Machine Learning Research, pp.576-584, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01493567

T. Jaksch, R. Ortner, and P. Auer, Near-optimal regret bounds for reinforcement learning, Journal of Machine Learning Research, vol.11, pp.1563-1600, 2010.

A. Federgruen, P. J. Schweitzer, and H. C. Tijms, Denumerable Undiscounted Semi-Markov Decision Processes with Unbounded Rewards, Mathematics of Operations Research, vol.8, issue.2, pp.298-313, 1983.
DOI : 10.1287/moor.8.2.298

L. Alexander, M. L. Strehl, and . Littman, An analysis of model-based interval estimation for markov decision processes, Journal of Computer and System Sciences, vol.74, issue.8, pp.1309-1331, 2008.

D. J. Hsu, A. Kontorovich, and C. Szepesvári, Mixing time estimation in reversible markov chains from a single sample path, Proceedings of the 28th International Conference on Neural Information Processing Systems, NIPS 15, pp.1459-1467, 2015.

C. Dann and E. Brunskill, Sample complexity of episodic fixed-horizon reinforcement learning, Proceedings of the 28th International Conference on Neural Information Processing Systems, pp.2818-2826, 2015.

E. Grace, C. D. Cho, and . Meyer, Comparison of perturbation bounds for the stationary distribution of a markov chain, Linear Algebra and its Applications, vol.335, issue.1, pp.137-150, 2001.

S. J. Kirkland, M. Neumann, and N. Sze, On optimal condition numbers for Markov chains, Numerische Mathematik, vol.66, issue.4, pp.521-537, 2008.
DOI : 10.1007/978-3-642-05156-2
URL : http://myweb.polyu.edu.hk/~marsze/PDF/condition.numbers.pdf

E. Seneta, Sensitivity of finite Markov chains under perturbation, Statistics & Probability Letters, vol.17, issue.2, pp.163-168, 1993.
DOI : 10.1016/0167-7152(93)90011-7

G. Thomas and . Dietterich, Hierarchical reinforcement learning with the maxq value function decomposition, Journal of Artificial Intelligence Research, vol.13, pp.227-303, 2000.

R. Ortner, Optimism in the face of uncertainty should be refutable. Minds and Machines, pp.521-526, 2008.

P. Bremaud, Applied Probability Models with Optimization Applications, chapter 3: Recurrence and Ergodicity, 1999.

P. Bremaud, Applied Probability Models with Optimization Applications, chapter 2: Discrete-Time Markov Models, 1999.

L. Martin and . Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming

L. Peter, A. Bartlett, and . Tewari, Regal: A regularization based algorithm for reinforcement learning in weakly communicating mdps, Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI '09, pp.35-42, 2009.

D. Paulin, Concentration inequalities for Markov chains by Marton couplings and spectral methods, Electronic Journal of Probability, vol.20, issue.0, 2015.
DOI : 10.1214/EJP.v20-4039
URL : http://doi.org/10.1214/ejp.v20-4039

M. Wainwright, Basic tail and concentration bounds, Course on Mathematical Statistics, issue.2, 2015.