Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning, Artificial Intelligence, vol.112, issue.1-2, pp.181-211, 1999. ,

DOI : 10.1016/S0004-3702(99)00052-1

Automatic discovery of subgoals in reinforcement learning using diverse density, Proceedings of the Eighteenth International Conference on Machine Learning, pp.361-368, 2001. ,

Q-Cut???Dynamic Discovery of Sub-goals in Reinforcement Learning, Proceedings of the 13th European Conference on Machine Learning, pp.295-306, 2002. ,

DOI : 10.1007/3-540-36755-1_25

Using relative novelty to identify useful temporal abstractions in reinforcement learning, Proceedings of the Twenty-first International Conference on Machine Learning, 2004. ,

Automatic construction of temporally extended actions for mdps using bisimulation metrics, Proceedings of the 9th European Conference on Recent Advances in Reinforcement Learning, pp.140-152, 2012. ,

Unified inter and intra options learning using policy gradient methods, EWRL, pp.153-164 ,

Options with Exceptions, Proceedings of the 9th European Conference on Recent Advances in Reinforcement Learning, pp.165-176, 2012. ,

DOI : 10.1007/978-3-642-29946-9_18

Time-regularized interrupting options (TRIO), Proceedings of the 31th International Conference on Machine Learning, ICML 2014 Conference Proceedings, pp.1350-1358, 2014. ,

A deep hierarchical approach to lifelong learning in minecraft, Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, pp.1553-1561, 2017. ,

Learning Options in Reinforcement Learning, SARA, volume 2371 of Lecture Notes in Computer Science, pp.212-223, 2002. ,

DOI : 10.1007/3-540-45622-8_16

URL : http://rl.cs.mcgill.ca/~mstoll/stolle-precup.pdf

Scaling up approximate value iteration with options: Better policies with fewer iterations, Proceedings of the 31th International Conference on Machine Learning, ICML 2014 Conference Proceedings, pp.127-135, 2014. ,

The utility of temporal abstraction in reinforcement learning, The Seventh International Joint Conference on Autonomous Agents and Multiagent Systems, 2008. ,

PAC-inspired Option Discovery in Lifelong Reinforcement Learning, Proceedings of the 31st International Conference on Machine Learning, ICML 2014 JMLR Proceedings, pp.316-324, 2014. ,

Exploration?exploitation in mdps with options, Proceedings of Machine Learning Research, pp.576-584, 2017. ,

URL : https://hal.archives-ouvertes.fr/hal-01493567

Near-optimal regret bounds for reinforcement learning, Journal of Machine Learning Research, vol.11, pp.1563-1600, 2010. ,

Denumerable Undiscounted Semi-Markov Decision Processes with Unbounded Rewards, Mathematics of Operations Research, vol.8, issue.2, pp.298-313, 1983. ,

DOI : 10.1287/moor.8.2.298

An analysis of model-based interval estimation for markov decision processes, Journal of Computer and System Sciences, vol.74, issue.8, pp.1309-1331, 2008. ,

Mixing time estimation in reversible markov chains from a single sample path, Proceedings of the 28th International Conference on Neural Information Processing Systems, NIPS 15, pp.1459-1467, 2015. ,

Sample complexity of episodic fixed-horizon reinforcement learning, Proceedings of the 28th International Conference on Neural Information Processing Systems, pp.2818-2826, 2015. ,

Comparison of perturbation bounds for the stationary distribution of a markov chain, Linear Algebra and its Applications, vol.335, issue.1, pp.137-150, 2001. ,

On optimal condition numbers for Markov chains, Numerische Mathematik, vol.66, issue.4, pp.521-537, 2008. ,

DOI : 10.1007/978-3-642-05156-2

URL : http://myweb.polyu.edu.hk/~marsze/PDF/condition.numbers.pdf

Sensitivity of finite Markov chains under perturbation, Statistics & Probability Letters, vol.17, issue.2, pp.163-168, 1993. ,

DOI : 10.1016/0167-7152(93)90011-7

Hierarchical reinforcement learning with the maxq value function decomposition, Journal of Artificial Intelligence Research, vol.13, pp.227-303, 2000. ,

Optimism in the face of uncertainty should be refutable. Minds and Machines, pp.521-526, 2008. ,

Applied Probability Models with Optimization Applications, chapter 3: Recurrence and Ergodicity, 1999. ,

Applied Probability Models with Optimization Applications, chapter 2: Discrete-Time Markov Models, 1999. ,

Markov Decision Processes: Discrete Stochastic Dynamic Programming ,

Regal: A regularization based algorithm for reinforcement learning in weakly communicating mdps, Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI '09, pp.35-42, 2009. ,

Concentration inequalities for Markov chains by Marton couplings and spectral methods, Electronic Journal of Probability, vol.20, issue.0, 2015. ,

DOI : 10.1214/EJP.v20-4039

URL : http://doi.org/10.1214/ejp.v20-4039

Basic tail and concentration bounds, Course on Mathematical Statistics, issue.2, 2015. ,