Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning, Artificial Intelligence, vol.112, issue.1-2, pp.181-211, 1999. ,
DOI : 10.1016/S0004-3702(99)00052-1
Automatic discovery of subgoals in reinforcement learning using diverse density, Proceedings of the Eighteenth International Conference on Machine Learning, pp.361-368, 2001. ,
Q-Cut???Dynamic Discovery of Sub-goals in Reinforcement Learning, Proceedings of the 13th European Conference on Machine Learning, pp.295-306, 2002. ,
DOI : 10.1007/3-540-36755-1_25
Using relative novelty to identify useful temporal abstractions in reinforcement learning, Proceedings of the Twenty-first International Conference on Machine Learning, 2004. ,
Automatic construction of temporally extended actions for mdps using bisimulation metrics, Proceedings of the 9th European Conference on Recent Advances in Reinforcement Learning, pp.140-152, 2012. ,
Unified inter and intra options learning using policy gradient methods, EWRL, pp.153-164 ,
Options with Exceptions, Proceedings of the 9th European Conference on Recent Advances in Reinforcement Learning, pp.165-176, 2012. ,
DOI : 10.1007/978-3-642-29946-9_18
Time-regularized interrupting options (TRIO), Proceedings of the 31th International Conference on Machine Learning, ICML 2014 Conference Proceedings, pp.1350-1358, 2014. ,
A deep hierarchical approach to lifelong learning in minecraft, Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, pp.1553-1561, 2017. ,
Learning Options in Reinforcement Learning, SARA, volume 2371 of Lecture Notes in Computer Science, pp.212-223, 2002. ,
DOI : 10.1007/3-540-45622-8_16
URL : http://rl.cs.mcgill.ca/~mstoll/stolle-precup.pdf
Scaling up approximate value iteration with options: Better policies with fewer iterations, Proceedings of the 31th International Conference on Machine Learning, ICML 2014 Conference Proceedings, pp.127-135, 2014. ,
The utility of temporal abstraction in reinforcement learning, The Seventh International Joint Conference on Autonomous Agents and Multiagent Systems, 2008. ,
PAC-inspired Option Discovery in Lifelong Reinforcement Learning, Proceedings of the 31st International Conference on Machine Learning, ICML 2014 JMLR Proceedings, pp.316-324, 2014. ,
Exploration?exploitation in mdps with options, Proceedings of Machine Learning Research, pp.576-584, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01493567
Near-optimal regret bounds for reinforcement learning, Journal of Machine Learning Research, vol.11, pp.1563-1600, 2010. ,
Denumerable Undiscounted Semi-Markov Decision Processes with Unbounded Rewards, Mathematics of Operations Research, vol.8, issue.2, pp.298-313, 1983. ,
DOI : 10.1287/moor.8.2.298
An analysis of model-based interval estimation for markov decision processes, Journal of Computer and System Sciences, vol.74, issue.8, pp.1309-1331, 2008. ,
Mixing time estimation in reversible markov chains from a single sample path, Proceedings of the 28th International Conference on Neural Information Processing Systems, NIPS 15, pp.1459-1467, 2015. ,
Sample complexity of episodic fixed-horizon reinforcement learning, Proceedings of the 28th International Conference on Neural Information Processing Systems, pp.2818-2826, 2015. ,
Comparison of perturbation bounds for the stationary distribution of a markov chain, Linear Algebra and its Applications, vol.335, issue.1, pp.137-150, 2001. ,
On optimal condition numbers for Markov chains, Numerische Mathematik, vol.66, issue.4, pp.521-537, 2008. ,
DOI : 10.1007/978-3-642-05156-2
URL : http://myweb.polyu.edu.hk/~marsze/PDF/condition.numbers.pdf
Sensitivity of finite Markov chains under perturbation, Statistics & Probability Letters, vol.17, issue.2, pp.163-168, 1993. ,
DOI : 10.1016/0167-7152(93)90011-7
Hierarchical reinforcement learning with the maxq value function decomposition, Journal of Artificial Intelligence Research, vol.13, pp.227-303, 2000. ,
Optimism in the face of uncertainty should be refutable. Minds and Machines, pp.521-526, 2008. ,
Applied Probability Models with Optimization Applications, chapter 3: Recurrence and Ergodicity, 1999. ,
Applied Probability Models with Optimization Applications, chapter 2: Discrete-Time Markov Models, 1999. ,
Markov Decision Processes: Discrete Stochastic Dynamic Programming ,
Regal: A regularization based algorithm for reinforcement learning in weakly communicating mdps, Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI '09, pp.35-42, 2009. ,
Concentration inequalities for Markov chains by Marton couplings and spectral methods, Electronic Journal of Probability, vol.20, issue.0, 2015. ,
DOI : 10.1214/EJP.v20-4039
URL : http://doi.org/10.1214/ejp.v20-4039
Basic tail and concentration bounds, Course on Mathematical Statistics, issue.2, 2015. ,