Policy-gradient learning of controllers with internal state, 2001. ,
Stochastic optimization, Engineering Cybernetics, vol.5, pp.11-16, 1968. ,
Covariant policy search, Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, 2003. ,
Advantage updating, 1993. ,
Neuron-like elements that can solve difficult learning control problems, IEEE Transaction on Systems, Man and Cybernetics, vol.13, pp.835-846, 1983. ,
Infinite-horizon policy-gradient estimation, Journal of Artificial Intelligence Research, vol.15, pp.319-350, 2001. ,
Experiments with infinite-horizon policy-gradient estimation, Journal of Artificial Intelligence Research, vol.15, pp.351-381, 2001. ,
The Likelihood Principle, Institute of Mathematical Statistics, 1984. ,
Neuro-Dynamic Programming, Athena Scientific, 1996. ,
DOI : 10.1007/0-306-48332-7_333
Incremental natural actor-Critic algorithms, Proceedings of Advances in Neural Information Processing Systems, pp.105-112, 2007. ,
DOI : 10.1016/j.automatica.2009.07.008
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.151.2177
Natural actor???critic algorithms, Automatica, vol.45, issue.11, pp.2471-2482, 2009. ,
DOI : 10.1016/j.automatica.2009.07.008
URL : https://hal.archives-ouvertes.fr/hal-00840470
Sparse On-Line Gaussian Processes, Neural Computation, vol.14, issue.3, pp.641-668, 2002. ,
DOI : 10.1109/34.735807
Algorithms and Representations for Reinforcement Learning, 2005. ,
Sparse Online Greedy Support Vector Regression, Proceedings of the Thirteenth European Conference on Machine Learning, pp.84-96, 2002. ,
DOI : 10.1007/3-540-36755-1_8
Bayes meets Bellman: The Gaussian process approach to temporal difference learning, Proceedings of the Twentieth International Conference on Machine Learning, pp.154-161, 2003. ,
Reinforcement learning with Gaussian processes, Proceedings of the 22nd international conference on Machine learning , ICML '05, pp.201-208, 2005. ,
DOI : 10.1145/1102351.1102377
Bayesian policy gradient algorithms, Proceedings of Advances in Neural Information Processing Systems 19, pp.457-464, 2006. ,
URL : https://hal.archives-ouvertes.fr/hal-00776608
Bayesian actor-critic algorithms, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.297-304, 2007. ,
DOI : 10.1145/1273496.1273534
URL : https://hal.archives-ouvertes.fr/hal-00776608
Stochastic approximation for Monte Carlo optimization, Proceedings of the 18th conference on Winter simulation , WSC '86, pp.356-365, 1986. ,
DOI : 10.1145/318242.318459
Likelihood ratio gradient estimation for stochastic systems, Communications of the ACM, vol.33, issue.10, pp.75-84, 1990. ,
DOI : 10.1145/84537.84552
Likelihood ratio gradient estimation for stochastic recursions, Advances in Applied Probability, vol.27, issue.04, pp.1019-1053, 1995. ,
DOI : 10.2307/3213735
Variance reduction techniques for gradient estimates in reinforcement learning, Journal of Machine Learning Research, vol.5, pp.1471-1530, 2004. ,
Exploiting generative models in discriminative classifiers, Proceedings of Advances in Neural Information Processing Systems 11, 1999. ,
A natural policy gradient, Proceedings of Advances in Neural Information Processing Systems 14, 2002. ,
Reinforcement learning by stochastic hillclimbing on discounted reward, Proceedings of the Twelfth International Conference on Machine Learning, pp.295-303, 1995. ,
Actor-Critic algorithms, Proceedings of Advances in Neural Information Processing Systems 12, pp.1008-1014, 2000. ,
Simulated-Based Methods for Markov Decision Processes, 1998. ,
Neural Networks for Control, 1990. ,
Monte-Carlo is fundamentally unsound. The Statistician, pp.247-249, 1987. ,
Bayes???Hermite quadrature, Journal of Statistical Planning and Inference, vol.29, issue.3, pp.245-260, 1991. ,
DOI : 10.1016/0378-3758(91)90002-V
Reinforcement learning of motor skills with policy gradients, Neural Networks, vol.21, issue.4, pp.682-697, 2008. ,
DOI : 10.1016/j.neunet.2008.02.003
Reinforcement learning for humanoid robotics, Proceedings of the Third IEEE-RAS International Conference on Humanoid Robots, 2003. ,
Natural actor-critic, Proceedings of the Sixteenth European Conference on Machine Learning, pp.280-291, 2005. ,
Calcul des Probabilités, Georges Carré, p.1896 ,
Markov Decision Processes, 1994. ,
DOI : 10.1002/9780470316887
Bayesian Monte Carlo, Proceedings of Advances in Neural Information Processing Systems 15, pp.489-496, 2003. ,
Gaussian Processes in Machine Learning, 2006. ,
DOI : 10.1162/089976602317250933
Sensitivity analysis via likelihood ratios, Proceedings of the 18th conference on Winter simulation , WSC '86, 1986. ,
DOI : 10.1145/318242.318450
Sensitivity Analysis for Simulations via Likelihood Ratios, Operations Research, vol.37, issue.5, 1989. ,
DOI : 10.1287/opre.37.5.830
Some Problems in Monte Carlo Optimization, 1969. ,
On-line Q-learning using Connectionist Systems, 1994. ,
Kernel Methods for Pattern Analysis, 2004. ,
DOI : 10.1017/CBO9780511809682
Temporal credit assignment in reinforcement learning, 1984. ,
Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998. ,
DOI : 10.1109/TNN.1998.712192
Policy gradient methods for reinforcement learning with function approximation, Proceedings of Advances in Neural Information Processing Systems 12, pp.1057-1063, 2000. ,
Hessian matrix distribution for Bayesian policy gradient reinforcement learning, Information Sciences, vol.181, issue.9, pp.1671-1685, 2011. ,
DOI : 10.1016/j.ins.2011.01.001
The optimal reward baseline for gradient-based reinforcement learning, Proceedings of the Seventeenth International Conference on Uncertainty in Artificial Intelligence, pp.538-545, 2001. ,
Simple statistical gradient-following algorithms for connectionist reinforcement learning, Machine Learning, pp.229-256, 1992. ,
DOI : 10.1007/978-1-4615-3618-5_2
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.129.8871