Reinforcement Learning Based Algorithms for Average Cost Markov Decision Processes, Discrete Event Dynamic Systems: Theory and Applications, pp.23-52, 2007. ,
DOI : 10.1007/s10626-006-0003-y
Learning Algorithms for Markov Decision Processes with Average Cost, SIAM Journal on Control and Optimization, vol.40, issue.3, pp.681-698, 2001. ,
DOI : 10.1137/S0363012999361974
Stochastic Optimization, Engineering Cybernetics, vol.5, pp.11-16, 1968. ,
A Simulated Annealing Algorithm with Constant Temperature for Discrete Stochastic Optimization, Management Science, vol.45, issue.5, pp.748-764, 1999. ,
DOI : 10.1287/mnsc.45.5.748
Natural Gradient Works Efficiently in Learning, Neural Computation, vol.37, issue.2, pp.251-276, 1998. ,
DOI : 10.1103/PhysRevLett.76.2188
On the Generation of Markov Decision Processes, Journal of the Operational Research Society, vol.46, issue.3, pp.354-361, 1995. ,
DOI : 10.1057/jors.1995.50
Advantage Updating, pp.45433-7301, 1993. ,
Residual Algorithms: Reinforcement Learning with Function Approximation, Proceedings of the Twelfth International Conference on Machine Learning, pp.30-37, 1995. ,
DOI : 10.1016/B978-1-55860-377-6.50013-X
Covariant policy search, Proceedings of International Joint Conference on Artificial Intelligence, 2003. ,
Neuron-like elements that can solve difficult learning control problems, IEEE Transactions on Systems, Man and Cybernetics, vol.13, pp.835-846, 1983. ,
DOI : 10.1109/tsmc.1983.6313077
Infinite-horizon policy-gradient estimation, Journal of Artificial Intelligence Research, vol.15, pp.319-350, 2001. ,
Experiments with infinite-horizon, policygradient estimation, Journal of Artificial Intelligence Research, vol.15, pp.351-381, 2001. ,
KnightCap: A Chess Program that Learns by Combining TD(?) with Game-Tree Search, Proceedings of the Fifteenth International Conference on Machine Learning, pp.28-36, 1998. ,
Functional approximations and dynamic programming, Mathematical Tables and Other Aids to Computation, pp.247-251, 1959. ,
DOI : 10.2307/2002797
URL : http://www.dtic.mil/get-tr-doc/pdf?AD=AD0606538
Adaptive Algorithms and Stochastic Approximations, 1990. ,
DOI : 10.1007/978-3-642-75894-2
Dynamic Programming and Optimal Control, Athena Scientific, 1995. ,
Parallel and Distributed Computation, 1989. ,
Improved temporal difference methods with linear function approximation, 2003. ,
A Simultaneous Perturbation Stochastic Approximation-Based Actor???Critic Algorithm for Markov Decision Processes, IEEE Transactions on Automatic Control, vol.49, issue.4, pp.592-598, 2004. ,
DOI : 10.1109/TAC.2004.825622
Adaptive multivariate three-timescale stochastic approximation algorithms for simulation based optimization, ACM Transactions on Modeling and Computer Simulation, vol.15, issue.1, pp.74-107, 2005. ,
DOI : 10.1145/1044322.1044326
Adaptive Newton-based multivariate smoothed functional algorithms for simulation optimization, ACM Transactions on Modeling and Computer Simulation, vol.18, issue.1, pp.1-235, 2007. ,
DOI : 10.1145/1315575.1315577
Incremental Natural Actor-Critic Algorithms, Advances in Neural Information Processing Systems, pp.105-112, 2008. ,
DOI : 10.1016/j.automatica.2009.07.008
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.151.2177
Stochastic approximation with two time scales, Systems & Control Letters, vol.29, issue.5, pp.291-294, 1997. ,
DOI : 10.1016/S0167-6911(97)90015-3
Reinforcement Learning ??? A Bridge Between Numerical Methods and Monte Carlo, 2008. ,
DOI : 10.1142/9789814273633_0004
The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning, SIAM Journal on Control and Optimization, vol.38, issue.2, pp.447-469, 2000. ,
DOI : 10.1137/S0363012997331639
Least-squares temporal difference learning, Proceedings of the Sixteenth International Conference on Machine Learning, pp.49-56, 1999. ,
Generalization in reinforcement learning: Safely approximating the value function, Advances in Neural Information Processing Systems: Proceedings of the 1994 Conference, pp.369-376, 1995. ,
Some Pathological Traps for Stochastic Approximation, SIAM Journal on Control and Optimization, vol.36, issue.4, pp.1293-1314, 1998. ,
DOI : 10.1137/S036301299630759X
URL : https://hal.archives-ouvertes.fr/hal-00694262
Linear least-squares algorithms for temporal difference learning, Machine Learning, pp.33-57, 1996. ,
Perturbation realization, potentials and sensitivity analysis of Markov processes, IEEE Transactions on Automatic Control, vol.42, pp.1382-1393, 1997. ,
An optimal one-way multigrid algorithm for discrete-time stochastic control, IEEE Transactions on Automatic Control, vol.36, issue.8, pp.898-914, 1991. ,
DOI : 10.1109/9.133184
Elevator Group Control using Multiple Reinforcement Learning Agents, Machine Learning, pp.235-262, 1998. ,
Splines and efficiency in dynamic programming, Journal of Mathematical Analysis and Applications, vol.54, issue.2, pp.402-407, 1976. ,
DOI : 10.1016/0022-247X(76)90209-2
Information theoretic justification of Boltzmann selection and its generalization to Tsallis case, 2005 IEEE Congress on Evolutionary Computation, pp.1667-1674, 2005. ,
DOI : 10.1109/CEC.2005.1554889
Bayesian Policy Gradient Algorithms, Advances in Neural Information Processing Systems, vol.19, pp.457-464, 2007. ,
URL : https://hal.archives-ouvertes.fr/hal-00776608
Bayesian actor-critic algorithms, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.297-304, 2007. ,
DOI : 10.1145/1273496.1273534
URL : https://hal.archives-ouvertes.fr/hal-00776608
Likelihood ratio gradient estimation for stochastic systems, Communications of the ACM, vol.33, issue.10, pp.75-84, 1990. ,
DOI : 10.1145/84537.84552
Stable function approximation in dynamic programming An expanded version was published as Technical Report CMU-CS-95-103, Proceedings of the Twelfth International Conference on Machine Learning, pp.261-268, 1995. ,
Variance reduction techniques for gradient estimates in reinforcement learning, Journal of Machine Learning Research, vol.5, pp.1471-1530, 2004. ,
Convergent activation dynamics in continuous time networks, Neural Networks, vol.2, issue.5, pp.331-349, 1989. ,
DOI : 10.1016/0893-6080(89)90018-X
A Natural Policy Gradient, Advances in Neural Information Processing Systems, p.14, 2002. ,
Policy gradient reinforcement learning for fast quadrupedal locomotion, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004, pp.2619-2624, 2004. ,
DOI : 10.1109/ROBOT.2004.1307456
Actor-Critic--Type Learning Algorithms for Markov Decision Processes, SIAM Journal on Control and Optimization, vol.38, issue.1, pp.94-123, 1999. ,
DOI : 10.1137/S036301299731669X
OnActor-Critic Algorithms, SIAM Journal on Control and Optimization, vol.42, issue.4, pp.1143-1166, 2003. ,
DOI : 10.1137/S0363012901385691
Stochastic Approximation Methods for Constrained and Unconstrained Systems, 1978. ,
DOI : 10.1007/978-1-4684-9352-8
Stochastic Approximation Algorithms and Applications, 1997. ,
DOI : 10.1007/978-1-4899-2696-8
Least-Squares Policy Iteration, Journal of Machine Learning Research, vol.4, pp.1107-1149, 2003. ,
Stability by Lyapunov's Direct Method with Applications, 1961. ,
Garnet Natural Actor?Critic Project, 2006. ,
Simulation-based optimization of Markov reward processes, IEEE Transactions on Automatic Control, vol.46, issue.2, pp.191-209, 2001. ,
DOI : 10.1109/9.905687
Control Techniques for Complex Networks, 2007. ,
Nonconvergence to Unstable Points in Urn Models and Stochastic Approximations, The Annals of Probability, vol.18, issue.2, pp.698-712, 1990. ,
DOI : 10.1214/aop/1176990853
Reinforcement learning for humanoid robotics, Proceedings of the Third IEEE-RAS International Conference on Humanoid Robots, 2003. ,
Natural Actor-Critic, Neurocomputing, vol.71, pp.7-9, 2008. ,
DOI : 10.1016/j.neucom.2007.11.026
Reinforcement learning of motor skills with policy gradients, Neural Networks, vol.21, issue.4, 2008. ,
DOI : 10.1016/j.neunet.2008.02.003
Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1994. ,
DOI : 10.1002/9780470316887
Natural Actor-Critic for Road Traffic Optimization, Advances in Neural Information Processing Systems, vol.19, pp.1169-1176, 2007. ,
On-line Q-learning using Connectionist Systems, 1994. ,
Numerical dynamic programming in economics, Handbook of Computational Economics, pp.614-722, 1996. ,
Analytical Mean Squared Error Curves for Temporal Difference Learning, Machine Learning, vol.32, issue.1, pp.5-40, 1998. ,
DOI : 10.1023/A:1007495401240
Temporal credit assignment in reinforcement learning. Doctoral dissertation, 1984. ,
Learning to predict by the methods of temporal differences, Machine Learning, pp.9-44, 1988. ,
DOI : 10.1007/BF00115009
Generalization in reinforcement learning: Successful examples using sparse coarse coding, Advances in Neural Information Processing Systems: Proceedings of the 1995 Conference, pp.1038-1044, 1996. ,
Policy gradient methods for reinforcement learning with function approximation, Advances in Neural Information Processing Systems, pp.1057-1063, 2000. ,
Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998. ,
DOI : 10.1109/TNN.1998.712192
On the Convergence of Temporal Difference Learning with Linear Function Approximation, Machine Learning, vol.42, issue.3, pp.241-267, 2001. ,
DOI : 10.1023/A:1007609817671
Temporal difference learning and TD-Gammon, Communications of the ACM, vol.38, issue.3, pp.58-68, 1995. ,
DOI : 10.1145/203330.203343
Asynchronous Stochastic Approximation and Q-learning, Machine Learning, pp.185-202, 1994. ,
An analysis of temporal-difference learning with function approximation, IEEE Transactions on Automatic Control, vol.42, issue.5, pp.674-690, 1997. ,
DOI : 10.1109/9.580874
Average cost temporal-difference learning, Automatica, vol.35, issue.11, pp.1799-1808, 1999. ,
DOI : 10.1016/S0005-1098(99)00099-0
A Survey of Applications of Markov Decision Processes, Journal of the Operational Research Society, vol.44, issue.11, pp.1073-1096, 1993. ,
DOI : 10.1057/jors.1993.181
Adaptive Signal Processing, 1985. ,
Simple statistical gradient-following algorithms for connectionist reinforcement learning, Machine Learning, pp.229-256, 1992. ,