Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998. ,
DOI : 10.1109/TNN.1998.712192
Dynamic programming and Markov processes, 1960. ,
Markov Decision Processes ? Discrete Stochastic Dynamic Programming . Probability and mathematical statistics, 1994. ,
Policy gradient methods for reinforcement learning with function approximation, Neural Information Processing Systems (NIPS), pp.1057-1063, 1999. ,
Simple statistical gradient-following algorithms for connectionist reinforcement learning, Machine Learning, vol.8, pp.229-256, 1992. ,
OnActor-Critic Algorithms, SIAM Journal on Control and Optimization, vol.42, issue.4, pp.1143-1166, 2003. ,
DOI : 10.1137/S0363012901385691
Policy gradient methods for robotics. Intelligent Robots and Systems, IEEE/RSJ International Conference on, pp.2219-2225, 2006. ,
Natural Gradient Works Efficiently in Learning, Neural Computation, vol.37, issue.2, pp.251-276, 1998. ,
DOI : 10.1103/PhysRevLett.76.2188
Natural actor-critic, Neurocomput, vol.717, issue.9, pp.1180-1190, 2008. ,
Incremental natural actorcritic algorithms, Advances in Neural Information Processing Systems, pp.105-112, 2008. ,
Evaluation of Policy Gradient Methods and Variants on the Cart-Pole Benchmark, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, pp.254-261, 2007. ,
DOI : 10.1109/ADPRL.2007.368196
The cascade-correlation learning architecture, Advances in Neural Information Processing Systems, pp.524-532, 1989. ,
A direct adaptive method for faster backpropagation learning: the RPROP algorithm, IEEE International Conference on Neural Networks, pp.586-591, 1993. ,
DOI : 10.1109/ICNN.1993.298623
An introduction to variable and feature selection, Journal of Machine Learning Research, vol.3, pp.1157-1182, 2003. ,
Basis Function Adaptation in Temporal Difference Reinforcement Learning, Annals of Operations Research, vol.34, issue.1/2/3, pp.215-238, 2005. ,
DOI : 10.1007/s10479-005-5732-z
Automatic basis function construction for approximate dynamic programming and reinforcement learning, Proceedings of the 23rd international conference on Machine learning , ICML '06, pp.449-456, 2006. ,
DOI : 10.1145/1143844.1143901
Analyzing feature generation for value-function approximation, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.737-744, 2007. ,
DOI : 10.1145/1273496.1273589
Representation policy iteration, In: UAI, pp.372-379, 2005. ,
Constructing basis functions from directed graphs for value function approximation, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.385-392, 2007. ,
DOI : 10.1145/1273496.1273545
Proto-value functions, Proceedings of the 22nd international conference on Machine learning , ICML '05, pp.2169-2231, 2007. ,
DOI : 10.1145/1102351.1102421
Combining td-learning with cascade-correlation networks, pp.632-639, 2003. ,