Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, Machine Learning, vol.22, issue.1, pp.89-129, 2008. ,
DOI : 10.1007/s10994-007-5038-2
URL : https://hal.archives-ouvertes.fr/hal-00830201
Policy search by dynamic programming, Proceedings of Advances in Neural Information Processing Systems 16, 2003. ,
}-valued functions, Proceedings of the fifth annual workshop on Computational learning theory , COLT '92, pp.74-86, 1995. ,
DOI : 10.1145/130385.130423
Error limiting reductions between classification tasks, Proceedings of the 22nd international conference on Machine learning , ICML '05, pp.49-56, 2005. ,
DOI : 10.1145/1102351.1102358
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.81.4721
Error-correcting tournaments. CoRR, abs/0902, p.3176, 2009. ,
DOI : 10.1007/978-3-642-04414-4_22
URL : http://arxiv.org/abs/0902.3176
Linear least-squares algorithms for temporal difference learning, Journal of Machine Learning, vol.22, pp.33-57, 1996. ,
Fast boosting using adversarial bandits, Proceedings of the Twenty-Seventh International Conference on Machine Learning, pp.49-56, 2010. ,
URL : https://hal.archives-ouvertes.fr/in2p3-00614564
A Probabilistic Theory of Pattern Recognition, 1996. ,
DOI : 10.1007/978-1-4612-0711-5
Algorithms and bounds for sampling-based approximate policy iteration, Recent Advances in Reinforcement Learning, 2008. ,
Rollout sampling approximate policy iteration, Machine Learning, vol.4, issue.1, pp.157-171, 2008. ,
DOI : 10.1007/s10994-008-5069-3
URL : http://arxiv.org/abs/0805.2027
Error propagation for approximate policy and value iteration, Advances in Neural Information Processing Systems, 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-00830154
Approximate policy iteration with a policy language bias, Proceedings of Advances in Neural Information Processing Systems 16, 2004. ,
Approximate policy iteration with a policy language bias: Solving relational Markov decision processes, Journal of Artificial Intelligence Research, vol.25, pp.85-118, 2006. ,
Rollout allocation strategies for classification-based policy iteration, ICML 2010 Workshop on Reinforcement Learning and Search in Very Large Spaces, 2010. ,
Dynamic Programming and Markov Processes, 1960. ,
Least-squares policy iteration, Journal of Machine Learning Research, vol.4, pp.1107-1149, 2003. ,
Reinforcement learning as classification: Leveraging modern classifiers, Proceedings of the Twentieth International Conference on Machine Learning, pp.424-431, 2003. ,
Relating reinforcement learning performance to classification performance, Proceedings of the 22nd international conference on Machine learning , ICML '05, pp.473-480, 2005. ,
DOI : 10.1145/1102351.1102411
Focus of attention in reinforcement learning, Journal of Universal Computer Science, vol.13, issue.9, pp.1246-1269, 2007. ,
Multi-class cost-sensitive boosting with p-norm loss functions, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD 08, pp.506-514, 2008. ,
DOI : 10.1145/1401890.1401953
Performance bounds in Lp norm for approximate value iteration, SIAM Journal of Control and Optimization, 2007. ,
URL : https://hal.archives-ouvertes.fr/inria-00124685
Finite time bounds for fitted value iteration, Journal of Machine Learning Research, vol.9, pp.815-857, 2008. ,
URL : https://hal.archives-ouvertes.fr/inria-00120882
Convergence of Stochastic Processes, 1984. ,
DOI : 10.1007/978-1-4612-5254-2
One-sided support vector regression for multiclass cost-sensitive classification, Proceedings of the Twenty-Seventh International Conference on Machine learning, pp.49-56, 2010. ,
Cost-sensitive learning by cost-proportionate example weighting, Third IEEE International Conference on Data Mining, p.435, 2003. ,
DOI : 10.1109/ICDM.2003.1250950
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.108.9874