Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, Machine Learning, vol.22, issue.1, pp.89-129, 2008. ,
DOI : 10.1007/s10994-007-5038-2
URL : https://hal.archives-ouvertes.fr/hal-00830201
Cost-sensitive multiclass classification risk bounds, Proceedings of the Thirtieth International Conference on Machine Learning, pp.1391-1399, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00840485
Policy search by dynamic programming, Proceedings of Advances in Neural Information Processing Systems 16, 2003. ,
}-valued functions, Proceedings of the fifth annual workshop on Computational learning theory , COLT '92, pp.74-86, 1995. ,
DOI : 10.1145/130385.130423
Error limiting reductions between classification tasks, Proceedings of the 22nd international conference on Machine learning , ICML '05, pp.49-56, 2005. ,
DOI : 10.1145/1102351.1102358
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.81.4721
Error-Correcting Tournaments, Proceedings of the 20th International Conference on Algorithmic Learning Theory, pp.247-262, 2009. ,
DOI : 10.1137/0214009
URL : http://arxiv.org/abs/0902.3176
Linear least-squares algorithms for temporal difference learning, Journal of Machine Learning, vol.22, pp.33-57, 1996. ,
DOI : 10.1007/978-0-585-33656-5_4
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.143.857
Fast boosting using adversarial bandits, Proceedings of the Twenty-Seventh International Conference on Machine Learning, pp.49-56, 2010. ,
URL : https://hal.archives-ouvertes.fr/in2p3-00614564
A Probabilistic Theory of Pattern Recognition, 1996. ,
DOI : 10.1007/978-1-4612-0711-5
Algorithms and bounds for sampling-based approximate policy iteration, Recent Advances in Reinforcement Learning, 2008. ,
Rollout sampling approximate policy iteration, Machine Learning, vol.4, issue.1, pp.157-171, 2008. ,
DOI : 10.1007/s10994-008-5069-3
URL : http://arxiv.org/abs/0805.2027
Error propagation for approximate policy and value iteration, Advances in Neural Information Processing Systems, 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-00830154
CAPI: Generalized classification-based approximate policy iteration, Proceedings of the Multidisciplinary Conference on Reinforcement Learning and Decision Making, 2013. ,
DOI : 10.1109/tac.2015.2418411
Approximate policy iteration with a policy language bias, Proceedings of Advances in Neural Information Processing Systems 16, 2004. ,
Approximate policy iteration with a policy language bias: Solving relational Markov decision processes, Journal of Artificial Intelligence Research, vol.25, pp.85-118, 2006. ,
Rollout allocation strategies for classification-based policy iteration, ICML Workshop on Reinforcement Learning and Search in Very Large Spaces, 2010. ,
Classification-based policy iteration with a critic, Proceedings of the Twenty-Eighth International Conference on Machine Learning, pp.1049-1056, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00590972
Approximate dynamic programming finally performs well in the game of Tetris, Proceedings of Advances in Neural Information Processing Systems 26, pp.1754-1762, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00921250
Conservative and greedy approaches to classificationbased policy iteration, Proceedings of the Twenty-Sixth Conference on Artificial Intelligence, pp.914-920, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00772610
A distribution-free theory of nonparametric regression, 2002. ,
DOI : 10.1007/b97848
Sphere packing numbers for subsets of the Boolean n-cube with bounded Vapnik-Chervonenkis dimension, Journal of Combinatorial Theory, Series A, vol.69, issue.2, pp.217-232, 1995. ,
DOI : 10.1016/0097-3165(95)90052-7
Dynamic Programming and Markov Processes, 1960. ,
On the Sample Complexity of Reinforcement Learning, 2003. ,
Approximately optimal approximate reinforcement learning, Proceedings of the Nineteenth International Conference on Machine Learning, pp.267-274, 2002. ,
Least-squares policy iteration, Journal of Machine Learning Research, vol.4, pp.1107-1149, 2003. ,
Reinforcement learning as classification: Leveraging modern classifiers, Proceedings of the Twentieth International Conference on Machine Learning, pp.424-431, 2003. ,
Relating reinforcement learning performance to classification performance, Proceedings of the 22nd international conference on Machine learning , ICML '05, pp.473-480, 2005. ,
DOI : 10.1145/1102351.1102411
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.408.1329
Analysis of a classification-based policy iteration algorithm, Proceedings of the Twenty-Seventh International Conference on Machine Learning, pp.607-614, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00482065
Finite-sample analysis of least-squares policy iteration, Journal of Machine Learning Research, vol.13, pp.3041-3074, 2012. ,
URL : https://hal.archives-ouvertes.fr/inria-00528596
Focus of attention in reinforcement learning, Journal of Universal Computer Science, vol.13, issue.9, pp.1246-1269, 2007. ,
Multi-class cost-sensitive boosting with p-norm loss functions, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD 08, pp.506-514, 2008. ,
DOI : 10.1145/1401890.1401953
Error and regret bounds for cost-sensitive multi- class classification reduction to regression, 2010. ,
Performance bounds in Lp norm for approximate value iteration, SIAM Journal of Control and Optimization, 2007. ,
DOI : 10.1137/040614384
URL : https://hal.archives-ouvertes.fr/inria-00124685
Finite time bounds for fitted value iteration, Journal of Machine Learning Research, vol.9, pp.815-857, 2008. ,
URL : https://hal.archives-ouvertes.fr/inria-00120882
Convergence of Stochastic Processes, 1984. ,
DOI : 10.1007/978-1-4612-5254-2
Approximate modified policy iteration, Proceedings of the Twenty-Ninth International Conference on Machine Learning, pp.1207-1214, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00758882