| inria-00185311, version 2 |
|
|
| See detailed view | BibTeX EndNote TEI RefWorks |
|
|
|||||||
| We consider continuous state, continuous action batch reinforcement learning where the goal is to learn a good policy from a sufficiently rich trajectory generated by some policy. We study a variant of fitted Q-iteration, where the greedy action selection is replaced by searching for a policy in a restricted set of candidate policies by maximizing the average action values. We provide a rigorous analysis of this algorithm, proving what we believe is the first finite-time bound for value-function based algorithms for continuous state and action problems. |
|
|
|
|
|
|
|
|
| 1: | SEQUEL (INRIA Futurs) |
| INRIA – CNRS : UMR8022 – CNRS : UMR8146 – Université des Sciences et Technologies de Lille - Lille I – Université Charles de Gaulle - Lille III – Ecole Centrale de Lille | |
| 2: | Computer and Automation Research Institute of the Hungarian Academy of Sciences (SZTAKI) |
| Computer and Automation Research Institute of the Hungarian Academy of Sciences | |
| 3: | Department of Computing Science, University of Alberta |
| Department of Computing Science, University of Alberta |
|
|
|
|
|
|
|
|
| Domain | : | Computer Science/Learning |
| Available versions: | v1 (2007-11-05) | v2 (2008-01-08) |
| inria-00185311, version 2 | |
| http://hal.inria.fr/inria-00185311/en/ | |
| oai:hal.inria.fr:inria-00185311_v2 | |
| From: Rémi Munos | |
| Submitted on: Tuesday, 8 January 2008 16:52:29 | |
| Updated on: Tuesday, 8 January 2008 17:02:01 | |