l1-penalized projected Bellman residual

2 MAIA - Autonomous intelligent machine
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : We consider the task of feature selection for value function approximation in reinforcement learning. A promising approach consists in combining the Least-Squares Temporal Difference (LSTD) algorithm with $\ell_1$-regularization, which has proven to be effective in the supervised learning community. This has been done recently whit the LARS-TD algorithm, which replaces the projection operator of LSTD with an $\ell_1$-penalized projection and solves the corresponding fixed-point problem. However, this approach is not guaranteed to be correct in the general off-policy setting. We take a different route by adding an $\ell_1$-penalty term to the projected Bellman residual, which requires weaker assumptions while offering a comparable performance. However, this comes at the cost of a higher computational complexity if only a part of the regularization path is computed. Nevertheless, our approach ends up to a supervised learning problem, which let envision easy extensions to other penalties.
Type de document :
Communication dans un congrès
European Wrokshop on Reinforcement Learning (EWRL 11), Sep 2011, Athens, Greece. 2011

Littérature citée [25 références]

https://hal.inria.fr/hal-00644507
Contributeur : Bruno Scherrer <>
Soumis le : jeudi 24 novembre 2011 - 15:16:07
Dernière modification le : jeudi 11 janvier 2018 - 06:19:51
Document(s) archivé(s) le : vendredi 16 novembre 2012 - 11:58:09

Fichier

gs_ewrl_l1_cr.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

• HAL Id : hal-00644507, version 1

Citation

Matthieu Geist, Bruno Scherrer. l1-penalized projected Bellman residual. European Wrokshop on Reinforcement Learning (EWRL 11), Sep 2011, Athens, Greece. 2011. 〈hal-00644507〉

Métriques

Consultations de la notice

339

Téléchargements de fichiers