l1-penalized projected Bellman residual

Matthieu Geist; Bruno Scherrer

Communication Dans Un Congrès Année : 2011

l1-penalized projected Bellman residual

(1) , (2)

1
2

Matthieu Geist

Fonction : Auteur
PersonId : 6945
IdHAL : matthieu-geist

SUPELEC-Campus Metz

Bruno Scherrer

Fonction : Auteur
PersonId : 1406
IdHAL : bruno-scherrer
IdRef : 073360708

Autonomous intelligent machine

Résumé

We consider the task of feature selection for value function approximation in reinforcement learning. A promising approach consists in combining the Least-Squares Temporal Difference (LSTD) algorithm with $\ell_1$-regularization, which has proven to be effective in the supervised learning community. This has been done recently whit the LARS-TD algorithm, which replaces the projection operator of LSTD with an $\ell_1$-penalized projection and solves the corresponding fixed-point problem. However, this approach is not guaranteed to be correct in the general off-policy setting. We take a different route by adding an $\ell_1$-penalty term to the projected Bellman residual, which requires weaker assumptions while offering a comparable performance. However, this comes at the cost of a higher computational complexity if only a part of the regularization path is computed. Nevertheless, our approach ends up to a supervised learning problem, which let envision easy extensions to other penalties.

Domaines

Intelligence artificielle [cs.AI]

Fichier principal

gs_ewrl_l1_cr.pdf (236.6 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Bruno Scherrer : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00644507

Soumis le : jeudi 24 novembre 2011-15:16:07

Dernière modification le : vendredi 24 mars 2023-14:52:55

Archivage à long terme le : vendredi 16 novembre 2012-11:58:09

Dates et versions

hal-00644507 , version 1 (24-11-2011)

Identifiants

HAL Id : hal-00644507 , version 1

Citer

Matthieu Geist, Bruno Scherrer. l1-penalized projected Bellman residual. European Wrokshop on Reinforcement Learning (EWRL 11), Sep 2011, Athens, Greece. ⟨hal-00644507⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

SUPELEC CNRS INRIA SUP_IMS UNIV-LORRAINE INRIA2 LORIA

195 Consultations

470 Téléchargements

l1-penalized projected Bellman residual

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager