Biasing Approximate Dynamic Programming with a Lower Discount Factor

Marek Petrik; Bruno Scherrer

Communication Dans Un Congrès Année : 2008

Biasing Approximate Dynamic Programming with a Lower Discount Factor

(1) , (2)

1
2

Marek Petrik

Fonction : Auteur

Department of Computer Science [Amherst]

Bruno Scherrer

Fonction : Auteur
PersonId : 1406
IdHAL : bruno-scherrer
IdRef : 073360708

Autonomous intelligent machine

Résumé

Most algorithms for solving Markov decision processes rely on a discount factor, which ensures their convergence. It is generally assumed that using an artificially low discount factor will improve the convergence rate, while sacrificing the solution quality. We however demonstrate that using an artificially low discount factor may significantly improve the solution quality, when used in approximate dynamic programming. We propose two explanations of this phenomenon. The first justification follows directly from the standard approximation error bounds: using a lower discount factor may decrease the approximation error bounds. However, we also show that these bounds are loose, thus their decrease does not entirely justify the improved solution quality. We thus propose another justification: when the rewards are received only sporadically (as in the case of Tetris), we can derive tighter bounds, which support a significant improvement in the solution quality with a decreased discount factor.

Domaines

Intelligence artificielle [cs.AI]

Fichier principal

finaldiscount.pdf (150.18 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Bruno Scherrer : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00337652

Soumis le : vendredi 7 novembre 2008-15:41:37

Dernière modification le : vendredi 24 mars 2023-14:52:51

Archivage à long terme le : mardi 9 octobre 2012-15:11:47

Dates et versions

inria-00337652 , version 1 (07-11-2008)

Identifiants

HAL Id : inria-00337652 , version 1

Citer

Marek Petrik, Bruno Scherrer. Biasing Approximate Dynamic Programming with a Lower Discount Factor. Twenty-Second Annual Conference on Neural Information Processing Systems -NIPS 2008, Dec 2008, Vancouver, Canada. ⟨inria-00337652⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UNIV-LORRAINE INRIA2 LORIA

191 Consultations

492 Téléchargements

Biasing Approximate Dynamic Programming with a Lower Discount Factor

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager