Biasing Approximate Dynamic Programming with a Lower Discount Factor

Marek Petrik 1 Bruno Scherrer 2
2 MAIA - Autonomous intelligent machine
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : Most algorithms for solving Markov decision processes rely on a discount factor, which ensures their convergence. It is generally assumed that using an artificially low discount factor will improve the convergence rate, while sacrificing the solution quality. We however demonstrate that using an artificially low discount factor may significantly improve the solution quality, when used in approximate dynamic programming. We propose two explanations of this phenomenon. The first justification follows directly from the standard approximation error bounds: using a lower discount factor may decrease the approximation error bounds. However, we also show that these bounds are loose, thus their decrease does not entirely justify the improved solution quality. We thus propose another justification: when the rewards are received only sporadically (as in the case of Tetris), we can derive tighter bounds, which support a significant improvement in the solution quality with a decreased discount factor.
Type de document :
Communication dans un congrès
Twenty-Second Annual Conference on Neural Information Processing Systems -NIPS 2008, Dec 2008, Vancouver, Canada. 2008
Liste complète des métadonnées

Littérature citée [8 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00337652
Contributeur : Bruno Scherrer <>
Soumis le : vendredi 7 novembre 2008 - 15:41:37
Dernière modification le : jeudi 11 janvier 2018 - 06:19:50
Document(s) archivé(s) le : mardi 9 octobre 2012 - 15:11:47

Fichier

finaldiscount.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : inria-00337652, version 1

Collections

Citation

Marek Petrik, Bruno Scherrer. Biasing Approximate Dynamic Programming with a Lower Discount Factor. Twenty-Second Annual Conference on Neural Information Processing Systems -NIPS 2008, Dec 2008, Vancouver, Canada. 2008. 〈inria-00337652〉

Partager

Métriques

Consultations de la notice

298

Téléchargements de fichiers

238