All that Glitters is not Gold: Using Landmarks for Reward Shaping in FPG

Olivier Buffet 1 Joerg Hoffmann 1
1 MAIA - Autonomous intelligent machine
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : Landmarks are facts that must be true at some point in any plan. It has recently been proposed in classical planning to use landmarks for the automatic generation of heuristic functions. We herein apply this idea in probabilistic planning. We focus on the FPG tool, which derives a factored policy based on learning from samples into the state space. The rationale is that FPG's performance can be improved significantly by a trivial heuristic that counts the number of false goals; landmarks provide much better estimates at little overhead cost. We devise improved versions of the classical landmarks heuristic, including a Markovian one that, unlike previous ones, does not depend on the state history. As done previously in FPG for the goal counting, we use the heuristics for reward shaping: the planner gets a positive reward when improving the heuristic value. Based on previous work, we argue that such shaping is policy invariant for Markovian heuristics. Our empirical results confirm that the landmarks heuristics are almost as fast as the goal counting, while delivering much more accurate estimates for initial states. In spite of this, overall planner performance is almost never improved. We discuss some intuitions as to why that is so.
Type de document :
Communication dans un congrès
ICAPS-10 Workshop on Planning and Scheduling Under Uncertainty, May 2010, Toronto, Canada. 2010
Liste complète des métadonnées

Littérature citée [16 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00534375
Contributeur : Joerg Hoffmann <>
Soumis le : mercredi 10 novembre 2010 - 09:28:59
Dernière modification le : jeudi 11 janvier 2018 - 06:19:51
Document(s) archivé(s) le : vendredi 26 octobre 2012 - 15:21:14

Fichier

icaps10-ws.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : inria-00534375, version 1

Collections

Citation

Olivier Buffet, Joerg Hoffmann. All that Glitters is not Gold: Using Landmarks for Reward Shaping in FPG. ICAPS-10 Workshop on Planning and Scheduling Under Uncertainty, May 2010, Toronto, Canada. 2010. 〈inria-00534375〉

Partager

Métriques

Consultations de la notice

221

Téléchargements de fichiers

120