J. Baxter, P. Bartlett, and L. Weaver, Experiments with infinite-horizon, policy-gradient estimation, JAIR, vol.15, pp.351-381, 2001.

O. Buffet and D. Aberdeen, The factored policy-gradient planner, Artificial Intelligence, vol.173, issue.5-6, pp.5-6722, 2009.
DOI : 10.1016/j.artint.2008.11.008

URL : https://hal.archives-ouvertes.fr/inria-00330031

M. Dorigo and M. Colombetti, Robot shaping: developing autonomous agents through learning, Artificial Intelligence, vol.71, issue.2, pp.321-370, 1994.
DOI : 10.1016/0004-3702(94)90047-7

M. Helmert and C. Domshlak, Landmarks, critical paths and abstractions: What's the difference anyway?, Proc. ICAPS'09, 2009.

J. Hoffmann and B. Nebel, The FF planning system: Fast plan generation through heuristic search, JAIR, vol.14, pp.253-302, 2001.

J. Hoffmann, J. Porteous, and L. Sebastia, Ordered landmarks in planning, JAIR, vol.22, pp.215-278, 2004.

E. Karpas and C. Domshlak, Cost-optimal planning with landmarks, Proc. IJCAI'09, 2009.

M. Matari´cmatari´c, Reward functions for accelerated learning, Proc. ICML'94, 1994.

A. Ng, D. Harada, and S. Russell, Policy invariance under reward transformations: Theory and application to reward shaping, Proc. ICML'99, 1999.

J. Randløv, Shaping in reinforcement learning by changing the physics of the problem, Proc. ICML'00, 2000.

S. Richter, M. Helmert, and M. Westphal, Landmarks revisited, Proc. AAAI'08, pp.975-982, 2008.

C. Szepesvári, Reinforcement Learning Algorithms for MDPs, 2009.
DOI : 10.1002/9780470400531.eorms0714

S. Thiébaux, C. Gretton, J. Slaney, D. Price, and F. Kabanza, Decision-theoretic planning with non-Markovian rewards, JAIR, vol.25, pp.17-74, 2006.

E. Wiewiora, Potential-based shaping and Q-value initialization are equivalent, JAIR, vol.19, pp.205-208, 2003.

R. Williams, Simple statistical gradient-following algorithms for connectionnist reinforcement learning, pp.229-256, 1992.

S. Yoon, A. Fern, and B. Givan, FF-Replan: a baseline for probabilistic planning, Proc. ICAPS'07, 2007.