M. Araya-lópez, O. Buffet, V. Thomas, and F. Charpillet, A POMDP extension with belief-dependent rewards, Advances in Neural Information Processing Systems 23 (NIPS-10), 2010.

K. Astrom, Optimal control of Markov processes with incomplete state information, Journal of Mathematical Analysis and Applications, vol.10, issue.1

R. Bellman, A Markovian decision process, Journal of Mathematics and Mechanics, vol.6, issue.5, p.1957
DOI : 10.1512/iumj.1957.6.56038

URL : http://www.iumj.indiana.edu/IUMJ/FTDLOAD/1957/6/56038/pdf

J. Dibangoye, C. Amato, O. Buffet, and F. Charpillet, Optimally solving Dec-POMDPs as continuousstate MDPs, Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence (IJCAI-13), 2013.
DOI : 10.1613/jair.4623

URL : https://hal.archives-ouvertes.fr/hal-00907338

J. Dibangoye, C. Amato, O. Buffet, and F. Charpillet, Optimally solving Dec-POMDPs as continuousstate MDPs, Journal of Artificial Intelligence Research, vol.55, 2016.
DOI : 10.1613/jair.4623

URL : https://hal.archives-ouvertes.fr/hal-00907338

F. Dufour and T. Prieto-rumeau, Approximation of Markov decision processes with general state space, Journal of Mathematical Analysis and Applications, vol.388, issue.2, 2012.
DOI : 10.1016/j.jmaa.2011.11.015

URL : https://hal.archives-ouvertes.fr/hal-00648223

M. Egorov, M. J. Kochenderfer, and J. J. Uudmae, Target surveillance in adversarial environments using POMDPs, Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16), 2016.

R. Fonteneau, S. Murphy, L. Wehenkel, and D. Ernst, Inferring bounds on the performance of a control policy from a sample of trajectories, Proceedings of the IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, 2009.

D. Fox, W. Burgard, and S. Thrun, Active Markov localization for mobile robots, Robotics and Autonomous Systems, vol.25, pp.49-58, 1998.
DOI : 10.1016/s0921-8890(98)00049-9

URL : http://robots.stanford.edu/papers/fox.ras_act_local.ps.gz

E. Hansen, D. Bernstein, and S. Zilberstein, Dynamic programming for partially observable stochastic games, Proceedings of the Nineteenth National Conference on Artificial Intelligence (AAAI-04), 2004.

K. Hinderer, Lipschitz continuity of value functions in Markovian decision processes, Mathematical Methods of Operations Research, vol.62, issue.1, 2005.
DOI : 10.1007/s00186-005-0438-1

S. Ieong, N. Lambert, Y. Shoham, and R. Brafman, Near-optimal search in continuous domains, Proceedings of the National Conference on Artificial Intelligence (AAAI-07), 2007.

H. Kurniawati, D. Hsu, and W. Lee, SARSOP: Efficient point-based POMDP planning by approximating optimally reachable belief spaces, Robotics: Science and Systems IV, 2008.
DOI : 10.15607/rss.2008.iv.009

URL : https://doi.org/10.15607/rss.2008.iv.009

R. Laraki and W. D. Sudderth, The preservation of continuity and Lipschitz continuity by optimal reward operators, Mathematics of Operations Research, vol.29, issue.3, 2004.
DOI : 10.1287/moor.1030.0085

L. Mihaylova, T. Lefebvre, H. Bruyninckx, and J. D. Schutter, NATO Science Series on Data Fusion for Situation Monitoring, Incident Detection, Alert and Response Management, vol.198, 2006.

V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou et al., Playing atari with deep reinforcement learning, NIPS Deep Learning Workshop, 2013.

J. Pineau, G. Gordon, and S. Thrun, Point-based value iteration: An anytime algorithm for POMDPs, Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI03), 2003.

J. Pineau, G. Gordon, and S. Thrun, Anytime point-based approximations for large POMDPs, Journal of Artificial Intelligence Research, p.27, 2006.

L. K. Platzman, Finite Memory Estimation and Control of Finite Probabilistic Systems, 1977.

P. Poupart, K. Kim, and D. Kim, Closing the gap: Improved bounds on optimal POMDP solutions, Proceedings of the Twenty-First International Conference on Automated Planning and Scheduling (ICAPS-11), 2011.

E. Rachelson and M. Lagoudakis, On the locality of action domination in sequential decision making, Proc. of the International Symposium on Artificial Intelligence and Mathematics (ISAIM-10), 2010.

Y. Satsangi, S. Whiteson, and M. T. Spaan, An analysis of piecewise-linear and convex value functions for active perception POMDPs, IAS, 2015.

R. Smallwood and E. Sondik, The optimal control of partially observable Markov decision processes over a finite horizon, Operation Research, vol.21, 1973.

T. Smith, Probabilistic Planning for Robotic Exploration, 2007.

T. Smith and R. Simmons, Heuristic search value iteration for POMDPs, Proceedings of the Annual Conference on Uncertainty in Artificial Intelligence (UAI-04), 2004.

T. Smith and R. Simmons, Point-based POMDP algorithms: Improved analysis and implementation, Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence (UAI-05), 2005.

E. Sondik, The Optimal Control of Partially Observable Markov Decision Processes, 1971.

M. T. Spaan, T. S. Veiga, and P. U. Lima, Decision-theoretic planning under uncertainty with information rewards for active cooperative perception, Autonomous Agents and Multi-Agent Systems, vol.29, issue.6, 2015.

N. L. Zhang and W. Zhang, Speeding up the convergence of value iteration in partially observable Markov decision processes, Journal of Artificial Intelligence Research, vol.14, 2001.

Z. Zhang, D. Hsu, and W. S. Lee, Covering number for efficient heuristic-based pomdp planning, Proceedings of the 31st International Conference on Machine Learning (ICML-14), 2014.