M. Agarwal, V. S. Borkar, and A. Karandikar, Structural properties of optimal transmission policies over a randomly varying channel, IEEE Transactions on Automatic Control, vol.53, issue.6, pp.1476-1491, 2008.

A. Avrachenkov and V. S. Borkar, Whittle index policy for crawling ephemeral content, IEEE Transactions on Control of Network Systems, vol.5, issue.1, pp.446-455, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01258647

Y. Azar, E. Horvitz, R. Lubetzky, Y. Peres, and D. Shahaf, Tractable near-optimal policies for crawling, Proc. National Academy of Sciences, vol.115, issue.32, pp.8099-8103, 2018.

D. P. Bertsekas, Dynamic Programming and Optimal Control, Athena Scientific, vol.II, 2012.

V. S. Borkar, A reinforcement learning algorithm for restless bandits, 4th Indian Control Conference, pp.89-94, 2018.

A. Goyal, F. Bonchi, and L. V. Lakshmanan, Learning influence probabilities in social networks, Proc. ACM WSDM, p.241250, 2010.

P. Jacko, Dynamic priority allocation in restless bandit models, 2010.

M. Larranaga, U. Ayesta, and I. M. Verloop, Stochastic and fluid index policies for resource allocation problems, Proceedings of IEEE Conference on Computer Communications (INFOCOM), 2015.

K. Liu and Q. Zhao, Indexability of restless bandit problems and optimality of Whittle index for dynamic multichannel access, IEEE Trans. Info. Theory, vol.56, issue.11, pp.5547-5567, 2010.

P. Marbach and J. N. Tsitsiklis, Simulation-based optimization of Markov reward processes, IEEE Transactions on Automatic Control, vol.46, issue.2, pp.191-209, 2001.

A. Massaro, F. De-pellegrini, and L. Maggi, Optimal trunk-reservation by policy learning, Proceedings of IEEE Conference on Computer Communications (INFOCOM), 2019.

T. Moon, W. Chu, L. Li, Z. Zheng, and Y. Chang, Refining recency search results with user click feedback, 2011.

J. Nino-mora and S. S. Villar, Sensor scheduling for hunting elusive hiding targets via Whittle's restless bandit index policy, 5th International Conference on Network Games, Control and Optimization (NetGCooP), 2011.

J. Nino-mora, A dynamic page-refresh index policy for web crawlers, Proceedings of International Conference on Analytical and Stochastic Modeling Techniques and Applications (ASMTA 2014), pp.44-60, 2014.

J. L. Ny, M. Dahleh, and E. Feron, Multi-UAV dynamic routing with partial observations using restless bandit allocation indices, Proceedings of American Control Conference, pp.4220-4225, 2008.

A. Roy, V. Borkar, P. Chaporkar, and A. Karandikar, A structure-aware online learning algorithm for Markov decision processes, Proceeding of VALUE-TOOLS 2019 : The 12th EAI International Conference on Performance Evaluation Methodologies and Tools, pp.71-78, 2019.

V. Raghunathan, V. S. Borkar, M. Cao, and P. R. Kumar, Index policies for real-time multicast scheduling for wireless bradcast systems, Proceedings of the IEEE Conference on Computer Communications (INFOCOM), 2008.

D. Ruiz-hernandez, Indexable Restless Bandits, 2008.

A. Roy, V. Borkar, P. Chaporkar, and A. Karandikar, Low complexity online radio access technology selection algorithm in LTE-WiFi HetNet, IEEE Trans. on Mobile Computing, 2019.

D. Lefortier, L. Ostroumova, E. Samosvat, and P. Serdyukov, Timely crawling of high-quality ephemeral new content, Proceedings of ACM Conference on Information and Knpwledge Management (CIKM), p.745750, 2013.

R. Sutton and A. Barto, Reinforcement Learning: An Introduction, 2018.

P. Whittle, Restless bandits: activity allocation in a changing world, A Celebration of Applied Probability, vol.25, pp.287-298, 1988.

H. Yu and D. P. Bertsekas, Convergence results for some temporal difference methods based on least squares, IEEE Transactions on Automatic Control, vol.54, issue.7, pp.1515-1531, 2009.