Structural properties of optimal transmission policies over a randomly varying channel, IEEE Transactions on Automatic Control, vol.53, issue.6, pp.1476-1491, 2008. ,
Whittle index policy for crawling ephemeral content, IEEE Transactions on Control of Network Systems, vol.5, issue.1, pp.446-455, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01258647
Tractable near-optimal policies for crawling, Proc. National Academy of Sciences, vol.115, issue.32, pp.8099-8103, 2018. ,
Dynamic Programming and Optimal Control, Athena Scientific, vol.II, 2012. ,
A reinforcement learning algorithm for restless bandits, 4th Indian Control Conference, pp.89-94, 2018. ,
Learning influence probabilities in social networks, Proc. ACM WSDM, p.241250, 2010. ,
Dynamic priority allocation in restless bandit models, 2010. ,
Stochastic and fluid index policies for resource allocation problems, Proceedings of IEEE Conference on Computer Communications (INFOCOM), 2015. ,
Indexability of restless bandit problems and optimality of Whittle index for dynamic multichannel access, IEEE Trans. Info. Theory, vol.56, issue.11, pp.5547-5567, 2010. ,
Simulation-based optimization of Markov reward processes, IEEE Transactions on Automatic Control, vol.46, issue.2, pp.191-209, 2001. ,
Optimal trunk-reservation by policy learning, Proceedings of IEEE Conference on Computer Communications (INFOCOM), 2019. ,
Refining recency search results with user click feedback, 2011. ,
Sensor scheduling for hunting elusive hiding targets via Whittle's restless bandit index policy, 5th International Conference on Network Games, Control and Optimization (NetGCooP), 2011. ,
A dynamic page-refresh index policy for web crawlers, Proceedings of International Conference on Analytical and Stochastic Modeling Techniques and Applications (ASMTA 2014), pp.44-60, 2014. ,
Multi-UAV dynamic routing with partial observations using restless bandit allocation indices, Proceedings of American Control Conference, pp.4220-4225, 2008. ,
A structure-aware online learning algorithm for Markov decision processes, Proceeding of VALUE-TOOLS 2019 : The 12th EAI International Conference on Performance Evaluation Methodologies and Tools, pp.71-78, 2019. ,
Index policies for real-time multicast scheduling for wireless bradcast systems, Proceedings of the IEEE Conference on Computer Communications (INFOCOM), 2008. ,
Indexable Restless Bandits, 2008. ,
Low complexity online radio access technology selection algorithm in LTE-WiFi HetNet, IEEE Trans. on Mobile Computing, 2019. ,
Timely crawling of high-quality ephemeral new content, Proceedings of ACM Conference on Information and Knpwledge Management (CIKM), p.745750, 2013. ,
, Reinforcement Learning: An Introduction, 2018.
Restless bandits: activity allocation in a changing world, A Celebration of Applied Probability, vol.25, pp.287-298, 1988. ,
Convergence results for some temporal difference methods based on least squares, IEEE Transactions on Automatic Control, vol.54, issue.7, pp.1515-1531, 2009. ,