Multi-Armed Bandit Learning in IoT Networks: Learning helps even in non-stationary settings

Abstract : Setting up the future Internet of Things (IoT) networks will require to support more and more communicating devices. We prove that intelligent devices in unlicensed bands can use Multi-Armed Bandit (MAB) learning algorithms to improve resource exploitation. We evaluate the performance of two classical MAB learning algorithms, UCB1 and Thompson Sampling, to handle the decentralized decision-making of Spectrum Access, applied to IoT networks; as well as learning performance with a growing number of intelligent end-devices. We show that using learning algorithms does help to fit more devices in such networks, even when all end-devices are intelligent and are dynamically changing channel. In the studied scenario, stochastic MAB learning provides a up to 16% gain in term of successful transmission probabilities, and has near optimal performance even in non-stationary and non-i.i.d. settings with a majority of intelligent devices.
Complete list of metadatas
Contributor : Lilian Besson <>
Submitted on : Monday, July 2, 2018 - 7:15:03 AM
Last modification on : Monday, June 3, 2019 - 11:16:57 AM


Files produced by the author(s)


Distributed under a Creative Commons Attribution - NonCommercial - ShareAlike 4.0 International License



Rémi Bonnefoi, Lilian Besson, Christophe Moy, Emilie Kaufmann, Jacques Palicot. Multi-Armed Bandit Learning in IoT Networks: Learning helps even in non-stationary settings. CROWNCOM 2017 - 12th EAI International Conference on Cognitive Radio Oriented Wireless Networks, Sep 2017, Lisbon, Portugal. pp.173-185, ⟨10.1007/978-3-319-76207-4_15⟩. ⟨hal-01575419v2⟩



Record views


Files downloads