HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information

Rotting bandits are not harder than stochastic ones

1 SEQUEL - Sequential Learning
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189
Abstract : In stochastic multi-armed bandits, the reward distribution of each arm is assumed to be stationary. This assumption is often violated in practice (e.g., in recommendation systems), where the reward of an arm may change whenever is selected, i.e., rested bandit setting. In this paper, we consider the non-parametric rotting bandit setting, where rewards can only decrease. We introduce the filtering on expanding window average (FEWA) algorithm that constructs moving averages of increasing windows to identify arms that are more likely to return high rewards when pulled once more. We prove that for an unknown horizon $T$, and without any knowledge on the decreasing behavior of the $K$ arms, FEWA achieves problem-dependent regret bound of $\widetilde{\mathcal{O}}(\log{(KT)}),$ and a problem-independent one of $\widetilde{\mathcal{O}}(\sqrt{KT})$. Our result substantially improves over the algorithm of Levine et al. (2017), which suffers regret $\widetilde{\mathcal{O}}(K^{1/3}T^{2/3})$. FEWA also matches known bounds for the stochastic bandit setting, thus showing that the rotting bandits are not harder. Finally, we report simulations confirming the theoretical improvements of FEWA.
Document type :
Conference papers

Cited literature [3 references]

https://hal.inria.fr/hal-01936894
Contributor : Michal Valko Connect in order to contact the contributor
Submitted on : Saturday, May 9, 2020 - 8:48:20 PM
Last modification on : Thursday, March 24, 2022 - 3:43:08 AM

File

seznec2019rotting.pdf
Files produced by the author(s)

Identifiers

• HAL Id : hal-01936894, version 2

Citation

Julien Seznec, Andrea Locatelli, Alexandra Carpentier, Alessandro Lazaric, Michal Valko. Rotting bandits are not harder than stochastic ones. International Conference on Artificial Intelligence and Statistics, 2019, Naha, Japan. ⟨hal-01936894v2⟩

Record views