Service interruption on Monday 11 July from 12:30 to 13:00: all the sites of the CCSD (HAL, Epiciences, SciencesConf, AureHAL) will be inaccessible (network hardware connection).
Skip to Main content Skip to Navigation
Conference papers

Weighted Linear Bandits for Non-Stationary Environments

yoan Russac 1, 2 Claire Vernade 3 Olivier Cappé 1, 2 
2 VALDA - Value from Data
DI-ENS - Département d'informatique - ENS Paris, Inria de Paris
Abstract : We consider a stochastic linear bandit model in which the available actions correspond to arbitrary context vectors whose associated rewards follow a non-stationary linear regression model. In this setting, the unknown regression parameter is allowed to vary in time. To address this problem, we propose D-LinUCB, a novel optimistic algorithm based on discounted linear regression, where exponential weights are used to smoothly forget the past. This involves studying the deviations of the sequential weighted least-squares estimator under generic assumptions. As a by-product, we obtain novel deviation results that can be used beyond non-stationary environments. We provide theoretical guarantees on the behavior of D-LinUCB in both slowly-varying and abruptly-changing environments. We obtain an upper bound on the dynamic regret that is of order d^{2/3} B_T^{1/3}T^{2/3}, where B_T is a measure of non-stationarity (d and T being, respectively, dimension and horizon). This rate is known to be optimal. We also illustrate the empirical performance of D-LinUCB and compare it with recently proposed alternatives in simulated environments.
Complete list of metadata

Cited literature [30 references]  Display  Hide  Download
Contributor : Olivier Cappé Connect in order to contact the contributor
Submitted on : Thursday, March 19, 2020 - 9:43:35 PM
Last modification on : Wednesday, June 8, 2022 - 12:50:06 PM
Long-term archiving on: : Saturday, June 20, 2020 - 4:45:36 PM


Files produced by the author(s)


  • HAL Id : hal-02291460, version 2
  • ARXIV : 1909.09146



yoan Russac, Claire Vernade, Olivier Cappé. Weighted Linear Bandits for Non-Stationary Environments. NeurIPS 2019 - 33rd Conference on Neural Information Processing Systems, Dec 2019, Vancouver, Canada. ⟨hal-02291460v2⟩



Record views


Files downloads