Skip to Main content Skip to Navigation
New interface
Conference papers

Secure Cumulative Reward Maximization in Linear Stochastic Bandits

Abstract : The linear stochastic multi-armed bandit is a sequential learning setting, where, at each round, a learner chooses an arm and receives a stochastic reward based on an unknown linear function of the chosen arm. The goal is to collect as much reward as possible. Linear bandits have popular applications such as online recommendation based on user preferences, where obtaining a high reward means recommending an item with high expected rating. We address the security concerns that occur when outsourcing the data and the cumulative reward maximization algorithm to an honest-but-curious cloud. We propose LinUCB-DS, a distributed and secure protocol that achieves the same cumulative reward as the standard LinUCB algorithm, without disclosing to the cloud the linear function used to draw arm rewards. We formally prove the complexity and security properties of LinUCB-DS. We also show that LinUCB-DS can be easily adapted to secure the SpectralUCB algorithm, which improves LinUCB for a class of linear bandits. We show the feasibility of our protocols via a proof-of-concept experimental study using the MovieLens movie recommendation dataset.
Complete list of metadata
Contributor : Radu Ciucanu Connect in order to contact the contributor
Submitted on : Friday, September 18, 2020 - 10:46:44 AM
Last modification on : Wednesday, March 16, 2022 - 3:54:06 AM



Radu Ciucanu, Anatole Delabrouille, Pascal Lafourcade, Marta Soare. Secure Cumulative Reward Maximization in Linear Stochastic Bandits. International Conference on Provable and Practical Security (ProvSec), Nov 2020, Conférence online, Singapore. pp.257-277, ⟨10.1007/978-3-030-62576-4_13⟩. ⟨hal-02942694⟩



Record views