Best of both worlds: Stochastic & adversarial best-arm identification

Abstract : We study bandit best-arm identification with arbitrary and potentially adversarial rewards. A simple random uniform learner obtains the optimal rate of error in the adversarial scenario. However, this type of strategy is suboptimal when the rewards are sampled stochastically. Therefore, we ask: Can we design a learner that performs optimally in both the stochastic and adversarial problems while not being aware of the nature of the rewards? First, we show that designing such a learner is impossible in general. In particular, to be robust to adversarial rewards, we can only guarantee optimal rates of error on a subset of the stochastic problems. We give a lower bound that characterizes the optimal rate in stochastic problems if the strategy is constrained to be robust to adversarial rewards. Finally, we design a simple parameter-free algorithm and show that its probability of error matches (up to log factors) the lower bound in stochastic problems, and it is also robust to adversarial ones.
Document type :
Conference papers
Complete list of metadatas

https://hal.inria.fr/hal-01808948
Contributor : Victor Gabillon <>
Submitted on : Thursday, July 12, 2018 - 6:39:58 AM
Last modification on : Friday, June 28, 2019 - 3:01:15 PM
Long-term archiving on : Tuesday, October 16, 2018 - 12:41:23 AM

File

BOB-bai.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01808948, version 2

Citation

Yasin Abbasi-Yadkori, Peter Bartlett, Victor Gabillon, Alan Malek, Michal Valko. Best of both worlds: Stochastic & adversarial best-arm identification. COLT 2018 - Conference on Learning Theory, Jul 2018, Stockholm, Sweden. pp.1 - 32. ⟨hal-01808948v2⟩

Share

Metrics

Record views

66

Files downloads

15