Finding the bandit in a graph: Sequential search-and-stop

Pierre Perrault 1, 2 Vianney Perchet 2 Michal Valko 1, 3
1 SEQUEL - Sequential Learning
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
Abstract : We consider the problem where an agent wants to find a hidden object that is randomly located in some vertex of a directed acyclic graph (DAG) according to a fixed but possibly unknown distribution. The agent can only examine vertices whose in-neighbors have already been examined. In this paper, we address a learning setting where we allow the agent to stop before having found the object and restart searching on a new independent instance of the same problem. Our goal is to maximize the total number of hidden objects found given a time budget. The agent can thus skip an instance after realizing that it would spend too much time on it. Our contributions are both to the search theory and multi-armed bandits. If the distribution is known, we provide a quasi-optimal and efficient stationary strategy. If the distribution is unknown, we additionally show how to sequentially approximate it and, at the same time, act near-optimally in order to collect as many hidden objects as possible.
Document type :
Conference papers
Complete list of metadatas

https://hal.inria.fr/hal-02387465
Contributor : Michal Valko <>
Submitted on : Friday, November 29, 2019 - 5:58:10 PM
Last modification on : Friday, January 24, 2020 - 2:34:28 PM

File

perrault2019finding.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02387465, version 1

Citation

Pierre Perrault, Vianney Perchet, Michal Valko. Finding the bandit in a graph: Sequential search-and-stop. International Conference on Artificial Intelligence and Statistics, 2019, Okinawa, Japan. ⟨hal-02387465⟩

Share

Metrics

Record views

61

Files downloads

140