Skip to Main content Skip to Navigation
Theses

Bandits Manchots sur Flux de Données Non Stationnaires

Abstract : The multi-armed bandit is a framework allowing the study of the trade-off between exploration and exploitation under partial feedback. At each turn t Є [1,T] of the game, a player has to choose an arm kt in a set of K and receives a reward ykt drawn from a reward distribution D(µkt) of mean µkt and support [0,1]. This is a challeging problem as the player only knows the reward associated with the played arm and does not know what would be the reward if she had played another arm. Before each play, she is confronted to the dilemma between exploration and exploitation; exploring allows to increase the confidence of the reward estimators and exploiting allows to increase the cumulative reward by playing the empirical best arm (under the assumption that the empirical best arm is indeed the actual best arm).In the first part of the thesis, we will tackle the multi-armed bandit problem when reward distributions are non-stationary. Firstly, we will study the case where, even if reward distributions change during the game, the best arm stays the same. Secondly, we will study the case where the best arm changes during the game. The second part of the thesis tacles the contextual bandit problem where means of reward distributions are now dependent of the environment's current state. We will study the use of neural networks and random forests in the case of contextual bandits. We will then propose meta-bandit based approach for selecting online the most performant expert during its learning.
Complete list of metadata

https://hal.inria.fr/tel-01420663
Contributor : Abes Star :  Contact
Submitted on : Wednesday, May 17, 2017 - 6:45:08 PM
Last modification on : Thursday, July 8, 2021 - 3:46:18 AM
Long-term archiving on: : Monday, August 21, 2017 - 12:31:09 AM

File

75550_ALLESIARDO_2016_diffusio...
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01420663, version 3

Citation

Robin Allesiardo. Bandits Manchots sur Flux de Données Non Stationnaires. Intelligence artificielle [cs.AI]. Université Paris Saclay (COmUE), 2016. Français. ⟨NNT : 2016SACLS334⟩. ⟨tel-01420663v3⟩

Share

Metrics

Record views

408

Files downloads

1114