Linear Bandits on Uniformly Convex Sets

Thomas Kerdreux; Christophe Roux; Alexandre d'Aspremont; Sebastian Pokutta

Pré-Publication, Document De Travail Année : 2021

Linear Bandits on Uniformly Convex Sets

(1) , (2) , (3, 4, 5, 6) , (2)

1
2
3
4
5
6

Thomas Kerdreux

Fonction : Auteur

Département d'informatique - ENS Paris

Christophe Roux

Fonction : Auteur

Zuse Institute Berlin

Alexandre d'Aspremont

Fonction : Auteur
PersonId : 10163
IdHAL : aspremon
ORCID : 0000-0003-3851-216X
IdRef : 157968219

Laboratoire d'informatique de l'école normale supérieure

Centre National de la Recherche Scientifique

Statistical Machine Learning and Parsimony

Université Paris sciences et lettres

Sebastian Pokutta

Fonction : Auteur

Zuse Institute Berlin

Résumé

Linear bandit algorithms yield $\tilde{\mathcal{O}}(n\sqrt{T})$ pseudo-regret bounds on compact convex action sets $\mathcal{K}\subset\mathbb{R}^n$ and two types of structural assumptions lead to better pseudo-regret bounds. When $\mathcal{K}$ is the simplex or an $\ell_p$ ball with $p\in]1,2]$, there exist bandits algorithms with $\tilde{\mathcal{O}}(\sqrt{nT})$ pseudo-regret bounds. Here, we derive bandit algorithms for some strongly convex sets beyond $\ell_p$ balls that enjoy pseudo-regret bounds of $\tilde{\mathcal{O}}(\sqrt{nT})$, which answers an open question from [BCB12, \S 5.5.]. Interestingly, when the action set is uniformly convex but not necessarily strongly convex, we obtain pseudo-regret bounds with a dimension dependency smaller than $\mathcal{O}(\sqrt{n})$. However, this comes at the expense of asymptotic rates in $T$ varying between $\tilde{\mathcal{O}}(\sqrt{T})$ and $\tilde{\mathcal{O}}(T)$.

Domaines

Machine Learning [stat.ML] Recherche opérationnelle [math.OC]

Alexandre d'Aspremont : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03379835

Soumis le : vendredi 15 octobre 2021-11:15:35

Dernière modification le : lundi 11 décembre 2023-11:30:53

Dates et versions

hal-03379835 , version 1 (15-10-2021)

Identifiants

HAL Id : hal-03379835 , version 1
ARXIV : 2103.05907

Citer

Thomas Kerdreux, Christophe Roux, Alexandre d'Aspremont, Sebastian Pokutta. Linear Bandits on Uniformly Convex Sets. 2021. ⟨hal-03379835⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-PARIS CNRS INRIA INRIA-CHILE INRIA2 PSL ANR PRAIRIE-IA

81 Consultations

0 Téléchargements

Linear Bandits on Uniformly Convex Sets

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager