Nested bandits

Matthieu Martin; Panayotis Mertikopoulos; Thibaud Rahier; Houssam Zenati

Communication Dans Un Congrès Année : 2022

Nested bandits

(1) , (2, 1) , (1) , (1, 3)

1
2
3

Matthieu Martin

Fonction : Auteur
PersonId : 182520
IdHAL : matthieu-martin2
ORCID : 0000-0001-5954-8087

Criteo AI Lab

Panayotis Mertikopoulos

Fonction : Auteur
PersonId : 1933
IdHAL : mertikop
ORCID : 0000-0003-2026-9616
IdRef : 253119758

Performance analysis and optimization of LARge Infrastructures and Systems

Criteo AI Lab

Thibaud Rahier

Fonction : Auteur
PersonId : 1084694

Criteo AI Lab

Houssam Zenati

Fonction : Auteur
PersonId : 1074044

Criteo AI Lab

Apprentissage de modèles à partir de données massives

Résumé

In many online decision processes, the optimizing agent is called to choose between large numbers of alternatives with many inherent similarities; in turn, these similarities imply closely correlated losses that may confound standard discrete choice models and bandit algorithms. We study this question in the context of nested bandits, a class of adversarial multi-armed bandit problems where the learner seeks to minimize their regret in the presence of a large number of distinct alternatives with a hierarchy of embedded (non-combinatorial) similarities. In this setting, optimal algorithms based on the exponential weights blueprint (like Hedge, EXP3, and their variants) may incur significant regret because they tend to spend excessive amounts of time exploring irrelevant alternatives with similar, suboptimal costs. To account for this, we propose a nested exponential weights (NEW) algorithm that performs a layered exploration of the learner's set of alternatives based on a nested, step-by-step selection method. In so doing, we obtain a series of tight bounds for the learner's regret showing that online learning problems with a high degree of similarity between alternatives can be resolved efficiently, without a red bus / blue bus paradox occurring.

Domaines

Optimisation et contrôle [math.OC]

Fichier principal

NestedBandits-ICML.pdf (970.04 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Panayotis Mertikopoulos : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03874048

Soumis le : dimanche 27 novembre 2022-17:52:40

Dernière modification le : vendredi 5 avril 2024-03:09:56

Archivage à long terme le : mardi 28 février 2023-23:46:40

Dates et versions

hal-03874048 , version 1 (27-11-2022)

Identifiants

HAL Id : hal-03874048 , version 1
ARXIV : 2206.09348

Citer

Matthieu Martin, Panayotis Mertikopoulos, Thibaud Rahier, Houssam Zenati. Nested bandits. ICML 2022 - 39th International Conference on Machine Learning, Jul 2022, Baltimore, United States. ⟨hal-03874048⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS INRIA LIG LJK LJK_GI LIG_SRCPR PERSYVAL-LAB INRIA2 TDS-MACS LJK-GI-THOTH LIG-SRCPR-POLARIS MIAI ANR LIG_SIDCH

98 Consultations

28 Téléchargements

Nested bandits

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager