Constant regret for sequence prediction with limited advice

El Mehdi Saad; Gilles Blanchard

Communication Dans Un Congrès Année : 2023

Constant regret for sequence prediction with limited advice

(1, 2) , (1, 3)

1
2
3

El Mehdi Saad

Fonction : Auteur
PersonId : 1085769

Laboratoire de Mathématiques d'Orsay

Statistique mathématique et apprentissage

Gilles Blanchard

Fonction : Auteur
PersonId : 738034
IdHAL : gilles-blanchard
ORCID : 0000-0003-2125-933X
IdRef : 190973250

Laboratoire de Mathématiques d'Orsay

Understanding the Shape of Data

Résumé

We investigate the problem of cumulative regret minimization for individual sequence prediction with respect to the best expert in a finite family of size K under limited access to information. We assume that in each round, the learner can predict using a convex combination of at most p experts for prediction, then they can observe a posteriori the losses of at most m experts. We assume that the loss function is range-bounded and exp-concave. In the standard multi-armed bandits setting, when the learner is allowed to play only one expert per round and observe only its feedback, known optimal regret bounds are of the order O(√ KT). We show that allowing the learner to play one additional expert per round and observe one additional feedback improves substantially the guarantees on regret. We provide a strategy combining only p = 2 experts per round for prediction and observing m ≥ 2 experts' losses. Its randomized regret (wrt. internal randomization of the learners' strategy) is of order O (K/m) log(Kδ −1) with probability 1 − δ, i.e., is independent of the horizon T ("constant" or "fast rate" regret) if (p ≥ 2 and m ≥ 3). We prove that this rate is optimal up to a logarithmic factor in K. In the case p = m = 2, we provide an upper bound of order O(K 2 log(Kδ −1)), with probability 1 − δ. Our strategies do not require any prior knowledge of the horizon T nor of the confidence parameter δ. Finally, we show that if the learner is constrained to observe only one expert feedback per round, the worst-case regret is the "slow rate" Ω(√ KT), suggesting that synchronous observation of at least two experts per round is necessary to have a constant regret.

Mots clés

Online Learning Prediction with expert advice Frugal Learning Bandits feedback Partial monitoring

Domaines

Statistiques [math.ST] Machine Learning [stat.ML]

Fichier principal

cst_reg_arxiv.pdf (527.57 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

El Mehdi Saad : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03797597

Soumis le : mardi 4 octobre 2022-21:15:50

Dernière modification le : vendredi 3 mai 2024-10:05:48

Dates et versions

hal-03797597 , version 1 (04-10-2022)

Identifiants

HAL Id : hal-03797597 , version 1
ARXIV : 2210.02256

Citer

El Mehdi Saad, Gilles Blanchard. Constant regret for sequence prediction with limited advice. Algorithmic Learning Theory (ALT 2023), Feb 2023, Singapore, Singapore. pp.1343-1386. ⟨hal-03797597⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA INSMI LM-ORSAY INRIA2 UNIV-PARIS-SACLAY UNIV-COTEDAZUR ANR GS-MATHEMATIQUES GS-COMPUTER-SCIENCE

86 Consultations

47 Téléchargements

Constant regret for sequence prediction with limited advice

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager