Optimistic PAC Reinforcement Learning: the Instance-Dependent View

Andrea Tirinzoni; Aymen Al-Marjani; Emilie Kaufmann

Communication Dans Un Congrès Année : 2023

Optimistic PAC Reinforcement Learning: the Instance-Dependent View

(1) , (2) , (3, 4)

1
2
3
4

Andrea Tirinzoni

Fonction : Auteur
PersonId : 1286511

Meta AI Research [Paris]

Aymen Al-Marjani

Fonction : Auteur
PersonId : 1118574

Unité de Mathématiques Pures et Appliquées

Emilie Kaufmann

Fonction : Auteur
PersonId : 10422
IdHAL : emilie-kaufmann
ORCID : 0000-0002-5496-824X
IdRef : 197040810

Centre National de la Recherche Scientifique

Scool

Résumé

Optimistic algorithms have been extensively studied for regret minimization in episodic tabular Markov Decision Processes (MDPs), both from a minimax and an instance-dependent view. However, for the PAC RL problem, where the goal is to identify a near-optimal policy with high probability, little is known about their instance-dependent sample complexity. A negative result of Wagenmaker et al. (2022) suggests that optimistic sampling rules cannot be used to attain the (still elusive) optimal instance-dependent sample complexity. On the positive side, we provide the first instance-dependent bound for an optimistic algorithm for PAC RL, BPI-UCRL, for which only minimax guarantees were available (Kaufmann et al., 2021). While our bound features some minimal visitation probabilities, it also features a refined notion of sub-optimality gap compared to the value gaps that appear in prior work. Moreover, in MDPs with deterministic transitions, we show that BPI-UCRL is actually near instance-optimal (up to a factor of the horizon). On the technical side, our analysis is very simple thanks to a new "target trick" of independent interest. We complement these findings with a novel hardness result explaining why the instance-dependent complexity of PAC RL cannot be easily related to that of regret minimization, unlike in the minimax regime.

Domaines

Autres [stat.ML]

Fichier principal

TAMK23.pdf (339.5 Ko)

Origine : Fichiers éditeurs autorisés sur une archive ouverte

Emilie Kaufmann : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04306228

Soumis le : vendredi 24 novembre 2023-19:25:46

Dernière modification le : vendredi 3 mai 2024-13:45:32

Dates et versions

hal-04306228 , version 1 (24-11-2023)

Identifiants

HAL Id : hal-04306228 , version 1
ARXIV : 2210.00974

Citer

Andrea Tirinzoni, Aymen Al-Marjani, Emilie Kaufmann. Optimistic PAC Reinforcement Learning: the Instance-Dependent View. Algorithmic Learning Theory (ALT), Feb 2023, Singapore (SG), Singapore. ⟨hal-04306228⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-LYON CNRS INRIA INSMI CRISTAL INRIA2 UNIV-LILLE UDL CRISTAL-SCOOL ANR

21 Consultations

15 Téléchargements

Optimistic PAC Reinforcement Learning: the Instance-Dependent View

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager