Optimistic PAC Reinforcement Learning: the Instance-Dependent View - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2023

Optimistic PAC Reinforcement Learning: the Instance-Dependent View

Andrea Tirinzoni
  • Fonction : Auteur
  • PersonId : 1286511

Résumé

Optimistic algorithms have been extensively studied for regret minimization in episodic tabular Markov Decision Processes (MDPs), both from a minimax and an instance-dependent view. However, for the PAC RL problem, where the goal is to identify a near-optimal policy with high probability, little is known about their instance-dependent sample complexity. A negative result of Wagenmaker et al. (2022) suggests that optimistic sampling rules cannot be used to attain the (still elusive) optimal instance-dependent sample complexity. On the positive side, we provide the first instance-dependent bound for an optimistic algorithm for PAC RL, BPI-UCRL, for which only minimax guarantees were available (Kaufmann et al., 2021). While our bound features some minimal visitation probabilities, it also features a refined notion of sub-optimality gap compared to the value gaps that appear in prior work. Moreover, in MDPs with deterministic transitions, we show that BPI-UCRL is actually near instance-optimal (up to a factor of the horizon). On the technical side, our analysis is very simple thanks to a new "target trick" of independent interest. We complement these findings with a novel hardness result explaining why the instance-dependent complexity of PAC RL cannot be easily related to that of regret minimization, unlike in the minimax regime.

Domaines

Autres [stat.ML]
Fichier principal
Vignette du fichier
TAMK23.pdf (339.5 Ko) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte

Dates et versions

hal-04306228 , version 1 (24-11-2023)

Identifiants

Citer

Andrea Tirinzoni, Aymen Al-Marjani, Emilie Kaufmann. Optimistic PAC Reinforcement Learning: the Instance-Dependent View. Algorithmic Learning Theory (ALT), Feb 2023, Singapore (SG), Singapore. ⟨hal-04306228⟩
21 Consultations
15 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More