Skip to Main content Skip to Navigation
Conference papers

Episodic reinforcement learning in finite MDPs: Minimax lower bounds revisited

Omar Darwiche Domingues 1 Pierre Ménard 2 Emilie Kaufmann 1 Michal Valko 3 
1 Scool - Scool
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189
Abstract : In this paper, we propose new problem-independent lower bounds on the sample complexity and regret in episodic MDPs, with a particular focus on the non-stationary case in which the transition kernel is allowed to change in each stage of the episode. Our main contribution is a lower bound of Ω((H 3 SA/ε 2) log(1/δ)) on the sample complexity of an (ε, δ)-PAC algorithm for best policy identification in a non-stationary MDP, relying on a construction of "hard MDPs" which is different from the ones previously used in the literature. Using this same class of MDPs, we also provide a rigorous proof of the Ω(√ H 3 SAT) regret bound for non-stationary MDPs. Finally, we discuss connections to PAC-MDP lower bounds.
Document type :
Conference papers
Complete list of metadata
Contributor : Michal Valko Connect in order to contact the contributor
Submitted on : Friday, July 16, 2021 - 4:05:28 PM
Last modification on : Tuesday, June 14, 2022 - 11:58:48 AM
Long-term archiving on: : Sunday, October 17, 2021 - 7:05:39 PM


Files produced by the author(s)


  • HAL Id : hal-03289004, version 1



Omar Darwiche Domingues, Pierre Ménard, Emilie Kaufmann, Michal Valko. Episodic reinforcement learning in finite MDPs: Minimax lower bounds revisited. Algorithmic Learning Theory, Mar 2021, Paris / Virtual, France. ⟨hal-03289004⟩



Record views


Files downloads