Optimal Regret Bounds for Selecting the State Representation in Reinforcement Learning

Odalric-Ambrym Maillard; Phuong Nguyen; Ronald Ortner; Daniil Ryabko

Communication Dans Un Congrès Année : 2013

Optimal Regret Bounds for Selecting the State Representation in Reinforcement Learning

(1) , (2) , (1) , (3)

1
2
3

Odalric-Ambrym Maillard

Fonction : Auteur
PersonId : 5563
IdHAL : odalric-ambrym-maillard
ORCID : 0000-0001-7935-7026
IdRef : 158055594

Montanuniversität Leoben

Phuong Nguyen

Fonction : Auteur

National ICT Australia [Sydney]

Ronald Ortner

Fonction : Auteur

Montanuniversität Leoben

Daniil Ryabko

Fonction : Auteur
PersonId : 848126

Sequential Learning

Résumé

We consider an agent interacting with an environment in a single stream of actions, observations, and rewards, with no reset. This process is not assumed to be a Markov Decision Process (MDP). Rather, the agent has several representations (mapping histories of past interactions to a discrete state space) of the environment with unknown dynamics, only some of which result in an MDP. The goal is to minimize the average regret criterion against an agent who knows an MDP representation giving the highest optimal reward, and acts optimally in it. Recent regret bounds for this setting are of order $O(T^{2/3})$ with an additive term constant yet exponential in some characteristics of the optimal MDP. We propose an algorithm whose regret after $T$ time steps is $O(\sqrt{T})$, with all constants reasonably small. This is optimal in $T$ since $O(\sqrt{T})$ is the optimal regret in the setting of learning in a (single discrete) MDP.

Domaines

Intelligence artificielle [cs.AI]

Fichier principal

icml1_iblb_cr-corrected.pdf (179.28 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Daniil Ryabko : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00778586

Soumis le : mercredi 20 mars 2013-10:52:14

Dernière modification le : jeudi 15 février 2024-03:31:45

Archivage à long terme le : vendredi 21 juin 2013-04:12:20

Dates et versions

hal-00778586 , version 1 (20-03-2013)

Identifiants

HAL Id : hal-00778586 , version 1

Citer

Odalric-Ambrym Maillard, Phuong Nguyen, Ronald Ortner, Daniil Ryabko. Optimal Regret Bounds for Selecting the State Representation in Reinforcement Learning. ICML - 30th International Conference on Machine Learning, 2013, Atlanta, USA, United States. pp.543-551. ⟨hal-00778586⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 UNIV-LILLE3 CNRS INRIA IRISA LAGIS CRISTAL INRIA2 CRISTAL-SEQUEL UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

202 Consultations

129 Téléchargements

Optimal Regret Bounds for Selecting the State Representation in Reinforcement Learning

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager