Reinforcement Learning with History-Dependent Dynamic Contexts

Guy Tennenholtz; Nadav Merlis; Lior Shani; Martin Mladenov; Craig Boutilier

Communication Dans Un Congrès Année : 2023

Reinforcement Learning with History-Dependent Dynamic Contexts

(1) , (2) , (1) , (1) , (1)

1
2

Guy Tennenholtz

Fonction : Auteur

Google Research

Nadav Merlis

Fonction : Auteur

IA coopérative : équité, vie privée, incitations

Lior Shani

Fonction : Auteur

Google Research

Martin Mladenov

Fonction : Auteur

Google Research

Craig Boutilier

Fonction : Auteur

Google Research

Résumé

We introduce Dynamic Contextual Markov Decision Processes (DCMDPs), a novel reinforcement learning framework for history-dependent environments that generalizes the contextual MDP framework to handle non-Markov environments, where contexts change over time. We consider special cases of the model, with a focus on logistic DCMDPs, which break the exponential dependence on history length by leveraging aggregation functions to determine context transitions. This special structure allows us to derive an upper-confidence-bound style algorithm for which we establish regret bounds. Motivated by our theoretical results, we introduce a practical model-based algorithm for logistic DCMDPs that plans in a latent space and uses optimism over history-dependent features. We demonstrate the efficacy of our approach on a recommendation task (using MovieLens data) where user behavior dynamics evolve in response to recommendations.

Domaines

Informatique [cs]

Fichier principal

2302.02061.pdf (1.24 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Nadav Merlis : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04420115

Soumis le : vendredi 26 janvier 2024-15:56:09

Dernière modification le : lundi 13 mai 2024-08:48:02

Dates et versions

hal-04420115 , version 1 (26-01-2024)

Identifiants

HAL Id : hal-04420115 , version 1
ARXIV : 2302.02061

Citer

Guy Tennenholtz, Nadav Merlis, Lior Shani, Martin Mladenov, Craig Boutilier. Reinforcement Learning with History-Dependent Dynamic Contexts. ICML, 2023, Honolulu, United States. ⟨hal-04420115⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

X GENES CNRS INRIA ENSAE CREST ENSAI INRIA2 X-CREST IP_PARIS GS-COMPUTER-SCIENCE

27 Consultations

12 Téléchargements

Reinforcement Learning with History-Dependent Dynamic Contexts

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager