Fighting Boredom in Recommender Systems with Linear Reinforcement Learning

Romain Warlop; Alessandro Lazaric; Jérémie Mary

doi:10.5555/3326943.3327105

Communication Dans Un Congrès Année : 2018

Fighting Boredom in Recommender Systems with Linear Reinforcement Learning

(1) , (2, 1) , (3, 1)

1
2
3

Romain Warlop

Fonction : Auteur
PersonId : 15817
IdHAL : romain-warlop
ORCID : 0000-0002-4432-9591
IdRef : 234172223

Sequential Learning

Alessandro Lazaric

Fonction : Auteur
PersonId : 851
IdHAL : alessandro-lazaric
ORCID : 0000-0002-8970-413X
IdRef : 188701486

Facebook

Sequential Learning

Jérémie Mary

Fonction : Auteur
PersonId : 15817
IdHAL : romain-warlop
ORCID : 0000-0002-4432-9591
IdRef : 234172223

Criteo [Paris]

Sequential Learning

Résumé

A common assumption in recommender systems (RS) is the existence of a best fixed recommendation strategy. Such strategy may be simple and work at the item level (e.g., in multi-armed bandit it is assumed one best fixed arm/item exists) or implement more sophisticated RS (e.g., the objective of A/B testing is to find the best fixed RS and execute it thereafter). We argue that this assumption is rarely verified in practice, as the recommendation process itself may impact the user's preferences. For instance, a user may get bored by a strategy, while she may gain interest again, if enough time passed since the last time that strategy was used. In this case, a better approach consists in alternating different solutions at the right frequency to fully exploit their potential. In this paper, we first cast the problem as a Markov decision process, where the rewards are a linear function of the recent history of actions, and we show that a policy considering the long-term influence of the recommendations may outperform both fixed-action and contextual greedy policies. We then introduce an extension of the UCRL algorithm (LINUCRL) to effectively balance exploration and exploitation in an unknown environment, and we derive a regret bound that is independent of the number of states. Finally, we empirically validate the model assumptions and the algorithm in a number of realistic scenarios.

Domaines

Intelligence artificielle [cs.AI]

Fichier principal

WARLOP-NIPS18.pdf (453.58 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Romain WARLOP : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01915468

Soumis le : mercredi 7 novembre 2018-16:11:55

Dernière modification le : mercredi 24 janvier 2024-09:54:23

Archivage à long terme le : vendredi 8 février 2019-15:34:28

Dates et versions

hal-01915468 , version 1 (07-11-2018)

Identifiants

HAL Id : hal-01915468 , version 1
DOI : 10.5555/3326943.3327105

Citer

Romain Warlop, Alessandro Lazaric, Jérémie Mary. Fighting Boredom in Recommender Systems with Linear Reinforcement Learning. Neural Information Processing Systems, Dec 2018, Montreal, Canada. ⟨10.5555/3326943.3327105⟩. ⟨hal-01915468⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA CRISTAL INRIA2 CRISTAL-SEQUEL UNIV-LILLE

357 Consultations

286 Téléchargements

Fighting Boredom in Recommender Systems with Linear Reinforcement Learning

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager