The formation of habits: a computational model mixing reinforcement and Hebbian learning

If basal ganglia are widely accepted to participate in the high-level cognitive function of decision-making, their role is less clear regarding the formation of habits. One of the biggest problem is to understand how goal-directed actions are transformed into habitual responses, or, said differently, how an animal can shift from an action-outcome (A-O) system to a stimulus-response (S-R) one while keeping a consistent behaviour. We introduce a computational model (basal ganglia, thalamus and cortex) that can solve a simple two arm-bandit task using reinforcement learning and explicit valuation of the outcome (Guthrie et al. (2013)). Hebbian learning has been added at the cortical level such that the model learns each time a move is issued, rewarded or not. Then, by inhibiting the output nuclei of the model (GPi), we show how learning has been transferred from the basal ganglia to the cortex, simply as a consequence of the statistics of the choice. Because best (in the sense of most rewarded) actions are chosen more often, this directly impacts the amount of Hebbian learning and lead to the formation of habits within the cortex. These results have been confirmed in monkeys (unpublished data at the time of writing) doing the same tasks where the BG has been inactivated using muscimol. This tends to show that the basal ganglia implicitely teach the cortex in order for it to learn the values of new options. In the end, the cortex is able to solve the task perfectly, even if it exhibits slower reaction times.

Mots clés

basal ganglia decision making habits Hebbian learning reinforcement learning

Domaines

Informatique [cs] Neurosciences [q-bio.NC]

Fichier principal

Topalidou_RDLM2015 (1).pdf (568.84 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Meropi Topalidou : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01252744

Soumis le : vendredi 8 janvier 2016-10:22:48

Dernière modification le : vendredi 24 mars 2023-14:53:01

Archivage à long terme le : jeudi 10 novembre 2016-21:40:11

Dates et versions

hal-01252744 , version 1 (08-01-2016)

Identifiants

HAL Id : hal-01252744 , version 1

Citer

Meropi Topalidou, Daisuke Kase, Thomas Boraud, Nicolas Rougier. The formation of habits: a computational model mixing reinforcement and Hebbian learning. The Multi-disciplinary Conference on Reinforcement Learning and Decision Making (RLDM 2015), Jun 2015, Edmonton, Canada. ⟨hal-01252744⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSERM CNRS INRIA INRIA2

286 Consultations

127 Téléchargements