Skip to Main content Skip to Navigation
Theses

A closed loop framework of decision-making and learning in primate prefrontal circuits using Computational Modeling and Virtual Experimentation

Bhargav Teja Nallapu 1
1 Mnemosyne - Mnemonic Synergy
LaBRI - Laboratoire Bordelais de Recherche en Informatique, Inria Bordeaux - Sud-Ouest, IMN - Institut des Maladies Neurodégénératives [Bordeaux]
Abstract : This thesis attempts to build a computational systems-level framework that would help to develop an understanding of the organization of the prefrontal cortex (PFC) and the basal ganglia (BG) systems and their functional interactions in the process of decision-making and goal-directed behaviour in humans. A videogame environment with an aritficial agent, Minecraft is used to design experiments to test the framework in an environment that could be more complex and realistic, if necessary. Malmo, a platform developed by Microsoft, allows to communicate with the videogame Minecraft to design the scenarios in the environment and control the behavior of the agent. The framework, along with virtual experimentation forms a closed-loop architecture for studying the high-level animal behavior. It is pointed out that the generic principles behind the flexible animal behaviors also give insights into developing artificial intelligence (A.I) that is more general and autonomous in the nature of learning, in addition to the current A.I systems that are specialized in a particular task. Behavior, of a human or an animal, is a pattern of responses to a certain stimulus (physical or abstract). A response is essentially a choice among several possible options or simply a choice between whether or not to make a choice from the available options. The neural correlates of decision-making in humans is an extensively sought after question across multiple fields ranging from behavioural psychology, economics to neuroscience and artificial intelligence (AI). Especially in the field of neuroeconomics and AI, there is a huge pursuit to understand the underpinnings of decision-making in brain. With rapidly growing interest in understanding the neural substrates of decision-making, learning and behaviour, at least in higher order mammals like rodents, non-human primates and humans, more research is leading to deeper questions about our understanding of decision-making itself. It is not so surprising because, given that any species, in some degree or the other, depends on the mechanisms of action selection or decision-making for its survival in an uncertain environment. Humans are presumably the most flexible and adaptive decision-makers who can learn the underlying structure of the world, even if the structure is hidden, and rapidly adapt their behaviour. The prefrontal cortex (PFC) has been at the forefront of this proposition and is believed to have facilitated this evolution towards a wider repertoire of behaviours that emerge from underlying primitive action selection mechanisms. It is highlighted that studying complex realistic decision-making in ecological scenarios will require a more sophisticated experimentation methods than the regular numerical simulations used. The experiments designed in Minecraft can be used to test the framework in an environment that could be more complex and realistic, if necessary. Major value addition of a virtual environment and an agent interacting in it is, that the bodily characteristics of the agent can be emphasized (like needs) and their role in value-based decision making can be discussed. Subsequently the framework, along with virtual experimentation forms a closed-loop architecture for studying the high-level animal behavior. The neural systems framework in this work rests on the network dynamics between the subsystems of PFC and BG. PFC is believed to play a crucial role, in executive functions like planning, attention, goal-directed behavior, etc. BG are a group of sub-cortical nuclei that have been extensively studied in the field of motor control and action selection. Different regions in the PFC and structures within BG are anatomically organized, including a respective sensory cortical region, in parallel and segregated loops (each of them referred here as a CBG loop). These loops can be, on a high level, divided into 3 kinds : limbic loops, associative loops and sensori-motor loops. Imagine an animal interacting with stimuli in an environment. Some of the most pertinent questions to the current state of the animal with respect to the stimuli present are : (i) What is (the value of) this stimulus? (Preference) (ii) Why is this stimulus relevant to my current internal needs? (Need) (iii) Where is this stimulus located with respect to my reference in the current environment (Orientation), and (iv) How do I reach the ’desired’ stimulus (Approach). Limbic loops address the questions What? and Why?. Sensori-motor loops are concerned with the questions Where? and How?. Associative loops form a multi-modal association of the current state information, for instance which stimulus in the limbic loops is at which position represented in the motor loops. Furthermore, in each of these loops, as the subregion of PFC represents the chosen goal, the process of achieving the goal by sustained activation between the PFC subregion and the corresponding sensory cortical area is described. Especially virtual experimentation helps highlight this phenomenon by demonstrating flexible adjustments to action plan once the goal is selected. First, a comprehensive framework with the above mentioned parallel loops is implemented. All the four loops are algorithmically implemented, describing the mutual influences between each of the prefrontal sub-regions. It is important to note that, although there is no explicit hierarchy built in the system among the loops, there are two levels of hierarchy that could implicitly arise. First, although the motor loops are free to make decisions in the action space, with sufficient learning in the limbic space, the decisions in any of the limbic loops could lead the decisions in the sensori-motor space. through the associative loop. Secondly, it is assumed that the fundamental motivation of the animal is internal homeostatis, that is to maintain its internal needs in acceptable bounds. Thus, in certain situations, the internal motivation might lead the dynamics in the limbic loops, with the Why? loop for internal motivation biasing the What? loop which might be more stimulus-driven, when there is no pressing internal need. The inputs for the CBG loops is provided by the sensory perception of the framework that communicates the information provided by Malmo from the videogame environment to the corresponding representations in the framework. Similarly the output of the framework is transformed to appropriate Malmo representations of action commands that drive the agent in the environment. Since the cognitive framework is described by several biological constraints, several adaptations have been made in the way the Malmo platform is used, in terms of sensory perception of the environment and the motor control of the agent. Next, we use this framework to study more closely, the role of limbic loops in valueguided decision making and goal-directed behavior. The emphasis rests on the limbic loops. Therefore the associative and sensori-motor loops are modeled algorithmically, taking help of the experimentation platform for motor control. As for the limbic loops, the orbitofrontal cortex (OFC) is the part of a loop for preferences and the anterior cingulate cortex (ACC), for internal needs. These loops are formed through their limbic counterpart in BG, ventral striatum (VS). VS has been widely studied and reported to be encoding various substrates of value, forming an integral part of value-based decision making. Simplistic scenarios are designed in the virtual environment using the agent and some objects and appetitive rewards in the environment. The limbic loops have been implemented according to existing computational models of decision making in the BG and amygdala. Thus the framework and the experimental platform stand as a testbed to computational models of specific processes that have to fit in a bigger picture. Of the limbic loops, the role of OFC has been closely studied. Ranging over diverse studies across decades, OFC has been implicated in almost all aspects of decision-making - state representation, outcome prediction, action selection, outcome evaluation and primarily, learning. Furthermore, deficits or lesions of OFC were argued to cause multiple behavioral impairments such as response inhibition for no longer rewarding stimulus, learning when reward contingencies are reversed etc. With more advanced lesion techniques and keener analysis, several such observations were turned down. Nevertheless, the role of OFC in value-based decision making and learning is underlined time and again, while the exact ways in which it affects the process are still unknown. As part of this thesis, several outstanding observations about the role of OFC in behavior have been summarized by consolidating numerous experimental evidences and reviews. To highlight a few, OFC is implied in : perceptual decision making and value-based decision making; within a single decision-making episode (trial), different kinds of involvement at a different phase (option presentation, action selection, outcome delivery etc.,); learning stimuli-outcome (pavlovian) and action-outcome (instrumental) associations. The neurons in OFC were found to vividly correlate with the value of the outcomes, more interestingly expressing a phenomenon of range adaptation, adapting to the changing ranges of values. OFC is believed to learn a state space representation of the task space to be able to access partially observable information for a decision. The structural heterogeneity of OFC adds to the inherent underlying complexity about studying the role of Orbitofrontal Cortex (OFC) in decision making, learning and goal-directed behavior. This has been studied in the recent years, with studies focused on dissociating the roles of lateral and medial subparts of OFC. Often, ventromedial prefrontal cortex (vmPFC) is considered under medial OFC. Bouret et al., 2010, Noonan et al., 2010, Rudebeck & Murray 2011 are some of the few comprehensive studies that clearly argued for separate roles of lateral and medial OFC. Lastly, to explain the findings of different roles of lateral and medial regions of OFC, existing computational architecture of CBG loops, pavlovian learning in amygdala and multiple evidences of amygdala-OFC-VS interactions are put together into a single model. The learning rules of reinforcement have been adapted to accommodate the appropriate credit assignment (correct outcome to correct chosen stimulus) and the value difference of the choice options. As a result, several findings from animal experiments studying the separable roles, were replicated. Particularly in the context of different roles of lateral and medial OFC in decision making as a function of the value difference between options, distinct and dissociate roles of lateral and medial were observed. Medial OFC seemed to be more crucial for the choice between two options that are close to each other, whereas lesions to medial OFC did not seem to affect the animal’s performance when the difference between the values of the options are sufficiently apart. On the contrary, surprisingly lateral OFC appeared to be crucial when the decisions are easy to make whereas lesions to lateral OFC did not seem to affect the difficult choices where the values of the options are close to each other. Similar results were found in the performances of the monkeys with lesions to to lateral and those with lesions to medial OFC. Dissociable roles in Pavlovian Instrumental Transfer were also observed. Notwithstanding the detailed neural architectures and basic neuronal descriptions used in certain parts of this work, the neural mechanisms of all the behavioral paradigms were discussed at a very simplistic level. Throughout the work, only appetitive behavior has been described, whereas most of the processes described in this work are also known to account for aversive behaviors like avoiding punishments. In addition, the role of dopamine as the neurotransmitter facilitating learning has been extremely simplified. Furthermore, with multiple systems of reinforcement learning involved in the framework, it demands for a detailed role of how dopamine could have a differential effect on these systems. One of the most important elements of behavior that is not accounted for in the framework is memory. In fact by complementing the framework with an existing computational account of a minimal working memory model, the mechanisms of sustained activities to maintain goals until achieving, aspects like giving up if the goal hasn’t been reached for a long time etc, can be explored further. Adding an explicit memory to store minimum spatial and episodic information would allow the framework to explain more flexible behaviors like pure goal-directed or opportunistic behaviors. However, that would require much sophisticated implementations of motor loops where a desired position can be navigated. Nevertheless, the investigations into the observed evidences around OFC offer great insight into understanding the very process of decision-making, value computation in general. By venturing into a realm of bio-inspired adaptive learning in an embodied virtual agent, describing the principles of motivation, goal-selection and self-evaluation, it is highlighted that the field of reinforcement learning and artificial intelligence has a lot to gain from studying the role of prefrontal systems in decision-making.
Complete list of metadata

Cited literature [349 references]  Display  Hide  Download

https://hal.inria.fr/tel-02431814
Contributor : Frédéric Alexandre <>
Submitted on : Wednesday, January 8, 2020 - 10:51:26 AM
Last modification on : Monday, October 19, 2020 - 11:05:55 AM
Long-term archiving on: : Thursday, April 9, 2020 - 6:32:28 PM

File

NallapuBhargav19.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : tel-02431814, version 1

Collections

Citation

Bhargav Teja Nallapu. A closed loop framework of decision-making and learning in primate prefrontal circuits using Computational Modeling and Virtual Experimentation. Neural and Evolutionary Computing [cs.NE]. Université de Bordeaux, 2019. English. ⟨tel-02431814⟩

Share

Metrics

Record views

323

Files downloads

521