The neural correlates of exploration

Date

2019-08-28

Authors

Hassall, Cameron Dale

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Like other animals, humans explore to learn about the world, and exploit what we have learned in order to maximize reward. The trade-off between exploration and exploitation is a widely-studied topic that cuts across multiple domains, including animal ecology, economics, and computer science. This work approaches the explore-exploit dilemma from the perspective of cognitive neuroscience. In particular, how are our decisions to explore or exploit represented computationally? And how is that representation implemented in the brain? Experiment 1 examined neural signals following outcomes in a risk-taking task. Explorations – defined as slower responses – were preceded by an enhancement of the P300, a component of the human event-related brain potential thought to reflect a phasic release of norepinephrine from locus coeruleus. Experiment 2 revealed that the same neural signal precedes feedback in a learning task called a two-armed bandit. There, a reinforcement learning model was used to classify responses as either exploitations or explorations; exploitations were driven by previous rewards, and explorations were not. Experiments 3 and 4 extended these results in three important ways. First, evidence is presented that the neural signal observed in Experiments 1 and 2 was driven not only by the upcoming decision, but also by the preceding decision (perhaps even more so). Second, Experiments 3 and 4 involved increasingly larger action spaces. Experiment 3 involved choosing from among either 4, 9, or 16 options. Experiment 4 involved searching for rewards in continuous two-dimensional map. In both experiments, the feedback-locked P300 was enhanced following exploration. Third, exploitation was the more common strategy in Experiments 1 and 2. Thus, it was unclear whether the exploration-related P300 enhancement observed there was due to exploration per se, to exploration rate, or to the fact that exploration was rare compared to exploitation. Experiment 3 partially address this by eliciting different rates of exploration; the exploration-related P300 effect correlated with rate of exploration. In Experiment 4, exploration was more common than exploitation (in contrast to Experiments 1–3); even so, exploration was followed by a P300 enhancement. Together, Experiments 1–4 suggest the presence of a general neural system related to exploration that operates across multiple task types (discrete to continuous), regardless of whether exploration or exploitation is the more common task strategy. The proposed purpose of this neural signal is to interrupt one mode of decision-making (exploration) in favour of another (exploitation).

Description

Keywords

P300, decision making, reinforcement learning, event-related potential, learning, reward positivity, N200, win-stay, lose-shift, computational modelling, P200, explore-exploit dilemma

Citation