The Effects of Uncertainty on TD Learning
Substantial evidence suggests that the phasic activities of dopamine
(DA) neurons in the primate midbrain represent a temporal difference
(TD) error in the predictions of future reward. TD offers a
computationally compelling account of a role for DA in appetitive
classical and instrumental conditioning, and a precise and
parsimonious computational theory of the generation of DA firing
patterns. Recent DA recordings (Fiorillo et al Science, 2003) in an experiment involving
inherently stochastic reward
delivery, show activity ramping towards the time of the
reward, with ramp height related to the degree of reward
uncertainty. Prima facie, these data present a crucial
challenge, as both classical and instrumental facets of the TD would
require the reliable activity in the ramp to be predicted
away by earlier stimuli.
We use analysis and simulations to show that the apparently anomalous
ramps are in fact to be expected under a standard TD account, if, as
suggested by the low baseline firing rates of the DA cells, positive
and negative prediction errors are differentially scaled. Using a
simple tapped-delay-line representation of time between the stimulus
and the reward (as commonly adopted in TD models), together with a
fixed learning rate, a ramp in the DA activity emerges just as in the
experimental data.
Analytically deriving the average response at the time of the reward
from the TD learning rule, we show that the height of the modelled
ramps are indeed proportional to the variance of the rewards, in
accordance with the data. There is, however, a key difference between
the uncertainty and TD accounts of the ramps. According to the former,
ramps are within-trial phenomena, coding uncertainty; by contrast, the
latter suggests they arise only through averaging across multiple
trials. Under the TD account, the non-stationarity engendered by
constant learning from errors makes the PSTH traces potentially
misleading, as they average over different trial histories.
Our study suggests that uncertainty need not be playing any explicit
part in determining these aspects of DA activity, a conclusion also
consistent with various other suggestive results from Fiorillo and
others. We also study the effects on TD learning of other sources of
representational and learning noise.