Asymmetric Coding of Temporal Difference Errors: Implications for Dopamine Firing Patterns
Substantial evidence suggests that the phasic firing of dopamine (DA) neurons in the primate
midbrain represents a temporal difference (TD) error in the predictions of future reward. TD
offers a precise and parsimonious computational theory for the role of DA in appetitive
classical and instrumental conditioning.
Recent experiments are exploring the envelope of this interpretation for both rewards
(involving DA) and punishments (presumably involving an opponent neurotransmitter such as
serotonin - Daw et al., Neural Networks, 2002). Notably, examining the way that information
about unexpected unconditioned outcomes (USs) propagates back to the conditioned stimuli
(CSs) predicting them. Such experiments make use of stochastic rewards (Fiorillo et al.,
Science, 2003) or punishments (Seymour et al., Nature, in press), so that there are
persistent prediction errors over which to accumulate statistics even in well learned
tasks.
When explicit stimuli bridge the gap between the earliest reliable CS and the US, then, as
expected from the theory, prediction errors are evident at intermediate points (Seymour et
al.). The situation is less clear when there are only internal timing cues. We use a novel
theoretical analysis to show that the surprising across-trials ramping in DA activity that
was observed by Fiorillo et al. may be a signature of this same process. Our critical
innovations are taking into account the fact (for which there is ample independent evidence)
that positive and negative TD errors are asymmetrically coded in DA activity, and
acknowledging the constant learning that should result from the ongoing prediction errors.
We suggest direct tests of our interpretation.