Asymmetric Coding of Temporal Difference Errors: Implications for Dopamine Firing Patterns

 

Yael Niv, Michael Duff and Peter Dayan

 

Substantial evidence suggests that the phasic firing of dopamine (DA) neurons in the primate midbrain represents a temporal difference (TD) error in the predictions of future reward. TD offers a precise and parsimonious computational theory for the role of DA in appetitive classical and instrumental conditioning.

Recent experiments are exploring the envelope of this interpretation for both rewards (involving DA) and punishments (presumably involving an opponent neurotransmitter such as serotonin - Daw et al., Neural Networks, 2002). Notably, examining the way that information about unexpected unconditioned outcomes (USs) propagates back to the conditioned stimuli (CSs) predicting them. Such experiments make use of stochastic rewards (Fiorillo et al., Science, 2003) or punishments (Seymour et al., Nature, in press), so that there are persistent prediction errors over which to accumulate statistics even in well learned tasks.

When explicit stimuli bridge the gap between the earliest reliable CS and the US, then, as expected from the theory, prediction errors are evident at intermediate points (Seymour et al.). The situation is less clear when there are only internal timing cues. We use a novel theoretical analysis to show that the surprising across-trials ramping in DA activity that was observed by Fiorillo et al. may be a signature of this same process. Our critical innovations are taking into account the fact (for which there is ample independent evidence) that positive and negative TD errors are asymmetrically coded in DA activity, and acknowledging the constant learning that should result from the ongoing prediction errors. We suggest direct tests of our interpretation.