Physical principles in neural coding and computation: Lessons from the fly visual system


William Bialek, Rob de Ruyter van Steveninck and collaborators


Last updated 2 May 2007


Successful organisms solve a myriad of problems in the face of profound physical constraints, and it is a challenge to quantify this functionality in a language that parallels our characterization of other physical systems. Strikingly, when we do this the performance of biological systems often approaches the limits set by basic physical principles. Here we describe our exploration of functionality and physical limits in the fly visual system.


An important technical aspect of the work described here has been the enormous stability of the fly as a system for recording the activity of neurons. In addition, rather than focusing on the response of the visual system to a discrete set of stimuli presented in isolation, we have developed ways of analyzing the responses to continuous, dynamic stimuli more like those encountered in nature; in fact considerable effort has gone into getting ever closer to truly natural stimuli. The result is that where classical experiments on neurons involved roughly one kilobit of data, roughly the size of a single gene, much of the work described here rests on the analysis of data sets of several megabits, closer in size to a whole genome. This combination of changing the scale of the data set and changing our theoretical outlook in the design and analysis has allowed us to uncover several new phenomena.


We have found that computation in the fly’s visual system works with such precision that it is limited by photon shot noise and diffraction, that information transmission across synapses between neurons operates near the limits imposed by the ‘quantization’ of signals into discrete packets of chemical transmitter, that deep inside the brain signals are represented by sequences of action potentials or ‘spikes’ with nearly optimal efficiency, and that these levels of performance are the result of dynamic adaptation processes that allow the system to adjust its strategies to the statistics of the current visual environment. These observations provide a glimpse of optimization principles that might organize and determine the brain’s choice of codes and computations. It remains to be seen whether these principles are applicable more generally, but we believe that our experience in the fly visual system has sharpened many of the questions in the field. The following paragraphs are meant as a guide through some of the papers taken in chronological (rather than logical) order; early parts of the work were reviewed, with pedagogical background, in Spikes: Exploring the Neural Code (MIT Press, 1997).


Numbers refer to a full list of publications for WB.



21. Real–time performance of a movement sensitive neuron in the blowfly visual system: Coding and information transfer in short spike sequences. R de Ruyter van Steveninck & W Bialek, Proc. R. Soc. London Ser. B 234, 379-414 (1988).


All of the sensory signals reaching the brain are encoded in sequences of identical, discrete pulses called action potentials or spikes. Spike trains in other regions of the brain can represent motor commands or, more subtly, plans and intentions, and there is no reason to doubt that even our private thoughts are represented in this way. The way in which spikes represent the external world has been studied since the 1920s. In this paper we argued that one should take a new point of view on this problem. Instead of asking how known signals in the outside world are encoded in the average properties of spike trains, we asked how the brain—which has only the spike trains to work with—could make inferences about unknown sensory stimuli. Specifically, we showed how to characterize the distribution of sensory inputs that are consistent with the neural response, thus quantifying the (un)certainty and information content of inferences from the spike train.  These ideas were used in the design and analysis of experiments on the fly visual system, specifically a neuron H1 that is responsible for extracting information about (horizontal) rigid body motion across the whole visual field. There are several specific points that have become important in recent work: The demonstration that short sequences of spikes are informative only about projections of the stimulus onto spaces of low dimensionality, that similar spike trains stand for similar stimulus waveforms, and that patterns of spikes can convey more information than expected by summing the contributions of individual spikes. In addition to these specific results, the point of view expressed in this paper set the agenda for much of our subsequent work on the neural code.




25. Coding and computation with neural spike trains. W Bialek & A Zee, J. Stat. Phys. 59, 103-115 (1990).


Inspired in part by the results in the fly, we set out to study the problem of coding and decoding in simple models of spiking neurons. Probably the most important result was that there is a large regime in which signals can be decoded by linear (perturbative) methods even though the encoding is strongly nonlinear. The small parameter that makes this work is the mean number of spikes per correlation time of the signal, suggesting that spike trains can be decoded linearly if they are sparse in the time domain.  In Spikes we discuss the evidence that many different neural systems make use of such a sparse representation, but of course one really wants a direct experimental answer: Can we decode the spike trains of real neurons using these theoretical ideas? As an aside, it is worth noting that identifying a regime where linear decoding can work is really much more general than the details of the model that we chose to investigate; this is important, since none of the models we write down are likely to be accurate in detail.


Rereading the original paper, it is perhaps not so clear that sparseness is the key idea. A somewhat more explicit discussion is given in later summer school lectures [29, 84].





34. Reading a neural code. W Bialek, F Rieke, RR de Ruyter van Steveninck, & D Warland, Science  252, 1854-1857 (1991).


Returning to the experiments, we showed that it really is possible to decode the spike trains from H1 and thereby reconstruct the time dependent velocity of motion across the visual field. The coexistence of linear decoding with nonlinear encoding was implicit in this work, but made explicit in Chapter 2 of Spikes. This work was intended as a proof of principle, but we also found that the reconstructions were surprisingly precise: Errors corresponded to ~ 0.06 degrees over ~ 30 ms, which is about 20 times smaller than the lattice spacing of detectors in the compound eye or 10 times smaller than the nominal “diffraction limit” due to blur by the eye’s lenses. Resolution beyond the sampling and diffraction scales is also known in human vision, and the collection of perceptual phenomena in which this occurs is called hyperacuity. This led us to wonder about the physical limits to motion estimation—blur due to diffraction through the lenses of the compound eye and noise due to the random arrival of photons at the receptors. In fact the observed performance is very close to this limit, so that even four layers of neurons away from the receptors it is still the physics of the inputs that sets the precision of computation. The ideas of decoding and stimulus reconstruction have since been applied to systems ranging from motor control in crabs to visual motion perception in monkeys.


For a review of hyperacuity see Section 4.2 of Spikes. Perceptual hyperacuity usually is demonstrated in tasks that involve discrimination among discrete alternative signals; the reconstruction experiments allowed the demonstration of comparable precision in a more natural task of continuous estimation. Experiments that are more analogous to the discrimination experiments have also been done on the H1 neuron [57], and a preliminary account of these experiments (in 1984) may have been the first report of hyperacuity in the responses of a single neuron. For details of the limits to motion estimation, see [29, 42].





53. Statistical mechanics and visual signal processing. M Potters & W Bialek, J Phys I France 4, 1755-1775 (1994).


Inspired by the observation of near optimal performance in the fly's motion estimation system, we set out to understand the algorithmic requirements for optimal estimation. Conventional approaches involve searching a set of possible strategies for the best within the set, but we showed how one could map the problem of estimation in the presence of noise onto a statistical mechanics problem in which the data act as external fields and the estimator is the expectation value of some order parameter.  Estimation theory then is reduced to the computation of (perhaps strongly nonlinear) response functions, and standard approximations in statistical mechanics map to different regimes of the signal processing problem.  Applying the general framework to the problem of motion estimation in the fly, we showed that the optimal estimation strategy has very different behaviors in different sensory environments.  In particular, the optimal estimator interpolates between popular models for motion estimation, which arise as limiting cases of the full theory.  An inevitable prediction of the theory is that the optimal processor must change its strategy, or adapt to changes in the statistics of the input signals.  Preliminary experiments gave clear evidence of this “statistical adaptation” [54, 59, 61], and recent experiments provide a direct confirmation of the combination of nonlinear operations predicted by the theory [102].





The rate of information transfer at graded–potential synapses. RR de Ruyter van Steveninck & SB Laughlin, Nature  379, 642-645 (1996).


Information is passed from one neuron to another largely through chemical synapses. In the same way that electrical signaling in many cells is quantized into action potentials or spikes, chemical signaling is quantized into vesicles or packets of neurotransmitter molecules; this is true even at synapses like the first synapse in the retina, where both the pre– and post–synaptic cells generate graded voltage responses rather than spikes. Here we characterized the signal and noise properties of the photodetector cells and their synaptic target, the large monopolar cell (LMC), in the fly retina, and then used these measurements to infer the information capacity of the synapse. In characterizing photodetector noise we touch one of the fundamental facts about the visual system, namely that it is capable of counting single photons. Although evidence for this has been accumulating since the 1940s, it has been much less clear whether biological photodetectors can continue to operate near the photon shot noise limit at counting rates that are more typical of animal behavior. Here we showed how one can combine traditional measures of signal transfer and noise to characterize the equivalent contrast noise of photodetector cells; we exploit the extreme stability of the fly experiments to calibrate this noise against the limits set by photon counting. The result is that fly photodetector performance comes very close to the shot noise limit over a wide range of frequencies and counting rates, up to the point where pupil mechanisms begin to attenuate the light entering the eye. The excess noise beyond shot noise is well approximated as a limit to time resolution. Applying the same analysis to the LMCs we saw an effective six–fold increase in photon capture rate, as expected since six photodetectors converge on a single LMC, but this also means that over a considerable range of frequencies and counting rates the noise in the synapse is negligible and integration of the six signals is essentially perfect. Finally these noise measurements were analyzed to show that the synapse can transmit more than 1500 bits/sec, by far the record for any single neuron. As discussed in Spikes, this information transmission rate is (given some uncertainties) close to the limit set by the quantization of the signal into vesicles combined with the system’s time resolution. Within some range of time scales, then, the photodetector is an near–ideal photon counter and the LMC is a near–ideal vesicle counter.


For a review and pedagogical discussion see [81].





66. Entropy and information in neural spike trains. SP Strong, R Koberle, RR de Ruyter van Steveninck & W Bialek, Phys Rev Lett  80, 197-200 (1998).


There have been fifty years of debate on the question of whether the detailed timing of spikes is important. With precise timing the system would have access to a much greater representational capacity: The entropy of neural responses is larger at higher time resolution, but is this capacity used efficiently? Here we showed how to measure—without reference to any models or assumptions—the information content of neural spike trains as they encode naturalistic, dynamic inputs. The result was that the motion sensitive neurons in the fly visual system use roughly half of the spike train entropy to carry visual information, and this efficiency is approximately constant from a time resolution of ~1 sec down to ~1 msec. The observation of high coding efficiency at msec time resolution is in agreement with earlier results from other systems using the linear decoding methods to estimate the information rates, but it is important that with the present approach we don’t even need to know what aspects of the stimulus are represented let alone the proper algorithm for decoding these features. In contrast with methods based on decoding, the tools developed in this work have come to be called ‘direct’ methods for analysis of information transmission, and are being applied to a variety of systems. Direct estimates of the information content of spike trains serve to make precise our impressions about the reproducibility and variability of neural responses, and an important observation in the fly as in other systems is that the statistical structure of the spike train in response to dynamic signals seems to be very different from that observed in response to static or slowly varying signals [63, 73]. This suggests that one might be able to observe even more informative and efficient responses under more natural conditions (see below!).


The central technical difficulty in using these methods is the problem of bias due to limited sample sizes. In this first paper we made an effort to collect a very large data set, rather than trying to be especially sophisticated in how we extract our estimates on entropy and information from the available samples. There are interesting theoretical questions concerning how much one can say about information theoretic quantities when the relevant probability distributions are undersampled. For our efforts in this direction see [83, 101].





The metabolic cost of neural information. SB Laughlin, RR de Ruyter van Steveninck & JC Anderson, Nature Neurosci  1, 36-41 (1998).


A striking fact about the brain is that very small groups of cells change their metabolic ate in relation to their activity—thinking harder really does cost energy, and the exquisite control of this energy balance ultimately forms the basis for signals that are detectable in functional imaging of the brain. The combination of our measurements on signals, noise and information transmission in the fly retina with fairly detailed mechanistic data on these cells allowed us to address the energetics of information transmission in a new way.  We found that visual information is quite expensive, with a cost of ~ 10,000  ATP molecules (each worth ~0.5 eV) per bit. In such noise–limited signaling systems, transmission of multiple parallel signals at relatively poor signal–to–noise ratio is vastly more energetically efficient than transmitting a single high quality signal, perhaps providing a physical justification for the frequent occurrence of both multiple pathways and apparent unreliability of the individual pathways in biological systems. Several groups are actively investigating related issues, from the idea that neural codes are optimized for metabolic efficiency to the possibility of exploiting these conclusions in low power silicon devices.





71. Synergy in a neural code. N Brenner, SP Strong, R Koberle, W Bialek & RR de Ruyter van Steveninck, Neural Comp 12, 1531-1552 (2000).


Timing of spikes could be significant (as demonstrated in [66]) because each spike points precisely to an event in the outside world, or because the system really uses temporal patterns of spikes to convey something special. Here we gave this question an information theoretic formulation: Do patterns of spikes carry more information than expected by summing the contributions of individual spikes? To answer this requires measuring the information carried by particular candidate symbols in the code, and we show how this can be done with real data, independent of an model assumptions, making connections between the information theoretic quantities and the more familiar correlation functions of the spike train. Although we focused on patterns across time in a single cell, everything generalizes to patterns across a population of cells.  For the fly’s motion sensitive neuron, we do observe synergistic coding, and this synergy is a significant component of the high coding efficiency seen in this system. The fact that we can measure objectively the information carried, for example, by a single spike, also means that we have a benchmark against which to test models of the code.


For a discussion of synergy and redundancy in populations see [91]. There is a big conceptual question about how one relates synergy or redundancy in populations to the synergy or redundancy that one can observe among pairs, triplets, ... of cells; see [90].





72. Adaptive rescaling optimizes information transmission. N Brenner, W Bialek & R de Ruyter van Steveninck, Neuron 26, 695-702 (2000).


The direct demonstration of high coding efficiency in neural spike trains [45, 66, 77] supports strongly the old idea that the construction of an efficient representation could be the goal of neural computation. Efficient representations must be matched to the statistical structure of the input signals, and it therefore is encouraging that we observe higher coding efficiency for more naturalistic signal ensembles [58, 63, 74], but it usually was assumed that such matching could occur only over the long time scales of development  or evolution.  In [53, 55] we proposed that adaptation to statistics would occur in real time, to exploit the intermittent structure of natural signals  (eg [52]), and in [54, 61, 62] we presented evidence that this occurs both in the fly and in the vertebrate retina.  Here we analyzed an example in detail, and found that adaptation to changes in the variance of the input has a striking form, rescaling the input/output relation of the neuron so that signals are coded in relative units.   Further, the precise choice of the scaling factor serves to optimize information transmission; as far as we know this is the first direct demonstration that an optimization principle is at work in the brain.


There are two notions of real–time adaptation to statistics at work in our thinking about the fly, and in the fly itself. First is the idea of adaptation of the computation that the fly does in estimating motion, as developed in [53]. Second is the idea that coding the output of these computations must be matched to he distribution of the signal we are trying to encode. In general, adaptation in real time works only if signals have a special statistical structure in which low order statistical properties (variance, correlation time, ... ) are constant across reasonable windows of space and time, and then these low order statistics rift. Under these conditions one can generate the sorts of long–tailed distributions seen for many different natural signals, and it is also true that optimal coding and computational strategies will involve adapting locally and tracking these drifting statistics. At least for us these ideas have their origin in observations on the statistical structure of natural images [52], where we first saw the “variance normalization” which has its direct echo in the present work on adaptation and scaling.





75. Universality and individuality in a neural code. E Schneidman, N Brenner, N Tishby, RR de Ruyter van Steveninck & W Bialek, in Advances in Neural Information Processing 13,TK Leen, TG Dietterich, & V Tresp,eds., pp. 159-165 (MIT Press, Cambridge, 2001).


One of the major challenges in thinking quantitatively about biological systems is the variability among individuals. In the context of the neural code, we can ask if different animals share similar neural representations of the same sensory inputs. The problem of comparing neural representations is similar in several ways to the problem of comparing DNA sequences, and we argue that rather than using conventional metric or string matching methods one should take a model independent information theoretic approach; we believe that this is a substantial conceptual advance that should have implications back to the bioinformatics problem. We then find that the fly’s visual system has a quantifiable mixture of universality and individuality: what is universal is the efficiency of the code, nd what is individual is the precise way in which patterns of spikes are used to achieve his efficiency. Closely related to the problem of individuality is the problem of classifying neurons within one organism: Can we, for example, make precise the impression that our retina has a small number of classes of cells which serve to divide the incoming visual information into parallel and stereotyped channels, or might each neuron in fact have a unique view of the world? Building on the methods introduced here, it is possible to give this classification problem a similar purely information theoretic formulation [87].





77. Neural coding of naturalistic motion stimuli. GD Lewen, W Bialek, & RR de Ruyter van Steveninck, Network 12, 317-329 (2001).


Brains were selected by evolution for their performance in processing sensory signals of considerable complexity, far from the simple stimuli of the traditional experimentalist’s toolbox. One of the themes in our exploration of the fly’s visual system thus has been to provide methods for analyzing the responses to more complex and ultimately natural inputs.  Any experiment with “natural” inputs, however, must make a compromise between emulating the richness of the real world and maintaining experimental control. Rather than trying to construct an ever more naturalistic world in the laboratory, here we took a different approach and recorded the responses of the motion sensitive neuron H1 with the fly outdoors and moving along angular trajectories taken from actual acrobatic flights. Even in response to constant velocity stimuli, the differences between outdoor and laboratory conditions is large enough to extend the dynamic range of these neurons by more than an order of magnitude in angular velocity. During motion along realistic flight trajectories, spike timing can be reproducible on the scale of 100 Ķsec; further, the ability of the photodetector cells to act as near–ideal photon counters at high counting rates is reflected in the fact that the information about motion continues to increase as the photon flux climbs toward its midday maximum. While much remains to be done, these initial results strongly support the conclusion that under natural conditions the nervous system can operate with a richness and precision far beyond that expected from experiments in more limited environments.


While attractive, the idea that natural stimuli are coded more efficiently (or are special in some other way) has been controversial. For a review that addresses the controversy and resents several new results, see [73].





78. Efficiency and ambiguity in an adaptive neural code.  AL Fairhall, GD Lewen, W Bialek & RR de Ruyter van Steveninck, Nature 412, 787-792 (2001).


Adaptation allows the nervous system to be better “matched” to the current sensory environment, but there are problems: adaptive codes are ambiguous, and matching takes time so one can fall behind. Here we take the observations on adaptation and optimization in [72] as a starting point and show that the dynamics of adaptation itself is optimal, so that the speed with which the system adjusts to a change in input distribution is close to the limit set by the need to gather statistics. Further, while the coding of short segments of the sensory stimulus in small numbers of spike trains is highly adaptive, the longer term statistics of the spike train contains enough information about the adaptation state of the cell to resolve potential ambiguities. Finally, there is no single time scale that characterizes the response to changes in input statistics; rather the system seems to have access to time scales ranging from less than 100 msec out to order one minute or more, which may make it possible to deal with the multiple time scales of variation in the real world.




103. Features and dimensions:  Motion estimation in fly vision.  W Bialek & RR de Ruyter van Steveninck, q-bio/0505003 (2005).


Here we build on the ideas of [21] and [72] to characterize the computation of motion in the fly visual system as a mapping from the high dimensional space of signals in the retinal photodetector array to the probability of generating an action potential in a motion sensitive neuron. We identify a low dimensional subspace of signals within which the neuron is most sensitive, and then sample this subspace to visualize the nonlinear structure of the mapping. The results illustrate the computational strategies predicted for a system that makes optimal motion estimates given the physical noise sources in the detector array. More generally, the hypothesis that neurons are sensitive to low dimensional subspaces of their inputs formalizes the intuitive notion of feature selectivity and suggests a strategy for characterizing the neural processing of complex, naturalistic sensory inputs. The same methods of analysis have been used to take a new look at the computations done in simple, biologically plausible model neurons [88], as well as other experimental systems from vertebrate retina to visual cortex. New, purely information theoretic methods should allow us to search for low dimensional relevant subspaces even when stimuli have all the complex correlation structure of fully natural signals [93].




115. Neural coding of a natural stimulus ensemble: Uncovering information at sub–millisecond resolution. I Nemenman, GD Lewen, W Bialek & RR de Ruyter van Steveninck, q–bio.NC/0612050 (2006).


Our knowledge of the sensory world is encoded by neurons in sequences of discrete, identical pulses termed action potentials or spikes.  There is persistent controversy about the extent to which the precise timing of these spikes is relevant to the function of the brain.  We revisit this issue, using the motion-sensitive neurons of the fly visual system as a test case.  New experimental methods (from [77]) allow us to deliver more nearly natural visual stimuli, comparable to those which flies encounter in free, acrobatic flight, and   new mathematical methods (from [83,99]) allow us to draw more reliable conclusions about the information content of neural responses even when the set of possible responses is very large.  We find that significant amounts of visual information are represented by details of the spike train at millisecond and sub-millisecond precision, even though the sensory input has a correlation time of ~ 60 ms; different patterns of spike timing represent distinct motion trajectories, and the absolute timing of spikes points to particular features of these trajectories with high precision.  Under these naturalistic conditions, the system continues to transmit more information at higher photon flux, even though individual photoreceptors are counting more than one million photons per second, and removes redundancy in the stimulus to generate a more efficient neural code.