Optimization

Optimization principles and the physical limits to biological function

William Bialek

Last updated 29 November 2004

Much of my work is motivated by the (perhaps dangerously mystical) idea that biological systems may be optimized for particular functions, operating at or near some fundamental physical limits to their performance. Often this idea stays comfortably in the background, but sometimes it’s been possible to confront the issue head on. This set of papers explores the physical limits to various biological processes, develops optimization principles that predict how biological systems must work if they are going to reach these limits, and assesses the experimental evidence for these predictions.

Papers are in chronological order, most recent papers at the bottom. Numbers refer to a full list of publications.

[19.] Physical limits to sensation and perception. W Bialek, Ann. Rev Biophys Biophys Chem 16, 455-478 (1987).

From a physicist's point of view, some of the most remarkable facts about biological systems concern the absolute sensitivity of the sense organs: eyes that count single photons, ears that detect vibrations comparable in amplitude to Brownian motion, chemical sensors that count single molecules, thermal sensors with milliKelvin sensitivity at room temperature, ... . In this work I tried to bring these very different observations together, combining classic data with some of my own arguments about more recent experiments. The goal was to see performance near the physical limits as a general principle. I also wanted to address the issue of what happens after the detectors: while one might concede that physics is relevant for the design of receptor cells, surely once the resulting signals enter the brain one is doing biology (or even psychology) and not physics, or so go the prevailing prejudices. Against these prejudices I offered several examples where the nervous system does nontrivial computations and still reaches answers whose precision is set by physical principles. This broader notion of optimal performance, pushing the physical limits wherever possible, remains a radical view of biology in general and of neural computation in particular.

[27.] Temporal filtering in retinal bipolar cells: Elements of an optimal computation? W Bialek & WG Owen, Biophys J 58, 1227-1233 (1990).

[30.] Optimal filtering in the salamander retina. F Rieke, WG Owen, & W Bialek, in Advances in Neural Information Processing 3, R Lippman, J Moody & D Touretzky, eds, pp 377-383 (Morgan Kaufmann, San Mateo CA, 1991).

When looking for a weak signal in a background of noise, one of the standard strategies is “matched filtering,” in which the observable (signal plus noise) is passed through a filter with a frequency response that is designed (roughly) to maximize the amount of signal that we let pass while at the same time minimizing the noise. In this work we showed that when it’s very dark outside, and hence the signal-to-noise ratio in the photodetector cells of the retina is very low, all visual computations have the same first step, and this universal first step is a form of matched filtering. We know enough about the signal and noise properties of biological photodetectors to construct this filter explicitly, first by analytic approximations and then by direct numerical analysis of the data; there are no free parameters. The result is a filter which is in very good agreement with the filtering observed to occur in the first stage of visual processing, the transfer of signals from the rods (photodetector) to bipolar cells. This is a simple but nonetheless compelling success for the ideas of optimal processing. Note that we are not proposing a model of retinal signal processing, but rather making predictions from first principles about what the form of this processing should be in order to make maximal use of the available photons.

[34.] Reading a neural code. W Bialek, F Rieke, RR de Ruyter van Steveninck & D Warland, Science 252, 1854-1857 (1991).

[57.] Reliability and statistical efficiency of a blowfly movement-sensitive neuron. R de Ruyter van Steveninck & W Bialek, Phil Trans R Soc Lond Ser B 348, 321-340 (1995).

In these two papers we used the motion sensitive neurons in the fly visual system as a test case to probe the reliability of neural computation: Although it had been known for years that the visual system can count single photons when given the simple task of detecting a dim flash of light, is it possible that similarly fundamental physical considerations set the limits to reliability for more complex tasks under more natural conditions? In [34] we decode the spike trains (as suggested in [25]) of the fly’s H1 neuron and thereby reconstruct the time dependent angular velocity; this allows us to assess the reliability or precision of computation by measuring the accuracy of the reconstructions relative to the true motion trajectory. In [57] we did an experiment much more like a human psychophysical experiment, asking whether the spike train output of H1 is sufficient to allow reliable discrimination between very similar amplitudes of stepwise motion. In both experiments we found that the effective noise level corresponds to ~ 0.06 deg of motion over 30 msec, which is ~ 20 times smaller than the lattice spacing of detectors in the compound eye or ~ 10 times smaller than the nominal “diffraction limit” due to blur by the eye's lenses. Resolution beyond the sampling and diffraction scales is also known in human vision, and the collection of perceptual phenomena in which this occurs is called hyperacuity. This led us to wonder about the physical limits to motion estimation—blur due to diffraction the lenses of the compound eye and noise due to the random arrival of photons at the receptors. In fact the observed performance is very close to this limit, so that even four layers of neurons away from the receptors it is still the physics of the inputs that sets the precision of computation.

For a review of hyperacuity see Section 4.2 of [65]. A preliminary account of the experiments in [57] (published in 1984) may have been the first report of hyperacuity in the responses of a single neuron. For details of the limits to motion estimation, see [29, 42]. The fly visual system proved to be a very fertile testing ground for the idea of optimization in neural coding and computation; look here for details.

[38.] A new look at the primary charge separation in bacterial photosynthesis. SS Skourtis, AJR DaSilva, W Bialek & JN Onuchic, J Phys Chem 96, 8034-8041 (1992).

Photosynthesis begins with a photon induced transfer of an electron from one large molecule to another, both held in a protein framework. This step is complete in ~ 3 psec, one of the fastest reactions of its type ever observed. Chemistry as we usually think about it operates in the limit where reactions are slow compared with internal molecular relaxation rates, and on these grounds alone it seemed unlikely that the initial event of photosynthesis could be thought of as a conventional chemical kinetic process. In more detail, if one tries to estimate the matrix elements among the relevant electronic states and use the golden rule to calculate the transfer rate, there are considerable uncertainties but it seemed hard to get the right answer. In this work we showed that there is a surprisingly broad regime in which electronic matrix elements, vibrational level spacings and relaxation rates are all comparable, so that one can be poised in between the golden rule regime and coherent oscillation. Since irreversibility is possible only after the destruction of quantum coherence, this regime is one in which the reactions are as fast as possible, and we argued that predictions of the theory in this regime are consistent with various aspects of the phenomenology in photosynthesis. As we were completing the paper, a new set of ultrafast spectroscopy experiments at lower temperatures revealed the coherent oscillations that would occur in our scenario if relaxation rates were reduced. In a similar spirit we studied a simple model for the initial events in the visual pigments [16], combining intuition from condensed matter physics work on conjugated polymers with a (then) novel simulation technique that combined molecular dynamics with diagonalization of a model Hamiltonian for the electrons; again subsequent experiments detected the coherence effects expected from the theory, although it would be an overstatement to say that the theory was confirmed. These are some of the only cases I know where quantum coherence really is important in a biological process.

In retrospect the simulation method in [16] is a sort of poor man's Car–Parinello method (and done at the same time), using tight binding rather than density functionals. I think it is quite a powerful technique, and we should have made more of it at the time.

[53.] Statistical mechanics and visual signal processing. M Potters & W Bialek, J Phys I France 4, 1755-1775 (1994).

Inspired by the observation of near optimal performance in the fly's motion estimation system, we set out to understand the algorithmic requirements for optimal estimation. Conventional approaches involve searching a set of possible strategies for the best within the set, but we showed how one could map the problem of estimation in the presence of noise onto a statistical mechanics problem in which the data act as external fields and the estimator is the expectation value of some order parameter. Estimation theory then is reduced to the computation of (perhaps strongly nonlinear) response functions, and standard approximations in statistical mechanics map to different regimes of the signal processing problem. Applying the general framework to the problem of motion estimation in the fly, we showed that the optimal estimation strategy has very different behaviors in different sensory environments. In particular, the optimal estimator interpolates between popular models for motion estimation, which arise as limiting cases of the full theory. An inevitable prediction of the theory is that the optimal processor must change its strategy, or adapt to changes in the statistics of the input signals. Preliminary experiments gave clear evidence of this “statistical adaptation” [54, 59, 61], and recent experiments provide a direct confirmation of the combination of nonlinear operations predicted by the theory [102].

[55.] Information flow in sensory neurons. M DeWeese & W Bialek, Il Nuovo Cimento 17D, 733-741 (1995).

This was a first step in the still incomplete project of constructing a theory for optimal coding by spiking neurons. Along the way we introduced some interesting technical tools, such as a perturbative expansion of the information transmission rate. In addition we took the opportunity to debunk some misconceptions that surrounded the idea of “stochastic resonance.” This might also be the first place here we stated explicitly that the prediction of an optimal coding theory will necessarily be a code that adapts to the statistics of the sensory inputs.

[56.] Random switching and optimal processing in the perception of ambiguous signals. W Bialek & M DeWeese, Phys Rev Lett 74, 3077-3080 (1995).

In the case of motion estimation [53] there is nothing deep about the statistical mechanics problems that we have to solve, but here we found that in cases where stimuli have ambiguous interpretations (as in the Necker cube) the estimation problem maps to a random field model. The nontrivial statistical mechanics of the random field problem really does seem to correlate with the phenomenology of multistable percepts. This is interesting as a very clear example of how the idea of optimal performance can generate striking and even counterintuitive predictions, in this case predicting fluctuations in perception even when inputs are constant and the signal to noise ratios are high.

[68.] Adaptation and optimal chemotactic strategy for E. Coli. SP Strong, B Freedman, W Bialek & R Koberle, Phys Rev E 57, 4604-4617 (1998).

In 1977 Berg and Purcell wrote a classic paper analyzing the physical constraints on bacterial chemotaxis. Their discussion inspired many people, including me, but they also had a decidedly idiosyncratic style of argument that left many loose ends. This paper is one effort to make their arguments more systematic. Given that bacteria can measure the time derivative of concentration along their trajectories, that this measurement inevitably has a substantial noise level, and that the bacteria cannot steer but only go straight (up to the limits set by their own rotational diffusion) or tumble to choose a new direction at random, how can they use the results of their measurements to guide their behavior? In particular, is there a strategy that would maximize their progress along favorable concentration gradients? This problem of optimal behavioral strategy turns out to be quite subtle. I don’t think we have a complete solution, but we said some interesting things, in particular about the extent to which the decision to tumble can be nearly deterministic and yet still consistent with the apparently stochastic behavior observed for E Coli. I think that subsequent measurements on the nearly switch-like input/output relation of the bacterial motor are consistent with these results, but many questions remain open.

[69.] The information bottleneck method. N Tishby, FC Pereira, & W Bialek, in Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing, B Hajek & RS Sreenivas, eds, pp 368-377 (University of Illinois, 1999).

When Shannon developed information theory he left open the problem of assigning relevance to a signal. Here we showed that if we observe one signal but are interested in another, then the statistical associations between these signals define what is relevant and one can (selectively) compress the observed signal to “squeeze out” the relevant bits by formulating the efficient representation of relevant information as an optimization principle. Crucially, this formulation does not require any assumptions about what it means for signals to be similar; indeed the various signals need not even live in a metric space. There are deep connections to clustering—especially to the statistical mechanics formulation in which separate clusters emerge through a series of phase transitions—and many different problems from signal processing to learning can be cast into this unified information theoretic framework. We believe that this is a fundamentally new and principled approach to a wide variety of problems of interest both as models of the problems solved by the brain and as practical problems in their own right.

[72.] Adaptive rescaling optimizes information transmission. N Brenner, W Bialek & R de Ruyter van Steveninck, Neuron 26, 695-702 (2000).

[78.] Efficiency and ambiguity in an adaptive neural code. AL Fairhall, GD Lewen, W Bialek & RR de Ruyter van Steveninck, Nature 412, 787-792 (2001).

The direct demonstration of high coding efficiency in neural spike trains [45, 66, 77] supports strongly the old idea that the construction of an efficient representation could be the goal of neural computation. Efficient representations must be matched to the statistical structure of the input signals, and it therefore is encouraging that we observe higher coding efficiency for more naturalistic signal ensembles [58, 63, 74], but it usually was assumed that such matching could occur only over the long time scales of development or evolution. In [53, 55] we proposed that adaptation to statistics would occur in real time, to exploit the intermittent structure of natural signals (eg [52]), and in [54, 61, 62] we presented evidence that this occurs both in the fly and in the vertebrate retina. In [72] we analyzed an example in detail, and found that adaptation to changes in the variance of the input has a striking form, rescaling the input/output relation of the neuron so that signals are coded in relative units. Further, the precise choice of the scaling factor serves to optimize information transmission; this is the first direct demonstration that an optimization principle is at work in the brain. In [78] we showed that the dynamics of this adaptation itself is optimal, so that the speed with which the system adjusts to a change in input distribution is close to the limit set by the need to gather statistics.

[84.] Thinking about the brain. W Bialek, in Physics of Biomolecules and Cells: Les Houches Session LXXV, H Flyvbjerg, F Jülicher, P Ormos, & F David, eds, pp 485-577 (EDP Sciences, Les Ulis; Springer-Verlag, Berlin, 2002).

A second major effort to review and explain the ideas of optimization in neural coding and computation; in a sense this is an update of [19], with fifteen more years of experience. Although nominally a review, this is the only place (thus far) where I have tried to articulate how the ideas of optimization, especially in an information theoretic framework, could be used to think about problems of learning and cognition.

[99.] Physical limits to biochemical signaling. W Bialek & S Setayeshgar, physics/0301001.

Another paper that was inspired (in part) by Berg and Purcell. Many of their intuitive arguments can be made rigorous by noting that biochemical signaling depends on binding of signaling molecules to specific binding sites, and this interaction leads to equilibrium. This means that fluctuations in the occupancy of the binding sites are a form of thermal noise and can be treated using the standard methods of statistical mechanics. These tools allow us to compute not only the occupancy noise for a single site, but to understand how these fluctuations are changed by the diffusion of the signaling molecules and the resulting interactions with other sites. We find a minimum effective noise level that depends on physical parameters such as the diffusion constant and the size and geometrical arrangement of the relevant binding sites; this “noise floor” is independent of all the (often unknown) details of the chemical kinetics. Comparison with recent experiments on noise in the control of gene expression and the switching of the bacterial flagellar motor suggest that these intracellular signaling processes operate with a precision close to the fundamental physical limits.

[102.] Features and dimensions: Motion estimation in fly vision. W Bialek & RR de Ruyter van Steveninck, submitted.

Here we build on the ideas of [21] and [72] to characterize the computation of motion in the fly visual system as a mapping from the high dimensional space of signals in the retinal photodetector array to the probability of generating an action potential in a motion sensitive neuron. We identify a low dimensional subspace of signals within which the neuron is most sensitive, and then sample this subspace to visualize the nonlinear structure of the mapping. The results illustrate the computational strategies predicted for a system that makes optimal motion estimates given the physical noise sources in the detector array. More generally, the hypothesis that neurons are sensitive to low dimensional subspaces of their inputs formalizes the intuitive notion of feature selectivity and suggests a strategy for characterizing the neural processing of complex, naturalistic sensory inputs. The same methods of analysis have been used to take a new look at the computations done in simple, biologically plausible model neurons [88], as well as other experimental systems from vertebrate retina to visual cortex. New, purely information theoretic methods should allow us to search for low dimensional relevant subspaces even when stimuli have all the complex correlation structure of fully natural signals [93].