**Optimization
principles and the physical limits to biological function**

** **

William Bialek

Last
updated 29 November 2004

Much of my work is motivated by the (perhaps dangerously mystical) idea that biological systems may be optimized for particular functions, operating at or near some fundamental physical limits to their performance. Often this idea stays comfortably in the background, but sometimes itÕs been possible to confront the issue head on. This set of papers explores the physical limits to various biological processes, develops optimization principles that predict how biological systems must work if they are going to reach these limits, and assesses the experimental evidence for these predictions.

Papers are in chronological order, most recent papers at the
bottom. Numbers refer to a full list of publications.

[19.] Physical limits to sensation and perception.
W Bialek, *Ann. Rev Biophys Biophys Chem* **16,** 455-478 (1987).

From a physicist's point of view, some of the most remarkable facts about biological systems concern the absolute sensitivity of the sense organs: eyes that count single photons, ears that detect vibrations comparable in amplitude to Brownian motion, chemical sensors that count single molecules, thermal sensors with milliKelvin sensitivity at room temperature, ... . In this work I tried to bring these very different observations together, combining classic data with some of my own arguments about more recent experiments. The goal was to see performance near the physical limits as a general principle. I also wanted to address the issue of what happens after the detectors: while one might concede that physics is relevant for the design of receptor cells, surely once the resulting signals enter the brain one is doing biology (or even psychology) and not physics, or so go the prevailing prejudices. Against these prejudices I offered several examples where the nervous system does nontrivial computations and still reaches answers whose precision is set by physical principles. This broader notion of optimal performance, pushing the physical limits wherever possible, remains a radical view of biology in general and of neural computation in particular.

[27.] Temporal filtering in retinal bipolar
cells: Elements of an optimal computation? W Bialek & WG Owen, *Biophys
J* **58,** 1227-1233 (1990).

[30.] Optimal filtering in the salamander retina.**
**F Rieke, WG
Owen, & W Bialek, in *Advances in Neural Information Processing 3*, R Lippman, J Moody & D
Touretzky, eds, pp 377-383 (Morgan Kaufmann, San Mateo CA, 1991).

When looking for a weak signal in a background of noise,
one of the standard strategies is Òmatched filtering,Ó in which the observable
(signal plus noise) is passed through a filter with a frequency response that
is designed (roughly) to maximize
the amount of signal that we let pass while at the same time minimizing the
noise. In this work we showed that
when itÕs very dark outside, and hence the signal-to-noise ratio in the
photodetector cells of the retina is very low, **all** visual computations have the
same first step, and this universal first step is a form of matched
filtering. We know enough about
the signal and noise properties of biological photodetectors to construct this filter
explicitly, first by analytic approximations and then by direct numerical
analysis of the data; there are no free parameters. The result is a filter which is in very good agreement with
the filtering observed to occur in the first stage of visual processing, the
transfer of signals from the rods (photodetector) to bipolar cells. This is a simple but nonetheless
compelling success for the ideas of optimal processing. Note that we are not proposing a model
of retinal signal processing, but rather making predictions from first
principles about what the form of this processing should be in order to make
maximal use of the available photons.

[34.] Reading a neural code. W Bialek, F Rieke, RR de Ruyter van
Steveninck & D Warland, *Science* **252, **1854-1857 (1991).

[57.] Reliability and statistical efficiency
of a blowfly movement-sensitive neuron. R de Ruyter van Steveninck & W Bialek, *Phil Trans R
Soc Lond Ser B* **348,** 321-340 (1995).

In these two papers we used the motion sensitive neurons in the fly visual system as a test case to probe the reliability of neural computation: Although it had been known for years that the visual system can count single photons when given the simple task of detecting a dim flash of light, is it possible that similarly fundamental physical considerations set the limits to reliability for more complex tasks under more natural conditions? In [34] we decode the spike trains (as suggested in [25]) of the flyÕs H1 neuron and thereby reconstruct the time dependent angular velocity; this allows us to assess the reliability or precision of computation by measuring the accuracy of the reconstructions relative to the true motion trajectory. In [57] we did an experiment much more like a human psychophysical experiment, asking whether the spike train output of H1 is sufficient to allow reliable discrimination between very similar amplitudes of stepwise motion. In both experiments we found that the effective noise level corresponds to ~ 0.06 deg of motion over 30 msec, which is ~ 20 times smaller than the lattice spacing of detectors in the compound eye or ~ 10 times smaller than the nominal Òdiffraction limitÓ due to blur by the eye's lenses. Resolution beyond the sampling and diffraction scales is also known in human vision, and the collection of perceptual phenomena in which this occurs is called hyperacuity. This led us to wonder about the physical limits to motion estimationÑblur due to diffraction the lenses of the compound eye and noise due to the random arrival of photons at the receptors. In fact the observed performance is very close to this limit, so that even four layers of neurons away from the receptors it is still the physics of the inputs that sets the precision of computation.

For a review of hyperacuity see Section 4.2 of
[65]. A preliminary account of the
experiments in [57]
(published in 1984) may have been the first report of hyperacuity in the
responses of a single neuron. For details of the limits to motion estimation,
see [29, 42]. The fly visual system proved to be a
very fertile testing ground for the idea of optimization in neural coding and
computation; look here for details.

[38.] A new look at the primary
charge separation in bacterial photosynthesis. SS Skourtis, AJR DaSilva, W Bialek & JN Onuchic, *J Phys Chem* **96,** 8034-8041 (1992).

Photosynthesis begins with a photon induced transfer of
an electron from one large molecule to another, both held in a protein
framework. This step is complete
in ~ 3 psec, one of the fastest reactions of its type ever observed. Chemistry as we usually think about it
operates in the limit where reactions are slow compared with internal molecular
relaxation rates, and on these grounds alone it seemed unlikely that the
initial event of photosynthesis could be thought of as a conventional chemical
kinetic process. In more detail,
if one tries to estimate the matrix elements among the relevant electronic
states and use the golden rule to calculate the transfer rate, there are
considerable uncertainties but it seemed hard to get the right answer. In this
work we showed that there is a surprisingly broad regime in which electronic
matrix elements, vibrational level spacings and relaxation rates are all
comparable, so that one can be poised in between the golden rule regime and
coherent oscillation. Since irreversibility is possible only after the
destruction of quantum coherence, this regime is one in which the reactions are
as fast as possible, and we argued that predictions of the theory in this
regime are consistent with various aspects of the phenomenology in
photosynthesis. As we were completing the paper, a new set of ultrafast
spectroscopy experiments at lower temperatures revealed the coherent
oscillations that would occur in our scenario if relaxation rates were
reduced. In a similar spirit we
studied a simple model for the initial events in the visual pigments [16],
combining intuition from condensed matter physics work on conjugated polymers
with a (then) novel simulation technique that combined molecular dynamics with
diagonalization of a model Hamiltonian for the electrons; again subsequent
experiments detected the coherence effects expected from the theory, although
it would be an overstatement to say that the theory was confirmed. These are some of the only cases I know
where quantum coherence really is important in a biological process.

In retrospect the simulation method in [16] is a sort of poor man's CarÐParinello method (and done at the same time), using tight binding rather than density functionals. I think it is quite a powerful technique, and we should have made more of it at the time.

[53.] Statistical mechanics and visual signal
processing. M Potters & W
Bialek, *J Phys I France* **4,** 1755-1775 (1994).

Inspired by the observation of near optimal performance in
the fly's motion estimation system, we set out to understand the algorithmic
requirements for optimal estimation. Conventional approaches involve searching
a set of possible strategies for the best within the set, but we showed how one
could map the problem of estimation in the presence of noise onto a statistical
mechanics problem in which the data act as external fields and the estimator is
the expectation value of some order parameter. Estimation theory then is reduced to the computation of
(perhaps strongly nonlinear) response functions, and standard approximations in
statistical mechanics map to different regimes of the signal processing
problem. Applying the general
framework to the problem of motion estimation in the fly, we showed that the
optimal estimation strategy has very different behaviors in different sensory
environments. In particular, the
optimal estimator interpolates between popular models for motion estimation,
which arise as limiting cases of the full theory. An inevitable prediction of the theory is that the optimal
processor must change its strategy, or *adapt *to changes in the statistics of
the input signals. Preliminary
experiments gave clear evidence of this Òstatistical adaptationÓ [54, 59, 61], and recent experiments
provide a direct confirmation of the combination of nonlinear operations
predicted by the theory [102].

[55.] Information flow in sensory neurons. M DeWeese & W Bialek, *Il Nuovo Cimento* **17D,** 733-741 (1995).

This was a first step in the still incomplete project of
constructing a theory for optimal coding by spiking neurons. Along the way we introduced some
interesting technical tools, such as a perturbative expansion of the
information transmission rate. In
addition we took the opportunity to debunk some misconceptions that surrounded
the idea of Òstochastic resonance.Ó
This might also be the first place here we stated explicitly that the
prediction of an optimal coding theory will necessarily be a code that adapts
to the statistics of the sensory inputs.

[56.] Random switching and optimal processing
in the perception of ambiguous signals. W Bialek & M DeWeese, *Phys Rev Lett* **74,** 3077-3080 (1995).

In the case of motion estimation [53] there is nothing
deep about the statistical mechanics problems that we have to solve, but here
we found that in cases where stimuli have ambiguous interpretations (as in the
Necker cube) the estimation problem maps to a random field model. The
nontrivial statistical mechanics of the random field problem really does seem
to correlate with the phenomenology of multistable percepts. This is
interesting as a very clear example of how the idea of optimal performance can
generate striking and even counterintuitive predictions, in this case
predicting fluctuations in perception even when inputs are constant and the
signal to noise ratios are high.

[68.] Adaptation and optimal chemotactic strategy
for *E. Coli**.* SP Strong, B Freedman,
W Bialek & R Koberle, *Phys Rev E* **57,** 4604-4617 (1998).

In 1977 Berg and Purcell wrote a classic paper analyzing
the physical constraints on bacterial chemotaxis. Their discussion inspired many people, including me, but
they also had a decidedly idiosyncratic style of argument that left many loose
ends. This paper is one effort to
make their arguments more systematic.
Given that bacteria can measure the time derivative of concentration
along their trajectories, that this measurement inevitably has a substantial
noise level, and that the bacteria cannot steer but only go straight (up to the
limits set by their own rotational diffusion) or tumble to choose a new
direction at random, how can they use the results of their measurements to
guide their behavior? In
particular, is there a strategy that would maximize their progress along
favorable concentration gradients? This problem of optimal behavioral strategy turns out to be
quite subtle. I donÕt think we
have a complete solution, but we said some interesting things, in particular
about the extent to which the decision to tumble can be nearly deterministic
and yet still consistent with the apparently stochastic behavior observed for *E
Coli. * I think that subsequent measurements on
the nearly switch-like input/output relation of the bacterial motor are
consistent with these results, but many questions remain open.

[69.] The
information bottleneck method.
N Tishby, FC Pereira, & W Bialek, in *Proceedings of the 37th
Annual Allerton Conference on Communication, Control and Computing*, B Hajek & RS Sreenivas,
eds, pp 368-377 (University of Illinois, 1999).

When Shannon developed information theory he left open
the problem of assigning relevance to a signal. Here we showed that if we observe one signal but are interested
in another, then the statistical associations between these signals define what
is relevant and one can
(selectively) compress the observed signal to Òsqueeze outÓ the relevant
bits by formulating the efficient representation of relevant information as an
optimization principle. Crucially, this formulation does not require any
assumptions about what it means for signals to be similar; indeed the various
signals need not even live in a metric space. There are deep connections to
clusteringÑespecially to the statistical mechanics formulation in which
separate clusters emerge through a series of phase transitionsÑand many
different problems from signal processing to learning can be cast into this
unified information theoretic framework. We believe that this is a
fundamentally new and principled approach to a wide variety of problems of
interest both as models of the problems solved by the brain and as
practical problems in their own
right.

[72.] Adaptive rescaling optimizes information
transmission.** **N Brenner, W Bialek & R de
Ruyter van Steveninck, *Neuron * **26,** 695-702 (2000).

[78.] Efficiency and ambiguity in an adaptive
neural code. AL Fairhall, GD Lewen, W Bialek & RR de Ruyter van
Steveninck, *Nature* **412,** 787-792 (2001).

The direct demonstration of high coding efficiency in neural spike trains [45, 66, 77] supports strongly the old idea that the construction of an efficient representation could be the goal of neural computation. Efficient representations must be matched to the statistical structure of the input signals, and it therefore is encouraging that we observe higher coding efficiency for more naturalistic signal ensembles [58, 63, 74], but it usually was assumed that such matching could occur only over the long time scales of development or evolution. In [53, 55] we proposed that adaptation to statistics would occur in real time, to exploit the intermittent structure of natural signals (eg [52]), and in [54, 61, 62] we presented evidence that this occurs both in the fly and in the vertebrate retina. In [72] we analyzed an example in detail, and found that adaptation to changes in the variance of the input has a striking form, rescaling the input/output relation of the neuron so that signals are coded in relative units. Further, the precise choice of the scaling factor serves to optimize information transmission; this is the first direct demonstration that an optimization principle is at work in the brain. In [78] we showed that the dynamics of this adaptation itself is optimal, so that the speed with which the system adjusts to a change in input distribution is close to the limit set by the need to gather statistics.

[84.] Thinking about the brain. W Bialek, in *Physics of Biomolecules and Cells:
Les Houches Session LXXV,* H Flyvbjerg, F JŸlicher, P Ormos, & F David, eds, pp 485-577
(EDP Sciences, Les Ulis; Springer-Verlag, Berlin, 2002).

A second major effort to review and explain the ideas of optimization in neural coding and computation; in a sense this is an update of [19], with fifteen more years of experience. Although nominally a review, this is the only place (thus far) where I have tried to articulate how the ideas of optimization, especially in an information theoretic framework, could be used to think about problems of learning and cognition.

[99.] Physical limits to biochemical
signaling. W Bialek & S Setayeshgar, physics/0301001.

Another paper that was inspired (in part) by Berg and
Purcell. Many of their
intuitive arguments can be made rigorous by noting that biochemical signaling depends
on binding of signaling molecules to specific binding sites, and this
interaction leads to equilibrium.
This means that fluctuations in the occupancy of the binding sites are a
form of thermal noise and can be treated using the standard methods of
statistical mechanics. These tools
allow us to compute not only the occupancy noise for a single site, but to
understand how these fluctuations are changed by the diffusion of the signaling
molecules and the resulting interactions with other sites. We find a minimum effective noise level
that depends on physical parameters such as the diffusion constant and the size
and geometrical arrangement of the relevant binding sites; this Ònoise floorÓ
is independent of all the (often unknown) details of the chemical
kinetics. Comparison with recent
experiments on noise in the control of gene expression and the switching of the bacterial
flagellar motor suggest that these intracellular signaling processes operate
with a precision close to the fundamental physical limits.

[102.] Features and dimensions: Motion estimation in fly vision. W Bialek & RR de Ruyter van Steveninck, submitted.

Here we build on the ideas of [21] and [72] to characterize the computation of motion in the fly visual system as a mapping from the high dimensional space of signals in the retinal photodetector array to the probability of generating an action potential in a motion sensitive neuron. We identify a low dimensional subspace of signals within which the neuron is most sensitive, and then sample this subspace to visualize the nonlinear structure of the mapping. The results illustrate the computational strategies predicted for a system that makes optimal motion estimates given the physical noise sources in the detector array. More generally, the hypothesis that neurons are sensitive to low dimensional subspaces of their inputs formalizes the intuitive notion of feature selectivity and suggests a strategy for characterizing the neural processing of complex, naturalistic sensory inputs. The same methods of analysis have been used to take a new look at the computations done in simple, biologically plausible model neurons [88], as well as other experimental systems from vertebrate retina to visual cortex. New, purely information theoretic methods should allow us to search for low dimensional relevant subspaces even when stimuli have all the complex correlation structure of fully natural signals [93].