**Some current theoretical issues in biophysics: A short
course**

William Bialek

Lectures in Aula Rasetti

M 10 May (16-18:00h), W 12 & F 14 May
(15-17:00h)

M 17 May (16-18:00h), W 19 & F 21 May
(15-17:00h)

In this short course I try to give some
perspective on two very different approaches to the physics of biological
systems. One approach is
phenomenological, allowing ourselves to be driven by newly emerging data. The
complementary approach is more abstract, in which we explore candidate
principles from which the properties of these complex systems should be
derivable.

My plan is to write up my lectures in something
one could (hopefully) read even if you didnÕt attend. For now, IÕll keep just an outline, which IÕll update as we
go along with pointers to references and exercises. *Last updated 15 May
2010.*

**Phenomenology**

Once upon a time, there were only a handful of
biological systems that could be explored through the kinds of quantitative
experiments that we are used to as physicists. Although one must be careful not to exaggerate, this
situation is changing dramatically, and we are seeing an explosion of
quantitative data emerging from experiments on many different systems, at many
different levels of organization, from single molecules to whole populations of
organisms. There is an
urgent practical problem of analyzing these data, and extracting something
meaningful. One might argue that
this is also a problem of principle---to claim that we ÒunderstandÓ what has
been learned from a particular set of experiments we must have some compact way
of summarizing the data. This
search for compact summaries is the essence of phenomenological analyses, and
it always involves the claim that the system is simpler than it could have
been. In this part of the course,
I'll discuss two ideas for how the seemingly complex behavior of biological
systems can be simplified, explicitly.

First is the idea of dimensionality
reduction. For example, when a
protein binds to DNA, it interacts with a small piece of the DNA sequence; the
space of sequences is large, but presumably only some limited features of the
sequence are relevant. Similarly, neurons in the visual system respond to
complex images and movies, but presumably each neuron is driven primarily by
some limited set of features such as edges of a particular orientation. In the classical experimental approach,
we impose this sort of simplification on the system from the outside, by
exploring only a limited set of dimensions—with a small set of targeted
mutations or simple patterns of light and dark across the visual field. The possibility of doing much bigger
experiments means that we can explore more fully, and then try to let the
resulting richer data tell us which are the relevant dimensions, if we have the
appropriate mathematical tools.
I'll discuss methods for discovering this low dimensional structure in
real data, using both protein/DNA interactions and neural coding as examples,
emphasizing the similarity of the underlying mathematics (and perhaps the
underlying physical intuition).

The idea
of dimensionality reduction has a long history, especially if you go back to
qualitative versions of the idea, prior to its formalization. Thus, the idea of receptive fields for
visual neurons goes back to work in the 1940s (on the retina of the horseshoe
crab) and 1950s (on the retina of frogs and cats), with a big stimulus coming
from the work of Hubel and Wiesel on visual cortex in the late 50s and early
60s. I am not sure at what point
people took the qualitative idea of receptive fields and wrote down an explicit
model which says that, in the space of images, all that matters for driving a
neuron is the projection of that image onto a template formed by the receptive
field. Certainly in studies of the
auditory system, by the mid 1960s there was a fairly explicit picture that
neural spiking was driven by linear filtering (or Euclidean projection in the
space of waveforms) followed by an instantaneous nonlinearity, and de Boer and
others introduced correlation methods (following Wiener) to separate these
components. For review of
correlation methods in characterizing the responses of neurons see Rieke et al
(1997) and Schwartz et al (2006).
Correlation methods work only if one can control the distribution of
inputs to the system, but if you can do this then it is even possible to count
the number of relevant dimensions (Bialek & de Ruyter van Steveninck 2005).

[Bialek
& de Ruyter van Steveninck] Features
and dimensions. W Bialek &
RR de Ruyter van Steveninck, arXiv.org: q-bio/0505003 (2005).

[Rieke et
al 1997] *Spikes: Exploring the Neural
Code. * F Rieke, D Warland, RR
de Ruyter van Steveninck & W Bialek (MIT Press, Cambridge, 1997).

[Schwartz
et al 2006] Spike triggered neural characterization. O Schwartz, JW Pillow, NC Rust & EP
Simoncelli, *J Vision ***6, **484-507 (2006).

If you
canÕt control the distribution of inputs—for example, if you want to
analyze the responses of visual neurons to fully natural images, which come
from a poorly characterized, but strongly non-Gaussian, distribution—then
one cannot rely on correlation approaches. Instead, we can ask for projections of the input which
capture the maximum amount of information about the output, and it is a theorem
that if the system really is described by a small number of projections then
this information theoretic optimization approach will find those dimensions
(Sharpee et al 2004). There is
often the suspicion that computing entropies or information requires enormous
amounts of data, but somewhat remarkably the search for Òmaximally informative
dimensionsÓ does not seem to be more data hungry than methods based on correlations
(Sharpee 2008).

[Sharpee
et al 2004] Analyzing neural responses to natural signals:
Maximally informative dimensions.
TO Sharpee, NC Rust & W Bialek, *Neural
Comp *** 16, **223-250 (2004).

[Sharpee
2008] Comparison of objective function for
estimating linear-nonlinear models.
TO Sharpee, in *Advances in Neural
Information Processing Systems 20, *JC Platt, D Koller, Y Singer & S Roweis,
eds, pp 1305-1312 (2008).

The idea
of dimensionality reduction—especially in its simplest form, a linear
projection—also has a long history in the description of protein-DNA
interactions. Here the idea is the
binding energy of the protein to DNA can be described as a weighted sum of
terms, one from each base pair that the protein contacts. Thus, the energy of binding to 4^K
different sequences is determined by 4xK parameters, which is a huge
simplification (Berg & von Hippel 1988). In the same way that one can collect thousands of action
potentials in response to complex sensory inputs, one can now survey the
relations between thousands of different sequences and the corresponding levels
of protein binding or gene expression.
In contrast to the neural case, however, the relationships that one
measures in these Òhigh throughputÓ experiments are a bit messy and often
uncalibrated. Kinney et al showed
that if you try to do the usual statistical inference, maximizing the
likelihood of your data given the underlying model, but average over all the
uncalibrated parts of the model, then one can find the underlying linear
projection precisely by an appropriate adaptation of the maximally informative
dimension method. Remarkably, this
works on real data, and even allows independent experiments to be analyzed and
yield the same model for the sequence-specificity of protein binding, although
previous work had suggested that the experiments were in conflict. I thought
this was a great triumph of principled reasoning in a field dominated by highly
tuned and often arbitrary algorithms.
Kinney et al (2010) went on to design new experiments that exploited
this approach, directly sampling sequence space by genetic engineering. This work showed how to
measure an Òinformational footprintÓ of the proteins along the sequence, and
provided evidence that the linear projections in sequence space really are
binding energies, since the binding energies for two proteins along the same
sequence (the transcription factor and the RNA polymerase) combine according to
the appropriate statistical mechanical weights in determining the probability
that transcription can be initiated.
It is striking that one can make such indirect experiments and yet draw
such sharp inferences about the underlying physics.

[Kinney et
al 2007] Precise physical models of protein-DNA
interaction from high-throughput data. JB Kinney, G Tkacik & CG Callan
Jr, *Proc NatÕl Acad Sci (USA) ***104, **501-506 (2007).

[Kinney et
al 2010] Using deep sequencing to characterize
the biophysical mechanism of a transcriptional regulatory sequence. JB Kinney, A Murugan, CG Callan Jr
& EC Cox, *Proc NatÕl Acad Sci (USA) *in
press (2010). See also the supplementary
material.

A second idea concerns the analysis of large
networks, again ranging from proteins to brains (and beyond). Here there is a long tradition of using
ideas from statistical physics to describe collective network behavior, but
these ideas often have been somewhat removed from experiment. Recently it has been emphasized that we
can construct statistical mechanics models directly from data using maximum
entropy methods, where we make the explicit simplification of constructing
models that match only low-order correlation functions. These simplified, data
driven models have proven to have quite interesting properties. I'll describe recent progress in this
effort, using examples drawn from both the molecular level (the diversity of
antibodies) and the brain (networks of real neurons). In both examples,
familiar physics ideas such as criticality and the proliferation of metastable
states seem to be crucial to biological function; we'll see how this emerges
from the data, and identify some theoretical questions raised by these results.

In the
lectures I tried to introduce the idea of maximum entropy models through a
discussion of protein sequences.
The initial work here is from Ranganthan and collaborators, who
developed methods to generate new protein sequences that conform to the
one-body and two-body statistics of amino acid substitutions in a large family
(Russ et al 2005, Sokolich et al 2005).
The important result is that if you keep track of only one-body
statistics, most of the resulting proteins are junk. On the other hand, if you respect the two-point
correlations, then a macroscopic fraction of the new proteins seem to fold and
function. This is quite
surprising. This example of the ÒnetworkÓ of amino acids is attractive for many
reasons, not least our physical intuition about the forces holding proteins
together. But it has other
problems, such as the fact that, until recently, the sample sizes (number of
sequences in a family) were quite small, and surely when one looks at sequences
from closely related organisms these are not independent samples. Thus, much of the early work is focused
on these issues of statistical significance and evolutionary history, rather
than the more abstract problem of writing down the distribution out of which
functional sequences are drawn.

[Russ et al 2005] Natural–like
function in artificial WW domains. WP Russ, DM Lowery, P Mishra, MB Yaffe
& R Ranganathan, *Nature ***437,** 579–583 (2005).

[Socolich et al 2005] Evolutionary
information for specifying a protein fold. M Socolich, SW Lockless, WP
Russ, H Lee, KH Gardner & R Ranganathan, *Nature ***437,** 512-518
(2005).

It turns out that the methods of Socolich et al actually generate
samples out of the maximum entropy distribution, although they donÕt provide an
explicit construction of this distribution (Bialek & Ranganathan
2007). More explicit uses of the
maximum entropy idea to think about sequence ensembles are given by Weigt et al
(2009), Halabi et al (2009) and Mora et al (2009). In the last work, we took advantage of recent experiments
that characterize the ensemble of antibodies made by a single organism (in this
case, a zebrafish – see Weinstein et al 2009). In this system, an enormous amount of sequence variability
is concentrated in a small region, and we have a huge number of samples
(approaching 100,000), both of which make the problem of constructing a maximum
entropy model much more accessible.
We recall that maximum entropy models match exactly some set of
correlations, in this case up to pairs of amino acids. To tets the model we can look at higher
order correlations (e.g., triplets), and this seems to come out right. The antibody example is in just the
right range where it is possible to estimate the probability distribution over
sequence space directly, however, so we can compare more theory and experiment
more directly. If we make a
plot of the probability vs rank (the ÒZipf plotÓ) we see that probability is
inversely proportional to rank, which is called ZipfÕs law, first noticed for
the distribution of words in English.
From a statistical mechanics point of view, this is the same as having
the system be poised at a critical point, in fact a rather odd kind of critical
point.

[Bialek & Ranganathan 2007] Rediscovering
the power of pairwise interactions. W Bialek & R Ranganathan,
arXiv.org:0712.4397 (2007).

[Halabi et al 2009] Protein
sectors: Evolutionary units of
three--dimensional structure. N Halabi, O Rivoire, S Leibler & R
Ranganathan, *Cell ***138,** 774-786 (2009). See also the supplementary material.

[Weigt et al 2009] Identification of
direct residue contacts in protein-protein interaction by message passing.
M Weigt, RA White, H Szurmant, JA Hoch & T Hwa, *Proc Nat'l Acad Sci (USA) ***106,**
67-72 (2009).

[Weisntein et al 2009] High-throughput
sequencing of the zebrafish antibody repertoire. JA Weinstein, N Jiang, RA
White, DS Fisher & SR Quake, *Science ***324,** 807-810 (2009).

Independent of the work on protein sequences is an effort to construct statistical
mechanics models for networks of real neurons via the maximum entropy
principle. If we slice time into
brief bins, then each neuron either generates an action potential or remain silent,
so the state of one neuron is naturally binary—an Ising spin. If we build a maximum entropy model
consistent with pairwise correlations among neurons (that is, the probabilities
of pairs of cells generating spikes in the same brief time bin), then this
model is __exactly __an Ising model with pairwise interactions. Because the correlations can be both
positive and negative, the spin-spin interactions also have both signs, and frustration
is almost unavoidable. Thus, we
end up deriving a spin-glass model of neural networks directly from data. This was first done for the population
of neurons in the retina that convey information about a small patch of the
visual world (Schneidman et al 2006), and it was found that the model ÒworkedÓ
in that it could reproduce the higher order statistical structure of the
network states, and that (as a result) it captured ~90% of the Òmulti-informationÓ
that measures the difference in entropy between the real system and a model of
independent neurons; these early, and very explicit tests were done on
relatively small systems, N=10-20 neurons. Since the initial work, the same general strategy has been
used on a wide variety of neural systems (Shlens et al 2006, Tang et al 2008, Yu
et al 2008, Shlens et al 2009, Marre et al 2009)

[Marre et al 2009] Prediction of spatio-temporal
patterns of neural activity from pairwise correlations. O Marre, SE
Boustani, Y Fregnac & A Destexhe, *Phys
Rev Lett* **102,** 138101 (2009).

[Ohiohenuan & Victor 2007] Maximum
entropy modeling of multi-neuron firing patterns in V1. IE Ohiorhenuan &
JD Victor, in *Proceedings of 2007 Cosyne
Conference* http://cosyne.org (2007).

[Schneidman et al 2006] Weak
pairwise correlations imply strongly correlated network states in a neural
population. E Schneidman, MJ Berry II, R Segev & W Bialek, *Nature* **440,** 1007-1012 (2006), arXiv:q-bio/0512013v1.

[Shlens et al 2006] The structure of
multi-neuron firing patterns in primate retina. J Shlens, GD Field, JL Gaulthier, MI Grivich, D Petrusca, A
Sher, AM Litke & EJ Chichilnisky, *J
Neurosci* **26,** 8254-8266 (2006).

[Shlens et al 2009] The structure of
large-scale synchronized firing in primate retina. J Shlens, GD Field, JL
Gaulthier, M Greschner, A Sher, AM Litke & EJ Chichilnisky, *J Neurosci* **29,** 5022-5031 (2009).

[Tang et al 2008] A maximum entropy model
applied to spatial and temporal correlations from cortical networks *in vitro**.* A Tang, D Jackson, J
Hobbs, W Chen, JL Smith, H PAtel, A Prieto, D Petruscam MI Grivich, A Sher, P Hottowy,
W Dabrowski, AM Litke & JM Beggs, *J
Neurosci* **28,** 505-518 (2008).

[Yu et al 2008] A small world of neuronal synchrony. S Yu, D Huang, W Singer & D Nikolic,
*Cereb Cortex* **18,** 2891-2901 (2008).

Of course, if we expect interesting collective effects in these
networks, one must look at more than N=10 cells.

**Optimization
principles**

Many biological systems operate very close to the
borders of what the laws of physics allow, so that they achieve the best
possible performance given the physical constraints; this idea has been
explored in examples ranging from photon counting in vision to bacterial
metabolism to human movement control.
The most exciting possibility is that we can use the idea of optimality
as a variational principle from which we could derive the properties of real
biological systems, in detail.
I'll review (quickly) some successes of this approach, and then dig into
two examples where we will soon have very quantitative confrontations between
theory and experiment.

In the first case, we consider the estimation of
motion in the visual system. The
general theory of optimal estimation shows that the best estimates can be
constructed as functional integrals over the joint distribution of input
(movies falling on the retina) and desired output (movement velocity); where
previous work has made models of these distributions, recent experiments make
it possible to evaluate the relevant integrals by Monte Carlo sampling from the
real distributions.

In the second case, we will look at the
transmission of information from the concentration of transcription factors to
the expression levels of the genes that they regulate, and ask how the
qualitative architecture and quantitative parameters of these networks can be
chosen to optimize information flow; here the connections are to emerging
experiments on the initial events in fruit fly development. In both cases, we have direct evidence
for performance near the physical limits, and the opportunity to test the
predictions of a theory which promotes this optimization to a theoretical
principle. We'll also try to see
to what extent such optimization is likely to be a general principle, as
opposed to a series of anecdotes about particular systems.