Some current theoretical issues in biophysics: A short course


William Bialek


Lectures in Aula Rasetti

M 10 May (16-18:00h), W 12 & F 14 May (15-17:00h)

M 17 May (16-18:00h), W 19 & F 21 May (15-17:00h)


In this short course I try to give some perspective on two very different approaches to the physics of biological systems.  One approach is phenomenological, allowing ourselves to be driven by newly emerging data. The complementary approach is more abstract, in which we explore candidate principles from which the properties of these complex systems should be derivable.


My plan is to write up my lectures in something one could (hopefully) read even if you didn’t attend.  For now, I’ll keep just an outline, which I’ll update as we go along with pointers to references and exercises.  Last updated 15 May 2010. 




Once upon a time, there were only a handful of biological systems that could be explored through the kinds of quantitative experiments that we are used to as physicists.  Although one must be careful not to exaggerate, this situation is changing dramatically, and we are seeing an explosion of quantitative data emerging from experiments on many different systems, at many different levels of organization, from single molecules to whole populations of organisms.    There is an urgent practical problem of analyzing these data, and extracting something meaningful.  One might argue that this is also a problem of principle---to claim that we “understand” what has been learned from a particular set of experiments we must have some compact way of summarizing the data.  This search for compact summaries is the essence of phenomenological analyses, and it always involves the claim that the system is simpler than it could have been.  In this part of the course, I'll discuss two ideas for how the seemingly complex behavior of biological systems can be simplified, explicitly.


First is the idea of dimensionality reduction.  For example, when a protein binds to DNA, it interacts with a small piece of the DNA sequence; the space of sequences is large, but presumably only some limited features of the sequence are relevant. Similarly, neurons in the visual system respond to complex images and movies, but presumably each neuron is driven primarily by some limited set of features such as edges of a particular orientation.  In the classical experimental approach, we impose this sort of simplification on the system from the outside, by exploring only a limited set of dimensions—with a small set of targeted mutations or simple patterns of light and dark across the visual field.  The possibility of doing much bigger experiments means that we can explore more fully, and then try to let the resulting richer data tell us which are the relevant dimensions, if we have the appropriate mathematical tools.  I'll discuss methods for discovering this low dimensional structure in real data, using both protein/DNA interactions and neural coding as examples, emphasizing the similarity of the underlying mathematics (and perhaps the underlying physical intuition).


The idea of dimensionality reduction has a long history, especially if you go back to qualitative versions of the idea, prior to its formalization.  Thus, the idea of receptive fields for visual neurons goes back to work in the 1940s (on the retina of the horseshoe crab) and 1950s (on the retina of frogs and cats), with a big stimulus coming from the work of Hubel and Wiesel on visual cortex in the late 50s and early 60s.  I am not sure at what point people took the qualitative idea of receptive fields and wrote down an explicit model which says that, in the space of images, all that matters for driving a neuron is the projection of that image onto a template formed by the receptive field.  Certainly in studies of the auditory system, by the mid 1960s there was a fairly explicit picture that neural spiking was driven by linear filtering (or Euclidean projection in the space of waveforms) followed by an instantaneous nonlinearity, and de Boer and others introduced correlation methods (following Wiener) to separate these components.  For review of correlation methods in characterizing the responses of neurons see Rieke et al (1997) and Schwartz et al (2006).  Correlation methods work only if one can control the distribution of inputs to the system, but if you can do this then it is even possible to count the number of relevant dimensions (Bialek & de Ruyter van Steveninck 2005).


[Bialek & de Ruyter van Steveninck] Features and dimensions.  W Bialek & RR de Ruyter van Steveninck, q-bio/0505003 (2005).


[Rieke et al 1997] Spikes: Exploring the Neural Code.  F Rieke, D Warland, RR de Ruyter van Steveninck & W Bialek (MIT Press, Cambridge, 1997).


[Schwartz et al 2006]  Spike triggered neural characterization.  O Schwartz, JW Pillow, NC Rust & EP Simoncelli, J Vision 6, 484-507 (2006).


If you can’t control the distribution of inputs—for example, if you want to analyze the responses of visual neurons to fully natural images, which come from a poorly characterized, but strongly non-Gaussian, distribution—then one cannot rely on correlation approaches.  Instead, we can ask for projections of the input which capture the maximum amount of information about the output, and it is a theorem that if the system really is described by a small number of projections then this information theoretic optimization approach will find those dimensions (Sharpee et al 2004).  There is often the suspicion that computing entropies or information requires enormous amounts of data, but somewhat remarkably the search for “maximally informative dimensions” does not seem to be more data hungry than methods based on correlations (Sharpee 2008).


[Sharpee et al 2004]  Analyzing neural responses to natural signals: Maximally informative dimensions.  TO Sharpee, NC Rust & W Bialek, Neural Comp  16, 223-250 (2004).


[Sharpee 2008] Comparison of objective function for estimating linear-nonlinear models.  TO Sharpee, in Advances in Neural Information Processing Systems 20, JC Platt, D Koller, Y Singer & S Roweis, eds, pp 1305-1312 (2008).


The idea of dimensionality reduction—especially in its simplest form, a linear projection—also has a long history in the description of protein-DNA interactions.  Here the idea is the binding energy of the protein to DNA can be described as a weighted sum of terms, one from each base pair that the protein contacts.  Thus, the energy of binding to 4^K different sequences is determined by 4xK parameters, which is a huge simplification (Berg & von Hippel 1988).  In the same way that one can collect thousands of action potentials in response to complex sensory inputs, one can now survey the relations between thousands of different sequences and the corresponding levels of protein binding or gene expression.  In contrast to the neural case, however, the relationships that one measures in these “high throughput” experiments are a bit messy and often uncalibrated.  Kinney et al showed that if you try to do the usual statistical inference, maximizing the likelihood of your data given the underlying model, but average over all the uncalibrated parts of the model, then one can find the underlying linear projection precisely by an appropriate adaptation of the maximally informative dimension method.  Remarkably, this works on real data, and even allows independent experiments to be analyzed and yield the same model for the sequence-specificity of protein binding, although previous work had suggested that the experiments were in conflict. I thought this was a great triumph of principled reasoning in a field dominated by highly tuned and often arbitrary algorithms.  Kinney et al (2010) went on to design new experiments that exploited this approach, directly sampling sequence space by genetic engineering.    This work showed how to measure an “informational footprint” of the proteins along the sequence, and provided evidence that the linear projections in sequence space really are binding energies, since the binding energies for two proteins along the same sequence (the transcription factor and the RNA polymerase) combine according to the appropriate statistical mechanical weights in determining the probability that transcription can be initiated.  It is striking that one can make such indirect experiments and yet draw such sharp inferences about the underlying physics. 


[Kinney et al 2007] Precise physical models of protein-DNA interaction from high-throughput data. JB Kinney, G Tkacik & CG Callan Jr, Proc Nat’l Acad Sci (USA) 104, 501-506 (2007).


[Kinney et al 2010] Using deep sequencing to characterize the biophysical mechanism of a transcriptional regulatory sequence.  JB Kinney, A Murugan, CG Callan Jr & EC Cox, Proc Nat’l Acad Sci (USA) in press (2010). See also the supplementary material.


A second idea concerns the analysis of large networks, again ranging from proteins to brains (and beyond).  Here there is a long tradition of using ideas from statistical physics to describe collective network behavior, but these ideas often have been somewhat removed from experiment.  Recently it has been emphasized that we can construct statistical mechanics models directly from data using maximum entropy methods, where we make the explicit simplification of constructing models that match only low-order correlation functions. These simplified, data driven models have proven to have quite interesting properties.  I'll describe recent progress in this effort, using examples drawn from both the molecular level (the diversity of antibodies) and the brain (networks of real neurons). In both examples, familiar physics ideas such as criticality and the proliferation of metastable states seem to be crucial to biological function; we'll see how this emerges from the data, and identify some theoretical questions raised by these results.


In the lectures I tried to introduce the idea of maximum entropy models through a discussion of protein sequences.  The initial work here is from Ranganthan and collaborators, who developed methods to generate new protein sequences that conform to the one-body and two-body statistics of amino acid substitutions in a large family (Russ et al 2005, Sokolich et al 2005).  The important result is that if you keep track of only one-body statistics, most of the resulting proteins are junk.  On the other hand, if you respect the two-point correlations, then a macroscopic fraction of the new proteins seem to fold and function.  This is quite surprising. This example of the “network” of amino acids is attractive for many reasons, not least our physical intuition about the forces holding proteins together.  But it has other problems, such as the fact that, until recently, the sample sizes (number of sequences in a family) were quite small, and surely when one looks at sequences from closely related organisms these are not independent samples.  Thus, much of the early work is focused on these issues of statistical significance and evolutionary history, rather than the more abstract problem of writing down the distribution out of which functional sequences are drawn. 


[Russ et al 2005] Natural–like function in artificial WW domains. WP Russ, DM Lowery, P Mishra, MB Yaffe & R Ranganathan, Nature 437, 579–583 (2005).


[Socolich et al 2005] Evolutionary information for specifying a protein fold. M Socolich, SW Lockless, WP Russ, H Lee, KH Gardner & R Ranganathan, Nature 437, 512-518 (2005).


It turns out that the methods of Socolich et al actually generate samples out of the maximum entropy distribution, although they don’t provide an explicit construction of this distribution (Bialek & Ranganathan 2007).  More explicit uses of the maximum entropy idea to think about sequence ensembles are given by Weigt et al (2009), Halabi et al (2009) and Mora et al (2009).  In the last work, we took advantage of recent experiments that characterize the ensemble of antibodies made by a single organism (in this case, a zebrafish – see Weinstein et al 2009).  In this system, an enormous amount of sequence variability is concentrated in a small region, and we have a huge number of samples (approaching 100,000), both of which make the problem of constructing a maximum entropy model much more accessible.  We recall that maximum entropy models match exactly some set of correlations, in this case up to pairs of amino acids.  To tets the model we can look at higher order correlations (e.g., triplets), and this seems to come out right.  The antibody example is in just the right range where it is possible to estimate the probability distribution over sequence space directly, however, so we can compare more theory and experiment more directly.   If we make a plot of the probability vs rank (the “Zipf plot”) we see that probability is inversely proportional to rank, which is called Zipf’s law, first noticed for the distribution of words in English.  From a statistical mechanics point of view, this is the same as having the system be poised at a critical point, in fact a rather odd kind of critical point. 


[Bialek & Ranganathan 2007] Rediscovering the power of pairwise interactions. W Bialek & R Ranganathan, (2007).


[Halabi et al 2009] Protein sectors:  Evolutionary units of three--dimensional structure. N Halabi, O Rivoire, S Leibler & R Ranganathan, Cell 138, 774-786 (2009).  See also the supplementary material.


[Weigt et al 2009] Identification of direct residue contacts in protein-protein interaction by message passing. M Weigt, RA White, H Szurmant, JA Hoch & T Hwa, Proc Nat'l Acad Sci (USA) 106, 67-72 (2009).


[Weisntein et al 2009] High-throughput sequencing of the zebrafish antibody repertoire. JA Weinstein, N Jiang, RA White, DS Fisher & SR Quake, Science 324, 807-810 (2009).


Independent of the work on protein sequences is an effort to construct statistical mechanics models for networks of real neurons via the maximum entropy principle.  If we slice time into brief bins, then each neuron either generates an action potential or remain silent, so the state of one neuron is naturally binary—an Ising spin.  If we build a maximum entropy model consistent with pairwise correlations among neurons (that is, the probabilities of pairs of cells generating spikes in the same brief time bin), then this model is exactly an Ising model with pairwise interactions.  Because the correlations can be both positive and negative, the spin-spin interactions also have both signs, and frustration is almost unavoidable.  Thus, we end up deriving a spin-glass model of neural networks directly from data.  This was first done for the population of neurons in the retina that convey information about a small patch of the visual world (Schneidman et al 2006), and it was found that the model “worked” in that it could reproduce the higher order statistical structure of the network states, and that (as a result) it captured ~90% of the “multi-information” that measures the difference in entropy between the real system and a model of independent neurons; these early, and very explicit tests were done on relatively small systems, N=10-20 neurons.  Since the initial work, the same general strategy has been used on a wide variety of neural systems (Shlens et al 2006, Tang et al 2008, Yu et al 2008, Shlens et al 2009, Marre et al 2009)


[Marre et al 2009] Prediction of spatio-temporal patterns of neural activity from pairwise correlations. O Marre, SE Boustani, Y Fregnac & A Destexhe, Phys Rev Lett 102, 138101 (2009).


[Ohiohenuan & Victor 2007] Maximum entropy modeling of multi-neuron firing patterns in V1. IE Ohiorhenuan & JD Victor, in Proceedings of 2007 Cosyne Conference  (2007).


[Schneidman et al 2006] Weak pairwise correlations imply strongly correlated network states in a neural population. E Schneidman, MJ Berry II, R Segev & W Bialek, Nature 440, 1007-1012 (2006), arXiv:q-bio/0512013v1.


[Shlens et al 2006] The structure of multi-neuron firing patterns in primate retina.  J Shlens, GD Field, JL Gaulthier, MI Grivich, D Petrusca, A Sher, AM Litke & EJ Chichilnisky, J Neurosci 26, 8254-8266 (2006).


[Shlens et al 2009] The structure of large-scale synchronized firing in primate retina. J Shlens, GD Field, JL Gaulthier, M Greschner, A Sher, AM Litke & EJ Chichilnisky, J Neurosci 29, 5022-5031 (2009).


[Tang et al 2008] A maximum entropy model applied to spatial and temporal correlations from cortical networks in vitro.  A Tang, D Jackson, J Hobbs, W Chen, JL Smith, H PAtel, A Prieto, D Petruscam MI Grivich, A Sher, P Hottowy, W Dabrowski, AM Litke & JM Beggs, J Neurosci 28, 505-518 (2008).


[Yu et al 2008] A small world of neuronal synchrony.  S Yu, D Huang, W Singer & D Nikolic, Cereb Cortex 18, 2891-2901 (2008).


Of course, if we expect interesting collective effects in these networks, one must look at more than N=10 cells.


Optimization principles


Many biological systems operate very close to the borders of what the laws of physics allow, so that they achieve the best possible performance given the physical constraints; this idea has been explored in examples ranging from photon counting in vision to bacterial metabolism to human movement control.  The most exciting possibility is that we can use the idea of optimality as a variational principle from which we could derive the properties of real biological systems, in detail.  I'll review (quickly) some successes of this approach, and then dig into two examples where we will soon have very quantitative confrontations between theory and experiment. 


In the first case, we consider the estimation of motion in the visual system.  The general theory of optimal estimation shows that the best estimates can be constructed as functional integrals over the joint distribution of input (movies falling on the retina) and desired output (movement velocity); where previous work has made models of these distributions, recent experiments make it possible to evaluate the relevant integrals by Monte Carlo sampling from the real distributions. 


In the second case, we will look at the transmission of information from the concentration of transcription factors to the expression levels of the genes that they regulate, and ask how the qualitative architecture and quantitative parameters of these networks can be chosen to optimize information flow; here the connections are to emerging experiments on the initial events in fruit fly development.  In both cases, we have direct evidence for performance near the physical limits, and the opportunity to test the predictions of a theory which promotes this optimization to a theoretical principle.  We'll also try to see to what extent such optimization is likely to be a general principle, as opposed to a series of anecdotes about particular systems.