**Learning, complexity
and relevant information**

** **

William Bialek

Papers
are in chronological order, most recent papers at the bottom. Numbers refer to a full list of publications.

[60.] Field theories for learning probability
distributions. W Bialek, CG Callan
& SP Strong, *Phys Rev Lett ***77,**
4693-4697 (1996).

[80.] Occam factors and model-independent
Bayesian learning of continuous distributions. I Nemenman & W Bialek, *Phys
Rev E* **65,** 026137
(2002).

[69.] The information bottleneck method. N Tishby, FC Pereira, & W Bialek,
in *Proceedings of the 37th Annual Allerton Conference on Communication,
Control and Computing*,
B Hajek & RS Sreenivas, eds, pp 368-377 (University of Illinois, 1999).

When
Shannon developed information theory he left open the problem of assigning
relevance to a signal. Here
we showed that if we observe one signal but are interested in another, then the
statistical associations between these signals defines what is relevant and one
can (selectively) compress the
observed signal to Òsqueeze outÓ the relevant bits by formulating the efficient
representation of relevant information as an optimization principle. Crucially,
this formulation does not require any assumptions about what it means for
signals to be similar; indeed the various signals need not even live in a
metric space. There are deep connections to clusteringÑespecially to the
statistical mechanics formulation in which separate clusters emerge through a
series of phase transitionsÑand many different problems from signal processing
to learning can be cast into this unified information theoretic framework. We
believe that this is a fundamentally new and principled approach to a wide variety
of problems of interest both as models of the problems solved by the brain and
as practical problems in their own right.

[76.] Predictability,
complexity and learning. W Bialek, I Nemenman & N Tishby, *Neural
Comp* **13,** 2409-2463 (2001).

[79.] Complexity
through nonextensivity. W Bialek, I Nemenman & N Tishby, *Physica A* **302,** 89-99
(2001).

We
have reached an understanding of the connections between learning and
complexity as unified by the idea of *predictive information*, which is equivalent to subÐextensive components in
the entropy. The results provide a conclusive answer to the long standing problem
of how to characterize the complexity of time series, and serve to unify ideas from
different areas of physics and
computer science. In particular we can classify data streams by their
complexity, and if there is something to be learned from the data stream then
this classification corresponds to measures for the complexity of the model
that can be learned. From a technical
point of view it was essential to have a calculable example in the regime where
models to be learned cannot be described by a finite number of parameters, and
in related work we showed how these nonparametric learning problems could be
given a field theoretic formulation [60, 80]. Perhaps the most interesting
direction to grow out of this work is the possibility of measuring directly the
complexity of models used by humans and other animals as they learn about the
world.

[84.] Thinking
about the brain. W Bialek, in *Physics
of Biomolecules and Cells: Les Houches Session LXXV,* H Flyvbjerg, F JŸlicher, P Ormos, & F David, eds,
pp 485-577 (EDP Sciences, Les Ulis; Springer-Verlag, Berlin, 2002).

[83.] Entropy and inference, revisited. I
Nemenman, F Shafee, & W Bialek, in *Advances in Neural Information
Processing 14, *TG Dietterich, S
Becker & Z Ghahramani, eds, pp 471-478 (MIT Press, Cambridge, 2002).

[101.] Entropy and information in neural spike
trains: Progress on the sampling
problem. I Nemenman, W Bialek
& R de Ruyter van Steveninck, physics/0306063.

[95.] Ambiguous model learning made unambiguous
with 1/f priors. GS Atwal
& W Bialek, to appear in *Advances in Neural Information Processing 16*, (MIT
Press, Cambridge, 2004).

[96.] Optimal manifold representation
of data: An information
theoretic perspective. D Chigirev & W Bialek, to appear in *Advances
in Neural Information Processing 16*,
(MIT Press, Cambridge, 2004).