The Watson Lecture: Biology in the Era of Complete Genomes
President Shirley M. Tilghman
April 14, 2003
Presented at the National Human Genome Research Institute, the National Institutes of Health
It is a great honor to be with all of you today – so many of you friends for over 25 years – to give this lecture in honor of James Dewey Watson and celebrate his monumental contribution to science, as well as the sequencing of the human genome. The 50th anniversary of his discovery with Francis Crick of the double helix has been celebrated this year across the country and across the world. The unveiling of the double helix in 1953 marked the beginning of a revolution in the then nascent field of molecular biology, and the ripples continue to be felt, most especially today with the human genome. But as profound as that discovery proved to be, in my mind it is matched by Jim’s extraordinary leadership of his beloved Cold Spring Harbor Laboratory and his impact on the broader scientific community.
Jim is a consummate institution builder by having a phenomenal nose for scientific talent, the highest of high standards that he applied ferociously but fairly, and the commitment to reaching out beyond the shores of Long Island to create a place that became a home away from home for the entire life sciences community. The lab has been a magnet for the brightest minds interested in science: from middle school students who discover the wonders of DNA at the learning center, to the graduate students and post docs at the Watson Graduate School receiving an innovative and intense graduate education designed to give them a fast start to an independent career; from students and post docs who come to the lab to study and, each summer, take courses that will change their scientific lives by opening up new areas of exploration, to the thousands of scientists like myself, ages 18 to 80, who attend the meetings, always held in an atmosphere of high energy, heated scientific debate, lots of beer, and no sleep; from the judges, policy makers, and philosophers who came every year to the Danbury Center to explore the interface of science and society, to all of us who benefit from the Cold Spring Harbor Press. It’s a breathtaking assembly of broad intellectual footprints, and it’s due to Jim’s vision and enormous skill at making it a reality.
Jim’s leadership was critical at the dawn of the genome era when he saw before many other scientists the absolute imperative to sequence the human genome. By the force of his prodigious intellect and the force of his interesting personality, he kept our eye on the prize and set in motion the forerunner of the National Human Genome Research Institute. In so doing, he broke every regulation in the NIH rule book. It’s rumored he was single-handedly responsible for every gray hair in Building One. There are some eyebrows that have yet to descend back to normal levels. But we have the human genome ahead of schedule and under budget, and while there are many who share the credit for this enormous accomplishment, without Jim’s visionary leadership at the very beginning, I predict we would not be here celebrating.
I have been given the task of talking today about the impact of the Human Genome Project on biology. Now, I suspect that the reason I have been asked to give this lecture is because I gave a similar lecture almost seven years ago in which I said a lot of nice things about genome biologists. So, I am guessing that Francis Collins was hoping I am in a similarly generous mood today and will roughly do the same thing. What he’s failed to take into account is the fact that I now spend my day thinking about the sorry state of the economy and college athletics, and consequently, I have developed a really big mean streak. But in preparing for this lecture, I returned to the lecture that I gave seven years ago, just to see whether some of the predictions that I made at the time have come to pass, and I thought it would be fitting to begin as I did that time by quoting Richard MacIntosh, a distinguished cell biologist at the University of Colorado, who said the following in 1996: “Future biologists will be working in a wondrous wealth of information about genome structure. It is mind-boggling to think of the ways our experimental lives will be changed as a result. No field of biology will be untouched.”
This was a bold prediction to make in 1996, especially because the person who made it is not a molecular geneticist but a very ultra-structurally oriented cell biologist. His work could not have been further from the world of genomics. Seven years later, has this prediction, which I agreed with at the time, held up to the test of time? For my money, the most profound impact of the Genome Project has been the way in which it has demonstrated and legitimized what can, for want of a better phrase, be described as data-driven science, to distinguish it from hypothesis-driven science. This, I think, is precisely what Eric Lander was referring to when he said that biology has become information. The way I have put it is we have come to realize the degree to which information is power. It is not that hypothesis-driven science, the tried and true method whereby a scientist sets out to test a very specific hypothesis, is diminished in its importance. This has been the overwhelmingly dominant model for 20th-century biology and one whose success is simply beyond dispute. It is rather that biologists have come to appreciate a different and equally powerful way of promoting the progress of science by generating a large body of data and then using it to construct hypotheses to be tested by the tried-and-true method.
I do not wish to suggest, by the way, that this was an entirely new insight for biologists. One could argue, for example, that the genetic screens conducted by Christiane Nüsslein-Volhard and Eric F. Wieschaus in the early 1980s, which garnered for them the Nobel Prize in Medicine in 1995, were forerunners of this paradigm. These two extraordinary scientists created a large community resource in the form of mutant fly strains that were defective in some aspect of early development. Those strains became grist for the mill of the Drosophila developmental biology field and kept many scientists busy for the next 20 years.
But examples like this were relatively rare. I have often told the story of my visit to the Princeton Physics Department not long after the NRC report on the Genome Project was published. The physicists were curious about this new initiative and wanted to know what all the excitement was about. Despite my own enthusiasm for the Project, I was trying hard to be evenhanded and explained why the biology community was not embracing the idea of sequencing the genome without reserve. I raised with them the criticism that we do not have the tools to understand the genome and, therefore, it was possible that the data would sit in databases for many years without the means of interpreting it. They were incredulous that this was a concern. They pointed to shelves of books that contained data they had collected – for example, satellite sky surveys that remained to be explored. For physicists, this was standard operating procedure. Today, we biologists have a great deal more respect for the power of uninterpretable data hanging over the heads of scientists to drive innovation. Had we not created the need for better algorithms to identify patterns in DNA, it is unlikely that such programs would exist today. The key lesson learned: data are inherently good.
The Genome Project has also taught us to take full advantage of efficiencies of scale when the goal is to create a resource that will benefit the community. Biology has always been a cottage industry, the unit of work consisting of a small group of scientists, almost always trainees. It was obvious to everyone from the beginning of the Project that the genome was not going to be sequenced using that model, despite the fact that the model had spawned some of the most creative and important science of the 20th century. So, a new lesson learned is taking advantage of efficiencies of scale. The ecological niche of a genome scientist became a place that looked fundamentally different from a traditional biology laboratory. The decision to adopt such a different model was not taken lightly, but it was a critical decision because without it, the genome would never have been sequenced.
The power that derives from adopting a high throughput, cost-effective way to generate large amounts of data is now being harnessed in many projects across biology. And the lesson we have learned here is, if something is worth doing once, it is worth doing 384 times. Let me just give you some examples of this. The first is taken from structural biology. A group of scientists in the Northeast have banded together to develop methods to do automated NMR analysis. Now, for anyone in the audience who knows anything about NMR analysis, what you know is that it is incredibly slow and time-consuming and usually is done on a one-protein-at-a-time basis. What the Genome Project, I think, implanted in the minds of NMR scientists is the idea of doing NMR analysis on a large scale with random proteins as a way to begin to assign function to proteins. They have not succeeded yet; this is very much a work in progress. But I think the idea of doing it in a high throughput way to get around the slow and laborious NMR analysis has as its inspiration some of the strategies that were put in place for the genome.
We are going to hear from Pat Brown regarding the real revolution that has occurred for us who study gene expression: from painstakingly studying the expression of a single gene to now being able to measure the expression of all genes in a cell, all measured simultaneously. But I think an interesting variant of this strategy has been developed by Anthea Letsou and Jim Metherall in Utah, who are taking the idea of microwaves one step further and looking at high throughput expression of very specific expression patterns in Drosophila embryos. So, beginning with a random cDNA library, each one of those is then, in a very high throughput way, being used in situ hybridization to identify random cDNAs that have highly interesting, unique patterns of expression that suggest they might be important in development. Again, the notion that you can take a technique like in situ hybridization and use it in a high throughput way is, I think, inspired by the genome.
Another example comes from Science just a few weeks ago, and that is the idea that you could screen for the function of genes using RNAi analysis in a whole genome-wide way. This is an experiment that was done in Gary Ruvkun’s lab, clearly identifying genes in C. elegans that are involved in the regulation of fat by essentially taking random cDNAs and looking for those that affect the pattern of fat deposition in worms. Once again, I think, you can see the flavor of the Genome Project coming through research of this kind.
I would argue that all these projects were inspired by the powerful example of the Genome Project, especially in the way in which data, gathered by brute force, repetition of a powerful technique, can move science forward. What these approaches are also pointing us to is a fundamental paradigm shift from a reductionist approach to science, in which we focused on a single gene, a single protein, a single cell surface modification, toward an integrative way of thinking. For the past 25 years, biology has moved forward by investigators approaching the study of an organism much as in the fable of the blind men surrounding the elephant, each one touching a different part and therefore describing the elephant in very different ways. Legions of scientists have spent a lifetime studying one protein on the surface of a cell, describing in exquisite detail the ways in which the protein transduces information from the outside of the cell to the inside.
Now we have the potential to know the identities of all the cell surface proteins expressed in a cell. We can begin to ask an entirely new kind of question – does the cell coordinate the activities of all these cell surface proteins? Is there a conductor orchestrating the music of the cell, or is it a cacophony with the loudest instrument winning the day? Using a different metaphor, this is the difference between taking the radio apart and putting it back together. These integrative approaches will inexorably lead us to a brand of biology that is far more quantitative and will therefore call upon biologists who have much more rigorous training in mathematics and in computer science. The capacity to extract information from large data sets is the example that Eric just gave us, and using that information to create theoretical models for experimentalists to test will become increasingly important. The close interplay between theory, modeling, and experiment has dominated many other branches of science, particularly physics and astrophysics, but it had little impact on biology until now. I used to refer to the Journal of Theoretical Biology as the cure to insomnia. No longer will I be able to say that. The genome and its vast accumulation of data have the potential to change all of that by opening the doors for scientists with more analytical and theoretical bents. Such scientists tend to be very smart and think completely differently from experimental biologists, and that is a good thing for biology.
The Genome Project also reminded us in the starkest terms of the central role that technology and particularly the development of new technology has in science. It is now much more widely accepted that as much as ideas are fundamental to the advancement of science, technological innovation is the engine of scientific innovation. As Emanuel Farber, one of my pathology professors, taught me 25 years ago, ideas are a dime a dozen; it is experiments that count. And without advances in technology, ideas will often remain a glimmer in a scientist’s eye. Technology is often the rate-limiting step between a good idea and an experiment to test it, and the Genome Project made it crystal clear that biology had undervalued the importance of technology and thus had not organized the infrastructure to support its development, which is both expensive and risky. Hopefully, that has now changed.
I now would like to turn in the last few minutes to some specific examples in biology that have, in my view, been transformed by genomics. In the first example – and this could actually apply to all of the examples – the Genome Project has delivered a classical “good news, bad news” message. The good news is that it has created research projects and thus gainful employment for young scientists for many years to come. The bad news is that the Genome Project has uncovered our great ignorance about the nature of the information contained within the genome. No one who has studied the genome as I have for the past 20 years can be anything but humbled when the human and mouse genomes were aligned and compared. And here I was intending to make a point that I think was just beautifully made by Eric Lander, and I know it is going to be reinforced by several of our next speakers. But clearly, what is immediately apparent when you look at any part of those two genomes that have been compared is that evolution has indeed been hard at work, conserving far more of the genome than we could explain by genes and their closely allied regulatory elements. Let me give you a close-up example of one random part of the genome I picked in which these blue boxes are recognizing homologies that correspond to the axons of a gene. But these red boxes, these lines indicating homology between mouse and human, are elements for which we have no knowledge about what they could possibly be doing.
Scientists should have a field day trying to understand what evolution had in mind when she paid so much attention to these little segments of DNA. One can’t help but be reminded of Sydney Brenner’s great exhortation to us at the beginning of the Genome Project when we were making pronouncements about all the junk DNA in the human genome. What Sydney reminded us was, don’t forget that there is a difference between junk and garbage. Junk is what you store up in the attic because you know it is going to be useful some day. Garbage is what you throw away. Had we not made the critical decision to sequence all the genome, and not just the cDNAs or the genes, we would have missed the junk, which I predict is going to be some of the most interesting parts of the genome.
Indeed, there is no field of biology that is going to be more affected by the Genome Project than evolutionary biology. And again, I think you got a foretaste of that from Eric’s talk. Some of the questions that are now open to evolutionary biologists; some of those they have been debating for 100 years are now possible to answer. Where do genes come from? How often when a gene arises in evolution is it through duplication? Is it through mixing and matching of axons through recombination within genes? How often are genes literally created de novo from the junk of the genome? How do genomes change with time? What drives the expansion of a genome like Maize or the contraction of a genome like Fugu? How do those genomes change with time? A point that Eric just mentioned, how much of the genome is under positive selection versus neutral selection versus negative selection? How often do we select for something as opposed to selecting against something? And, finally, how much are mutation rates affected by where you are in the genome? Is there an architecture of the genome inside the nucleus that makes it more or less likely that point mutations will develop in those genes? These are critical and fundamental questions to evolutionary biologists, and these are questions that now can begin to be approached with genomic tools.
One of my favorite evolution stories involves a place where genomics has come face-to-face with Charles Darwin, allowing us to span 150 years. For a quarter of a century, two evolutionary biologists at Princeton, Peter and Rosemary Grant, have been traveling each year to the Galapagos Islands off the coast of South America to live for three months on a rock and study how populations of Darwin’s finches have changed as a result of very dramatic climatic differences: from large beaks designed to crack very hard, tough nuts and seeds, which are the only things that survive drought, to narrow beaks designed for times when the islands are wet, when there are flowers and such beaks are needed to extract the nectar from deep inside them. What the Grants have been doing over 25 years is characterizing these birds morphologically so they understand in enormous detail the structural differences among them. What they are now doing is sequencing their genomes and correlating these structural differences, which are critically important to survival, and the genetic changes that led to them. Darwin meeting the 21st century!
Likewise, conservation biologists and ecologists are experiencing the enormous impact of the genome on their work as they begin to define the level of biodiversity on the planet. In a review in Nature that was written several months ago, the authors demonstrate the enormous ignorance that we currently have about the level of biodiversity on the planet. For example, among invertebrates, it is estimated that of the total number of predicted organisms, only a tiny percent of them have actually been identified and studied. What genomics is now beginning to help do is to find the relationships between animals within groups to begin to understand population dynamics. Scientists who study the dynamics of forests are beginning to understand who is related to whom in a forest. When a large tree sends out sprouts, are those sprouts daughters, cousins, or second cousins twice removed? All of the tools that were developed for the genome sequencing are now being used by scientists engaged in some of the most important scientific questions that confront us today. What is clear is that DNA-based taxonomy is going to be of enormous value in resolving many of the serious and fundamental problems that scientists are facing. The availability of genome sequences of obscure organisms – obscure from the point of view of mainstream biology, of course, not necessarily obscure in the view of Mother Nature – will have, I predict, a dramatic effect on biology going forward.
When the Genome Project began, it began with what Jerry Fink called the security council of organisms. These were the privileged few whose genomes were identified for sequencing other than the human genome. They were chosen because they had good genetic systems and large communities of scholars working on these organisms. I would argue those were the right choices, but today because of the improvements in technology, we now have the “united nations.” The “united nations” is a much broader, much more curious and interesting group of organisms, some of which are going to allow us to ask questions that we could not with the classical model organisms.
I was having lunch with Mark Kirschner yesterday – many of you know him – a very distinguished cell biologist at Harvard Medical School, and he was regaling me, on the placemat of the restaurant, with stories about the acorn worm, a small animal mollusk that he is now working on, and the questions he was asking with this organism could not be asked with any of the security council organisms. This is going to be a wonderful boon for biology. The fact that we can sequence the genomes of lots of different organisms is really going to make biology a much more inclusive and interesting science.
So, in this short lecture, I have been able to mention only a handful of ways in which I see biology changing in the next decade as a result of the output of the many genome projects that will be undertaken. It is a wonderful time to be in biology, for the frontiers feel decidedly different to me and hold out the promise for an extraordinary ride ahead. However, I would only urge us to take this ride with the last and most important lesson that I learned from both Jim Watson and from the leaders in the Genome Project, which is to aim high and to be bold. Thank you.