Mouse Genetics: Concepts & Applications (Full Table of Contents)

Copyright ©1995 Lee M. Silver

7. Mapping in the mouse: An overview

7.1 Genetic maps come in various forms

7.1.1 Definitions

7.1.2 Linkage maps

7.1.3 Chromosome maps

7.1.4 Physical maps

7.1.5 Connections between maps

7.2 Mendel’s genetics, linkage, and the mouse

7.2.1 Historical overview

7.2.2 Linkage and recombination

7.2.3 Crossover sites are not randomly distributed

7.2.4 A history of mouse mapping

7.3 General strategies for mapping mouse loci

7.3.1 Novel DNA clones

7.3.2 Transgene insertion sites

7.3.3 Verification of region-specific DNA markers

7.3.4 Loci defined by polypeptide products

7.3.5 Mutant phenotypes

7.4 The final chapter of genetics

7.4.1 From gene to function

7.4.2 From phenotype to gene

7.4.3 The molecular basis of complex traits

 

7.1 Genetic maps come in various forms

The remaining chapters in this book will be devoted to the process and practice of genetic mapping in the mouse. Although mapping was once viewed as a sleepy pastime performed simply for the satisfaction of knowing where a gene mapped as an end unto itself, it is now viewed as a critical tool of importance to many different areas of biological and medical research. Mapping can provide a means for moving from important diseases to clones of the causative genes which, in turn, can provide tools for diagnosis, understanding, and treatment. In the opposite direction, mapping can be used to uncover functions for newly-derived DNA clones by demonstrating correlations with previously-described variant phenotypes. Mapping can also be used to dissect out the heritable and non-heritable components of complex traits and the mechanisms by which they interact. The purpose of this chapter is to provide a primer on classical genetics and to give an overview of mapping in the mouse, with further details provided in subsequent, more focused chapters.

7.1.1 Definitions

7.1.1.1 Genes and loci

In the pre-recombinant DNA era, all genes were defined by the existence of alternative alleles that produced alternative phenotypes that segregated in genetic crosses. Today, with the use of molecular technologies, the ability to recognize genes has expanded tremendously. Monomorphic genes (those with only a single allele) can now be recognized through their transcriptional activity alone. Recognition of putative genes within larger genomic sequences can also be accomplished through the identification of open reading frames, flanking tissue-specific enhancers and other regulatory elements, internal splicing signals, and sequence conservation across evolutionary lines. Sequence-specific epigenetic phenomena such as imprinting, methylation, and DNase sensitivity can also be used to elucidate the existence of functional genomic elements.

Mouse geneticists use the term locus to describe any DNA segment that is distinguishable in some way by some form of genetic analysis. In the pre-recombinant DNA era, only genes distinguished by phenotype could be recognized as loci. But today, with the use of molecular tools, it is possible to distinguish "loci" in the genome that have no discernible function at all. In fact, any change in the DNA sequence, no matter how small or large, whether in a gene or elsewhere, can be followed potentially as an alternative allele in genetic crosses. When alternative alleles exist in a genomic sequence that has no known function, the polymorphic site is called an anonymous locus. With an average rate of polymorphism of one base difference in a thousand between individual chromosome homologs within a species, the pool of potential anonymous loci is enormous. Classes of anonymous loci and the methods by which they are detected and used as genetic markers will be the subject of chapter 8.

7.1.1.2 Maps

A genetic map is simply a representation of the distribution of a set of loci within the genome. The loci included by an investigator in any one mapping project may bear no relation to each other at all, or they may be related according to any of a number of parameters including functional or structural homologies or a pre-determined chromosomal assignment. Mapping of these loci can be accomplished at many different levels of resolution. At the lowest level, a locus is simply assigned to a particular chromosome without any further localization. At a step above, an assignment may be made to a particular subchromosomal region. At a still higher level of resolution, the relative order and approximate distances that separate individual loci within a linked set can be determined. With ever-increasing levels of resolution, the order and inter-locus distances can be determined with greater and greater precision. Finally, the ultimate resolution is attained when loci are mapped onto the DNA sequence itself.

The simplest genetic maps can contain information on as few as two linked loci. At the opposite extreme will be complete physical maps that depict the precise physical location of all of the thousands of genes that exist along an entire chromosome. The first step toward the generation of these complete physical maps has recently been achieved with the establishment of single contigs of overlapping clones across the length of two complete human chromosome arms (Chumakov et al., 1992; Foote et al., 1992). By the time this book is actually read, it is likely that complete contigs across other human — as well as mouse — chromosomes will also be attained. However, it is still a long journey from simply having a set of clones to deciphering the genetic information within them.

There is actually not one, but three distinct types of genetic maps that can be derived for each chromosome in the genome (other than the Y). The three types of maps — linkage, chromosomal, and physical — are illustrated in figure 7.1 and are distinguished both by the methods used for their derivation and the metric used for measuring distances within them.

7.1.2 Linkage maps

The linkage map, also referred to as a recombination map, was the first to be developed soon after the re-discovery of Mendel’s work at the beginning of the twentieth century. Linkage maps can only be constructed for loci that occur in two or more heritable forms, or alleles. Thus, monomorphic loci — those with only a single allele — cannot be mapped in this fashion. Linkage maps are generated by counting the number of offspring that receive either parental or recombinant allele combinations from a parent that carries two different alleles at two or more loci. Analyses of this type of data allow one to determine whether loci are "linked" to each other and, if they are, their relative order and the relative distances that separate them (see section 7.2).

A chromosomal assignment is accomplished whenever a new locus is found to be in linkage with a previously assigned locus. Distances are measured in centimorgans, with one centimorgan equivalent to a crossover rate of 1%. The linkage map is the only type based on classical breeding analysis. The term "genetic map" is sometimes used as a false synonym for "linkage map"; a genetic map is actually more broadly defined to include both chromosomal and physical maps as well.

7.1.3 Chromosome maps

The chromosome map (or cytogenetic map) is based on the karyotype of the mouse genome. All mouse chromosomes are defined at the cytogenetic level according to their size and banding pattern (see figures 5.1), and ultimately, all chromosomal assignments are made by direct cytogenetic analysis or by linkage to a locus that has previously been mapped in this way. Chromosomal map positions are indicated with the use of band names (figures 5.2 and 7.1). Inherent in this naming scheme is a means for ordering loci along the chromosome (see section 5.2).

Today, several different approaches, with different levels of resolution, can be used to generate chromosome maps. First, in some cases, indirect mapping can be accomplished with the use of one or more somatic cell hybrid lines that contain only portions of the mouse karyotype within the milieu of another species’ genome. By correlating the presence or expression of a particular mouse gene with the presence of a mouse chromosome or subchromosomal region in these cells, one can obtain a chromosomal, or subchromosomal, assignment (see section 10.2.3).

The second approach can be used in those special cases where karyotypic abnormalities appear in conjunction with particular mutant phenotypes. When the chromosomal lesion and the phenotype assort together, from one generation to the next, it is likely that the former causes the latter. When the lesion is a deletion, translocation, inversion, or duplication, one can assign the mutant locus to the chromosomal band that has been disrupted.

Finally, with the availability of a locus-specific DNA probe, it becomes possible to use the method of in situ hybridization to directly visualize the location of the corresponding sequence within a particular chromosomal band. This approach is not dependent on correlations or assumptions of any kind and, as such, it is the most direct mapping approach that exists. However, it is technically demanding and not nearly as high resolving as linkage or physical approaches (see section 10.2.2).

7.1.4 Physical maps

The third type of map is a physical map. All physical maps are based on the direct analysis of DNA. Physical distances between and within loci are measured in basepairs (bp), kilobasepairs (kb) or megabasepairs (mb). Physical maps are arbitrarily divided into short range and long range. Short range mapping is commonly pursued over distances ranging up to 30 kb. In very approximate terms, this is the average size of a gene and it is also the average size of cloned inserts obtained from cosmid-based genomic libraries. Cloned regions of this size can be easily mapped to high resolution with restriction enzymes and, with advances in sequencing technology, it is becoming more common to sequence interesting regions of this length in their entirety.

Direct long range physical mapping can be accomplished over megabase-sized regions with the use of rare-cutting restriction enzymes together with various methods of gel electrophoresis referred to generically as pulsed field gel electrophoresis. or PFGE, which allow the separation and sizing of DNA fragments of 6 mb or more in length (Schwarz and Cantor, 1984; den Dunnen and van Ommen, 1991). PFGE mapping studies can be performed directly on genomic DNA followed by Southern blot analysis with probes for particular loci (see section 10.3.2). It becomes possible to demonstrate physical linkage whenever probes for two loci detect the same set of large restriction fragments upon sequential hybridizations to the same blot.

Long range mapping can also be performed with clones obtained from large insert genomic libraries such as those based on the Yeast Artificial Chromosome (YAC) cloning vectors, since regions within these clones can be readily isolated for further analysis (see section 10.3.3). In the future, long range physical maps consisting of overlapping clones will cover each chromosome in the mouse genome. Short range restriction maps of high resolution will be merged together along each chromosomal length, and ultimately, perhaps, the highest level of mapping resolution will be achieved with whole chromosome DNA sequences.

7.1.5 Connections between maps

In theory, linkage, chromosomal, and physical maps should all provide the same information on chromosomal assignment and the order of loci. However, the relative distances that are measured within each map can be quite different. Only the physical map can provide an accurate description of the actual length of DNA that separates loci from each other. This is not to say that the other two types of maps are inaccurate. Rather, each represents a version of the physical map that has been modulated according to a different parameter. Cytogenetic distances are modulated by the relative packing of the DNA molecule into different chromosomal regions. Linkage distances are modulated by the variable propensity of different DNA regions to take part in recombination events (see section 7.2.3).

In practice, genetic maps of the mouse are often an amalgamation of chromosomal, linkage, and physical maps. But, at the time of this writing, it is still the case that classical recombination studies provide the great bulk of data incorporated into such integrated maps. Thus, the primary metric used to chart inter-locus distances has been the centimorgan. However, it seems reasonable to predict that, within the next five years, the megabase will overtake the centimorgan as the unit for measurement along the chromosome.

7.2 Mendel’s genetics, linkage, and the mouse

7.2.1 Historical overview

By the time the chemical nature of the gene was uncovered, genetics was already a mature science. In fact, Mendel’s formulation of the basic principles of heredity was not even dependent on an understanding of the fact that genes existed within chromosomes. Rather, the existence of genes was inferred solely from the expression in offspring of visible traits at predicted frequencies based on the traits present in the parental and grandparental generations. Today, of course, the field of genetics encompasses a broad spectrum of inquiry from molecular studies on gene regulation to analyses of allele frequencies in natural populations, with many subfields in-between. To distinguish the original version of genetics — that of Mendel and his followers — from the various related fields that developed later, several terms have been coined including "formal" genetics, "transmission" genetics, or "classical" genetics. Transmission genetics is the most informative term since it speaks directly to the feature that best characterizes the process by which Mendelian data are obtained — through an analysis of the transmission of genotypes and phenotypes from parents to offspring.

Mendel himself only formulated two of the three general features that underlie all studies in transmission genetics from sexually reproducing organisms. His formulations have been codified into two laws. The first law states, in modern terms, that each individual carries two copies of every gene and that only one of these two copies is transmitted to each child. At the other end of this equation, a child will receive one complete set of genes from each parent, leading to the restoration of a genotype that contains two copies of every gene. Individuals (and cells) that carry two copies of each gene are considered "diploid."

Mendel’s first law comes into operation when diploid individuals produce "haploid" gametes — sperm or eggs — that each carry only a single complete set of genes. In animals, only a certain type of highly specialized cell — known as a "germ cell" — is capable of undergoing the transformation from the diploid to the haploid state through a process known as meiosis. At the cell division in which this transformation occurs, the two copies of each gene will separate or segregate from each other and move into different daughter (or brother) cells. This event provides the name for Mendel’s first law: "The law of segregation." Segregation can only be observed from loci that are heterozygous with two distinguishable alleles. As a result of segregation, half of an individual’s gametes will contain one of these alleles and half will contain the other. Thus, a child can receive either allele with equal probability.

While Mendel’s first law is concerned with the transmission of individual genes in isolation from each other, his second law was formulated in an attempt to codify the manner in which different genes are transmitted relative to each other. In modern terms, Mendel’s second law states that the segregation of alleles from any one locus will have no influence on the segregation of alleles from any other locus. In the language of probability, this means that each segregation event is independent of all others and this provides the name for Mendel’s second law: "The law of independent assortment."

Independent assortment of alleles at two different loci — for example, A and B — can only be observed from an individual who is heterozygous at both with a genotype of the form A/a, B/b as illustrated in figure 7.2. Each gamete produced by such an individual will carry only one allele from the A locus and only one allele from the B locus. Since the two alleles are acquired independently of each other, it is possible to calculate the probability of any particular allelic combination by simply multiplying together the probability of occurrence of each alone. For example, the probability that a gamete will receive the A allele is 0.5 (from the law of segregation) and the probability that this same gamete will receive the b allele is similarly 0.5. Thus, the probability that a gamete will have a combined A b genotype is 0.5 x 0.5 = 0.25. The same probabilities are obtained for all four possible allelic combinations (A B, a b, A b, a B). Since the number of gametes produced by an individual is very large, these probabilities translate directly into the frequencies at which each gamete type is actually present and, in turn, the frequency with which each will be transmitted to offspring (figure 7.2).

As we all know today, Mendel’s second law holds true only for genes that are not linked together on the same chromosome. When genes A and B are linked, the numbers expected for each of the four allele sets becomes skewed from 25% (figure 7.3). Two allele combinations will represent the linkage arrangements on the parental chromosomes (for example, A B and a b), and these combinations will each be transmitted at frequency of greater than 25%. The remaining two classes will represent recombinant arrangements that will be transmitted at a frequency below 25%. In the extreme case of absolute linkage, only the two parental classes will be transmitted, each at a frequency of 50%. At intermediate levels of linkage, transmission of the two parental classes together will be greater than 50% but less than 100%.

In 1905, when evidence for linkage was first encountered in the form of loci whose alleles did not assort independently, it’s significance was not appreciated (Bateson et al., 1905). The terms coupling and repulsion were coined to account for this unusual finding through some sort of underlying physical force. In a genetics book from 1911, Punnett imagined that alleles of different genes might "repel one another, refusing, as it were, to enter into the same zygote, or they may attract one another, and becoming linked, pass into the same gamete, as it were by preference" (Punnett, 1911). What this hypothesis failed to explain is why alleles found in repulsion to each other in one generation could become coupled to each other in the next generation. But even as Punnett’s genetics text was published, an explanation was at hand. In 1912, Morgan and his colleagues proposed that coupling and repulsion were actually a consequence of co-localization of genes to the same chromosome: coupled alleles are those present on the same parental homolog, and alleles in repulsion are those present on alternative homologs (Morgan and Cattell, 1912 and figure 7.3). Through the process of crossing over, alleles that are in repulsion in one generation (for example the A and b alleles in figure 7.3) can be brought together on the same homolog — and thus become coupled — in the next generation. In 1913, Sturtevant used the rates at which crossing over occurred between different pairs of loci to develop the first linkage map with six genes on the Drosophila X chromosome (Sturtevant, 1913). Although the original rationale for the terms coupling and repulsion was eliminated with this new understanding, the terms themselves have been retained in the language of geneticists (especially human geneticists). Whether alleles at two linked loci are coupled or in repulsion is referred to as the phase of linkage.

The purpose of this chapter is to develop the concepts of transmission genetics as they are applied to contemporary studies of the mouse. This discussion is not meant to be comprehensive. Rather, it will focus on the specific protocols and problems that are most germane to investigators who seek to place genes onto the mouse linkage map and those who want to determine the genetic basis for various traits that are expressed differently by different animals or strains.

7.2.2 Linkage and recombination

7.2.2.1 The backcross

Genetic linkage is a direct consequence of the physical linkage of two or more loci within the same pair of DNA molecules that define a particular set of chromosome homologs within the diploid genome. Genetic linkage is demonstrated in mice through breeding experiments in which one or both parents are detectably heterozygous at each of the loci under investigation. In the simplest form of linkage analysis — referred to as a backcross — only one parent is heterozygous at each of two or more loci, and the other parent is homozygous at these same loci. As a result, segregation of alternative alleles occurs only in the gametes that derive from one parent, and the genotypes of the offspring provide a direct determination of the allelic constitution of these gametes. The backcross greatly simplifies the interpretation of genetic data because it allows one to jump directly from the genotypes of offspring to the frequencies with which different meiotic products are formed by the heterozygous parent.

For each locus under investigation in the backcross, one must choose appropriate heterozygous and homozygous genotypes so that the segregation of alleles from the heterozygous parents can be followed in each of the offspring. For loci that have not been cloned, the genotype of the offspring can only be determined through a phenotypic analysis. In this case, if the two alleles present in the heterozygous parent show a complete dominant/recessive relationship, then the other parent must be homozygous for the recessive allele. For example, the A allele at the agouti locus causes a mouse to have a banded "agouti" coat color, whereas the a allele determines a solid "non-agouti" coat color. Since the A allele is dominant to a, the homozygous parent must be a/a. In an A/a x a/a backcross, the occurrence of agouti offspring would indicate the transmission of the A allele from the heterozygous parent, and the occurrence of non-agouti offspring would indicate the transmission of the a allele.

In the case just described, the wild-type allele (A) is dominant and the mutant allele (a) is recessive. Thus, the homozygous parent must carry the mutant allele (a/a) and express a non-agouti coat color. In other cases, however, the situation is reversed with mutations that are dominant and wild-type alleles that are recessive. For example, the T mutation at the T locus causes a dominant shortening of the tail. Thus, if the T locus were to be included in a backcross, the heterozygous genotype would be T/+ and the homozygous genotype would be wild-type (+/+) to allow one to distinguish the transmission of the T allele (within short-tailed offspring) from the + allele (within normal-tailed offspring).

As discussed in chapter 8, most loci are now typed directly by DNA-based techniques. As long as both DNA alleles at a particular locus can be distinguished from each other, it doesn’t matter which is chosen for inclusion in the overall genotype of the homozygous parent. The same holds true for all phenotypically-defined loci at which pairs of alleles act in a co-dominant or incompletely dominant manner. In all these cases, the heterozygote (A1/A2 for example) can be distinguished from both homozygotes (A1/A1 and A2/A2).

7.2.2.2 Map distances

In the example presented in figure 7.3, an animal is heterozygous at both of two linked loci, which results in two complementary sets of coupled alleles — A B and a b. The genotype of this animal would be written as follows: AB/ab. In the absence of crossing over between homologs during meiosis, one or the other coupled set — either A B or a b — will be transmitted to each gamete. However, if a crossover event does occur between the A and B loci, a non-parental combination of alleles will be transmitted to each gamete. In the example shown in figure 7.3, the frequency of recombination between loci A and B can be calculated directly by determining the percentage of offspring formed from gametes that contain one of the two non-parental, or "recombinant," combinations of alleles. In this example, the recombination frequency is 10%.

To a first degree, crossing over occurs at random sites along all of the chromosomes in the genome. A direct consequence of this randomness is that the farther apart two linked loci are from each, the more likely it is that a crossover event will occur somewhere within the length of chromosome that lies between them. Thus, the frequency of recombination provides a relative estimate of genetic distance. Genetic distances are measured in centimorgans (cM) with one centimorgan defined as the distance between two loci that recombine with a frequency of 1%. Thus, as a further example, if two loci recombine with a frequency of 2.5%, this would represent an approximate genetic distance of 2.5 cM. In the mouse, correlations between genetic and physical distances have demonstrated that one centimorgan is, on average, equivalent to 2,000 kilobases. It is important to be aware, however, that the rate of equivalence can vary greatly due to numerous factors discussed in section 7.2.5.

Although the frequency of recombination between two loci is roughly proportional to the length of DNA that separates them, when this length becomes too large, the frequency will approach 50% which is indistinguishable from that expected with unlinked loci. The average size of a mouse chromosome is 75 cM. Thus, even when genes are located on the same chromosome, they are not necessarily linked to each other according to the formal definition of the term. However, a linkage group does include all genes that have been linked by association. Thus, if gene A is linked to gene B, and gene B is linked to gene C, the three genes together — A B C — form a linkage group even if the most distant members of the group do not exibit linkage to each other.

7.2.2.3 Genetic interference

A priori, one might assume that all recombination events within the same meiotic cell should be independent of each other. A direct consequence of this assumption is that the linear relationship between recombination frequency and genetic distance — apparent in the single digit centimorgan range — should degenerate with increasing distances. The reason for this degeneration is that as the distance between two loci increases, so does the probability that multiple recombination events will occur between them. Unfortunately, if two, four, or any other even number of crossovers occur, the resulting gametes will still retain the parental combination of coupled alleles at the two loci under analysis as shown in figure 7.4. Double (as well as quadruple) recombinants will not be detectably different from non-recombinants. As a consequence, the observed recombination frequency will be less than the actual recombination frequency.

Consider, for example, two loci that are separated by a real genetic distance of 20 cM. According to simple probability theory, the chance that two independent recombination events will occur in this interval is the product of the predicted frequencies with which each will occur alone which is 0.20 for a 20 cM distance. Thus, the probability of a double recombination event is 0.2 x 0.2 = 0.04. The failure to detect recombination in 4% of the gametes means that two loci separated by 20 cM will only show recombination at a frequency of 0.16. A similar calculation indicates that at 30 cM, the observed frequency of recombinant products will be even further removed at 0.21. In 1919, Haldane simplified this type of calculation by developing a general equation that could provide values for recombination fractions at all map distances based on the formulation just described. This equation is known as the "Haldane mapping function" and it relates the expected fraction of offspring with detectable recombinant chromosomes (r) to the actual map distance in morgans (m) that separates the two loci (Haldane, 1919):

? ?(7.1)

After working through this hypothetical adjustment to recombination rates, it is now time to state that multiple events of recombination on the same chromosome are not independent of each other. In particular, a recombination event at one position on a chromosome will act to interfere with the initiation of other recombination events in its vicinity. This phenomenon is known, appropriately, as "interference." Interference was first observed within the context of significantly lower numbers of double crossovers than expected in the data obtained from some of the earliest linkage studies conducted on Drosophila (Muller, 1916). Since that time, interference has been demonstrated in every higher eukaryotic organism for which sufficient genetic data have been generated.

Significant interference has been found to extend over very long distances in mammals. The most extensive quantitative analysis of interference has been conducted on human chromosome 9 markers that were typed in the products of 17,316 meiotic events (Kwiatkowski et al., 1993). Within 10 cM intervals, only two double-crossover events were found; this observed frequency of 0.0001 is 100-fold lower than expected in the absence of interference. Within 20 cM intervals, there were 10 double-crossover events (including the two above); this observed frequency of 0.0005 is still 80-fold lower than predicted without interference. As map distances increase beyond 20 cM, the strength of interference declines, but even at distances of up to 50 cM, its effects can still be observed (Povey et al., 1992).

If one assumes that human chromosome 9 is not unique in its recombinational properties, the implication of this analysis is that for experiments in which fewer than 1000 human meiotic events are typed, multiple crossovers within 10 cM intervals will be extremely unlikely, and within 25 cM intervals, they will still be quite rare. Data evaluating double crossovers in the mouse are not as extensive, but they suggest a similar degree of interference (King et al., 1989). Thus, for all practical purposes, it is appropriate to convert recombination fractions of 0.25, or less, directly into centimorgan distances through a simple multiplication by 100.

When it is necessary to work with recombination fractions that are larger than 0.25, it is helpful to use a mapping function that incorporates interference into an estimate of map distance. Since the effects of interference can only be determined empirically, one cannot derive such a mapping function from first principles. Instead, equations have been developed that fit the results observed in various species (Crow, 1990). The best-known and most widely-used mapping function is an early one developed by Kosambi (1944):

? ?(7.2)

By solving equation 7.2 for the observed recombination fraction,r, one obtains the "Kosambi estimate" of the map distance, mK , which is converted into centimorgans through multiplication by 100. Later, Carter and Falconer (1951) developed a mapping function that assumes even greater levels of interference based on the results obtained with linkage studies in the mouse:

? ?(7.3)

Although it is clear that the Carter-Falconer mapping function is the most accurate for mouse data, the Kosambi equation was more easily solvable in the days before cheap, sophisticated hand-held calculators were available. Although the Carter-Falconer function is readily solvable today, it is not as well-known and not as widely used as it should be.

Interference works to the benefit of geneticists performing linkage studies for two reasons. First, the approximate linearity between recombination frequency and genetic distance is extended out much further than anticipated from strictly independent events. Second, the very low probability of multiple recombination events can serve as a means for distinguishing the correct gene order in a three-locus cross, since any order that requires double recombinants among markers within a 20 cM interval is suspect. When all possible gene orders require a double or triple crossover event, it behooves the investigator to go back and re-analyze the sample or samples in which the event supposedly occurred. Finally, if the genotypings are shown to be correct, one must consider the possibility that an isolated gene conversion event has occurred at the single locus that differs from those flanking it.

7.2.3 Crossover sites are not randomly distributed

7.2.3.1 Theoretical considerations in the ideal situation

Although genetic interference will restrict the randomness with which crossover events are distributed relative to each other within individual gametes, it will not affect the random distribution of crossover sites observed in large numbers of independent meiotic products. Thus, a priori, one would still expect the resolution of a linkage map to increase linearly with the number of offspring typed in a genetic cross. Assuming random sites of recombination, the average distance, in centimorgans, between crossover events observed among the offspring from a cross can be calculated according to the simple formula [100/N] where N is the number of meiotic events that are typed. For example, in an analysis of 200 meiotic events (200 backcross offspring or 100 intercross offspring), one will observe, on average, one recombination event every 0.5 cM. With 1000 meiotic events, the average distance will be only 0.1 cM which is equivalent to approximately 200 kb of DNA. Going further according to this formula, with 10,000 offspring, one would obtain a genetic resolution that approached 20 kb. This would be sufficient to separate and map the majority of average-size genes in the genome relative to each other.

Once again, however, the results obtained in actual experiments do not match the theoretical predications. In fact, the distribution of recombination sites can deviate significantly from randomness at several different levels. First, in general, the telomeric portions of all chromosomes are much more recombinogenic than are those regions closer to the centromere in both mice (de Boer and Groen, 1974) and humans (Laurie and Hulten, 1985). This effect is most pronounced in males and it leads to a rubber-band-like effect when one tries to orient male and female linkage maps relative to each other (Donis-Keller et al., 1987). Second, different sites along the entire chromosome are more or less prone to undergo recombination. Third, even within the same genomic region, rates of recombination can vary greatly depending on the particular strains of mice used to produce the hybrid used for analysis (Seldin et al., 1989; Reeves et al., 1991; Watson et al., 1992). Finally, the sex of the hybrid can also have a dramatic effect on rates of recombination (Reeves et al., 1991).

7.2.3.2 Gender-specific differences in rates of recombination

Gender-specific differences in recombination rates are well-known. In general, it can be stated that recombination occurs less frequently during male meiosis than during females meiosis. An extreme example of this general rule is seen in Drosophila melanogaster where recombination is eliminated completely in the male. In the mouse, the situation is not as extreme with males showing a rate of recombination that is, on average, 50 to 85% of that observed in females (Davisson et al., 1989). However, the ratio of male-to-female rates of recombination can vary greatly among different regions of the mouse genome. In a few regions, the recombination rates are indistinguishable between sexes, and in even fewer regions yet, the male rates of recombination exceed female rates. Nevertheless, the general rule of higher recombination rates in females can be used to maximize data generation by choosing gender appropriately for a heterozygous F1 animal in a backcross. For example, to maximize chances of finding initial evidence for linkage, one could choose males as the F1 animals, but to maximize the resolution of a genetic map in a defined region, it would be better to use females. These considerations are discussed further in section 9.4.

7.2.3.3 Recombinational hotspots

The most serious blow to the unlimited power of linkage analysis has come from the results of crosses in which many thousands of offspring have been typed for recombination within small well-defined genomic regions. When the recombinant chromosomes generated in these crosses were examined at the DNA level, it was found that the distribution of crossover sites was far from random (Steinmetz et al., 1987). Instead, they tended to cluster in very small "recombinational hotspots" of a few kilobases or less in size (Zimmerer and Passmore, 1991; Bryda et al., 1992) The accumulated data suggest that these small hotspots may be distributed at average distances of several hundred kilobases apart from each other with 90% or more of all crossover events restricted to these sites.

The finding of recombinational hotspots in mice is surprising because it was not predicted from very high resolution mapping studies performed previously in Drosophila which showed an excellent correspondence between linkage and physical distances down to the kilobase level of analysis (Kidd et al., 1983). Thus, this genetic phenomenon — like genomic imprinting (section 5.5) — might be unique to mammals. Unlike imprinting, however, the locations of particular recombinational hotspots do not appear to be conserved among different subspecies or even among different strains of laboratory mice.

Figure 7.5 illustrates the consequences of hotspot-preferential crossing over on the relationship between linkage and physical maps. In this example, two thousand offspring from a backcross were analyzed for recombination events between the fictitious A and F loci. These loci are separated by a physical distance of 1500 kb and, in our example, 17 crossover events (indicated by short vertical lines on the linkage map) were observed among the 2000 offspring. A recombination frequency of 17/2000 translates into a linkage distance of 0.85 cM. This linkage distance is very close to the 0.75 cM predicted from the empirically-determined equivalence of 2000 kb to one centimorgan. However, when one looks further at loci between A and F, the situation changes dramatically. The B and C loci are only 20 kb apart from each on the physical map but are 0.4 cM apart from each other on the linkage map because a hotspot occurs in the region between them. With random sites of crossing over, the linkage value of 0.4 cM would have predicted a physical distance of 800 kb. The reciprocal situation occurs for the loci D and E which are separated by a physical distance of 400 kb but which show no recombination in 2000 offspring. In this case, random crossing over would have predicted a physical distance of less than 100 kb.

The existence and consequences of recombinational hotspots can be viewed in analogy to the quantized nature of matter. For experiments conducted at low levels of resolution — for example, in measurements of grams or centimorgans — the distribution of both matter and crossover sites will appear continuous. At very high levels of resolution, however, the discontinuous nature of both will become apparent. In practical terms, the negative consequences of hotspots on the resolution of a mouse linkage map will only begin to show up as one goes below the 0.2 cM level of analysis.

With the limited number of very-large-sample linkage studies performed to date, it is not possible to estimate the portion of the mouse genome that is dominated by hotspot-directed recombination. Furthermore, it is still possible that some genomic regions will allow unrestricted recombination as in Drosophila. Nevertheless, the available data suggest that for much of the genome, there will be an upper limit to the resolution that can be achieved in linkage studies based on a single cross. This limit will be reached at a point when the density of crossover sites passes the density of hotspots in the region under analysis. From the data currently available, it appears likely that this point will usually be crossed with the analysis of 1000 meiotic events corresponding to 0.1 cM or 200 kb. One strategy that can be used to overcome this limitation is to combine information obtained from several crosses with different unrelated inbred partners, each of which is likely be associated with different hotspot locations. This approach is discussed more fully in section 9.4.

7.2.3.4 Frequencies of recombination can vary greatly between different chromosomal regions.

As mentioned previously, the telomeric portions of chromosomes show higher rates of recombination per DNA length than more centrally located chromosomal regions. However, there is still great variation in recombination rates even among different non-telomeric regions. Some one-megabase regions produce recombinants at a rate equivalent to 2 cM or greater, whereas other regions of equivalent size only recombine with a rate equivalent to 0.5 cM or less in animals of the same gender. This variation could be due to differences in the number and density of recombination hotspots. In addition, the "strength" of individual hotspots, in terms of recombinogenicity, may differ from one site to another. Such differences could be specified by the DNA sequences at individual hotspots or by the structure of the chromatin that encompass multiple hotspots in a larger interval. A final variable may be generalized differences in the rates at which recombination can occur in regions between hotspots. Many more empirical studies will be required to sort through these various explanations.

7.2.4 A history of mouse mapping

7.2.4.1 The classical era

Although its significance was not immediately recognized, the first demonstration of linkage in the mouse was published in 1915 by the great twentieth century geneticist J.B.S. Haldane (1915). What Haldane found was evidence for coupling between mutations at the albino (c) and pink-eyed dilution (p) loci, which we now know to lie 15 cM apart on Chr 7. Since that time, the linkage map of the mouse has expanded steadily at a near-exponential pace. During the first 65 years of work on the mouse map, this expansion took place one locus at a time. First, each new mutation had to be bred into a strain with other phenotypic markers. Then further breeding was pursued to determine whether the new mutation showed linkage to any of these other markers. This process had to be repeated with different groups of phenotypic markers until linkage to one other previously mapped marker was established. At this point, further breeding studies could be conducted with additional phenotypic markers from the same linkage group to establish a more refined map position.

In the first compendium of mouse genetic data published in the Biology of the Laboratory Mouse in 1941 (Snell, 1941), a total of 24 independent loci were listed, of which 15 could be placed into seven linkage groups containing either two or three loci each; the remaining nine loci were found not to be linked to each other or to any of the seven confirmed linkage groups. By the time the second edition of the Biology of the Laboratory Mouse was published in 1966, the number of mapped loci had grown to 250, and the number of linkage groups had climbed to 19, although in four cases, these included only two or three loci (Green, 1966).

With the 1989 publication of the second edition of the Genetic Variants and Strains of the Laboratory Mouse (Lyon and Searle, 1989), 965 loci had been mapped on all twenty recombining chromosomes. However, even at the time that this map was actually prepared for publication (circa late 1987), it was still the case that the vast majority of mapped loci were defined by mutations that had been painstakingly incorporated into the whole genome map through extensive breeding studies.

7.2.4.2 The middle ages: recombinant inbred strains

The first important conceptual breakthrough aimed at reducing the time, effort, and mice required to map single loci came with the conceptualization and establishment of recombinant inbred (abbreviated RI) strains by Donald Bailey and Benjamin Taylor at the Jackson Laboratory (Bailey, 1971; Taylor, 1978; Bailey, 1981). As discussed in detail in section 9.2, a set of RI strains provides a collection of samples in which recombination events between homologs from two different inbred strains are preserved within the context of new inbred strains. The power of the RI approach is that loci can be mapped relative to each other within the same "cross" even though the analyses themselves may be performed many years apart. Since the RI strains are essentially pre-formed and immortal, typing a newly-defined locus requires only as much time as the typing assay itself.

Although the RI mapping approach was extremely powerful in theory, during the first two decades after its appearance, its use was rather limited because of two major problems. First, analysis was only possible with loci present as alternative alleles in the two inbred parental strains used to form each RI set. This ruled out nearly all of the many loci that were defined by gross phenotypic effects. Only a handful of such loci — primarily those that affect coat color — were polymorphic among different inbred strains. In fact, in the pre-recombinant DNA era, the only other loci that were amenable to RI analysis were those that encoded: (1) polymorphic enzymes (called allozymes or isozymes) that were observed as differentially migrating bands on starch gels processed for the specific enzyme activity under analysis (Womack, 1979); (2) immunological polymorphisms detected at minor histocompatibility loci (Graff, 1978); and (3) other polymorphic cell surface antigens (called alloantigens or isoantigens) that could be distinguished with specially developed "allo-antisera" (Boyse et al., 1968). In retrospect, it is now clear that RI strains were developed ahead of their time; their power and utility in mouse genetics is only now — in the 1990s — being fully unleashed.

7.2.4.3 DNA markers and the mapping panel era

Two events that occurred during the 1980s allowed the initial development of a whole genome mouse map that was entirely based on DNA marker loci. The first event was the globalization of the technology for obtaining DNA clones from the mouse genome and all other organisms. Although the techniques of DNA cloning had been developed during the 1970s, stringent regulations in the U.S. and other countries had prevented their widespread application to mammalian species like the mouse (Watson and Tooze, 1981). These regulations were greatly reduced in scope during the early years of the 1980s so that investigators at typical biological research facilities could begin to clone and characterize genes from mice. The globalization of the cloning technology was greatly hastened in 1982 by the publication of the first highly detailed cloning manual from Cold Spring Harbor Laboratory, officially entitled Molecular Cloning: A Laboratory Manual, but known unofficially as "The Bible" (Maniatis et al., 1982).

Although DNA clones were being recovered at a rapid rate during the 1980s, from loci across the mouse genome, their general utilization in linkage mapping was not straightforward. The only feasible technique available at the time for mapping cloned loci was the typing of restriction fragment length polymorphisms (RFLPs). Unfortunately, as discussed earlier in this book (sections 2.3 and 3.2), the common ancestry of the traditional inbred strains made it difficult, if not impossible, to identify RFLPs between them at most cloned loci.

The logjam in mapping was broken not through the development of a new molecular technique, but rather, through the development of a new genetic approach. This was the second significant event in terms of mouse mapping during the 1980s — the introduction of the interspecific backcross. François Bonhomme and his French colleagues had discovered that two very distinct mouse species — M. musculus and M. spretus — could be bred together in the laboratory to form fertile F1 female hybrids (Bonhomme et al., 1978). With the three million years that separate these two Mus species (section 2.3), basepair substitutions have accumulated to the point where RFLPs can be rapidly identified for nearly every DNA probe that is tested. Thus, by backcrossing an interspecific super-heterozygous F1 female to one of its parental strains, it becomes possible to follow the segregation of the great majority of loci that are identified by DNA clones through the use of RFLP analysis.

Although the "spretus backcross" could not be immortalized in the same manner as a set of RI strains, each of the backcross offspring could be converted into a quantity of DNA that was sufficient for RFLP analyses with hundreds of DNA probes. In essence, it became possible to move from a classical three-locus backcross to a several-hundred-locus backcross. Furthermore, the number of loci could continue to grow as new DNA probes were used to screen the members of the established "mapping panel" (until DNA samples were used up). The spretus backcross revolutionized the study of mouse genetics because it provided the first complete linkage map of the mouse genome based on DNA markers and because it provided mapping panels that could be used to rapidly map essentially any new locus that was defined at the DNA level.

7.2.4.4 Microsatellites

The most recent major advance in genetic analysis has come not from the development of new types of crosses but from the discovery and utilization of PCR-based DNA markers that are extremely polymorphic and can be rapidly typed in large numbers of animals with minimal amounts of sample material. These powerful new markers — especially microsatellites — have greatly diminished the essential need for the spretus backcross and they have breathed new life into the usefulness of the venerable RI strains. Most importantly, it is now possible for individual investigators with limited resources to carry out independent, sophisticated mapping analyses of mutant genes or complex disease traits. As Philip Avner of the Institut Pasteur in Paris states: "If the 1980s were the decade of Mus spretus — whose use in conjunction with restriction fragment length polymorphisms revolutionized mouse linkage analysis, and made the mouse a formidably efficient system for genome mapping — the early 1990s look set to be the years of the microsatellite" (Avner, 1991). Microsatellites and other PCR-typable polymorphic loci are discussed at length in section 8.3.

7.3 General strategies for mapping mouse loci

How should one go about performing a mapping project? The answer to this question will be determined by the nature of the problem at hand. Is there a particular locus, or loci, of interest that you wish to map? If so, at what level is the locus defined, and at what resolution do you wish to map it? Is the locus associated with a DNA clone, a protein-based polymorphism, or a gross phenotype visible only in the context of the whole animal? Are you interested in mapping a transgene insertion site unique to a single line of animals? Do you have a new mutation found in the offspring from a mutagenesis experiment? Alternatively, are you isolating clones to be used as potential DNA markers for a specific chromosome or subchromosomal region with the need to know simply whether each clone maps to the correct chromosome or not? The answers to these questions will lead to the choice of a general mapping strategy.

7.3.1 Novel DNA clones

Gene cloning has become a standard tool for analysis by biologists of all types from those studying protein transport across cell organelles to those interested in the development of the nervous system. Genes are often cloned based on function or pattern of expression. With a cloned gene in-hand, how does one determine its location in the genome? Today, the answer to this question is always through the use of an established mapping panel as described at length in chapter 9. Mapping with established panels is relatively painless and very quick. Furthermore, it can provide the investigator with a highly accurate location within a single chromosome of the mouse genome. With these results in-hand, it is always worthwhile to determine whether the newly-mapped clone could correspond to a locus previously defined by a related trait or disease phenotype. This can be accomplished by consulting the most recent version of the genetic map for the region of interest. Maps and further genetic information for each mouse chromosome are prepared annually in reports by individual mouse chromosome committees. These reports are published together as a compendium in a special issue of Mammalian Genome. This information is also available electronically from the Jackson Laboratory (see Appendix B). If a relationship is suspected between a cloned locus and a phenotypically-defined locus, further genetic studies of the type described in chapter 9 should be pursued.

7.3.2 Transgene insertion sites

Transgene insertion sites are unique in that the inserted foreign sequence is present in its particular genomic location only in the founder of the transgenic line and those descendants to which the transgene has been transmitted. This uniqueness rules out the use of mapping panels for analysis when only the transgene itself is available as a probe. There are several general approaches to the mapping of transgene insertion sites, and each has advantages and disadvantages. The first approach is in situ hybridization (section 10.2). The first advantage here is that the actual DNA used for embryo injection can now be used as a probe for mapping. Thus, one avoids the need to clone endogenous sequences that flank the insertion site in each and every founder line to be analyzed. A second advantage is that the analysis can be performed on a single animal and there is no need to carry out extensive crosses. The main disadvantage is the specialized nature of the in situ technique as mentioned previously.

A second approach is to clone genomic sequences that flank the inserted DNA from each founder line of interest. Once a flanking sequence is obtained, it can be analyzed like any other novel DNA sequence with the use of mapping panels as described in section 9.3. The advantage to this approach is that it requires only standard molecular biology protocols. The disadvantage is that an additional cloning step is required for each founder line. Cloning endogenous sequences is may be complicated by the chaotic nature of most transgene insertion events, with multiple copies of the transgene sequence intermingled with endogenous sequences in a chaotic manner.

A third approach is to follow the segregation of the transgene in relation to DNA markers that span the mouse genome in a standard backcross or intercross analysis as described in section 9.4. The advantages to this approach are that only standard molecular biology protocols are required and there is no need for any cloning of endogenous sequences. The main disadvantage is the time and expense of generating and typing a novel mouse mapping panel.

The choice of a mapping approach will be highly dependent on the what is viewed as common practice in each investigator’s laboratory. If one has access to the in situ hybridization technology, this will be the fastest and least expensive approach. If genomic library production and screening are commonly performed protocols, then the second approach would likely be the best one to follow. Finally, if an investigator has an active breeding program and is facile at producing and analyzing large panels of mice, the third approach might be the easiest to follow.

7.3.3 Verification of region-specific DNA markers

When investigators are interested in the genetic analysis of a particular chromosome or subchromosomal region, they often begin by screening a specialized library that is enriched for clones from the region of interest (section 8.4). In such cases, initial genetic mapping is limited to the question of whether a cloned sequence localizes to this region or not. The most efficient way to answer this question for a large number of clones is through the analysis of one or a few somatic cell hybrid lines that contain the chromosome of interest within the genetic background of another host species as described in section 10.2. In the simplest cases, hybridization to a blot that contains restriction enzyme-digested DNA from three samples — mouse, the somatic cell hybrid line, and a cell line from the somatic cell host species — will provide the answer. Clones that are found to map to the region of interest can then be analyzed in more detail with mapping panels or other genetic tools developed for the particular project.

7.3.4 Loci defined by polypeptide products

In some cases, even today, the protein product of a locus may be identified before the locus itself is cloned. If the protein is truly of interest, it is likely that this state will be a temporary one, since numerous protocols have been devised to proceed backwards from a protein product to its coding sequence in the genome. Nevertheless, it is sometimes possible to map the gene which encodes a defined protein before a DNA clone becomes available. If the protein is associated with an enzymatic activity that is expressed constitutively — a so-called housekeeping function — it is often possible to assay for its expression among a panel of somatic cell hybrid lines, each of which contains a defined subset of mouse chromosomes as described in section 10.2. As long as the mouse enzyme is generally expressed in somatic cells and is distinguishable from the homologous protein produced by the host species used to construct the somatic cell hybrid panel, a chromosomal assignment can be attained. Following along this line of analysis, subchromosomal mapping can be performed when somatic cell hybrid lines are available that contain defined segments of the chromosome in question. However, in most cases, the level of mapping resolution will still be quite low.

Linkage analysis can only be performed in those cases where different strains of mice are found to express distinguishable allelic forms of the protein. Protein polymorphisms are detectable in a number of different ways. In the earliest pre-recombinant DNA studies, assays were developed to detect specific enzymatic activities within mixtures of cellular proteins that had been separated by starch gel electrophoresis. Allelic differences involving charged amino acids caused enzyme molecules to migrate with different mobilities in a starch gel and the in situ detection system allowed the visualization of these alternative enzyme forms which are known as "isozymes."

A more general approach to detecting allelic charge differences in proteins relies on the technique of isoelectric focusing, usually within the context of a two-dimensional polyacrylamide gel where the second dimension involves a molecular weight-based separation with SDS (O’Farrell, 1975). High resolution two-dimensional gel electrophoresis can resolve up to 2,000 polypeptide spots from whole cell extracts (Garrels, 1983). Although this approach to mapping has been used with success in the past (Elliott, 1979; Silver et al., 1983), in most cases it is rather tedious since a separate two stage gel must be run for each animal to be typed. However, when the sample size is small, for example, with two members of a congenic pair, a two-dimensional search for polypeptide polymorphisms becomes much more feasible (Silver et al., 1983).

A special class of polypeptide polymorphisms are those that are detected as antigenic differences through any of a variety of immunological assays. Most immuno-assays are quick and easy to perform and this allows the rapid mapping of genes that encode polymorphic antigens. A variety of other biochemical differences can result from alternative alleles at some loci, such as differences in enzyme kinetics. Any easily-assayed difference can be exploited to map the underlying gene. Finally, in those cases where no polymorphism is detected, it makes sense to wait for a clone of the gene that can be used as a direct tool for mapping.

7.3.5 Mutant phenotypes

For loci defined by phenotype alone, rapid mapping is usually not possible. Interest in the new phenotype is likely to lie within its novelty and, as such, the parental strains used in all standard mapping panels are almost certain to be wild-type at the guilty locus. Thus, a broad-based recombinational analysis can be accomplished only by starting from scratch with a cross between mutant animals and a standard strain. Before one embarks on such a large-scale effort, it makes sense to consider whether the mutant phenotype, or the manner in which it was derived, can provide any clues to the location of the underlying mutation. Is the mutant phenotype similar to one that has been previously described in the literature? Does the nature of the phenotype provide insight into a possible biochemical or molecular lesion?

The most efficient way to begin a search for potentially-related loci is to search through the detailed compilation of mouse loci and their effects in the Mouse Locus Catalog (MLC) published in the Genetic variants and strains of the Laboratory Mouse (Lyon and Searle, 1989) and available on-line through an Internet Gopher Hole at the Jackson Laboratory (see Appendix B). It is also worthwhile to consult the human equivalent of MLC called Mendelian inheritance in man and edited by Victor McKusick (1988). This database is also available on-line (and called OMIM) through the Gopher Hole at the Genome Database maintained at Johns Hopkins University (see Appendix B). Phenotypically-related loci can be uncovered by searching each of these electronic databases for the appearance of well-chosen keywords. Finally, one can carry out a computerized on-line search through the entire biomedical literature. Once again, this search need not be confined to the mouse since similarity to a human phenotype can be informative as well.

When a possible relationship with a previously characterized locus is uncovered, genetic studies should be directed at proving or disproving identity. This is most readily accomplished when the previously characterized locus — either human or mouse — has already been cloned. A clone can be used to investigate the possibility of aberrant expression from mice that express the new mutation. And with the strategies described in section 9.4, one can follow the segregation of the cloned locus in animals that segregate the new mutation. Absolute linkage would provide evidence in support of an identity between the new mutation and the previously-characterized locus.

Even if the previously-characterized mutant locus has not yet been cloned, it may still be possible to test a relationship between it and the newly defined mutation. If the earlier mutation exists in a mouse strain that is still alive (or frozen), it becomes possible to carry out classical complementation analysis. This analysis is performed by breeding together animals that carry each mutation and examining the phenotype of offspring that receive both. If the two mutations — m1 and m2, for example — are at different loci, then the double mutant animals will have a genotype of [+/m1, +/m2]. If both mutations express a recessive phenotype, then this double mutant animal, with wild-type alleles at both loci, would appear wild-type; this would be an example of complementation. On the other hand, if the two mutations are at the same locus, then the double mutant animal would have a compound heterozygous genotype of m1/m2. Without any wild-type allele at this single locus, one would expect to see expression of a mutant phenotype; this would be an example of non-complementation.

Even if the previously-characterized mutation is extinct, it may still be possible to use its previously-determined map position as a test for the possibility that it did lie at the same locus as the newly uncovered mutation. This is accomplished by following the transmission to offspring of the newly uncovered mutation along with a polymorphic DNA marker that maps close to the previously-determined mutant map position (methods for identifying appropriate DNA markers are discussed in chapters 8). Close linkage between the new mutation and a DNA marker for the old mutation would suggest, although not prove, that the two mutations occurred at the same locus.

Finally, a similar approach can often be followed when the previously characterized mutation is uncloned but mapped in the human genome rather than the mouse. Most regions of the human genome have been associated with homologous regions in the mouse genome (Copeland et al., 1993; O’Brien et al., 1993). Thus, one can choose DNA markers from the region (or regions) of the mouse genome that is likely to carry the mouse gene showing homology to the mutant human locus. These markers can then be tested for linkage to the new mouse mutation. Again, the data would be only suggestive of an association.

In some cases, new mutations will be found to be associated with gross chromosomal aberrations. This is especially likely to be the case if the new mutation was first observed in the offspring from a specific mutagenesis study. Two mutagenic agents in particular — X-irradiation and the chemical chlorambucil — often cause chromosomal rearrangements (section 6.1). Rearrangements can also occur spontaneously and when the mutant line is difficult to breed, this provides a hint that this might indeed be the case. In any case where the suspicion of a chromosomal abnormality exists, it is worthwhile analyzing the karyotype of the mutant animals. The observation of an aberrant chromosome — with a visible deletion, inversion, or translocation — should be followed up by a small breeding study to determine if the aberration shows complete linkage to the mutant phenotype. If it does, one can be almost certain that the mutation is associated with the aberration in some way. If the chromosomal aberration is a deletion, the mutant gene is likely to lie within the deleted region. With a translocation or inversion, the mutant phenotype is likely to be due to the disruption of a gene at a breakpoint. In all cases, the next step would be to perform linkage analysis with DNA markers that have been mapped close to the sites affected by the chromosomal aberration. The aberration itself may also be useful later as a tool for cloning the gene. This is especially true for translocations since the breakpoint will provide a distinct physical marker for the locus of interest.

Another possibility to consider is whether the mutation is sex-linked. This is easily demonstrated when the mutation is only transmitted to mice of one sex. Sex linkage almost always means X chromosome linkage. If the mutation is recessive, a female carrier mated to a wild-type male will produce all normal females and 50% mutant males. If the mutation is dominant, a mutant male mated to a wild-type female will produce all normal males and all mutant females. Finally, if all efforts to map the novel phenotype by association fail, it will be necessary to set up a new mapping cross from scratch in which DNA markers from across the genome can be tested for linkage.

7.4 The final chapter of genetics

The fundamental goal of molecular genetics is to understand, at the molecular level, how genotype is translated into phenotype. To accomplish this goal, investigators everywhere are busy dissecting the genome into its component parts — the genes — and then tracking the pathway from the gene to its product to its role in the overall scheme of life. This contemporary approach to biological understanding can be divided into two parts: first, an investigator must obtain a clone of the gene, then second, he or she can use the clone in a large variety of experiments aimed at investigating the function of the gene.

7.4.1 From gene to function

There are two very different pathways to the analysis of gene function. One pathway begins with a mutant or variant phenotype and follows this back to a clone of the guilty gene; this pathway will be discussed in the next section. The other pathway begins with a clone of a transcription unit whose function is not understood, and proceeds to utilize this clone in various experiments aimed at uncovering gene function. With tens of thousands of uncharacterized transcription units sitting in every cDNA library, how does one go about choosing which ones to study. Often, clones have been chosen in a manner akin to a fishing expedition in which an investigator recovers clones from a cDNA library and selects a subset that show a pattern of expression — among tissues or developmental stages — indicative of a potential role in a particular biological process. However, an ultimate goal of the human genome project is to characterize and understand the function of all genes in the genome (Hochgeschwender, 1992). In a more directed approach toward this goal, it is possible to walk down a cloned chromosomal region and pick up each transcription unit one-by-one for further analysis of function.

The first step in the analysis of a newly cloned gene is always to determine its sequence and compare it with all other sequences stored in databases such as GenBank and others. Sequence homologies in-and-of-themselves can often be used to predict characteristics of the polypeptide encoded by the new gene under investigation. In some cases, the new product will contain a "domain" with homology to a specific "peptide motif" that is associated with a particular function in groups of previously-characterized polypeptides. For example, one or more peptide motifs have been identified that are characteristic of DNA binding domains, membrane-associated domains, various enzymatic activities, receptor functions, and many others. Peptide motifs are almost always degenerate amino acid sequences; they can vary in length from just three amino acids to over one hundred residues.

Even when the new gene product does not contain any previously-defined peptide motifs, standard search algorithms will sometimes allow the identification of previously-cloned genes that are related by descent from a common ancestral sequence. Once again, if the function of the previously-characterized gene has been determined, it can be used as a starting point for understanding the function of the new gene under investigation.

Although the sequence can sometimes provide clues to gene function, further experiments will always be required to demonstrate conclusively the role played by a particular gene product in the overall scheme of life. These further experiments can take two different forms: biochemical and genetic. A biochemical investigation is often begun by cloning the open reading frame into an expression vector which is placed back into an appropriate host cell system for the "in vitro" synthesis of large quantities of the gene product, which can then be used to immunize rabbits or mice for the production of polyclonal antisera or monoclonal antibodies. These antibodies can then be used as a tool to investigate the expression and localization of the protein both among cells and within cells, and to purify the native protein from the mouse. The purified native protein can be analyzed for enzymatic activities and for its interactions with other molecules. Biochemical studies can often provide critical insight into the function of a particular polypeptide.

The genetic approach to understanding function flows from the ability to manipulate the expression of the selected gene within the mouse and then follow the phenotypic consequences of this manipulation. The two most powerful approaches to gene manipulation are based on targeted mutagenesis and the insertion of transgene constructs of any conceivable kind into the germline of the mouse; both of these approaches are discussed in depth in chapter 6. Targeted mutagenesis allows an investigator to produce a null mutation at the locus of interest, and determine how and where the absence of the corresponding gene product affects the animal, its tissues, and its cells. The transgenic technology can be used to produce animals that miss-express the gene and/or its product — in the wrong place, the wrong time, or the wrong form. The rationale for the use of both targeted mutagenesis and directed transgenesis is that by examining the perturbations in phenotype that occur in response to perturbations of the genotype, one can gain insight — by contrast — into the true function of the normal wild-type locus.

In some cases, the genetic approach will be the one that uncovers the function of a gene, but in other cases, it will be the biochemical approach. However, these two approaches are entirely complementary and thus together they are likely to provide more information than either one can alone.

7.4.2 From phenotype to gene

The second pathway to deciphering the relationship between genotype and phenotype is based on the initial observation of an interesting new variant that distinguishes one group of individuals from another. Variants may be observed in the context of either deleterious mutations or polymorphic differences in a common traits such as growth, life span, disease resistance or various physiological parameters. In all of these cases, the phenotype will be available for analysis before the causative gene or genes. The process by which one moves from a phenotypic difference to the gene (or genes) responsible has been outlined in section 7.3.5. As discussed previously, if all other approaches fail, an investigator will be forced to take a path that is referred to as positional cloning.

There are two stages in the process of positional cloning. The first stage is the focus of a major portion of this book: the use of formal linkage analysis and other genetic approaches — as tools — to find flanking DNA markers that must lie very close to the gene of interest. With these markers in-hand, one can move to the second stage of this pathway: cloning across the region that must contain the gene responsible for the phenotype, and then identifying the gene itself apart from all other genes and non-genic sequences within this region. This second stage will be discussed in chapter 10.

7.4.3 The molecular basis of complex traits

With all of the new approaches to mapping that have been developed over the last few years, it has become possible, for the first time, to follow the segregation of the whole genome from each parent to each offspring in a cross. This, in turn, has allowed investigators to consider the exciting possibility of approaching the genetic basis for quantitative, polygenic, and multifactorial traits. In fact, most common types of phenotypic differences that distinguish one individual from another are due to the interaction of alleles at more than one locus, and expression is often modified by environmental factors as well. The available inbred strains provide a treasure chest of polygenic differences that control characteristics as diverse as size, life span, reproductive performance, aggression, and levels of susceptibility or resistance to particular diseases, both infectious and inherited. The golden age of mammalian genetics beckons: the genetic components of any and all traits that show variation between different mice are now amenable to dissection with classical genetic tools that can provide a means for obtaining clones of all of the genes involved which can, in turn, be used as tools to understand each trait at the molecular level.