Mouse Genetics by Lee M. Silver, chapter 3

Mouse Genetics: Concepts & Applications (Full Table of Contents)

3. Laboratory Mice

3.1 Sources of laboratory mice

3.2 Mouse crosses and standard strains

3.2.1 Outcrosses, backcrosses, intercrosses, and incrosses

3.2.2 The generation of inbred strains

3.2.3 The classical inbred strains

3.2.4 Segregating inbred strains

3.2.5 Newly derived inbred strains

3.2.6 F1 hybrids

3.2.7 Outbred stocks

3.3 Coisogenics, congenics, and other specialized strains

3.3.1 The need to control genetic background

3.3.2 Coisogenic strains

3.3.3 Congenic and related strains

3.3.4 Recombinant inbred and related strains

3.4 Standardized nomenclature

3.4.1 Introduction

3.4.2 Strain symbols

3.4.3 Locus names and symbols

3.4.4 Alleles

3.4.5 Transgene loci

3.4.6 Further details

3.5 Strategies for record-keeping

3.5.1 General requirements

3.5.2 The mating unit system

3.5.3 The animal/litter system

3.5.4 Comparison of record-keeping systems

3.5.5 A computer software package for mouse colony record-keeping

3.1 Sources of laboratory mice

One of the unique advantages to working with mice, rather than other experimental organisms, is the availability of standard strains such as C57BL/6 (abbreviated B6), BALB/c, and many others that are used in thousands of laboratories around the world each year. With the use of the same standard inbred strain, it is possible to eliminate genetic variability as a complicating factor in comparing results obtained from experiments performed in Japan, Canada, Germany, or any other country in the world. Furthermore, for the most part, results obtained in 1992 can be directly compared to results obtained in 1962 or any other year. But, where do these standard strains come from and how can one be sure that a mouse advertised as BALB/c is actually a BALB/c mouse?

When two animals have the same strain name — such as BALB/c — it means that they can both trace their lineage back through a series of brother-sister matings to the very same mating pair of inbred animals. The breeding protocol through which these original progenitors became inbred is discussed later in this chapter. However, the important point is that unlike the world of computers, where there can be many independent imitation models of a standard such as the IBM PC, there is no such thing as an imitation BALB/c mouse. Two animals either have a common heritage or they do not. If not, they cannot share the same name. Thus, a strain name implies a history, and the histories of the traditional inbred strains are well-documented (see Table 3.2).

A handful of U.S. suppliers provide various strains of mice to researchers. Addresses and phone numbers for each are provided in the appendix; all will provide free catalogs upon request. The Jackson Laboratory (or the JAX as it is commonly abbreviated) maintains an extensive mouse breeding facility with a very large collection of commonly used (and not so commonly used) strains for sale to other scientists. Their 1991 catalog lists hundreds of different inbred strains and substrains of many different types including all of the "standards" as well as newly developed strains and mice that carry various mutant alleles or chromosomal aberrations. Other U.S. suppliers have a more limited selection, but the largest of these — Charles River Laboratory, Taconic Farms, and Harlan Sprague Dawley — may actually sell even more mice than the JAX. Each of these three companies stocks a set of common inbred strains — including BALB/c, C57BL/6, C3H, DBA/2 and several others — as well as hybrids and non-inbred strains. Two other U.S. companies — Hilltop Lab Animals and Life Sciences, Inc. — have more focused lists of strains with the latter devoted essentially to the sale of special athymic strains used in various immunological studies. All of these suppliers provide high quality, disease-free animals that are constantly monitored for genetic purity.

Once a supplier has been chosen for a particular set of experiments, it is best to stay with that supplier for all future orders of mice. The reason for this is that even though all suppliers propagate their stocks with constant brother-sister matings, and B6 mice sold by JAX, Charles River, or Taconic can all trace their pedigree back to common founder animals, it is still the case that each independently-maintained line will slowly drift apart genetically from its ancestors and distant cousins. Most of the standard inbred strains sold by companies are derived from a genetic resource maintained by the National Institutes of Health (NIH). The NIH inbred lines have been maintained separately from corresponding Jackson Laboratory lines since at least the early 1950s. The B6 strain was only at generation F32 when this separation occurred; by the beginning of 1994, the JAX strain had reached generation F187 and NIH-derived strains sold by Taconic and Charles River had reached generations 155 and 160 respectively (Table 3.2). The differences that have accumulated over this large number of generations may, or may not, have an impact upon the particular genetic characteristics of importance to any particular experiment, but it is critical to be aware of this possibility. To foster this awareness, independently-maintained inbred strains are given different "substrain" designations which follow the standard name and provide an account of past history. For example, the full name for the standard B6 mouse sold by the Jackson Laboratory is C57BL/6J where J is the symbol for JAX. The B6 mice sold by both Charles River and Taconic have substrain symbols with multiple parts including N as an indication of their NIH derivation and BR to indicate that they are maintained in barrier facilities. Each supplier has also incorporated a sub-symbol that uniquely identifies the animals that they sell — the full names are C57BL/6NCrlBR for B6 mice supplied by Charles River Laboratory and C57BL/6NTacfBR for B6 supplied by Taconic Farms.

3.2 Mouse crosses and standard strains

3.2.1 Outcrosses, backcrosses, intercrosses, and incrosses

A formal classification system has been developed to describe the various types of crosses that can be set up between mice having defined genetic relationships relative to each other at one or more loci. For the sake of simplicity in describing these crosses, I will arbitrarily use a single locus (the A locus) with two alleles (A and a) to represent the situation encountered for the whole genome. With a simple two allele system, there are only four generalized classes of crosses that can be carried out: each of these is defined in Table 3.1 and described in more detail in the following discussion.

At the start of most breeding experiments, there is usually an outcross, which is defined as a mating between two animals or strains considered unrelated to each other. In many experiments, the starting material for this outcross is two inbred strains. As described in the next section, all members of an inbred strain are, for all practical purposes, homozygous across their entire genome and genetically identical to each other. Thus, an outcross between two inbred strains can be symbolized as A/A x a/a, and the offspring resulting from such a cross are called the first filial generation, symbolized by F1. All F1 animals that derive from an outcross between the same pair of inbred strains are identical to each other with a heterozygous genome symbolized as A/a. However, when either or both parents are not inbred, as indicated in the second more generalized outcross mating shown in Table 3.1, F1 siblings will not be identical to each other.

An outcross between two inbred strains or between one inbred strain and a non-inbred animal that contains a genetic variant of interest is almost always the first breeding step performed in a linkage analysis. The F1 animals obtained from this outcross can be used in two types of crosses commonly performed by mouse geneticists — backcrosses and intercrosses. A mating between a heterozygous F1 animal (with an A/a genotype) and one that is homozygous for either the A or a allele is called a backcross. This term is derived from the vision of an F1 animal being mated "back" to one of its parents. In actuality, a backcross is usually accomplished by mating F1 animals with other members of a parental strain rather than a parent itself. The two generation outcross-backcross combination is one of the major breeding protocols used in linkage analysis as described in detail in chapter 9. From Mendel’s first law of segregation, we know that the offspring from a backcross to the a/a parent will be distributed in roughly equal proportions between two genotypes at any single locus — approximately 50% will be heterozygous A/a, and approximately 50% will be homozygous a/a.

A mating set up between brothers and sisters from the F1 generation, or between any other two animals that are identically heterozygous at a particular locus under investigation, is called an intercross. The two generation outcross-intercross series was the classic breeding scheme used by Mendel in the formulation of his laws of heredity, and it is the second major breeding protocol used today for linkage analysis in mice. Again, according to Mendel’s first law, the offspring from an intercross will be distributed among three genotypes at any single locus — 50% will be heterozygous A/a, 25% will be homozygous A/A, and 25% will be homozygous a/a. The particular uses of each of the two major protocols for linkage analysis — outcross-backcross and outcross-intercross — are discussed in chapter 9.

A mating between two members of the same inbred strain, or between any two animals having the same homozygous genotype is called an incross. The incross serves primarily as a means for maintaining strains of animals that are inbred or carry particular alleles of interest to the investigator. All offspring from an incross will have the same homozygous genotype which is identical to that present in both parents.

3.2.2 The generation of inbred strains

The offspring that result from a mating between two F1 siblings are referred to as members of the "second filial generation " or F2 animals, and a mating between two F2 siblings will produce F3 animals, and so on. An important point to remember is that the filial (F) generation designation is only valid in those cases where a protocol of brother-sister matings has been strictly adhered to at each generation subsequent to the initial outcross. Although all F1 offspring generated from an outcross between the same pair of inbred strains will be identical to each other, this does not hold true in the F2 generation which results from an intercross where three different genotypes are possible at every locus. However, at each subsequent filial generation, genetic homogeneity among siblings is slowly recovered in a process referred to as inbreeding. Eventually, this process will lead to the production of inbred mice that are genetically homogeneous and homozygous at all loci. The International Committee on Standardized Nomenclature for Mice has ruled that a strain of mice can be considered "inbred" at generation F20 (Committee on standardized genetic nomenclature for mice, 1989).

The process of inbreeding becomes understandable when one realizes that at each generation beyond F1, there is a finite probability that the two siblings chosen to produce the subsequent generation will be homozygous for the same allele at any particular locus in the genome. If, for example, the original outcross was set up between animals with genotypes A/A and a/a at the A locus, then at the F2 generation, there would be animals with three genotypes A/A, A/a, and a/a present at a ratio of 0.25:0.50:0.25. When two F2 siblings are chosen randomly to become the parents for the next generation, there is a defined probability that these two animals will be identically homozygous at this locus as shown in figure 3.1. Since the genotypes of the two randomly chosen animals are independent events, one can derive the probability of both events occurring simultaneously by multiplying the individual probabilities together according to the "law of the product". Since the probability that one animal will be A/A is 0.25, the probability that both animals will be A/A is 0.25 x 0.25 = 0.0625 (figure 3.1). Similarly, the probability that both animals will be a/a is also 0.0625. The probability that either of these two mutually exclusive events will occur is derived by simply adding the individual probabilities together according to the "law of the sum" to obtain 0.0625 + 0.0625 = 0.125.

If there is a 12.5% chance that both F2 progenitors are identically homozygous at any one locus, then approximately 12.5% of all loci in the genome will fall into this state at random. The consequence for these loci is dramatic: all offspring in the following F3 generation, and all offspring in all subsequent filial generations will also be homozygous for the same alleles at these particular loci. Another way of looking at this process is to consider the fact that once a starting allele at any locus has been lost from a strain of mice, it can never come back, so long as only brother-sister matings are performed to maintain the strain.

At each filial generation subsequent to F3, the class of loci fixed for one parental allele will continue to expand beyond 12.5%. This is because all fixed loci will remain unchanged through the process of incrossing, while all unfixed loci will have a certain chance of reaching fixation at each generation. At each locus which has not been fixed, matings can be viewed as backcrosses, outcrosses, or intercrosses, which are all inherently unstable since they can all yield offspring with heterozygous genotypes as shown in Table 3.1.

Figure 3.2 shows the level of homozygosity reached by individual mice at each generation of inbreeding along with the percentage of the genome that is fixed identically in both animals chosen to produce the next filial generation according to the formulas given by Green (1981). After 20 generations of inbreeding, 98.7% of the loci in the genome of each animal should be homozygous (Green, 1981). This is the operational definition of inbred. At each subsequent generation, the level of heterozygosity will fall off by 19.1%, so that at 30 generations, 99.8% of the genome will be homozygous and at 40 generations, 99.98% will be homozygous.

These calculations are based on the simplifying assumption of a genome that is infinitely divisible with all loci assorting independently. In reality, the size of the genome is finite and, more importantly, linked loci do not assort independently. Instead, large chromosomal chunks are inherited as units, although the boundaries of each chunk will vary in a random fashion from one generation to the next. As a consequence, there is an ever-increasing chance of complete homozygosity as mice pass from the thirtieth to sixtieth generation of inbreeding (Bailey, 1978). In fact, by 60 generations, one would be virtually assured of a homogeneous homozygous genome if it were not for the continual appearance of new spontaneous mutations (most of which will have no visible effect on phenotype). However, every new mutation that occurs will soon be fixed or eliminated from the strain through further rounds of inbreeding. Thus, for all practical purposes, mice at the F60 generation or higher can be considered 100% homozygous and genetically indistinguishable from all siblings and close relatives (Bailey, 1978). All of the classical inbred strains (including those in Table 3.2 and many others) have been inbred for at least 60 generations.

3.2.3 The classical inbred strains

During the first three decades of the twentieth century, a series of inbred strains were developed from mice obtained through the fancy trade (see chapter 1). A small number of these "classical strains" have, through the years, become the standards for research in most areas of mouse biology. The most important of these strains are listed in Table 3.2 along with their uses, other characteristics, and the number of generations of sequential brother-sister matings that had been accomplished, as of 1993, in the colonies of the major suppliers. Other characteristics relevant to the reproductive performance of many of the classical inbred strains are tabulated in Table 4.1. Pictures of several classical and newly derived mouse strains are presented in figure 3.3.

3.2.4 Segregating inbred strains

A special class of inbred strains are produced and maintained by brother-sister mating in the same manner just described with one major exception. Instead of selecting animals randomly at each generation for further matings to maintain the strain, an investigator purposefully selects individuals heterozygous for a mutant allele at a particular locus of interest. This "forced heterozygosity" at each generation results in the development of a "segregating inbred strain" with the same properties as all other inbred strains in regions of the genome not linked to the "segregating locus". In almost all cases, segregating inbred strains are developed around mutant loci that cause lethality, severely reduced viability, or sterility in the homozygous state. Some mutant genes — including Steel (Sl), Yellow (Ay), Brachyury (T), and Disorganization (Ds) — can be recognized through the expression of a dominant phenotype that allows direct selection of heterozygotes at each generation. With other mutant genes, heterozygotes cannot be recognized directly and must be identified by progeny testing or through closely linked marker alleles that are recognizable in the heterozygous state.

At each generation of breeding, a segregating inbred strain will produce two classes of animals: those that carry the mutant allele and those that do not. Thus, it is possible to use sibling animals as "experimental" and "control" groups to investigate the phenotypic effects of the mutation in a relatively uniform genetic background. Segregating inbred strains are conceptually similar to congenic strains and the reader should read section 3.3.3 for more information on the advantages and limitations of this approach to genetic analysis.

3.2.5 Newly derived inbred strains

When the genomes of the traditional inbred strains were first analyzed with molecular probes during the 1980s, it became clear that their common origin from the fancy mouse trade had led to a great reduction in inter-strain polymorphism at many loci (as discussed in section 2.3.4). Since polymorphisms are essential for formal linkage analysis, crosses between the traditional inbred strains were less than ideal for this purpose. This problem could be overcome with the development of new inbred strains that were genetically distinct from the traditional ones. Another driving force in the development of new strains from scratch was the realization that none of the traditional strains were derived from a single subspecies or population; instead, they were all undefined genomic mixtures from two or more subspecies. Thus, the classical laboratory mice do not actually represent any animal that exists in nature. Although for many investigators, this would not appear to be an important problem, it is likely to become more relevant in future studies that are focused on the interactions among multiple genes rather than single genes in isolation. Within the traditional strains, unnatural combinations of alleles could have subtle unnatural effects on the operation of polygenic traits. To overcome this problem, new inbred strains are routinely derived from a pair of animals captured from a single well-defined wild population. Over the last several decades, inbred strains have been developed from animals representing each of the major subspecies in the house mouse group as well as somewhat more distant species that still form fertile hybrid females with M. musculus. Inbred individuals from M. m. musculus (CZECH II/Ei), M. m. domesticus (WSB/Ei, ZALENDE/Ei), M. m. castaneus (CAST/Ei), M. spiciligus (previously M. hortulanus; PANCEVO/Ei), M. spretus (SPRET/Ei), and the faux subspecies M. molossinus (MOLF/Ei) can all be purchased from the Jackson Laboratory (see figure 2.2 for the phylogenetic relationships that exist among these various species and subspecies).

The major hurdle that must be overcome in the development of new inbred strains from wild populations is inbreeding depression which occurs most strongly between the F2 and F8 generations. The cause of this depression is the load of deleterious recessive alleles that are present in the genomes of wild mice as well as all other animal species. These deleterious alleles are constantly generated at a low rate by spontaneous mutation but their number is normally held in check by the force of negative selection acting upon homozygotes. With constant replenishment and constant elimination, the load of deleterious alleles present in any individual mammal reaches an equilibrium level of approximately ten. Different unrelated individuals are unlikely to carry the same mutations, and as a consequence, the effects of these mutations are almost never observed in large randomly-mating populations.

Thus, it not surprising that during the early stages of mouse inbreeding, many of the animals will be sickly or infertile. At the F2 to F8 generations, the proportion of sterile mice is often so great that the earliest mouse geneticists thought that inbreeding was a theoretical impossibility (Strong, 1978). Obviously they were wrong. But, to succeed, one must begin the production of a new strain with a very large number of independent F1 X F1 lines followed by multiple branches at each following generation. Most of these lines will fail to breed in a productive manner. But, an investigator can continue to breed the few most productive lines at each generation — these are likely to have segregated away most of the deleterious alleles. The depression in breeding will begin to fade away by the F8 generation with the elimination of all of the deleterious alleles. Inbreeding depression will not occur when a new inbred strain is begun with two parents who are themselves already inbred because no deleterious genes are present at the outset in this special case.

3.2.6 F1 hybrids

The most obvious advantage of working with inbred strains is genetic uniformity over time and space. Researchers can be confident that the B6 mice used in experiments today are essentially the genetic equivalent of B6 mice used ten years ago. Furthermore, one can be confident that there will always be B6 mice around to conduct experiments on. Thus, the existence of inbred strains serves to eliminate the contribution of genetic variability to the interpretation of experimental results. However, there is a serious disadvantage to working with inbred mice in that a completely inbred genome is an abnormal condition with detrimental phenotypic consequences. The lack of genomic heterozygosity is responsible for a generalized decrease in a number of fitness characteristics including body weight, life span, fecundity, litter size, and resistance to disease and experimental manipulations.

It is possible to generate mice that are genetically uniform without suffering the consequences of whole genome homozygosity. This is accomplished by simply crossing two inbred strains. The resulting F1 hybrid animals express hybrid vigor in all of the fitness characteristics just listed with an overall life span that will exceed that of both inbred parents (Green and Witham, 1991). Furthermore, as long as there are both B6 mice and DBA mice, for example, it will be possible to produce F1 hybrids between the two, and all F1 hybrids obtained from a cross between a B6 female and a DBA male will be genetically identical to each other over time and space. This particular F1 hybrid is the most common of those used and is available directly from most suppliers. All F1 hybrid animals are named with an abbreviated form of the female progenitor first, followed by the male progenitor and the "F1" symbol. The F1 hybrid generated from a cross between B6 females and DBA/2 males is named B6D2F1. Of course, uniformity will not be preserved in the offspring that result from an "intercross" between two F1 hybrids; instead random segregation and independent assortment will lead to F2 animals that are all genotypically distinct.

3.2.7 Outbred stocks

A large number of the laboratory mice sold and used by investigators around the world are considered to be outbred or random-bred. Popular stocks of such mice in the U. S. include CD-1 (Charles River Breeding Laboratories), Swiss Webster (Taconic Farms), and ICR and NIH Swiss (both from Harlan Sprague Dawley). Outbred mice are used for the same reasons as F1 hybrids — they exhibit hybrid vigor with long life spans, high disease resistance, early fertility, large and frequent litters, low neonatal mortality, rapid growth, and large size. However, unlike F1 hybrids, outbred mice are genetically undefined. Nevertheless, outbred mice are bought and used in large numbers simply because they are less expensive than any of the genetically-defined strains.

Outbred mice are useful in experiments where the precise genotype of animals is not important and when they will not contribute their genome toward the establishment of new strains. They are often ideal as a source of material for biochemical purification and as stud males for the stimulation of pseudo-pregnancy in females to be used as foster mothers for transgenic or chimeric embryos. It is unwise to use outbred males as progenitors for any strain of mice that will be maintained and studied over multiple generations; the random-bred parent will contribute genetic uncertainty which could result in unexpected results down-the-road.

If a stock of mice were truly random-bred, it would be maintained through matings that were set-up randomly among the breeding-age members of the population. Accordingly, matings would sometimes occur between individuals as closely related as siblings. In fact, most commercial suppliers follow breeding schemes that avoid crosses between closely-related individuals in order to maintain the maximal level of heterozygosity in all offspring. Thus, random-bred is a misnomer; stocks of this type should always be called non-inbred or outbred.

3.3 Coisogenics, congenics, and other specialized strains

3.3.1 The need to control genetic background

With the many new tools of molecular genetics described throughout this book, it has become easier and easier to clone genes defined by mutant phenotypes. Often, mutant phenotypes involve alterations in the process of development or physiology. In these cases, simply having a cloned copy of a gene is often not enough to critically examine the full range of effects exerted by that gene on the developmental or physiological process. In particular, normal development and physiology can vary significantly from one strain of mice to the next, and in the analysis of mutants, it is often not possible to distinguish subtle effects due to the mutation itself from effects due to other genes within the background of the mutant strain. To make this distinction, it is essential to be able to compare animals in which differences in the genetic background have been eliminated as a variable in the experiment. This is accomplished through the placement of the mutation into a genome derived from one of the standard inbred strains. It is then possible to perform a direct comparison between mutant and wild-type strains that differ only at the mutant locus. Phenotypic differences that persist between these strains must be a consequence of the mutant allele.

3.3.2 Coisogenic strains

In the best of all possible worlds, the mutation of interest will have occurred spontaneously within a strain of mice that is already inbred. In this case, one can be reasonably confident that the mutant animal differs at only a single locus from non-mutant animals of the same strain. If the mutation allows homozygous viability and fertility, it can be propagated as a strain unto itself by inbreeding offspring from the original mutant animal. If the mutation cannot be propagated in the homozygous state, it will be maintained by continuous backcrossing of heterozygous animals to the original inbred strain. In both cases, the new mutant strain is considered coisogenic because its genome is identical (isogenic) to that of its sister strain except at the mutant locus. In the past, coisogenic strains could only be obtained by luck — when a spontaneous mutation happened to occur within an inbred strain. Today, one can initiate the production of coisogenic strains at any cloned locus through the use of the gene targeting technology described in section 6.4.

Coisogenic strains are named with a compound symbol consisting of two parts separated by a hyphen: the first part is the full or abbreviated symbol for the original inbred strain; the second part is the symbol for the mutation or variant allele. If the mutation is maintained in a homozygous state within the coisogenic strain, the mutant symbol is used alone; if the mutation is maintained in a heterozygous state, the +/m genotype symbol is used (where m is the mutation). For example: if the mutation nude (nu) appeared in the BALB/cJ strain, and the new coisogenic strain was homozygous for this mutation, its complete symbol would be [BALB/cJ-nu]; if the semidominant lethal mutation T appeared in the C57BL/6J strain, and the new coisogenic strain was maintained by backcrossing to the parental strain, its symbol would be [B6-T/+].

3.3.3 Congenic and related strains

3.3.3.1 Historical perspective: the Major Histocompatibility Complex

A large number of mouse mutations and variants with interesting phenotypic effects have been identified and characterized over the last 90 years. Most of these mutations were not found within strains that were already inbred and, to date, most of the genes that underlie these mutations remain uncloned. Thus, in all of these cases, coisogenicity is not a possibility. But even when a gene has been cloned, and the generation of a coisogenic mutant through the gene targeting technology is a possibility, this approach is still extremely tedious and, at the time of this writing, there is no guarantee of a successful outcome. There are other reasons why spontaneous mouse mutations are often important even when the gene underlying the mutation has been cloned. The spontaneous mutation may not be a "knockout" but instead may exert a more subtle effect on gene function which could provide special insight into the action of the wild-type allele. Furthermore, the phenotypic effects of many older mutations have been studied in tedious detail by classical embryologists and other scientists, and it can be advantageous to a contemporary scientist to build upon these classical studies.

The "low-tech" solution to the elimination of genetic background effects in the analysis of an established mutation, or any other genetic variant, is to use breeding protocols, rather than molecular biology, to generate strains of mice that approximate coisogenics to the greatest extent possible. Mice that have been bred to be essentially isogenic with an inbred strain except for a selected differential chromosomal segment are called congenic strains. The conceptual basis for the development of congenic mice was formulated by George Snell at the Jackson Laboratory during the 1940s and it led to the first and only Nobel Prize for work strictly in the field of mouse genetics.

Snell was interested in the problem of tissue transplantation. Long before 1944, it was known that tissues could be readily transplanted between individuals of the same inbred strain without immunological rejection, but that mice of different strains would reject tissue transplants from each other. Although these observations were a clear indication of the fact that genetic differences were responsible for tissue rejection, the number and types of genes involved remained entirely unknown. In absentia, these genes were named histocompatibility (or H) loci. The assumption was that the histocompatibility genes were responsible — directly or indirectly — for the production of tissue (or "histological") markers that could be distinguished as "self" or "non-self" by an animal’s immune system. If transplanted tissue and a host recipient carried identical genotypes at all H loci, there would be no immunological response and the transplant would "take." However, if a single foreign allele at any H locus was present in the tissue, it would be recognized as foreign and attacked.

Although the number of histocompatibility loci was unknown, it was assumed to be large because of the rarity with which unrelated individuals — both mice and humans — accept each other’s tissues. The logic behind this assumption was the empirical finding that polymorphic loci are most often di-allelic and not usually associated with more than three common alleles. If H loci showed a similar level of polymorphism, a large number would be required to ensure that there would almost always be at least one allelic difference between any two unrelated individuals. The experimental problem was to identify and characterize each of the histocompatibility loci in isolation from all of the others.

Snell’s approach to this problem was to use a novel multi-generation breeding protocol based on repeated backcrossing to trap a single H locus from one mouse strain (the donor) in the genetic background of another (the inbred partner). The basic approach (developed mathematically in the following section) caused the newly forming congenic strain to become increasingly similar to the inbred partner at each generation, but only those offspring who remained histo-incompatible with the inbred partner were selected to participate in the next round of backcrossing. It was assumed that a difference at any one H locus would be sufficient to allow full histo-incompatibility. Thus, at the end of the process, Snell expected to find that each independently derived congenic line would have trapped the donor strain allele at a single random H locus. With random selection, all H loci could be isolated in different congenic strains so long as a large enough number were generated.

With this outcome in mind, Snell began the production of histo-incompatible congenic strains (originally called "congenic resistant" strains) with 125 independent lines of matings (Snell, 1978). Of these, 27 were carried through to the point at which it was possible to determine which H locus had been trapped. Surprisingly, 22 of the 27 lines had trapped the same locus, which was given the name H-2 (by chance, it was the second one identified). Contrary to expectations, the H-2 locus (now called the H2 complex since it is known to be a tightly linked complex of genes) acts, for all effective purposes, as the only strong determinant of histocompatibility. Snell and his predecessors were misled by the false assumption that only a limited number of alleles are possible at any one locus. Instead, a subset of genes within the H2 complex — known as the class I genes — are the most polymorphic in the genome with hundreds of alleles at each individual locus. The generic term "major histocompatibility complex" (MHC) is now used to designate this complex locus in mice as well as its homolog in all other mammalian species including humans, where it was historically called HLA.

3.3.3.2 Creation of a congenic strain

In the past, there were several different breeding schemes used to produce congenic mice depending on whether animals heterozygous for the donor allele at the differential locus were phenotypically distinguishable — through a dominant form of expression — from those not carrying the donor allele. It was often the case that the heterozygote could not be distinguished and, as a consequence, congenic strains had to be created through complex breeding schemes that allowed the generation of homozygotes for the variant allele in alternating generations. Today, identifying the heterozygote is almost never a problem since one will almost certainly map the locus of interest before undertaking the production of a congenic strain, and with a map position will come closely linked DNA markers. Therefore, the following discussion will be limited to the most direct, simple and efficient method of congenic construction known as the backcross or NX system which is illustrated in figure 3.4 (Flaherty, 1981).

The backcross system of congenic strain creation is straightforward in both concept and calculation. The first cross is always an outcross between the recipient inbred partner and an animal that carries the donor allele. The donor animals need not be inbred or homozygous at the locus of interest, but the other partner must be both. The second generation cross and all those that follow to complete the protocol are backcrosses to the recipient inbred strain. At each generation, only those offspring who have received the donor allele at the differential locus are selected for the next round of backcrossing.

The genetic consequences of this breeding protocol are easy to calculate. First, one can start with the conservative assumption that the donor (D) and recipient (R) strains are completely distinct with different alleles at every locus in the genome. Then, all F1 animals will be 100% heterozygous D/R at every locus. According to Mendel’s laws, equal segregation and independent assortment will act to produce gametes from these F1 animals that carry R alleles at a random 50% of their loci and D alleles at the remaining 50%. When these gametes combine with gametes produced by the recipient inbred partner (which, by definition, will have only R alleles at all loci), they will produce N2 progeny having genomes in which approximately 50% of all loci will be homozygous R/R and the remaining loci will be heterozygous D/R as illustrated in figure 3.4. Thus, in a single generation, the level of heterozygosity is reduced by about 50%. Furthermore, it is easy to see that at every subsequent generation, random segregation from the remaining heterozygous alleles will cause a further ~50% overall reduction in heterozygosity.

In mathematical terms, the fraction of loci that are still heterozygous at the Nth generation can be calculated as [(1/2)N-1], with the remaining fraction [1 - (1/2)N-1] homozygous for the inbred strain allele. These functions are represented graphically in figure 3.5. At the fifth generation, after only four backcrosses, the developing congenic line will be identical to the inbred partner across ~94% of the genome. By the tenth generation, identity will increase to ~99.8%. It is at this stage that the new strain is considered to be a certified congenic. As one can see by comparing figures 3.2 and 3.5, the development of a congenic line will take approximately half the time that it takes to develop a simple inbred line from scratch. The reason for this more rapid pace is the fact that one of the two mates involved at every generation of congenic development is already inbred.

Backcrossing can continue indefinitely after the tenth generation, but if the donor allele does not express a dominant effect that is visible in heterozygous animals, it will be easier to maintain it in a homozygous state. To achieve this state, two tenth generation or higher carriers of the selected donor allele are intercrossed and homozygous donor offspring are selected to continue the line through brother-sister matings in all following generations. The new congenic strain is now effectively inbred, and in conjunction with the original inbred partner, the two strains are considered a "congenic pair."

In some cases, it will be possible to distinguish animals heterozygous for the donor allele from siblings that do not carry it. In a subset of these cases, as well as others, a donor allele may have recessive deleterious effects on viability or fertility. In all such instances, it is advisable to maintain the congenic strain by a continuous process of backcrossing and selection for the donor allele at every generation. Congenic strains that are maintained in this manner are considered to be in a state of "forced heterozygosity". There are two major advantages to pursuing this strategy whenever possible. First, the level of background heterozygosity will continue to be reduced by ~50% through each round of breeding. Second, the use of littermates with and without the donor allele as representatives of the two parts of the congenic pair will serve to reduce the effects of extraneous variables on the analysis of the specific phenotypic consequences of the donor allele.

The rapid elimination of heterozygosity occurs only in regions of the genome that are not linked to the donor allele which, of course, is maintained by selection in a state of heterozygosity throughout the breeding protocol. Unfortunately, linkage will also cause the retention of a significant length of chromosome flanking the differential locus which is called the differential chromosomal segment. Even for congenic lines at the same backcross generation, the length of this segment can vary greatly because of the inherently random distribution of crossover sites. Nevertheless, the expected average length of the differential chromosomal segment in centimorgans can also be calculated as [200 (1 — 2-N)/N] where N is the generation number. For all values of N greater than 5, this equation can be simplified to [200/N]. This function is represented graphically in figure 3.6. As one can see, the average size of the differential segment decreases very slowly. At the tenth generation, there will still be, on average, a 20 cM region of chromosome encompassing the differential locus derived from the donor strain.

It is possible to reduce the length of the differential chromosomal segment more rapidly by screening backcross offspring for the occurrence of crossovers between the differential locus of interest and nearby DNA markers. As an example of this strategy, one could recover fifty congenic offspring from the tenth backcross generation and test each for the presence of donor alleles at DNA markers known to map at distances of one to five centimorgans on both sides of the locus of interest. It is very likely that at least one member of this backcross generation will show recombination between the differential locus and a nearby marker. The animal with the closest recombination event can be backcrossed again to the recipient strain to produce congenic mice of the eleventh backcross generation. By screening a sufficient number of these N11 animals, it should be possible to identify one or more that show recombination on the opposite side of the differential locus. In this manner, an investigator should be able to obtain a founder for a congenic strain with a defined differential chromosomal segment of five centimorgans or less after just eleven generations of breeding.

As the preceding discussion indicates, congenic strains differ from the previously described coisogenic strains in two important respects which must always be considered in the interpretation of unexpected data. First, congenic strains, especially those that have undergone only a minimum number of backcrosses, will have small random remnants of the donor strain — so-called passenger genes — scattered throughout the genome. In congenic strains maintained by inbreeding, the same passenger genes will be present in all members of the strain. In rare instances, traits attributed to the selected donor allele may actually result from one of these cryptic passenger genes. Such effects can be sorted out by breeding the congenic strain back to its original inbred partner. If a trait is due to a passenger gene, it will assort independently of the donor locus in subsequent backcrosses.

The second difference between a congenic strain and a coisogenic strain is in the chromosomal vicinity of the differential locus. Congenic strains will always differ from their inbred partner along a significant length of chromosome flanking the differential locus; coisogenic strains will only differ at the differential locus itself and nowhere else. Thus, there is always the possibility that phenotypic differences between the two members of a congenic pair are actually caused by a closely linked gene rather than the selected differential locus. This potential problem is much more difficult to resolve by simple breeding protocols.

3.3.3.3 Nomenclature

The nomenclature used for congenic strains is so similar to that used for coisogenic strains that it is sometimes not possible to distinguish between the two by name alone. In such cases, it is necessary to go back to the original source publication for clarification. There are, however, two nomenclature components which are unique to congenic strains. The first is used in those cases where a mutant or variant allele is transferred from one defined genetic background onto another. For example, one might wish to transfer the albino (c) mutation from the BALB/c strain onto a B6 background. In cases of this type, the strain which "donates" the variant allele is symbolized after the recipient strain with the two strain symbols separated by a period. This is followed by a hyphen and the symbol for the variant allele. Thus, in the example just described, the congenic strain would be named B6.BALB-c.

The final nomenclature component is an indication of the number and type of crosses that have occurred subsequent to the original mating between the recipient and donor animals. In the derivation of any new congenic strain, the first cross is always an outcross, and the offspring are considered members of the F1 generation. The second cross is always a backcross, and the offspring are considered members of the N2 generation. (Note that there is no such thing as an N1 generation). The letter ‘N’ is always used, followed by a subscripted number (Ni), to describe a series of backcross events leading to a particular generation of animals. But, remember that N10 generation offspring are the result of one outcross followed by an uninterrupted sequence of nine backcrosses to the same parental strain. Once a congenic strain is established, backcrossing to the parental stain is often stopped, and future generations are propagated by a simple inbreeding protocol. The number of generations of inbreeding is indicated, as always, with the filial generation symbol ‘F’. For example, suppose that the albino mutation has been placed onto the B6 background by an outcross followed by 14 generations of backcrosses, after which a brother-sister mating regime is begun and followed for eight more generations. The offspring produced at this stage would be considered to be members of the N15F8 generation. When generational information is incorporated into the name of a congenic strain, the numbers are no longer subscripted. So, in this example, the complete name for the congenic animals at the stage indicated would be B6.BALB-c (N15F8).

3.3.3.4 Consomic strains

Consomic strains are a variation on congenic strains in which a whole chromosome — rather than one local chromosomal region — is backcrossed from a donor strain onto a recipient background. In almost all cases, the donor chromosome is the Y. Like congenics, consomics are produced after a minimum of 10 backcross generations. Backcrossing to obtain consomics for the Y chromosome must be carried out in a single direction — males that contain the donor chromosome are always crossed to inbred females of the recipient strain. For example, to obtain a B6 strain consomic for the M. m. castaneus Y chromosome, one would start with an outcross between a B6 female and a castaneus male. F1 males, and those from all subsequent generations, would also be mated with B6 females. After ten generations, the genetic background would be essentially B6, but the Y chromosome would be castaneus. This strain could be symbolized as B6-YCAS. Obtaining strains consomic for chromosomes other than the Y is not practical at the present time.

3.3.3.5 Conplastic strains

Conplastic strains are another variation on the congenic theme, except that in this case, the donor genetic material is the whole mitochondrial genome which is placed into an alternative host. Since the mitochondrial genomes carried by all of the classical inbred strains are indistinguishable, conplasticity makes sense only in the context of interspecific or inter-subspecies crosses. Conplastic lines are generated by sequential backcrossing of females from the donor strain to recipient males; this protocol is reciprocal to the one used for the generation of Y chromosome-consomics. For example, to obtain a B6 strain conplastic for the M. m. castaneus mitochondrial genome, one would start with an outcross between a B6 male and a castaneus female. F1 females, and those from all subsequent generations, would also be mated with B6 males. After ten generations, the nuclear genome would be essentially B6 with the same statistics that hold for congenic production (figure 3.5), but all mitochondria would be derived from castaneus. This strain could be symbolized as B6-mtCAS.

3.3.4 Recombinant inbred and related strains

3.3.4.1 Recombinant inbred strains

Recombinant inbred (RI) strains are formed from an initial cross between two different inbred strains followed by an F1 intercross and 20 generations of strict brother-sister mating. This breeding protocol allows the production of a family of new inbred strains with special properties relative to each other that are discussed fully in section 9.2. Different RI strains derived from the same pair of original inbred parents are considered members of a set. Each RI set is named by joining an abbreviation of each parental strain together with an ‘X’. For example, RI strains derived from a C57BL/6J (B6) female and a DBA/2J male are members of the BXD set, and RI strains derived from AKR/J and C57L/J are members of the AKXL set. A complete listing of commonly used RI sets is given in Table 9.3. Each RI strain in a particular set is distinguished by appending a hyphen to the series name followed by a letter or number. Thus, BXD-15 is a particular RI strain that has been formed from an initial cross between a B6 female and a DBA male. At any point in time, it is always possible to add a new strain to a particular set through an outcross between the same two progenitor strains followed by 20 generations of inbreeding. The RI strains represent an important tool in the arsenal available for linkage studies of newly defined DNA loci.

3.3.4.2 Recombinant congenic strains

Recombinant congenic strains (abbreviated as RC strains) are a variation on the recombinant inbred concept (Demant and Hart, 1986). As with RI strains, the initial cross is between two distinct inbred strains. However, the next two generations are generated by backcrossing, without selection, to one of the parental strains. This sequence is followed by brother-sister mating for at least 14 generations. Whereas standard RI strains have genomes that are a mosaic of equal parts derived from both parents (as detailed in section 9.2.2), RC strains will have mosaic genomes that are skewed in the direction of the parent to which the backcrossing occurred such that a random 7/8 fraction of the genome will be derived from this parent, and a random 1/8 fraction will be derived from the other parent. Sets of RC strains have some interesting properties in terms of limiting the amount of the genome that has to be searched for multiple genes involved in quantitative traits. However, with the new PCR-based methods for genotyping highly polymorphic loci discussed in section 8.3, the advantages of the RC strains appear to have been superseded and they have not been used widely by the mouse genetics community.

3.4 Standardized nomenclature

3.4.1 Introduction

Mouse genetics is, by its very nature, a collaborative field of scientific investigation. This is because the interpretation of data collected by any one scientist is highly dependent on data collected by others. High resolution genetic maps are often formed through the integration of results obtained in many individual studies, and as each new result is published, it can be swept up into a system of databases. Large-scale integration has been possible only because all mouse geneticists speak the same language. The definition of this language is provided by the International Committee on Standardized Nomenclature for Mice which has been in existence since 1939. This committee is charged with the task of establishing and updating rules and guidelines for genetic nomenclature. The continued functioning of this committee is critical because, as the analysis of the genome becomes ever more sophisticated, new genetic entities become apparent, and these must be named in a standard fashion.

For a complete description of the "Rules and Guidelines for Gene Nomenclature," one should consult the Lyon and Searle book (Committee on standardized genetic nomenclature for mice, 1989), and updates published regularly in Mouse Genome. Here, I will briefly review the salient features of this nomenclature system with a focus on the naming of newly defined genes and loci.� Once an investigator has chosen a new name and symbol for a locus, the chair of the Committee should be contacted for confirmation that the rules have been followed properly, and the names do not conflict with others already in use.

3.4.2 Strain symbols

There is no rhyme or reason to the names given to the original inbred strains derived at the beginning of the century. The name of the famous BALB/c line was derived by co-joining the name of the investigator (Bagg) with the color of the mouse (albino). Bagg’s ALBino became BALB. Other famous strains have names based on animal numbers; for example, female no. 57 (from Abbie Lathrop’s farm) gave rise to both the C57BL/6 and C57BL/10 strains which are commonly abbreviated as B6 and B10 respectively. New inbred strains can be named freely by their originators as long as certain rules are followed: the name should be brief, and it should begin with a capital letter followed by other capital letters or (less preferably) numbers. Strains with a common origin that have been separated prior to the F20 generation must be given separate symbols, although these symbols can indicate their relationship to each other. All names should be registered with the appropriate contact person who is indicated prominently in the current issue of Mouse Genome.

Substrains can arise whenever two or more colonies of an established inbred strain are maintained in isolation from each other for a sufficient period of time to allow detectable genetic differences to become fixed. There are three specific instances where substrain formation can be considered to have occurred: (1) when branches of an inbred strain are separated before the F40 generation when residual heterozygosity is still likely, (2) when a branch has been maintained separately from other branches for 100 or more generations, and (3) when genetic differences from other branches are uncovered. Such differences can be caused by any one or more of three factors: residual heterozygosity at the time of branching, mutation, or contamination.

Substrains are indicated by appending a slash (/) to the strain symbol followed by an appropriate substrain symbol, for example, DBA/1 and DBA/2. A laboratory registration code is often included within the substrain designation, for example, C57BL/6J and C57BL/10J are two substrains of C57BL that are both maintained at the Jackson Laboratory (indicated with a J). On the other hand, a different nomenclature has been formulated recently to distinguish the same strain maintained without any apparent genetic differences by two or more laboratories. In this case, the "@" character is appended to the strain symbol followed by the laboratory registration code. For example, the SJL strain maintained by the Jackson Laboratory would be symbolized as SJL@J.

3.4.3 Locus names and symbols

In the first set of rules for distinguishing gene symbols laid down by the Committee on Mouse Genetics Nomenclature in 1940, it was stated that "the initial letter of the mutant gene symbol shall be the same as the initial letter of the mutant gene, e.g. d for dilution. Additional letters shall be added to the initial letter if necessary to distinguish it from symbols already in use" (Snell, 1941 p.242). With over three thousand independent loci identified as of 1993, the necessity of using symbols that contain more than one letter is now obvious. In fact, the recent explosion in gene and locus identifications in the mouse has brought about a re-evaluation of the entire basis for the naming of chromosomal entities. At the time of this writing, a final consensus has not yet been reached. Thus, investigators are cautioned to contact members of the International Committee on Standardized Nomenclature for Mice before settling on a name for a new genetic entity.

Each mouse locus is given a unique name and a unique symbol. In devising new names, investigators should consider their suitability for inclusion into databases. Thus, names should be limited in length to fewer than 40 characters (including spaces), and should not include Greek letters or Roman numerals. The symbol is a highly abbreviated version of the name. In published articles, locus symbols (but not names) are always set in italic font. Symbols always begin with a letter followed by any combination of letters or Arabic numbers without internal white space. In the past, symbols were typically three to eight characters in length. Today, database considerations set a preferred maximum number of characters at 10, although this rule is frequently broken.

Loci that are members of a related series of some kind are given the same primary stem and symbol followed by a distinguishing number or letter. Thus, the third esterase gene to be defined is named "Esterase 3", with the symbol Es3, and the second homeo box gene cluster to be identified is named "homeo box B cluster" with the symbol Hoxb. In the past, a hyphen was often used to separate the numeral designation from the body of the gene symbol, e.g. Es-3. This practice has now been discontinued and hyphens have been deleted from all symbols except in the special cases discussed just below.

When one member of a series has been further duplicated into a closely-linked cluster of related genes, a number can be appended to the cluster name; individual genes in the Hoxb cluster will be named homeo box B1, homeo box B2 etc. with symbols Hoxb1, Hoxb2 etc. For clusters that were initially named with an appended number — like Lamb1 — the individual symbols can be named by appending the cluster name with a hyphen followed by a number to obtain the symbols Lamb1-1, Lamb1-2 etc.

All loci can be broadly separated into two classes. The first class includes loci known to be functional or homologous to functional loci. With few exceptions, these loci are genes or pseudogenes. The second class includes sequences identified solely on the basis of DNA variation. Members of this latter class are referred to as anonymous loci because their function or lack thereof is unknown. The rules for naming each of these classes of loci follow below:

Gene names should convey in a concise form, and as accurately as possible, the character by which the gene is recognized. Genes can be named according to an expressed phenotype (retinal degeneration or shiverer), an enzyme or protein name or function (Glyoxalase-1, hemoglobin alpha chain, or octamer binding transcription factor 1), a pattern of expression (t-complex testes-expressed-1), a combination of these (Myosin light chain-alkali-fast skeletal muscle), or by homology to genes characterized in other organisms (homeo box A, B, etc.; retinoblastoma). Except in the case of genes that are first characterized through a recessive mutation, names and symbols should begin with an upper case letter. With all symbols, all letters that follow the initial character should be lower case.

Whenever a mouse gene is characterized based on homology to a gene already named in another species, the mouse homolog should be given essentially the same name and symbol. Of course, one should always check the mouse gene databases (see appendix B) to be certain that the symbol has not already been assigned. In the translation from human to mouse symbols, characters beyond the first should be converted from upper to lower case.

All pseudogenes are defined by homology to known genes. Their symbol is a combination of the known gene name (as a stem), and the pseudogene designation (ps) followed by a serial number. Thus, the third alpha globin pseudogene has been given the name "Hemoglobin alpha 3 pseudogene", with the symbol Hba-ps3.

When new loci are uncovered by hybridization with known genes, the functionality of the new locus is usually unknown. In these cases, where the locus could be either a functional gene or a pseudogene, it should be named with a "related sequence" symbol (rs). Thus, if a new locus is uncovered by cross-hybridization with a probe for the Plasminogen gene (symbolized Plg), it would be named "Plasminogen related sequence-1" and would be symbolized as Plg-rs1. If a new locus is uncovered with a probe for one member of a series of loci, the rs symbol is appended without the hyphen. Thus, a locus related to Ela1 would be symbolized as Ela1rs1.

Anonymous DNA loci are named in a straightforward manner. The symbol should begin with the character ‘D’ (for DNA), followed by an integer representing the chromosomal assignment, followed by a two to three letter registration code representing the laboratory or scientist that described the locus, followed by a unique serial number given to the locus to distinguish it from others on the same chromosome described by the same investigator. For example, the twenty-third anonymous locus mapped to chromosome 14 by the Pasteur Institute would be given the symbol D14Pas23. The name for this locus would include all of this information in longhand form— "DNA segment Chromosome 14 Pasteur 23". This DNA locus nomenclature system should be used for all loci defined only as DNA segments including, but limited to, microsatellites, minisatellites and RFLPs. To obtain a unique laboratory or investigator registration code, please contact the Institute for Laboratory Animal Resources, USA National Academy of Sciences, Washington, D.C.

Mouse homologs of anonymous DNA loci first mapped in humans are named in a somewhat different format in order to allow the connection between the two species to be perfectly transparent. In these cases, the symbol should still begin with the character ‘D’ and the mouse chromosomal assignment, but this should now be followed by the character ‘h’, the chromosomal assignment of the human homolog and its identification number. For example, a probe to the human locus D17S111 — the 111th single copy (S) anonymous locus mapped to human chromosome 17 — is used to identify a mouse homolog on Chr 1. This corresponding mouse homolog will now be named D1h17S111.

3.4.4 Alleles

In the case of a gene defined initially by a mutant phenotype, the symbol for the first defined mutant allele becomes both the gene symbol and the symbol for that allele. The corresponding wild-type allele is indicated by a + sign. For example, an animal heterozygous at the tf locus with a wild-type and the defining mutant allele would have a genotype symbolized as +/tf. In this case, the context is sufficient to indicate the association of the + symbol with the tf locus. When the context is not sufficient to indicate association, the wild-type allele of a specific locus should have the locus symbol appended to it as a superscript. Thus, the wild-type allele at the tf locus can also be designated as +tf.

In all other cases, alleles are designated by the locus symbol followed by an allele-defining symbol that is usually one or a two characters in length and set in superscript, with the entire expression set in italics. This rule also applies to mutant alleles beyond the first one that are uncovered at a phenotypically-defined locus. For computer presentation with only ASCII (text format) code, the allele designation can be set off from the locus symbol by prefixing it with a * or with angular brackets; for example, Hbbd becomes Hbb*d or Hbb<d>.

The simplest means for assigning allele names is through a series of lower case letters, beginning with a. Thus, the Hba-ps4 gene has alleles Hba-ps4a, Hba-ps4b, etc. In many cases, it can be useful to provide information within the allele symbol. For example, a M. spretus-specific allele may be given the designation s as in DXPas4s. This type of nomenclature can be extended to alleles associated with the common inbred strains such as B6 (signified by the b allele) and DBA (signified by the d allele) as well as the subspecies musculus (m), castaneus (c), and domesticus (d).

New mutations — including targeted knockouts — at previously characterized genes are denoted by a superscript m followed by a serial number and the one-to-three letter code representing the laboratory or scientist that described the new allele. Thus, the third knockout allele created at Princeton University by gene targeting at the cftr locus would be designated as cftrm3Pri. To obtain a unique laboratory or investigator registration code, please contact the Institute for Laboratory Animal Resources, USA National Academy of Sciences, Washington, D.C.

3.4.5 Transgene loci

The experimental introduction of foreign DNA into the germ line of a mouse results in the creation of a new transgene locus at the site of integration. The official symbol for a transgene locus has five parts. First is the designation Tg for transgene. Second is a letter indicating the mode by which the transgene was inserted; N is used for nonhomologous insertion, R for insertion with a retroviral vector, and H for homologous recombination. With the standard production of transgenic mice by embryo injection, N would be used; with homologous recombination in embryonal stem cells followed by chimera formation to rescue the transgene into the germ line, H would be used; for transgenic animals produced by retroviral infection of embryos, R would be used.

The third part of the symbol contains a mnemonic, of six characters or fewer, that describes the salient features of the transgene insert written within parentheses. If the insert includes a defined gene, the gene symbol should be incorporated into the mnemonic without hyphens. Other standard abbreviations for use within the mnemonic include: An for anonymous sequence; Nc for noncoding sequence; Rp for reporter sequence; Et for enhancer trap; Pt for promoter trap; and Sn for synthetic sequence. The fourth part of the symbol is an investigator-assigned one-to-five digit number. The fifth and last part is the laboratory code. An example of this nomenclature is as follows. Castle has injected mouse embryos with a construct containing the Pgk2 coding sequence as a reporter. He names the transgene locus present in the fourth line that he recovers TgN(RpPgk2)4Cas.

When the insertion of a transgene at a particular site results in a new mutation through the disruption of a gene present normally in the genome, this mutation should be named independently of the transgene locus itself. The rationale for this rule is that the contents of the transgene are independent of the locus uncovered through insertional mutagenesis. However, the mutant allele associated with the transgene should incorporate the transgene symbol as the superscripted allele designation. For example, if Castle’s construct became inserted into the Hbb locus in the fifth line that he derived, the new Hbb allele that was created would be called HbbTgNrpPgk25Cas. Notice that the parentheses have been removed from the allele symbol. If a new mutation has been induced at a previously unidentified locus, the mutant phenotype should be used to name the new locus.

3.4.6 Further details

In this section, I have only touched upon those issues of nomenclature that will be of most concern to the majority of molecular biologists involved in studies of the mouse genome. In fact, the nomenclature rules developed for the mouse are rather extensive and are discussed in much greater detail in the Lyon and Searle compendium (Committee on standardized genetic nomenclature for mice, 1989), with additions and changes published regularly in Mouse Genome. As a final note, one must keep in mind that mouse genetic nomenclature will continue to evolve with the field as a whole. As new types of genetic elements and inter-relationships are uncovered, it will be the charge of the Nomenclature Committee to keep the rules internally consistent and up-to-date.

3.5 Strategies for record-keeping

3.5.1 General requirements

A breeding mouse colony differs significantly from a static one in the type and complexity of information that is generated. In a non-breeding colony, there are only the animals and the results obtained from observations and experiments on each one. In a breeding colony, there are animals, matings, and litters, with specific genetic connections among various members of each of these data sets. Classical genetic analysis is based on the transmission of information between generations, and as a consequence, the network of associations among individual components of a colony is as important as the components are in-and-of themselves.

An ideal record-keeping system would allow one to keep track of: (1) individual animals, their ancestors, siblings and descendants; (2) matings between animals; (3) litters born from such matings, and the individuals within litters that are used in experiments or to set up the next generation of matings; and (4) experimental material (tissues and DNA samples) obtained from individual animals. Ideally, one would like to maintain records in a format that readily allows one to determine the relationship, if any, that exists between any two or more components of the colony, past or present.

Based on these general requirements, two different systems for record-keeping have been developed by mouse geneticists over the last 60 years. The "mating-unit" system focuses on the mating pair as the primary unit for record-keeping. The "animal/litter" system treats each animal and litter as a separate entity. As discussed below, there are advantages and disadvantages to each approach.

3.5.2 The mating unit system

With this system, each mating unit is assigned a unique number and is given an individual record. When record-keeping is carried out with a notebook and pencil, each mating pair is assigned a page in the book. The cage that holds the mating pair can be identified with a simple card on which the record number is indicated; this provides immediate access to the corresponding page in the record book.

When litters are born, they are recorded within the mating record. Each litter is normally given one line on which the following information is recorded in defined columns: (1) a number indicating whether it is the first, second, third, or a subsequent litter born to the particular mating pair; (2) the date of birth; (3) the number of pups; and (4) other characteristics of the litter that are of importance to the investigator. Individual mice within any litter can be identified uniquely with a code that includes the mating unit number, followed by a hyphen, the litter number, and a letter that distinguishes siblings from each other. For example, the fourth pup in the third litter born to mating unit 7371 would be numbered 7371-3d. This system provides for the individual numbering of animals in a manner that immediately allows one to identify siblings and parents.

At the outset, parental numbers are incorporated into each mating record, and since these are linked implicitly to the litters from which they come, it becomes possible to trace a complete pedigree back from any individual. It also becomes possible to trace pedigrees forward if, as a matter of course, one cross-references all new matings within the litter records from which the parents derive. For example, if one sets up a new mating unit that is assigned the number 8765 with female 5678-2e and male 5543-1c, the number 8765 could be inscribed on appropriate lines in records 5678 and 5543.

There are several important advantages to a record-keeping system based on the mating unit: (1) only a single set of primary record numbers is required; (2) one can easily keep track of the reproductive history of each mating pair; and (3) information on siblings can be readily viewed within a single location. Furthermore, it is easy to incorporate this system of record-keeping into a simple spreadsheet file that can be maintained on a desktop computer or fileserver. This can be accomplished most readily by having each row represent an individual litter (or even individuals within a litter) with columns for (1) Mating unit number, (2) Father’s number, (3) Mother’s number, (4) Litter number, (5) Birth date, (6) Number of pups, and (7) further information. To record information on individuals within particular litters, new rows having the same format can be formed but with the litter number-sibling letter combination used in place of the litter number alone in column 4.

New litters can be recorded initially as they are born in empty rows at the bottom of the file. By re-sorting the database according to the first column, an investigator would be able to see all litters born to a particular mating pair in sequential rows. Upon sorting according to birth date, the list of litters could be displayed according to age. With search or find commands, it would be possible to identify ancestors, descendants, and siblings related to each animal.

The major disadvantage to this form of record-keeping is that records are focused on mating pairs and litters rather than individual animals. Thus, it is not well-suited for investigators who need to record and retrieve animal-specific information. It is also less than ideal for situations where the mating unit is not sacrosanct and animals are frequently moved from one mate to another. Under these circumstances, the animal/litter system described below is more appropriate for record-keeping.

3.5.3 The animal/litter system

In a second system developed originally by one of the earliest mouse geneticists, L. C. Dunn, there are two primary units for record-keeping — the individual animal and the individual litter. Each breeding animal is assigned a unique sequential number (at the time of weaning) that is associated with an individual record occupying one row across facing pages within an "animal record book" or a spreadsheet file. Each animal record contains the numbers of both parents and through these it is possible to trace back pedigrees. A separate "litter record book" or spreadsheet file is used to keep track of litters which are also assigned unique sequential numbers attached to one row in the database. In this record-keeping system, animal numbers and litter numbers are assigned independently of each other.

A third independent set of numbers are those assigned to individual cages. Cage numbers can be assigned in a systematic manner so that related matings are in cages with related numbers. For example, different matings that derive from the same founder of a particular transgenic line may be placed in cages numbered from 2311 to 2319. A second set of matings that carry the same transgene from a different founder could be placed into cages numbered 2321 to 2329, and so on. Thus, the cages between 2300 and 2399 would all have animals that carried the same transgene, however, different sets of ten would be used for different founder lines. For matings of animals with a second transgene, you might choose to use the cages numbered 2400 to 2499. This type of numbering allows one to classify cages — which represent matings — in a hierarchical manner. Although at any point in time, every cage in the colony will have a different number, once a particular cage is dismantled, its number can be re-assigned to a new mating. Cage cards from dismantled matings can be saved for future reference.

When a litter is born, the litter record is initiated with entries into a series of columns for (1) an identifying number, (2) the birth date, (3 & 4) the numbers of the parents, (5) the number of the cage in which the litter was born, (6) the number of pups born, and (7) any other information of importance to the investigator. In addition, the litter number can be inscribed on the cage card (which may or may not have additional information about the mating pair). When an animal is weaned from a litter for participation in the breeding program, an animal record is initiated. The most important information in the animal record is the number of the litter from which it came, the cage that it goes into, and the date of that move (all entered into pre-defined columns). The cage number is particularly important in allowing one to trace pedigrees forward from any individual at a future date. If an animal is moved from one cage to another at some later date, this can be added to the record in another column.

3.5.4 Comparison of record-keeping systems

With three unrelated sets of numbers and the need for extensive cross-referencing, the animal/litter system is complex, and implementation on paper is labor-intensive. However, it does provide the investigator with additional power for analysis. For example, by choosing cage numbers wisely and saving cage cards in numerical order, it becomes possible to go back at any point in the future and look at all of the litters born to a particular category of matings over any period of time. With the mating unit system, this could only be accomplished by using different files for different categories of matings. However, it then becomes very difficult to keep track of matings formed with animals taken from different files.

Another difference between the mating unit system and the animal/litter system is the ease with which it is possible to keep track of animals that are moved from one mating unit to another. The mating unit system is most effective for colonies where "animals are mated for life." The animal/litter system is effective for colonies of this type as well, but is also amenable to those where animals are frequently switched from one mate to another.

3.5.5 A computer software package for mouse colony record-keeping

The animal/litter system of record-keeping has been incorporated into more extensive computer software packages that greatly facilitate data entry, with automatic cross-referencing and extensive error checking. These software packages, called MacMice and Mendel's Lab, are specialized database programs that allow users to record and retrieve information on animals, litters, tissue and DNA samples, and restriction digests generated from a breeding mouse colony (Silver, 1993b). Data are entered through a series of queries and answers. With automatic cross-referencing and specialized protocols, the same information never has to be entered more than once. Hard copy printouts can be obtained for cage cards, individual records, or sets of records (in abbreviated form) that have been recovered through searches for positive or negative matches to particular words or parameters. Search protocols are highly versatile; for example, it is possible to print-out a cage-ordered list of litters that are old enough for weaning on a particular date, or a list of live mice from a particular set of breeding cages that are ordered according to the cage in which they were born. Mendel's Lab provides investigators with the ability to maintain control over a complex breeding program with instant access to each record, current and past. It can store up to 100,000 records in each of four files for animals, litters, DNA/tissue samples, and restriction digests. For licensing and other information, click here or please refer to appendix B (or send a FAX to Mendel Software at 609-924-4382).