Rate of spontaneous mutation at human loci encoding protein structure. (human electrophoretic variants/erythrocyte enzymes/mutational base line). JAMES V.
Proc. Natl. Acad. Sci. USA Vol. 77, No. 10, pp. 6037-6041, October 1980
Rate of spontaneous mutation at human loci encoding protein structure (human electrophoretic variants/erythrocyte enzymes/mutational base line)
JAMES V. NEEL, HARVEY W. MOHRENWEISER, AND MIRIAM H. MEISLER Department of Human Genetics, University of Michigan Medical School, Ann Arbor, Michigan 48109
Contributed by James V. Neel, June 24, 1980
ABSTRACT The techniques of electrophoresis were used in a search for evidence of mutation affecting protein structure, the indicators being hemoglobin and a set of serum proteins and erythrocyte enzymes. Among 94,796 locus tests on Amerindians from Central and South America, there was no evidence for mutation. Among 105,649 locus tests on newborn infants in Ann Arbor, Michigan, there was also no evidence for mutation. We have previously failed to encounter any mutations in a series of 208,196 locus tests involving Japanese children [Neel, J. V., Satoh, C., Hamilton, H. B., Otake, M., Goriki, K., Kageoka, T., Fugita, M., Neriishi, S. & Asakawa, J. (1980) Proc. Nat]. Acad. Sci. USA 77, 4221-4225], and H. Harris, D. A. Hopkinson, and E. B. Robson [(1974) Ann. Hum. Genet. 37,237-2531 found no mutations in 113,478 locus tests on inhabitants of the United Kingdom. This failure to demonstrate any mutations of this type in a total of 522,119 locus tests excludes, at the 95% level of probability, a mutation rate greater than 0.6 X 10-5/locus per generation in this combination of populations.
The rate at which human genes "spontaneously" mutate not only is a parameter of basic evolutionary interest but also has certain immediate practical implications: the lower the rate, the more demanding is the demonstration of an altered rate. Until recently, for the higher eukaryotes-Drosphila, mice, and humans-estimates of mutation rate have been based largely on genetic events resulting either in gross phenotypic departures from normal or in death. Such estimates suffer from two obvious possible sources of bias: (i) the loci chosen for study are generally those at which clear-cut genetic variation (sometimes demonstrably due to mutation) has already been observed, which may result in the selection for study of the more mutable loci, and (ii) the relationship of the observed change to alterations of the genetic code is unknown, and one cannot specify what fraction of the mutational spectrum is detectable. Furthermore, it is difficult to compare mutation rates among different species with any rigor when the molecular basis for the phenotype is unknown. The advent of the ability to detect variant proteins by electrophoresis has dramatically altered the strategy for studying mutation. Now the investigator can study a battery of proteins without reference to the previous occurrence of variation at the loci in question, and, as the study of the abnormal hemoglobins so well documents, one cannot only estimate the proportion of possible mutational events that are detectable but also elucidate the nature of the events at the DNA level. In this communication we report the results of direct tests of the frequency of occurrence of mutation altering the electrophoretic properties of selected proteins in two different The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U. S. C. §1734 solely to indicate
human populations. These findings will then be combined with the data of Harris et al. (1) on a sample drawn largely from the United Kingdom and with data on a Japanese population (2) to provide a current "best estimate" of the human mutation rate. The findings are notable in that in a total of approximately 522,000 locus tests, no probable instance of mutation has yet been encountered. Although it is clear from the occurrence of diverse protein variants in human populations that the true rate for mutations of this type is nonzero, the present data do not provide an estimate of that rate, but do permit placing an upper bound on it for the populations studied. The possibility of a conflict between these results based on a direct approach and other results using an indirect approach will be recognized. Data on Amerindians During the past 18 years, members of this department and our colleagues have been engaged in extensive multidisciplinary studies of 15 different Amerindian tribes. One aspect of those studies has been electrophoretic surveys of the 25 proteins for which there are entries in the Amerindian column of Table 1, with particular reference to the occurrence of "rare" or "private" electrophoretic variants (or both). These are defined as variants restricted to one or more members of a single tribe, or adjacent tribes, usually present in a few individuals but in eight instances reaching polymorphic proportions within the tribe (allele frequency >0.01). The findings have recently been summarized in several papers (3, 4) in which references to the precise biochemical techniques used in these studies will also be found. Because the conditions of the field work usually precluded obtaining a repeat sample when a variant was found, only completely unambiguous findings have been accepted as evidence for a variant. For this and other reasons well detailed by Harris et al. (1), these variant frequencies are minimal. The design of the field work involved obtaining blood samples from as many members of each village contacted as possible. For each of the 111 villages in which we have worked, the family relationships of the persons from whom samples were obtained were established to the fullest extent possible under the circumstances. This has resulted over the years in the identification of 1141 putative nuclear families (father, mother, and one or more children) within the total of 10,561 persons from whom blood samples have been obtained. A direct estimate of mutation rate can be calculated from the frequency with which a child exhibits a variant not present in either parent. Before this family material can be used as the basis for a study of mutation, steps must be taken to ensure that the pedigree information is as accurate as possible. For a variety of reasons, the frequency of children whose blood types suggest that the alleged parentage is not the true parentage is relatively high
/Genetics: Neel et al.
in Amerindian tribes (5). In addition to the above-mentioned biochemical determinations, all samples have been typed with reference to the following polymorphic blood group or serum protein systems: Rh (C, D, E, c, e), MNSs, Kell-Cellano, Duffy, Kidd, Diego, group-specific component (Gc), and the immunoglobulin allotypes of the Gm and Inv systems. Typings were also performed with reference to the ABO systems, but because Central and South American Indians are generally assumed to have only blood type 0 unless admixture has occurred and because our studies have concentrated on tribes with little or no genetic exchange with non-Indians, this system is of very limited value in the present context. The findings from these typings have been used to detect discrepancies between stated and actual parentage on the basis of standard genetic principles-i.e., children have been scrutinized for an attribute not present in either parent. In addition, the typings referenced in Table 1 (in which the abbreviations to be used throughout this paper are given) also supply data on a number of genetic polymorphisms that can be used in this context: HP types 1 and 2, ACP1 types A and B. PGD types A and C, PGM1 types 1 and 2, ESD types 1 and 2, and GALT types normal and Duarte. The "private" polymorphisms encountered in certain tribes (3, 6) were also used where possible to detect nonparentage. In our program each protein is usually typed once, repeat typings being performed only when there is ambiguity in the results of the first typing (or when special studies are undertaken). In the present context, however, in all instances in which the evidence for a discrepancy between stated and biological parentage rested on the results from a single system and in which the appropriate stored samples were available, that system was retyped in all members of the pertinent trio. After exclusion of children with one or more genetic traits discrepant from their parents, there remained 1043 nuclear families, in which a total of 2242 children had been examined and not excluded. The number of tests for mutation are listed by protein in Table 1. Differing numbers of determinations between series reflect the sequential introduction of new tests or technical factors (or both) which resulted in failure to examine some samples for some systems. In no instance was a child found to have a variant not present in one or the other parent. There is thus no evidence for the occurrence of mutation in this series. The number of locus tests to which the series corresponds is 94,796, computed as twice the number of determinations, with allowance for the fact that two of the proteins (LDH and HGBA1) are composed of two independently coded polypeptides. There are no sex-linked traits in this data set, so that no correction for hemizygosity is necessary. Because ESA is probably a heterooligomer (but was scored as a monomer), this calculation of number of locus tests is conservative. Placental cord blood series The second series of original data stems from an effort under way at the University of Michigan Medical Center to investigate the conditions under which a population of newborn infants can most effectively be studied with respect to the rate with which mutation affects protein components of erythrocytes, leukocytes, and blood plasma (7). Blood samples are being collected from the placentas of all newborn infants for whom the parents will give informed consent and will also contribute venous samples themselves. The parents are predominantly Caucasian, but 8% are American Blacks, 2% are Mongoloids, and there is no information for 4% of the sample. Currently the samples from the infants are being screened for the proteins for which there are entries in Table 1. Starch gel electrophoresis for PGM1, PGM2, ESA, and ESD followed the same techniques referenced for the studies of
Proc. Natl. Acad. Sci. USA 77 (1980)
Amerindians. For the remaining determinations with starch gel, the following electrophoretic systems have been introduced. (i) For ACP1, we used the buffers that Harris and Hopkinson (8) describe for PGM. (ii) For peptidases, we used the Tris/ phosphate buffer system of Harris and Hopkinson (8). (iii) For HK and NP, the electrode buffer was 100 mM Tris/5 mM EDTA adjusted to pH 7.7 with phosphoric acid. The gel buffer was a 1:10 dilution of electrode buffer containing 10 mM glucose and 1 mM 2-mercaptoethanol. Electrophoresis was carried out at 400 V for 16 hr at 4VC. (iv) For AK1, IDH2, GALT, and GOT1, the electrode buffer was 0.22 M Tris/86 mM citric acid adjusted to pH 6.2 with NaOH. The gel buffer was a 1:30 dilution of the electrode buffer. Electrophoresis was carried out at 200 V for 16 hr at 40C. (v) For ESB, electrophoretic conditions were as described for ESA and ESD (9) with a-naphthylburyrate as substrate. The following systems were studied by polyacrylamide gel electrophoresis: TF, CRPL, ALB, HBGA1, ADA, TPI, LDH, MDH, GPI, PGD, and PGK. With the exceptions of albumin (10), PGD, and PGK, the acrylamide electrophoresis was carried out in 7% polyacrylamide gels in 40 mM Tris HCl (Sigma, 7-9)/0.18 M glycine as gel and reservoir buffers. PGD was analyzed on 8% polyacrylamide gels; the gel buffer was 0.1 M Tris/glycine (pH 8.4,) containing 0.05% 2-mercaptoethanol and the reservoir buffer was 90 mM Tris borate (pH 8.4) containing 0.01% mercaptoethanol. NADP (3.3 mg/ml) was added to the cathodal reservoir buffer in this case. PGK was analyzed on 6% polyacrylamide gels with 0.1 Tris citrate buffer (pH 7.4) as reservoir buffer and 5 mM histidine/5 mM citrate/NaOH, pH 7.4, in the gels. In addition, G6PD was analyzed by electrophoresis on cellulose acetate Titan III (Helena Laboratories, Beaumont, TX) (11) and UPS by isoelectric focusing (12). The enzymes were stained as described by Harris and Hopkinson (8); ALB and TF were stained with Coomassie brilliant blue; CRPL was stained as described by Shreffler et al. (13). (The possibility of maternal transplacental transmission of variant serum protein types has been excluded by appropriate studies of 500 "trios.") HP, CA1, and CA2 occur in such low concentrations in newborn infants that accurate determinations were not deemed possible. The discrepancy in Table 1 between the number of determinations for PGM1 and PGM2, run on the same gel, is due to the occurrence of five ambiguous PGM1 determinations for which samples for repeat typings were unavailable. As was true for the Amerindians, a number of these systems (ACP1, PGD, PGM1, ADA, ESD, and GALT) are characterized by widely distributed genetic polymorphisms, but in this context the primary concern was with relatively uncommon electromorphs (allele frequency 0.01 in Caucasians, Mongoloids, or Negroes (or all three) have been excluded from the table when they occurred in the indicated population group: CRPLA, CRPLNew Hawven, CRPLMichjg, HGBAls, TFD1, G6PDA, GOT2, GOT3, and PEPA2. -, Not determined. * HGBA1 and HGB2 are tetramers of two polypeptides, one of which is common to both proteins. Accordingly, for estimating locus tests we have scored the a polypeptide only in conjunction with HGBA1. t LDH is a tetramer of two independently coded polypeptides. t The structural loci for these enzymes are sex linked; the number of locus tests is computed on the basis of the actual number of male and female infants studied.
been explained by paternity exclusion. In two cases the variant was a slowly migrating TF; in a third case, it was a rapidly migrating CRPL. In the first two cases, there were three-system paternity exclusions; in the third case, a one-system exclusion (MNSs system). The fourth instance was a rapidly migrating variant of G6PD with reduced activity in a male child. Although the mother exhibited reduced G6PD activity (52% of normal), only a normal band of G6PD activity was visible in her hemolysate on repeated electrophoretic determinations. Because this is a sex-linked trait, paternity was not at issue.
When a fresh blood sample was obtained, the variant was detected electrophoretically in preparations of the mother's leukocytes, but was still not detected in erythrocytes. A second son showed a normal G6PD pattern, but a daughter exhibited both a normal band and, very faintly, the variant band seen in her younger brother. We conclude that the mother is a heterozygote in whom failure to detect the variant G6PD in hemolysate resulted from the reduced activity of the variant combined with nonrandom Iyonization of the X chromosome bearing the variant allele in the erythropoietic precursor. There are thus
Genetics: Neel et al.
no instances of putative mutation in this series. The-numberof locus tests, computed as described earlier, was 105,649. Other pertinent data There are two other sources of data pertinent to this treatment. The first set is derived from a study of the potential genetic effects of the atomic bombs in Japan. For this purpose, two panels of children have been established, one born to parents one or both of whom were within 2000 m of the hypocenter at the time of the explosions (referred to as "proximally exposed"), the other born to parents more than 2500 m from the hypocenter (referred to as "distally exposed"). The findings in the latter, whose average radiation exposure is thought to be less than 1 rem, serve as control on the findings in the former and can properly be included in the present analysis. Blood samples are being obtained from both groups of children and essentially the same determinations are being made as in the studies on Amerindians and by the same techniques. The number of determinations per system for the children of distally exposed parents ranged from 4415 to 4847. Whenever a variant was encountered, a family study was attempted. In a preliminary report on this study at its approximate midpoint (2), no probable mutations were encountered in children of the distally exposed parents in an estimated total of 208,196 locus determinations embracing most of the systems enumerated in Table 1. The other set of data was assembled over a period of more than 10 years by members of the Human Biochemical Genetics Group at University College, London (1). Data have been collected on the incidence of rare alleles with respect to 43 different enzyme loci, the number of examinations per system varying from 349 to 11,237. The great majority of the subjects were residents of the United Kingdom. When a variant was encountered, an effort was usually made to carry out family studies. In an estimated total of 133,478 locus tests, no evidence for mutation has been encountered. In the data on Amerindians and placental blood samples collected in the U.S., the studies have been organized so that samples from both parents are available in the event a variant is encountered in a child. In the data from Japan and from England, by contrast, family studies were initiated after a variant was ascertained, and it was not always feasible to obtain blood samples from both parents. The number of locus tests given above for these latter two populations is therefore an estimate obtained by multiplying the total number of determinations (corrected for dimeric and sex-linked proteins) by the proportion of variants for which complete family studies have been performed and the result is rounded off to the nearest whole number. Discussion A concern at the outset of this study was the ability to deal with such a low-frequency event as mutation of this type was assumed to be. Would specimen mislabeling, laboratory errors, and discrepancies between legal and biological parentage introduce so much "noise" that the resulting estimate of mutation rate was of little value? The results indicate that this is not the case. In this connection, however, we must point out that because of the requirements of informed consent for the Ann Arbor program and the favorable circumstances with reference to the other bodies of data, the occurrence of undetected discrepancies between legal and biological parentage may be substantially lower in these data than in many other of the world's subpopulations. The present data confront us with the rather surprising failure to detect any mutational events resulting in electrophoretic variants in an estimated total of 522,119 locus tests. In
Proc. Natl. Acad. Sci. USA 77 (1980)
the absence of any apparent mutations, a mutation rate of course cannot be estimated. On the other hand, we can estimate the upper frequency of the mutation rate with which the totality of the findings to date is consistent. The mutation rate at which there would have been a probability of 0.95 of observing at least one instance of mutation in 522,119 locus tests is given
(1 - 0.95) = (1 - )n in which n is the number of locus tests and ,u is the frequency of mutation. This upper limit, at the 5% confidence level, is 0.6 X 10-5/locus per generation. It is clear that the true rate of mutation resulting in electrophoretic variants in humans cannot be zero. Six indirect estimates based on studies of electrophoretic variants in tribal populations assumed to have been in approximate equilibrium for some centuries have resulted in an average value of 1.1 X 10-5/locus per generation; if one extremely low estimate is deleted, the average of the remaining five is 1.3 X 10-5. The direct approach has for the most part been based on civilized, industrialized populations residing in temperate regions, and the indirect on primitive, nonindustrialized populations residing for the most part in the tropics, and the frequency of the type of electrophoretic variant on which these estimates are based is greater in the primitive populations. Thus, the possibility of real differences between populations must be considered. However, the direct approach involves fewer assumptions and is more valid for the types of populations on which it is based. For now, we will consider the average mutation rate for electromorphs in populations such as those forming the basis for this paper as some value less than 0.6 X 10-5/locus per generation. Considerable caution must, of course, be exercised in the use of this upper bound. For most of these proteins, only a single set of electrophoretic conditions was used in the search for variants. No one set of conditions will detect all the electrophoretic variants of a given protein. For those systems in which genetic polymorphisms occur, these interfere with the detection of additional variants of similar mobility. Thus, for these and other reasons well detailed by Harris et al. (1), any mutation rates generated by this approach are probably biased downwards. Only for Drosophila melanogaster have comparable extensive studies been performed. Tobari and Kojima (14) determined the average spontaneous mutation rate at 10 loci encoding enzymes, on the basis of the equivalent of 669,904 locus tests, some of the loci functionally homologous to several included in the present studies. They discovered three mutants, resulting in an estimated mutation rate of 0.5 X 10-5/locus per generation. Mukai and Cockerham (15), in an estimated equivalent of 1,658,308 locus tests equally distributed over five enzymes at least one of which (MDH) is homologous with one included in our studies, detected three mutants, for a rate of 0.2 X 10-5/locus per generation. Given the well-known strain differences in mutation rates in Drosophila, the need for a broader data base is clear, but for now these are the only data available. The design of the experiment of Mukai and Cockerham (15) permitted the detection of the complete loss of enzyme activity (i.e., "nulls"). Seventeen such occurrences were encountered. Unfortunately, rigorous genetic studies were not performed on these nulls, but they were presumed to reflect mutational events at a rate of 1.0 X l0-5. This rate cannot be termed a locus rate because, in theory, mutation at operator-controller loci, as well as structural loci, can result in nulls. The mutation rates are an average of the rate in the two sexes. In both of these experi-
Proc. Natl. Acad. Sci. USA 77 (1980)
Genetics: Neel et al. ments, mutations were accumulated over many- generations (up to 175) in a marked-inversion-balanced-lethal system, which carried the mutations in the heterozygous condition until they
revealed by the appropriate enzyme studies. There was thus the opportunity for selection against mutations that are deleterious when heterozygous. This possibility, presumably greater with respect to the nulls than the electromorphs, must be borne in mind in interpreting these findings; i.e., for these and other (technical) reasons, there are minimal estimates. On the other hand, because the genetic basis for the nulls was not rigorously verified, their frequency relative to electromorphs might be overestimated. Efforts are being made to generate estimates of the frequency with which mutation produces nulls in humans. For the present, we must restrict precise comparison between Drosophila and humans to electrophoretic variants. The current data, taken at face value, suggest that the human rate is no greater than the Drosophila rate and could even be less. Soares (16) has recently studied the mutagenic potential of triethylenemelamine with respect to 11 loci encoding isozymes in mice. No mutations were encountered in 62,986 control locus tests. It is unfortunate that these appear to be the only data of this type available for mice. The comparison of the types of estimates of mutation rates generated by these electrophoretic studies with those generated by other types of direct estimates in fruit flies, mice, and humans presents many points of interest. Here we comment on only three implications of the present findings. (i) The present findings, even at this early stage, have important implications for programs designed to study the impact of potentially mutagenic exposures on human populations. Simply stated, the less common the phenomenon, the larger the sample size necessary to detect a departure from the norm of some given magnitude. For instance, we have recently developed the power function for demonstrating specified differences between two samples in the frequency with which mutation results in electromorphs, on the assumption of a baseline frequency of 1 X 10-5/locus per generation (17). The demonstration of a 50% increase, which is surely the maximal increase which should go undetected, with a type I error of 0.05 and a type II error of 0.20, requires two samples of approximately 7,400,000 observations each. However, should that base line rate in fact be 0.5 X 10-5, then with the same assumptions and precision the required sample sizes are approximately 15,000,000! (ii) Kimura and colleagues (18, 19) have vigorously promulgated the concept that the rate of amino acid substitution per unit time in any one protein has been essentially the same along widely divergent lines of evolutionary descent and have suggested that this requires that most of these substitutions must result from the fixation of mutations that were selectively neutral. It would seem to be an inescapable corollary of this argument that to achieve this constancy in rate of genetic fixation, the mutation rate resulting in selectively neutral alleles (i.e., not necessarily the total mutation rate) should be much higher in the long-lived primate line of descent than in a short-lived dipteran line of descent, as exemplified by Drowere
sophila. To the extent that present procedures would detect such alleles, the present studies do not support the corollary. (iii) We find it noteworthy that at present the upper bound on the mutation rate that we postulate for humans is consistent with the Drosophila rate despite the fact that the human germ line is subjected to an average temperature at least 100C greater than for Drosophila; undergoes (especially in men) significantly more cell divisions each generation than that of Drosophila; and, per generation, is subjected to whatever mutagens exist in the environment and reach germinal tissue for an average time span from fertilization to mean age at reproduction roughly 500 times as long as obtained in the Drosophila experiments. The facts as they stand suggest the evolution in a long-lived mammal such as humans of remarkably effective mechanisms for "neutralizing" potential mutagens or repairing genetic damage in the human germ line and should add to the difficulties in extrapolating to the human situation from data based on experimental animals. We acknowledge the excellent technical assistance provided by T. Krasteff, L. Passamani, C. Yoshihara, and K. H. Wurzinger. The investigations were supported in part by the Department of Energy. 1. Harris, H., Hopkinson, D. A. & Robson, E. B. (1974) Ann. Hum. Genet. 37,237-253. 2. Neel, J. V., Satoh, C., Hamilton, H. B., Otake, M., Goriki, K., Kageoka, T., Fujita, M., Neriishi, S. & Asakawa, J. (1980) Proc. NatI. Acad. Sci. USA 77,4221-4225. 3. Neel, J. V. (1978) Am. J. Hum. Genet. 30,465-490. 4. Neel, J. V. & Rothman, E. D. (1978) Proc. Natl. Acad. Sci. USA
75,5585-5588. 5. Neel, J. V. & Weiss, K. (1975) Am. J. Phys; Anthropol. 42,2552. 6. Neel, J. V. (1980) in Population Structure and Genetic Disorders, ed. Eriksson, A. (Academic, London), pp. 173-193. 7. Neel, J. V., Mohrenweiser, H., Satoh, C. & Hamilton, H. B. (1979) in Genetic Damage in Man Caused by Environmental Agents, ed. Berg, K. (Academic, New York), pp. 29-47. 8. Harris, H. & Hopkinson, D. A. (1976) Handbook of Enzyme Electrophoresis in Human Genetics (American Elsevier, New
Neel, J. V., Tanis, R. J., Migliazza, E. C., Spielman, R. S., Salzano,
F., Oliver, W. J., Morrow, M. & Bachofer, S. (1977) Hum. Genet. 36,81-107. 10. Heredero, L., Granda, H., Aguilar, J. A. S. & Altland, K. (1974) Humangenetik 21, 167-177. 11. Ellis, N. & Alperin, J. B. (1972) Am. J. Clin. Pathol. 57, 534536. 12. Meisler, M. H. & Carter, M. C. (1980) Proc. Natl. Acad. Sci. USA
77,2848-2852. 13. Shreffler, D. C., Brewer, G. J., Gall, J. C. & Honeyman, M. S. (1967) Biochem. Genet. 2, 101-115. 14. Tobari, Y. N. & Kojima, K.-I. (1972) Genetics 70,397-403. 15. Mukai, T. & Cockerham, C. C. (1977) Proc. Natl. Acad. Sci. USA
74,2514-2517. 16. 17. 18. 19.
Sbares, E. R. (1979) Environ. Mutagen. 1, 19-25. Neel, J. V. Proc. XIV Int. Cong. Genet., in press. Kimura, M. (1969) Proc. Natl. Acad. Sci. USA 63, 1181-1188. Kimura, M. & Ohta, T. (1971) Nature (London) 229, 467-