The Mutation Rates of Di-, Tri- and Tetranucleotide Repeats in ...

27 downloads 0 Views 332KB Size Report
In a recent study, we reported that the combined average mutation rate of 10 di-, 6 tri-, and 8 tetranucleotide repeats in Drosophila melanogaster was 6.3.
The Mutation Rates of Di-, Tri- and Tetranucleotide Repeats in Drosophila melanogaster Malcolm D. Schug,* Carolyn M. Hutter,* Kris A. Wetterstrand,* Mara S. Gaudette,* Trudy F. C. Mackay,† and Charles F. Aquadro* *Section of Genetics and Development, Cornell University; and †Department of Genetics, North Carolina State University In a recent study, we reported that the combined average mutation rate of 10 di-, 6 tri-, and 8 tetranucleotide repeats in Drosophila melanogaster was 6.3 3 1026 mutations per locus per generation, a rate substantially below that of microsatellite repeat units in mammals studied to date (range 5 1022–1025 per locus per generation). To obtain a more precise estimate of mutation rate for dinucleotide repeat motifs alone, we assayed 39 new dinucleotide repeat microsatellite loci in the mutation accumulation lines from our earlier study. Our estimate of mutation rate for a total of 49 dinucleotide repeats is 9.3 3 1026 per locus per generation, only slightly higher than the estimate from our earlier study. We also estimated the relative difference in microsatellite mutation rate among di-, tri-, and tetranucleotide repeats in the genome of D. melanogaster using a method based on population variation, and we found that tri- and tetranucleotide repeats mutate at rates 6.4 and 8.4 times slower than that of dinucleotide repeats, respectively. The slower mutation rates of tri- and tetranucleotide repeats appear to be associated with a relatively short repeat unit length of these repeat motifs in the genome of D. melanogaster. A positive correlation between repeat unit length and allelic variation suggests that mutation rate increases as the repeat unit lengths of microsatellites increase.

Introduction Several laboratories, including ours, have been interested in using highly variable microsatellites (short, tandemly repeated units of DNA) to obtain a detailed picture of population colonization and demographic history for various organisms, including humans and Drosophila. Microsatellites are abundant in the genome of Drosophila melanogaster (Goldstein and Clark 1995; England, Briscoe, and Frankham 1996; Schug et al. 1998b) and have high mutation rates relative to base pair substitutions in DNA sequences (Schug, Mackay, and Aquadro 1997). In addition to providing information about very recent evolutionary events (Michalakis and Veuille 1996), microsatellites may be useful as markers to identify specific regions of the genome affected by natural selection in local populations (Schlo¨tterer, Vogl, and Tautz 1997) or to test different models of selection (Slatkin 1995; Schug et al. 1998a). To estimate the dates of evolutionary events and distinguish between different models of selection using microsatellites, it is necessary to know their mutation rates. Using mutation accumulation lines, we recently estimated the average spontaneous mutation rate of 10 di-, 6 tri-, and 8 tetranucleotide repeats in D. melanogaster to be 6.3 3 1026 per locus per generation (Schug, Mackay, and Aquadro 1997). Although this rate is several orders of magnitude higher than base pair substitution rates (;1028 per site per generation; Li 1997), it is substantially lower than estimates of microsatellite mutation rates in mammalian pedigrees, which range from 1022 to 1025 per locus per generation (e.g., Dallas Key words: Drosophila melanogaster, microsatellite, simple sequence length polymorphism, SSR, mutation rate, effective population size. Address for correspondence and reprints: Malcolm D. Schug, Section of Genetics and Development, 403 Biotechnology Building, Cornell University, Ithaca, New York 14853. E-mail: [email protected]. Mol. Biol. Evol. 15(12):1751–1760. 1998 q 1998 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038

1992; Dietrich et al. 1992; Serikawa et al. 1992; Weber and Wong 1993; Ellegren 1995). We proposed that the relatively low mutation rate of microsatellites in D. melanogaster is a function of the short repeat unit lengths (Schug, Mackay, and Aquadro 1997). In humans, microsatellite mutation rates vary considerably among different repeat motifs. For example, Weber and Wong (1993) assayed both di- and tetranucleotide repeat microsatellites in cell lines representing human pedigrees and found that mutations at tetranucleotide repeats were more frequent than those at dinucleotide repeats. Chakraborty et al. (1997) used a method based on population variation to estimate relative differences in mutation rate and reported that dinucleotide repeats mutate at a rate 1.2–2.4 times higher than that of tri- or tetranucleotide repeats. Our estimate of microsatellite mutation rate in D. melanogaster was based on a combination of di-, tri-, and tetranucleotide repeats (Schug, Mackay, and Aquadro 1997). Furthermore, our estimate of mutation rate was based on the observation of a single spontaneous mutation at one dinucleotide repeat locus after screening 157,680 allele generations in the mutation accumulation lines. In this paper, we significantly extend our original analysis by obtaining a more precise estimate of mutation rate for dinucleotide repeats alone, the most polymorphic and abundant class of microsatellite repeat motifs in the D. melanogaster genome. This analysis includes 39 new dinucleotide repeats that we assayed in the same mutation accumulation lines of D. melanogaster described in our earlier study. We also estimate the relative difference in mutation rate among di-, tri-, and tetranucleotide repeats using a method based on population variation. We combine the results of these analyses to infer the mutation rates of di-, tri-, and tetranucleotide repeat loci individually. Finally, we investigate the influence of repeat unit length on population variation. 1751

1752

Schug et al.

Materials and Methods Estimates of Dinucleotide Repeat Mutation Rate We assayed 30 mutation accumulation lines of D. melanogaster for different length alleles at 39 new dinucleotide microsatellite loci in the same manner as in our previous study (Schug, Mackay, and Aquadro 1997). Briefly, a highly inbred strain (Harwich) was divided into replicate sublines, 19 of which were maintained for 230 generations by randomly choosing parents each generation (Mackay, Lyman, and Hill 1995) and 11 of which were selected for bristle number for 200 generations (Mackay, Lyman, and Hill 1994), for a total of 6,570 allele generations per locus. In all replicate sublines, 10 males and 10 females were chosen as parents for each generation. Since the initial population was essentially homozygous at all loci, any differences among the sublines must be due to spontaneous mutation. Contamination by exogenous flies was ruled out by scoring chromosomal locations of dispersed transposable element insertions (Mackay, Lyman, and Hill 1994, 1995). Microsatellites were amplified by PCR from a single fly as described in Schug, Mackay, and Aquadro (1997) and Schug et al. (1998b), including a negative control with no DNA. Mutations were confirmed by a subsequent PCR amplification using at least five different flies from the same subline. Microsatellite mutation rate, m, could then be calculated from the substitution rate by dividing the number of allele generations (6,570) times the number of loci scored into the number of mutations. Confidence intervals (CIs) are calculated assuming the mutations follow a Poisson distribution such that the probability of observing mutational events P(k) 5 e2l(lk/k!) where l is the expected number of mutations (Casella and Berger 1990). Solving for l such that e2l(lk/k!) 5 0.025 and 0.975 thus gave the numbers of mutations expected at the upper and lower 95% CIs. The number of allele generations times the number of loci assayed was then divided by these numbers to yield the mutation rate at each CI. The original 10 dinucleotide repeat loci analyzed in our previous study (Schug, Mackay, and Aquadro 1997) were identified by a search of GenBank. The 39 new loci analyzed here include four loci that were identified by us in GenBank (Schug et al. 1998b; DRO17DC2Z, DRO17DC4Z, DRO14DC95Z, and DRO13DC98), five that were reported from a GenBank search by Goldstein and Clark (1995; DMMP20, DRONINAC, DMZ3K25Z, DROTROPONI, and DROYANET), and 30 that were identified from our DNA library screen (Schug et al. 1998b; DM28, DM30, DM38, DM40, DM42, DM54, DM55, DM58, DM61, DM73, DM75, DM78, DM80, DM83, DM85, DM86, DM87, DM88, DM89, DM92, DM94, DM95, DM97, DM98, DM100, DM102, DM114, DM116, DM122, and DM123). Primer sequences and assay conditions for all of the loci used in this study are available from our www site (http:// www.bio.cornell.edu/genetics/aquadro/aquadro.html). All sequences have been submitted to GenBank (AF091999–AF092029).

Estimates of Microsatellite Variability in Natural Populations Segregating variation is a function of the mutation rate and effective population size (Ne). Thus, we can assay microsatellite variation in natural populations to determine if our estimates of mutation rate from the mutation accumulation lines in this study are consistent with levels of segregating variation in nature. Population variation can also be used to infer the relative differences in mutation rate among repeat types (e.g., Chakraborty et al. 1997). Heterozygosity was calculated as H 5 [n/(n 2 1)][1 2 S q2i ], where n is the number of chromosomes surveyed and qi is the frequency of the ith allele. We calculated heterozygosity and variance in repeat unit number for 48 of the dinucleotide repeat microsatellites. The 49th locus, DMRHO, was assayed in the mutation accumulation lines but was not included in the population assays, because the PCR fragment also included a tetranucleotide repeat variation, and the dinucleotide repeat could not be assayed alone. One individual was assayed from each of 18–25 isofemale lines established from a natural population in Beltsville, Md., and 10–23 isofemale lines established from a natural population in Harare, Zimbabwe. For each heterozygous individual, one individual allele was chosen at random such that each line represented a single chromosome from the population (Goldstein and Clark 1995; Schug, Mackay, and Aquadro 1997). In addition, both heterozygosity and variance in repeat number were calculated for nine di-, six tri-, and eight tetranucleotide repeat microsatellites in 18–20 isofemale lines established from each of five populations collected from Africa, China, Australia, Ecuador, and the U.S.A. (Wetterstrand 1997). These di-, tri-, and tetranucleotide repeat microsatellites represent the longest repeats revealed in an extensive search of GenBank (Schug et al. 1998b). Furthermore, all are located in noncoding regions of the genome, eliminating any potential effect of restrictions on variation imposed by selection to maintain reading frame in protein-coding regions. We used the number of repeat units determined by direct sequencing (Amersham dideoxy terminator cycle sequencing kit) in the most common allele in the U.S.A. and Africa populations as a reference to determine the number of repeat units in the other alleles. For the loci identified in our DNA library screen, we used the clone sequence as a reference to determine the number of repeat units in the most common allele. Since the DNA library was constructed from a pool of flies from the Africa population, this method is likely accurate. ANOVAs and Pearson correlations were calculated using SYSTAT (SPSS, Inc.). All data were approximately normally distributed unless otherwise noted. Bonferroni probabilities were used to evaluate the significance of the correlations. Results Dinucleotide Repeat Mutation Rate Of the 49 dinucleotide repeat loci now scored, three have shown a single spontaneous mutation (DMU1951,

Microsatellite Mutation Rate in D. melanogaster

previously reported in Schug, Mackay, and Aquadro [1997], and DMZ3K25Z and DM86, reported here) each in independent lines, yielding a mutation rate of 9.3 3 1026 (53/[49(6,570)]; 95% CI 5 2.72 3 1025 to 3.35 3 1026) where 49 is the number of loci and 6,570 is the number of allele generations per locus). Each of the three mutant alleles was a single repeat unit larger than the allele in the other mutation accumulation lines. The repeat unit lengths in the mutation accumulation lines were 20, 13, and 10 for DMU1951, DMZ3K25Z, and DM86, respectively. Repeat unit lengths of the other loci ranged from 5 to 30 (mean 5 11.69 6 5.13). If our estimate of dinucleotide repeat mutation rate reflects the mutation rate of dinucleotide microsatellites in natural populations, it should be consistent with levels of segregating variation and effective population sizes (Ne) of D. melanogaster populations in nature. We can test the consistency between our mutation rate estimate and levels of segregating variation in natural populations by using our empirical estimates of mutation rate and levels of heterozygosity in natural populations to calculate Ne. Similarity between our estimates of Ne and those derived independently in studies of types of genome variation in natural populations would suggest that our mutation rate estimates reflect those in nature. Estimates of Ne from Microsatellites We assayed a sample of isofemale lines established from Africa and the U.S.A. representing an ancestral and a derived population, respectively (David and Capy 1988), to measure microsatellite heterozygosity (H) and variance in repeat unit (V) for 48 dinucleotide repeat microsatellite loci (table 1). Heterozygosity was significantly higher in the African population (paired t 5 2.72, df 5 47, P , 0.01). However, V was not significantly different (paired t 5 20.133, df 5 47, P 5 NS). Under a stepwise mutation model, H 5 1 2 [1/ Ï1 1 8Nem] (Ohta and Kimura 1973). By rearranging the equation, solving for Ne at each locus, and averaging across all 48 loci, we obtain an estimate of Ne 5 216,819 (range 5 0–2.6 3 106) for the U.S.A. population and 328,278 (range 5 0–2.1 3 106) for the Africa population. Pooling the U.S.A. and Africa populations gives similar results (average Ne 5 268,368). For the infinitealleles model, H 5 4Nem/(1 1 4Nem). This model leads to estimates of Ne 5 65,512 (range 5 0–1.3 3 105) for the U.S.A. population, 80,823 (range 5 0–3.1 3 105) for the Africa population, and 73,087 for the two populations combined. We also estimated Ne from heterozygosity in five populations of D. melanogaster from throughout the world for the nine dinucleotide repeat loci isolated from GenBank (table 2) using our empirically determined dinucleotide repeat mutation rate of 9.331026. The average Ne across these loci for our worldwide sample of D. melanogaster is 530,466 (range 5 0–5.1 3 105) for the stepwise mutation model and 160,281 (range 5 0–5.3 3 106) for the infinite-alleles model. A similar calculation can be done using V. Under a single-step stepwise mutation model, V 5 4Nem (Slatkin 1995). Based on our measures of V, Ne 5 182,578

1753

(range 5 0–1.5 3 106) for the U.S.A. population, Ne 5 210,764 (range 5 0–1.0 3 106) for the Africa population, and Ne 5 196,671 (range 5 0–1.5 3 106) for the populations combined. For our worldwide sample of D. melanogaster, average Ne 5 202,377 (range 5 0–8.1 3 105). Calculations of Ne based on V under an infinitealleles or two-phase model require an estimate of the variance of the distribution of mutations larger than one step (e.g., DiRienzo et al. 1994), which we do not have at this time. The stepwise mutation model and the infinite-alleles model represent two extreme models of mutational processes. The mutation process of microsatellites in natural populations of D. melanogaster is most likely a mix between the models and may be better represented by a two-phase model which assumes that some of the mutations are stepwise, while others are independent of previous allelic states (DiRienzo et al. 1994). Accordingly, the true value of Ne at each locus probably lies between the estimates based on each model. We also note that in other studies (Schug et al. 1998a; unpublished data), we have demonstrated that variation at microsatellites in regions of low recombination is reduced due to selective sweeps or background selection. Thus, our estimate of Ne averaged across all microsatellite loci based on either mutation model may be an underestimate of the species Ne. An Independent Estimate of Ne from Single-Copy Nuclear Genes The most widely cited estimate of Ne for D. melanogaster (Ne 5 3.3 3 106; Kreitman 1983) is based on DNA sequence variation at Adh and estimates of mutation rate for electrophoretically detectable protein variants. Since population variation has been strongly influenced by balancing selection at this locus (Kreitman and Hudson 1991), this estimate of Ne may be inflated. We thus estimated Ne from seven additional genes for which measures of population variation at the DNA sequence level are available in the literature (table 3). Since selection reduces nucleotide variation in regions of low recombination (Begun and Aquadro 1992), Ne should be lowest in these regions and increase with rate of recombination. Our estimates of Ne for the eight single-copy nuclear genes follow this general pattern with the exceptions of Adh, z, and white (table 3). However, as noted above, Ne is most likely inflated at Adh due to polymorphism maintained by balancing selection (Kreitman and Hudson 1991). Ne at white appears to be reduced, possibly reflecting directional selection at this locus (Kirby and Stephan 1995). Variation at v also seems high, and while tests of neutrality were not violated, patterns of linkage disequilibrium raise the possibility of selection at or near this locus as well (e.g., Begun and Aquadro 1995). To the extent that the substitution rates are underestimates of the neutral mutation rate due to the selective constraints associated with codon bias, our estimates of Ne for all of the single-copy genes may, in fact, overestimate the species Ne. Ideally, we would like to compare estimates of Ne from microsatellites located very close to single-copy genes for

1754

Schug et al.

Table 1 Dinucleotide Repeat Variation Among Individuals Sampled from Two Populations of Drosophila melanogaster U.S.A.

AFRICA

LOCUS

RU

n

Mean

MCA

Max

H

V

n

Mean

MCA

Max

H

V

DM28 . . . . . . . . . . . DM30 . . . . . . . . . . . DM38 . . . . . . . . . . . DM40 . . . . . . . . . . . DM42 . . . . . . . . . . . DM54 . . . . . . . . . . . DM55 . . . . . . . . . . . DM58 . . . . . . . . . . . DM61 . . . . . . . . . . . DM73 . . . . . . . . . . . DM75 . . . . . . . . . . . DM78 . . . . . . . . . . . DM80 . . . . . . . . . . . DM83 . . . . . . . . . . . DM85 . . . . . . . . . . . DM86 . . . . . . . . . . . DM87 . . . . . . . . . . . DM88 . . . . . . . . . . . DM89 . . . . . . . . . . . DM92 . . . . . . . . . . . DM94 . . . . . . . . . . . DM95 . . . . . . . . . . . DM97 . . . . . . . . . . . DM98 . . . . . . . . . . . DM100 . . . . . . . . . . DM102 . . . . . . . . . . DM114 . . . . . . . . . . DM116 . . . . . . . . . . DM122 . . . . . . . . . . DM123 . . . . . . . . . . DMMP20 . . . . . . . . DMPROSPER . . . . DMU1951 . . . . . . . DMWHITE. . . . . . . DMZ3K25Z . . . . . . DRO13DC98 . . . . . DRO14DC95Z. . . . DRO17DC2Z . . . . . DRO17DC4Z . . . . . DROABDB. . . . . . . DROACS2 . . . . . . . DROGPAD . . . . . . DRONANOS . . . . . DRONINAC . . . . . . DROSEV1 . . . . . . . DROTROPONI . . . DROTROPI1 . . . . . DROYANET. . . . . .

CA CA CA AG CA AC AC CA AC AC AC AC AC AC AC CA CA AG CA CA CA CA CA AC AG AC CA CA AC AC CA AG AT AT AT AC AC AC AC AC AG AC AT AT AC AC AT AC

22 22 28 19 21 23 24 25 23 20 27 22 29 27 26 27 25 27 28 26 22 22 25 24 21 25 13 29 26 22 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48

21.09 5.98 7.73 12.97 8.90 26.37 6.88 11.02 8.70 34.80 10.88 2.95 8.97 8.83 5.07 8.77 8.62 12.87 2.48 12.20 12.04 8.93 15.14 8.45 12.61 9.80 10.07 10.93 23.73 6.77 9.84 10.18 19.23 17.84 13.43 14 6 14.55 11.93 15.17 6.72 23.81 20.79 15.63 17.09 6.46 19.98 16.96

22 6 8 13 9 24 8 9.5 9 36 12 3 9 8 6 10 8.5 12 2 12 12.5 9.5 12.5 7 12 10 10 11 24 6 10 11 20 14 13 14 6 13 12 15 6 24 19 15 17 12 20 13

23 6 9 16 14 32.5 8 13 10 46 13 4 11 12 6 11 10.5 15.5 15.5 13 15.5 9.5 25 13 14.5 11 14 11 24 14 12 14 27 19 15.5 14 6 15 13 21 8 28 30 21 17 12 24 23

0.52 0.45 0.46 0.83 0.49 0.53 0.43 0.65 0.64 0.95 0.62 0.33 0.36 0.58 0.52 0.45 0.16 0.83 0.87 0.49 0.82 0.40 0.85 0.57 0.75 0.59 0.79 0.07 0.15 0.55 0.57 0.53 0.85 0.77 0.88 0 0 0.65 0.55 0.54 0.51 0.45 0.82 0.74 0.50 0.71 0.69 0.85

2.50 0.48 0.41 6.96 4.45 11.90 9.95 3.48 1.38 55.0 2.50 0.69 4.42 0.88 1.03 6.38 0.19 2.47 6.50 2.57 3.80 7.84 57.25 3.99 3.23 1.66 2.30 0.55 0.93 5.50 3.31 2.08 8.14 3.78 5.40 0 0 0.95 1.06 2.90 0.99 3.85 7.51 0.79 16.56 12.49 12.64 29.06

19 21 18 13 19 15 23 13 20 12 19 18 19 21 16 15 18 17 12 16 17 16 18 21 19 19 10 21 19 19 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18

21.0 5.98 6.60 13.25 8.90 24.50 7.40 8.23 10.65 33.50 9.71 3.60 7.84 10.19 4.65 5.50 8.60 12.60 6.75 12.59 11.50 7.25 9.13 9.30 12.50 9.78 11.70 10.0 22.86 7.21 9.92 8.33 20.45 18.36 19.25 14 6 15.78 11 14.51 6.88 24.43 18.62 10.11 16.02 6.46 19.97 12.84

21 6 8 13 9 24 8 11.5 10 30 9 10 9 10 5 10 8.5 12 9 13 16 9.5 6.5 10 12 9 13 10 24 6 10 8 21 12 19 14 6 13 12 15 6 27 18 15 17.5 9 19 12

23 6 5 17 14 32.5 8 11.5 13 39 15 7 10 18 6.5 11 9.5 14 9 17.5 16 9.5 17 11 14 11 14 11 25 9 12 11 31 15.5 21 14 6 15 13 19 6 27 29 16.5 18 10 20 30

0.46 0.18 0.71 0.87 0.37 0.53 0.50 0.73 0.77 0.92 0.74 0.54 0.75 0.80 0.58 0.89 0.21 0.71 0.77 0.74 0.90 0.84 0.78 0.66 0.74 0.68 0.76 0.19 0.80 0.22 0.18 0.68 0.89 0.79 0.90 0 0 0.46 0.73 0.80 0.18 0.88 0.62 0.26 0.65 0.76 0.57 0.84

1.50 0.19 2.60 4.80 3.66 14.95 1.73 3.83 1.16 15.20 4.37 0.97 2.05 6.86 0.91 21.32 0.95 0.71 8.75 3.33 14.07 12.27 7.40 1.91 1.13 0.62 10.85 2.82 2.82 1.89 7.77 1.87 10.63 21.74 4.07 0 0 0.72 4.0 3.50 0.10 3.57 13.09 0.24 30.19 5.57 11.38 6.59

NOTE.—RU is the type of repeat unit, n is the number of chromosomes sampled, Mean is the mean number of repeat units in the sample, MCA is the number of repeat units in the most common allele in the sample, and Max is the repeat unit length of the longest allele in the sample. H and V are heterozygosity and variance in repeat units, respectively.

which sequence polymorphism is available. At present, this is only possible at one microsatellite locus, (DMWHITE) that is near the white gene. Our estimate from the microsatellite is Ne 5 2.9 3 105 (estimated from H using the stepwise mutation model), very close to the estimate from silent-site nucleotide variation (Ne 5 2.3 3 105). These results thus indicate that estimates of Ne from microsatellite heterozygosity are within the range of independent estimates from DNA sequence variation, and thus support our contention that the mutation rate estimates using our mutation accumulation lines reflect the mutation rate in natural populations.

Relative Mutation Rates Among Repeat Types We used a two-way factorial ANOVA on the raw microsatellite allele length data to determine the amount of variation in repeat unit length among individuals accounted for by different loci, populations, and repeat types. In this model, Repeat type, Locus, and Population are fixed effects, Repeat type and Population are crossclassified main effects, and Locus is nested within Repeat type. All main effects and interactions are highly significant (table 4). Significant differences found among loci for repeat unit length among the loci may be associated with different repeat lengths of the loci

Microsatellite Mutation Rate in D. melanogaster

1755

Table 2 Variance in Repeat Number for Di-, Tri-, and Tetranucleotide Repeat Microsatellites in Five Worldwide Populations of Drosophila melanogaster POPULATION LOCUS

REPEAT

U.S.A.

Africa

Australia

China

Ecuador

H

Dinucleotide DROABDB. . . . . . . . . . . DROACS2 . . . . . . . . . . . DROGPAD . . . . . . . . . . DRONANOS . . . . . . . . . DMPROSPER . . . . . . . . DMRHO. . . . . . . . . . . . . DROTROPONI . . . . . . . DMU1951 . . . . . . . . . . . DMWHITE. . . . . . . . . . .

(AC)19 (AG)7 (TG)9 (AT)18 (AG)12 (AC)10 (AT)14 (AT)16 (AT)13

10.15 0.45 7.50 5.61 0.75 2.24 28.93 5.41 2.88

3.20 0.84 11.52 20.66 2.47 2.82 10.99 10.67 24.74

6.79 0 2.35 30.03 1.91 2.10 13.20 4.05 2.66

7.10 0.05 0.85 12.17 3.50 1.64 17.62 7.74 2.31

5.40 0 9.24 16.14 3.89 4.58 20.70 8.98 2.85

0.76 0.17 0.68 0.88 0.70 0.83 0.86 0.90 0.88

Trinucleotide DMCATHPO . . . . . . . . . DROFAS . . . . . . . . . . . . DMZ60MEX. . . . . . . . . . DROSEV . . . . . . . . . . . . DROTYK . . . . . . . . . . . . DROYP3 . . . . . . . . . . . .

(ACC)6 (AGG)5 (GTT)8 (GTT)9 (ACA)5 (ACG)6

0.21 1.21 1.63 3.21 0.06 1.72

0.38 1.71 0.98 1.92 0.06 2.68

0 0.45 0.35 3.0 0 3.88

0 0.45 0.89 2.72 0 0.30

1.68 1.84 2.26 1.43 0.12 0

0.10 0.35 0.66 0.72 0.10 0.72

Tetranucleotide DMDELTEX . . . . . . . . . DMEHAB. . . . . . . . . . . . DROEXO . . . . . . . . . . . . DROHSP . . . . . . . . . . . . DMPCX . . . . . . . . . . . . . DRORUD. . . . . . . . . . . . DROZFP . . . . . . . . . . . . DRO15DC96Z . . . . . . . .

(AGTT)6 (AGCC)5 (CATA)8 (CAGC)5 (AAAC)5 (AATT)5 (TATG)7 (AAAT)5

1.89 0.38 0.05 0.11 1.63 0.12 0 0

1.57 0.05 5.14 0.95 1.59 0.40 0 0

1.20 0.20 1.54 0.54 1.52 0.13 0 0

0.89 0.22 1.90 0 3.23 0.08 0.42 0

2.06 0.62 2.12 0.96 3.94 0.02 0 0.48

0.70 0.27 0.23 0.66 0.82 0.73 0.04 0.04

NOTE.—Average heterozygosities are reported for each locus across populations. Allele frequencies, sample sizes (n 5 18–20 chromosomes), and individual heterozygosities are reported in Wetterstrand (1997) and are available on request from the authors.

selected from our GenBank search from those identified in the DNA library screen. Since it is not possible to separate these effects, we can not interpret the biological significance of this difference. Furthermore, inspection of the data suggests significant departures from a normal distribution, thus violating the assumptions of the ANOVA. Significant differences among populations and among repeat types and the interactions may reflect population history, evolutionary history of microsatellites in particular regions of the genome with different rates of recombination, and/or differences in mutation rate

among repeat types. Analysis of population history and influences of recombination for this data set will be presented elsewhere. Here, we focus on differences in mutation rate among repeat types. Slatkin (1995) and Kimmel et al. (1996) have shown that the variance in repeat number within populations is expected to be proportional to the product of Ne and m. Thus, the natural log of the expectation of the variance in repeat number (V) equals the natural log of Ne plus the natural log of m plus a constant. Because this is a linear relationship, a two-way ANOVA on ln(V)

Table 3 Estimates of Ne for Single-Copy Genes Gene

ACE 3 100a

pb

Divergence

Adh . . . . . . Adhr . . . . . z. . . . . . . . . Zw . . . . . . . Rh3 . . . . . . v. . . . . . . . . white . . . . . boss . . . . . .

1.98 1.98 2.40 3.08 3.25 4.80 6.17 7.34

0.0278 0.0059 0.0133 0.0125 0.0036 0.0175 0.0046 0.0595

0.047 0.132 0.142 0.096 0.093 0.149 0.125c 0.106

Ne 3.4 2.7 7.8 2.8 2.4 7.3 2.3 1.1

3 3 3 3 3 3 3 3

Reference 106 105 105 105 105 105 105 106

Laurie, Bridgham, and Choudhary (1991) Kreitman and Hudson (1991) Hey and Kliman (1993) Eanes, Kirchner, and Yoon (1993) Ayala, Chang, and Hartl (1993) Begun and Aquadro (1995) Kirby and Stephan (1995) Ayala and Hartl (1993)

NOTE.—Ne was calculated by rearranging p 5 4Nem for autosomal genes, or p 5 3Nem for X-linked genes, to solve for Ne. Divergence at silent sites between Drosophila melanogaster and Drosophila simulans was used to estimate substitution rate (m) assuming five generations per year and that the lineages diverged 2.5 MYA. Measures of silent site p and divergence are from referenced studies except where noted. a Rates of recombination, measured as the adjusted coefficients of exchange (ACEs) are from Kindahl (1994). b Estimates based on silent sites only. c Estimated from unpublished data.

1756

Schug et al.

Table 4 Two-way ANOVA on Raw Data (PCR Fragment Lengths of Di-, Tri-, and Tetranucleotide Repeats Among Five Population Samples of Drosophila melanogaster) Source

df

SS

MS

F-ratio

P

Repeat type . . . . . . . . . . . . . . . . . . . . . . . . Population . . . . . . . . . . . . . . . . . . . . . . . . . Repeat type 3 Population . . . . . . . . . . . . Locus(Repeat type) . . . . . . . . . . . . . . . . . . Population 3 Locus(Repeat type) . . . . . . Error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 4 8 20 80 2,041

32,495.25 70.16 120.39 10.979.91 1,145.33 7,125.47

16,247.63 17.54 15.05 549.00 14.32 3.49

4,653.93 5.02 4.31 157.25 4.10

,0.0001 ,0.0001 ,0.0001 ,0.0001 ,0.0001

NOTE.—df 5 degrees of freedom, SS 5 sum of squares, MS 5 mean square.

can be used to determine the amount of variance due to differences in Ne and m among populations and an interaction between repeat type and Ne (Chakraborty et al. 1997). The model assumes the mutational process is similar for all loci and that the loci assayed have similar Ne values within a population. The di-, tri-, and tetranucleotide repeat loci in this study are distributed across the genome and are thus not clustered in chromosomal locations with different Ne values (Wetterstrand 1997; Schug et al. 1998b). To perform this analysis, we first eliminated the data points for which V 5 0, and then log transformed V. These transformed data fit a normal distribution. Relative mutation rates for di-, tri-, and tetranucleotide repeats are significantly different (table 5). There is no effect of population size and no interaction effect. Multiple-comparison tests of pairwise differences in variation between repeat types demonstrate that only the difference in mutation rate between dinucleotide repeats and tri- and tetranucleotide repeats is significant. The same analysis with 0.01 added to each value of V, such that the zero values were not eliminated, gave nearly identical results. Since we find no significant contribution of Ne to variation among populations and no interaction effect, the difference between the natural log of the variance values among repeat types (5lnVi 2 lnVj, where i and j are different repeat types) can be used to estimate the relative differences in mutation rate among repeat types (Chakraborty et al. 1997). Under these conditions, we can calculate the magnitude of the differences among repeat types by calculating the average variance in repeat unit number among loci over all populations for each repeat type and determining their ratios. This analysis indicates that dinucleotide repeats mutate 6.4 times faster than trinucleotide repeats (57.55/1.17) and 8.4 Table 5 Two-Way ANOVA on Variance in Repeat Number of Di-, Tri-, and Tetranucleotide Repeats Among Five Population Samples of Drosophila melanogaster Source

df

Repeat type . . . . . . . . . . . . . Population . . . . . . . . . . . . . . Repeat type 3 Population . . . . . . . . . . . . . . Error . . . . . . . . . . . . . . . . . . .

2 4 8 84

SS

MS

F-ratio

P

89.44 44.72 24.56 ,0.0001 5.52 1.38 0.76 0.5549 3.48 152.94

0.43 1.83

0.24

0.9824

NOTE.—df 5 degrees of freedom, SS 5 sum of squares, MS 5 mean square.

times faster than tetranucleotide repeats (57.55/0.854). Trinucleotide repeats mutate 1.3 times faster than tetranucleotide repeats (51.17/0.854). If we use our empirically determined mutation rate of 9.3 3 1026 for dinucleotide repeats as a baseline, then we can infer the mutation rate for trinucleotide repeats to be 1.5 3 1026 (59.3 3 1026/6.4) and that for tetranucleotide repeats to be 1.1 3 1026 (59.3 3 1026/8.4). Differences in mutation rate among di-, tri-, and tetranucleotide repeats estimated from variance in repeat number are consistent with mutation rates inferred from microsatellite heterozygosity. For the nine di-, six tri-, and eight tetranucleotide repeats, average H 5 0.74, 0.44, and 0.45, respectively, in the worldwide sample of D. melanogaster populations (table 2; Wetterstrand 1997). Using the expression Nem 5 H/4(1 2 H) for the infinite-alleles model, Nem 5 0.71, 0.19, and 0.20 for di-, tri-, and tetranucleotide repeats, respectively. Assuming Ne is the same for each repeat type, the relative magnitude of m can be estimated by determining their ratios: 3.73 (50.71/0.19) for di- versus trinucleotide repeats, 3.55 (50.71/0.20) for di- versus tetranucleotide repeats, and 0.95 (50.19/0.20) for tri- versus tetranucleotide repeats. Using the expression for the stepwise mutation model gives similar results. Thus, the relative differences in the levels of microsatellite heterozygosity are also consistent with dinucleotide repeats having a mutation rate approximately four times higher than that of tri- and tetranucleotide repeats. Repeat Unit Length and Population Variation To directly test the hypothesis that the differences in mutation rate among repeat types are a function of microsatellite repeat unit length, we compare variation at dinucleotide repeats with short repeat lengths (similar to those of tri- and tetranucleotide repeats) and long triand tetranucleotide repeats (similar to those of dinucleotide repeats). Six of the dinucleotide repeat microsatellites identified from the DNA library search were shorter than eight repeat units (DM30, DM38, DM85, DM87, DM116, and DM123). These tend to have among the lowest level of variation for dinucleotide repeats (table 1). Our GenBank search did not reveal any long trior tetranucleotide repeats in noncoding DNA. Thus, we cannot make the reciprocal comparison. A correlation between population variation and repeat unit length should reflect a difference in mutation rate among microsatellites of different lengths. To test

Microsatellite Mutation Rate in D. melanogaster

1757

FIG. 1.—Relationship between number of repeat units and variance in repeat number for 48 dinucleotide repeat microsatellites in a natural population of D. melanogaster from Maryland and Zimbabwe. Mean allele length 5 the average number of tandemly repeated units in a sample of 13–48 isofemale lines established from a natural population. Most common allele 5 the number of repeat units in the most common allele, and Maximum repeat unit length 5 the number of repeat units in the longest allele. * P , 0.05; ** P , 0.01.

this hypothesis, we examined the relationship between three measures of repeat unit length (Mean 5 mean repeat unit length, MCA 5 length of most common allele, and Max 5 maximum repeat unit length) and two measures of population variation (V and H) in D. melanogaster populations from the U.S.A. and Africa for 48 of the 49 dinucleotide repeats (table 1). V was log transformed such that it was normally distributed. The correlations between log(V) and all three measures of repeat

unit length were positive, but they were stronger and significant more often in the U.S.A. population than in the Africa population (fig. 1). In both populations, log(V) is more strongly correlated to Max than Mean or MCA (fig. 1). Correlations between H and measures of repeat unit length show similar patterns but are not significant, possibly reflecting the fact that H is a frequency measure and thus saturates rather quickly as the number and frequency of alleles increase. We have seen similar

1758

Schug et al.

results in our studies of the influences of recombination rates on microsatellite variation (Schug et al. 1998a). Discussion Dinucleotide Repeat Mutation Rate Our best direct estimate of the dinucleotide repeat microsatellite mutation rate in D. melanogaster is 9.3 3 1026 (95% CI 5 2.72 3 1025 to 1.92 3 1026) per locus per generation. Although the inclusion of tri- and tetranucleotide repeat loci in our earlier study decreased our estimate of mutation rate slightly, the mutation rate of dinucleotide repeats alone remains much lower than that reported for mammals (m 5 1023–1025 per locus per generation; e.g., Dietrich et al. 1992; Serikawa et al. 1992; Weber and Wong 1993; Ellegren 1995). Our empirical estimate of mutation rate at microsatellites does not appear to be a characteristic confined to the mutation accumulation lines we assayed. We have shown here that estimates of Ne calculated using our mutation rate estimate for microsatellites and levels of segregating variation in natural populations of D. melanogaster are consistent with independent estimates of Ne at several single-copy nuclear genes. Furthermore, Schlo¨tterer et al. (1998) have recently found a similar mutation rate (6.3 3 1026 per locus per generation for 24 dinucleotide repeats) for D. melanogaster in an independent set of mutation accumulation lines. Rubinsztein et al. (1995) and Amos and Harwood (1998) have proposed that microsatellite mutation rates might be higher in larger populations because the increased number of heterozygotes will increase the probability that unequal length alleles will pair, thus increasing microsatellite instability by unequal crossing over. This hypothesis might lead one to propose that our empirical estimates of mutation rates are low relative to those for mammals because of the small population sizes maintained in our mutation accumulation lines (Ne ø 14). The congruence between estimates of Ne based on our empirical estimate of microsatellite mutation rate and microsatellite heterozygosity in natural populations and independent estimates of Ne based on DNA sequence variation at single-copy nuclear genes indicates that our mutation rate is not biased downward. In fact, the consistency between our measure of Ne using microsatellites and measures of Ne using nucleotide substitution polymorphism indicates that our estimates are similar to the mutation rate of microsatellites in natural populations of D. melanogaster. Our data, then, do not support the population size/mutation rate correlation hypothesis of Rubinsztein et al. (1995) and Amos and Harwood (1998). Repeat Unit Length and Mutation Rate Several lines of evidence suggest that the differences in mutation rate among di-, tri-, and tetranucleotide repeats are associated with repeat unit length, and that longer microsatellites have higher mutation rates. First, the dinucleotide repeats assayed in our study are considerably longer on average (mean 5 13.1) and more variable than tri- and tetranucleotide repeats (means 5

6.5 and 5.75, respectively; table 2). It is unlikely that the lower levels of variation at tri- and tetranucleotide repeats reflect differences in Ne associated with regional rates of recombination, because they are interspersed throughout the genome. Second, the positive correlations between repeat unit length and variance in repeat unit number are consistent with higher mutation rates in longer microsatellites. Goldstein and Clark (1995) observed a similar positive correlation between maximum repeat unit length and variation for 11 di- and 7 trinucleotide repeats in an earlier study of D. melanogaster microsatellites. Our study and that of Goldstein and Clark (1995) are also consistent with in vitro studies of yeast which report that long dinucleotide repeat units are more mutable than shorter dinucleotide repeats (Wierdl, Dominska, and Petes 1997). Similar studies come from direct studies of microsatellite mutations in humans (Brinkmann et al. 1998) and barn swallows (Primmer et al. 1998). Schlo¨tterer et al. (1998) also found that a particularly long dinucleotide repeat allele (28 repeat units) in D. melanogaster had an unusually high mutation rate in mutation accumulation lines. From the correlation index (r2), we estimate that 14%–30% of the variation among microsatellites in the U.S.A. population and 13%–25% of that in the Africa population is due to differences in mutation rate associated with repeat unit length. In other studies, we have demonstrated that selection and recombination also contribute to differences in microsatellite variation among loci (Schug et al. 1998a). Other factors which may contribute to differences among loci within a population include different modes of mutation, as has been recently shown for humans (DiRienzo et al. 1998), and statistical and evolutionary variance associated with the measurement of microsatellite variation such as variance in repeat unit (Kimmel and Chakraborty 1996; Pritchard and Feldman 1996). Whether the difference in r2 we observe between the U.S.A. and Africa populations of D. melanogaster is associated with different sample sizes used for the analysis, different population histories, or perhaps different frequencies of inversion polymorphisms at chromosome positions at which the microsatellites are located is unclear. We do note, however, that we have noticed similar differences between U.S.A. and Africa populations in the effects of selection and rate of recombination on microsatellite variation for 18 dinucleotide repeats from the GenBank screen (Schug et al. 1998a; unpublished data). In our study of D. melanogaster, the differences in mutation rate among repeat types appears to be larger than those for the human microsatellites examined by Chakraborty et al. (1997). Both our data and those of Chakraborty et al. (1997) conflict with the study of human di- and tetranucleotide repeats by Weber and Wong (1993). In assays of pedigrees for spontaneous mutations, Weber and Wong (1993) observed a larger number of spontaneous mutations at tetranucleotide repeat than at dinucleotide repeat loci in assays of pedigrees. Chakraborty et al. suggested that the discrepancy between their study and Weber and Wong’s might reflect a stricter

Microsatellite Mutation Rate in D. melanogaster

constraint on the number of repeats for tetranucleotide than for dinucleotide repeats in natural populations. An alternative explanation suggested by our study is that the tetranucleotide repeat loci assayed by Weber and Wong are longer than the dinucleotide repeat loci assayed by Chakraborty et al., hence the higher levels of variation and higher estimates of mutation rate. The lack of sequence information on repeat length of the loci assayed in these two studies prevents a test of this hypothesis. The relationship between repeat unit length and mutation rate reported here and in previous studies raises the question of whether mutation rates are low in D. melanogaster because the distribution of repeat unit lengths is short (Schug, Mackay, and Aquadro 1997; Schug et al. 1998b) or the distribution of repeat unit lengths is short in D. melanogaster because mutation rates are low. Using a Markov Chain model, Kruglyak et al. (1998) recently demonstrated that the distribution of repeat unit lengths across the genomes of D. melanogaster, humans, mice, and yeast can be explained by a balance between the rate at which DNA slippage generates new length mutations and the rate at which point mutations interrupt microsatellite repeats, causing them to ‘‘decay.’’ By holding the substitution rate constant across species and repeat types and varying the slippage rates, they could predict the distribution of repeat unit lengths for different repeat types within D. melanogaster and the distribution of dinucleotide repeat lengths among species. Under this model, slippage rates for different repeat types in D. melanogaster are similar to those estimated in this study. Furthermore, the model predicted slippage rates for dinucleotide repeats in D. melanogaster, humans, mice, and yeast that are similar to empirical estimates. This suggests that differences in repeat unit lengths among di-, tri-, and tetranucleotide repeats and among species are due to simple differences in DNA slippage rates. The results we present here and those of other studies (Brinkmann et al. 1998; Harr et al. 1998; Primmer and Ellegren 1998; Primmer et al. 1998) indicate that mutation rates vary considerably among repeat types and that repeat unit length reflects the mutation rate both among and within classes of repeat types. It will thus be important to account for differences in mutation rate among microsatellite loci in studies of population structure (e.g., England, Briscoe, and Frankham 1996; Irvin et al. 1998), examining genomewide relationships between recombination and variation (Michalakis and Veuille 1996; Schug et al. 1998a), testing models of selection (Schug et al. 1998a), estimating Ne for natural populations (Lehmann et al. 1998), and using microsatellites to date evolutionary events (Tishkoff et al. 1996; Hammer et al. 1998; Stephens et al. 1998). Acknowledgments We thank R. Durrett, S. Kruglyak, M. Noor, M. Pascual, and members of the Aquadro lab for advice. This work was supported by a National Institute of Health National Research Service Fellowship to M.D.S.,

1759

National Institute of Health grants to T.F.C.M. and C.F.A., and a Howard Hughes Medical Institute predoctoral fellowship to C.M.H. LITERATURE CITED

AMOS, W., and J. HARWOOD. 1998. Factors affecting levels of genetic diversity in natural populations. Philos. Trans. R. Soc. Lond. B Biol. Sci. 353:177–186. AYALA, F. J., B. S. W. CHANG, and D. L. HARTL. 1993. Molecular evolution of the rh3 gene in Drosophila. Genetica 92:23–32 AYALA, F. J., and D. L. HARTL. 1993. Molecular drift of the bride of sevenless (boss) gene in Drosophila. Mol. Biol. Evol. 10:1030–1040. BEGUN, D. J., and C. F. AQUADRO. 1992. Levels of naturally occurring DNA polymorphism correlate with recombination rates in Drosophila melanogaster. Nature 356:519–520. . 1995. Molecular variation at the vermilion locus in geographically diverse populations of Drosophila-melanogaster and Drosophila-simulans. Genetics 140:1019–1032. BRINKMANN, B., M. KLINTSCHAR, F. NEUHUBER, J. HUHNE, and B. ROLF. 1998. Mutation rate in human microsatellites: influence of the structure and length of the tandem repeat. Am. J. Hum. Genet. 62:1408–1415. CASSELLA, G., and R. L. BERGER. 1990. Statistical inference. Duxbury Press, Belmont, Calif. CHAKRABORTY, R., M. KIMMEL, D. N. STIVERS, L. J. DAVISON, and R. DEKA. 1997. Relative mutation rates at di-, tri-, and tetranucleotide microsatellite loci. Proc. Natl. Acad. Sci. USA 94:1041–1046. DALLAS, J. F. 1992. Estimation of microsatellite mutation rates in recombinant inbred strains of mouse. Mamm. Genome 3:452–456. DAVID, J. R., and P. CAPY. 1988. Genetic variation of Drosophila melanogaster natural populations. Trends Genet. 4: 106–111. DIETRICH, W., H. KATZ, S. E. LINCOLN, H. S. SHIN, J. FRIEDMAN, N. C. DRACOPOLI, and E. S. LANDER. 1992. A genetic map of the mouse suitable for typing intraspecific crosses. Genetics 131:423–447. DIRIENZO, A., P. DONNELLY, C. TOOMAJIAN, B. SISK, A. HILL, M. L. PETZL-ERLER, G. K. HAINES, and D. H. BARCH. 1998. Heterogeneity of microsatellite mutations within and between loci, and implications for human demographic histories. Genetics 148:1269–1284. DIRIENZO, A., A. C. PETERSON, J. D. GARZA, A. M. VALDEZ, M. SLATKIN, and N. B. FREIMER. 1994. Mutational processes of simple-sequence repeat loci in human populations. Proc. Natl. Acad. Sci. USA 91:3166–3170. EANES, W. F., M. KIRCHNER, and J. YOON. 1993. Evidence for adaptive evolution of the G6pd gene in the Drosophila melanogaster and Drosophila simulans lineages. Proc. Natl. Acad. Sci. USA 90:7475–7479. ELLEGREN, H. 1995. Mutation-rates at porcine microsatellite loci. Mamm. Genome 6:376–377. ENGLAND, P. R., D. A. BRISCOE, and R. FRANKHAM 1996. Microsatellite polymorphisms in a wild population of Drosophila melanogaster. Genet. Res. Camb. 67:285–290. GOLDSTEIN, D. B., and A. G. CLARK. 1995. Microsatellite variation in North American populations of Drosophila melanogaster. Nucleic Acids Res. 23:3882–3886. GOLDSTEIN, D. B., L. A. ZHIVOTOVSKY, K. NAYAR, A. R. LINARES, L. L. CAVALLI-SFORZA, and M. W. FELDMAN. 1996. Statistical properties of the variation at linked microsatellite loci: implications for the history of human Y chromosomes. Mol. Biol. Evol. 13:1213–1218.

1760

Schug et al.

HAMMER, M. F., T. KARAFET, A. RASANAYAGAM, E. T. WOOD, T. K. ALTHEIDE, T. JENKINS, R. C. GRIFFETHS, A. R. TEMPLETON, and S. L. ZEGURA. 1998. Out of Africa and back again: nested cladistic analysis of human Y chromosome variation. Mol. Biol. Evol. 15:427–441. HARR, B., B. ZANGERL, G. BREM, and C. SCHLO¨TTERER. 1998. Conservation of locus-specific microsatellite variability across species: a comparison of two Drosophila sibling species, D. melanogaster and D. simulans. Mol. Biol. Evol. 15: 176–184. HEY, J., and R. M. KLIMAN. 1993. Population-genetics and phylogenetics of DNA-sequence variation at multiple loci within the Drosophila melanogaster species complex. Mol. Biol. Evol. 10:804–822. IRVIN, S. D., K. A. WETTERSTRAND, C. M. HUTTER, and C. F. AQUADRO. 1998. Genetic variation and differentiation of microsatellite loci in Drosophila simulans: evidence for founder effects in new world populations. Genetics 15:777– 790. KIMMEL, M., and R. CHAKRABORTY. 1996. Measures of variation at DNA repeat loci under a general stepwise mutation model. Theor. Popul. Biol. 50:345–367. KIMMEL, M., R. CHAKRABORTY, D. STIVERS, and R. DEKA. 1996. Dynamics of repeat polymorphisms under a forwardbackward mutation model: within- and between-population variability at microsatellite loci. Genetics 143:549–555. KINDAHL, E. C. 1994. Recombination and DNA polymorphism on the third chromosome of Drosophila melanogaster. Ph.D. dissertation, Cornell University, Ithaca, NY. KIRBY, D. A., and W. STEPHAN. 1995. Haplotype test reveals departure from neutrality in a segment of the white gene of Drosophila melanogaster. Genetics 141:1483–1490. KREITMAN, M. 1983. Nucleotide polymorphism at the alcohol dehydrogenase locus of Drosophila melanogaster. Nature 304:412–417. KREITMAN, M., and R. R. HUDSON. 1991. Inferring the evolutionary histories of the Adh and Adh-dup loci in Drosophila melanogaster from patterns of polymorphism and divergence. Genetics 127:565–582. KRUGLYAK, S., R. DURRET, M. D. SCHUG, and C. F. AQUADRO. 1998. Equilibrium distributions of microsatellite repeat length resulting from a balance between slippage events and point mutations. Proc. Natl. Acad. Sci. USA 95:10774– 10778. LAURIE, C. C., J. T. BRIDGHAM, and M. CHOUDHARY. 1991. Associations between DNA sequence variation and variation in expression of the Adh gene in natural-populations of Drosophila melanogaster. Genetics 129:489–499. LEHMANN, T., W. A. HAWLEY, H. GREBERT, and F. H. COLLINS. 1998. The effective population size of Anopheles gambiae in Kenya: implications for population structure. Mol. Biol. Evol. 15:264–276. LI, W.-H. 1997. Molecular evolution. Sinauer, Sunderland, Mass. MACKAY, T. F. C., R. T. LYMAN, and W. G. HILL. 1994. Polygenic mutation in Drosophila melanogaster: estimates from response to selection of inbred strains. Genetics 136:937– 951. . 1995. Polygenic mutation in Drosophila melanogaster—nonlinear divergence among unselected strains. Genetics 139:849–859. MICHALAKIS, Y., and M. VEUILLE. 1996. Length variation of CAG/CAA trinucleotide repeats in natural populations of

Drosophila melanogaster and its relation to the recombination rate. Genetics 143:1713–1725. OHTA, T., and M. KUMURA. 1973. A model of mutation appropriate to estimate the number of electrophoretically detectable alleles in a finite population. Genet. Res. 22:201–204. PRIMMER, C. R., and H. ELLEGREN. 1998. Patterns of molecular evolution in avian microsatellites. Mol. Biol. Evol. 15:997– 1008. PRIMMER, C. R., N. SAINO, A. P. MOLLER, and H. ELLEGREN. 1998. Unraveling the processes of microsatellite evolution through analysis of germ line mutations in barn swallows, Hirundo rustica. Mol. Biol. Evol. 15:1047–1054. PRITCHARD, J. K., and M. W. FELDMAN. 1996. Statistics for microsatellite variation based on coalescence. Theor. Popul. Biol. 50:325–344. RUBINSZTEIN, D. C., W. AMOS, J. LEGGO, S. GOODBURN, S. JAIN, S. H. LI, R. L. MARGOLIS, C. A. ROSS, and M. A. FERGUSON-SMITH. 1995. Microsatellite evolution—evidence for directionality and variation in rate between species. Nat. Genet. 10:337–343. SCHLO¨TTERER, C., R. RITTER, B. HARR, and G. BREM. 1998. High mutation rate of a long microsatellite allele in Drosophila melanogaster provides evidence for allele specific mutation rates. Mol. Biol. Evol. 15:1269–1274. SCHLO¨TTERER, C., C. VOGL, and D. TAUTZ. 1997. Polymorphism and locus-specific effects on polymorphism at microsatellite loci in natural Drosophila melanogaster populations. Genetics 146:309–320. SCHUG, M. D., T. F. C. MACKAY, and C. F. AQUADRO. 1997. Low mutation rates of microsatellites in Drosophila melanogaster. Nat. Genet. 15:99–102. SCHUG, M. D., C. M. HUTTER, M. A. F. NOOR, and C. F. AQUADRO. 1998a. Mutation and evolution of microsatellites in D. melanogaster. Genetica 102/103:359–367. SCHUG, M. D., K. A. WETTERSTRAND, M. S. GAUDETTE, R. H. LIM, C. M. HUTTER, and C. F. AQUADRO. 1998b. The distribution and frequency of microsatellites in Drosophila melanogaster. Mol. Ecol. 7:57–70. SERIKAWA, T., T. KURAMOTO, P. HILBERT et al. (11 co-authors). 1992. Rat gene mapping using PCR-analyzed microsatellites. Genetics 131:701–721. SINGH, R. S. 1989. Population genetics and evolution of species related to Drosophila melanogaster. Annu. Rev. Genet. 23:425–453. SLATKIN, M. 1995. Hitchhiking and associative overdominance at a microsatellite locus. Mol. Biol. Evol. 12:473–480. STEPHENS, D., D. E. REICH, D. B. GOLDSTEIN et al. (39 coauthors). 1998. Dating the origin of the CCR5-D32 AIDSresistance allele by the coalescence of haplotypes. Am. J. Hum. Genet. 62:1507–1515. TISHKOFF, S. A., E. DIETZSCH, W. SPEED et al. (14 co-authors). 1996. Global patterns of linkage disequilibrium at the CD4 locus and modern human origins. Science 271:1380–1387. WEBER, J. L., and C. WONG. 1993. Mutation of human short tandem repeats. Hum. Mol. Genet. 2:1123–1128. WETTERSTRAND, K. S. 1997. Microsatellite polymorphism and divergence in worldwide populations of Drosophila melanogaster and D. simulans. M.Sc. thesis, Cornell University, Ithaca, N.Y. WIERDL, M., M. DOMINSKA, and T. D. PETES. 1997. Microsatellite instability in yeast: dependence on the length of the microsatellite. Genetics 146:769–779.

DAVID M. RAND, reviewing editor Accepted September 15, 1998