Evolution of the Cycloidea Gene Family in Antirrhinum and Misopates

4 downloads 0 Views 525KB Size Report
were performed using SeqPup, version 0.6f (Gilbert. 1995). The numbers of synonymous ...... RNAse. Plant Cell 8:805–814. PAMELA SOLTIS, reviewing editor.
Evolution of the Cycloidea Gene Family in Antirrhinum and Misopates Cristina P. Vieira, Jorge Vieira, and Deborah Charlesworth Institute of Cell, Animal and Population Biology, University of Edinburgh, Edinburgh, Scotland Studies at the nucleotide level on the nuclear flower development gene cycloidea (cyc) in seven Antirrhinum, two Misopates, one Linaria, one Cymbalaria, and one Digitalis species revealed that cyc is a member of a gene family composed of at least five apparently functional genes. The estimated ages of the duplication events that created this gene family are from 7.5 Myr to more than 75 Myr. We also report the first estimates of DNA sequence diversity for species of Antirrhinum and Misopates. Low between-species variability suggests that this group of species may have diverged recently.

Introduction At least six genes are known to be involved in development of the zygomorphic flowers in Antirrhinum (Reeves and Olmstead 1998), yet little is known about how these genes evolve. While many development genes (like the MADS-box genes) may be pleiotropic (Kramer, Dorit, and Irish 1998), cycloidea (cyc) seems to be involved only in the shaping of the angiosperm flower (Luo et al. 1995). This gene has an uninterrupted open reading frame encoding a putative protein of 286 amino acids that is necessary to establish full dorsoventral asymmetry (Luo et al. 1995). There is a putative nuclear localization signal, suggesting that cyc may play a role in transcription regulation. At least one other gene, dichotoma, which is also necessary for dorsoventral asymmetry, apparently has high amino acid similarity to cyc (unpublished results cited in Luo et al. 1995). It is therefore possible that cyc belongs to a small gene family. Genetic analyses of mutants obtained by transposon mutagenesis suggested that the radialis gene, which is linked to cyc, is also involved in dorsoventral asymmetry. However, the alternative explanation that cyc is a complex locus composed of two interacting functional components was not ruled out (Carpenter and Coen 1990). In the course of a study attempting to estimate DNA sequence diversity for nuclear loci within species of Antirrhinum and Misopates, we found evidence that cyc is a member of a gene family. Here, we report evidence for the existence of at least five functional genes: the cyc gene already known (which we will call cyc1A, see below) plus cyc1B, cyc2, cyc3, and cyc4. These genes are present in all of the species analyzed (seven Antirrhinum, two Misopates, one Linaria, one Cymbalaria, and one Digitalis species), suggesting that these gene duplications occurred before the split between the lineage leading to Digitalis and Antirrhinum/Misopates/ Linaria/Cymbalaria. Because it has been suggested that both allozyme diversity (reviewed by Schoen and Brown 1991; Hamrick and Godt 1996; Charlesworth and Yang 1998) and DNA sequence variation (Awadalla Key words: Cycloidea, gene family, DNA sequence variation, Antirrhinum. Address for correspondence and reprints: Deborah Charlesworth, I.C.A.P.B. University of Edinburgh, Ashworth Laboratories, King’s Buildings, West Mains Road, Edinburgh EH9 3JT, U.K. E-mail: [email protected]. Mol. Biol. Evol. 16(11):1474–1483. 1999 q 1999 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038

1474

and Ritland 1997; Liu, Zhang, and Charlesworth 1998) are affected by the mating system, we were interested in testing whether breeding system affects DNA sequence diversity in the taxa studied here. The duplications make the study of diversity very difficult. Nevertheless, diversity appears to be reduced in inbreeders. Materials and Methods Plant Material Leaves from all of the species except Antirrhinum valentinum, Antirrhinum siculum, and the Alg population of Antirrhinum majus subsp. cirrigherum were collected in the field in summer 1997 and 1998 (see table 1). Leaves from A. valentinum and A. siculum were kindly provided by Isabel Mateu, while leaves from A. majus subsp. cirrigherum (population Alg) were kindly provided by Andrew Hudson. Based on morphological characters, several Antirrhinum taxa are disputed, with some authors regarding them as different species and others regarding them as subspecies (Halliday and Beadle 1983; Doaigey and Harkiss 1991). We follow the identification of Franco (1971) for both Antirrhinum and Misopates. DNA Extraction and PCR Amplification Genomic DNA was prepared from leaves of individual plants using the method of Ingram et al. (1997). The cyc gene sequence deposited in GenBank (accession number Y16313) was used to construct the following primer pairs (all positions are relative to the cyc start codon): P1 (59-TTGGGAAGAACACATACCTA-39) (position 5) and P2 (59-AATTGATGAACTTGTGCTGAT-39) (position 859) Cu (59-CTTGAGTCCACCGCTTTGTT-39) (position 154) and Cl (59-CGTTGCCATAGTTTTGCTGA-39) (position 772). The primers C1 (59ACCACCACGGCCACCACCA-39) and C2 (59-AAATCCAAACATTGAAGGG-39 were designed based on our determination of the sequence of cyc4 (see below); they specifically amplify cyc3 and cyc4 but not cyc1A, cyc1B, or cyc2 (see below). Based on the GenBank sequence, PCR reactions using primers P1 and P2 are expected to amplify an 855-bp fragment; Cu and Cl are expected to yield a 619-bp fragment (the latter set of primers was used to amplify the genes cyc1A, cyc1B, and cyc2; see below). Based on the new cyc3/cyc4 genes (see below), C1 and C2 are expected to yield a 749-bp fragment. Standard amplification conditions were 30 cycles of denaturation at 948C for 30 s, primer annealing

Cycloidea—The Evolution of a Gene Family

1475

Table 1 Taxa, Code Numbers, Collection Localities, and Years Locality

Collection Year

orontiumB orontiumG calycinum

Braganca, Portugal Vila Nova de Gaia, Portugal Coimbra, Portugal

1997 1997 1998

molle braun-blanquetii graniticumB graniticumS linkianum cirrhigerumAve cirrhigerumAlg valentinum siculum2 siculum6 siculum7

Braganca, Portugal Braganca, Portugal Braganca, Portugal Chaves, Portugal Coimbra, Portugal Aveiro, Portugal Algarve, Portugal

1998 1998 1997 1998 1998 1997

Linaria triornithophora . . . . . . . . . . . . .

Linaria

Braganca, Portugal

1998

Cymbalaria muralis. . . . . . . . . . . . . . . . . . . .

Cymbalaria

Edinburgh, Scotland

1998

Digitalis purpurea . . . . . . . . . . . . . . . . . .

Digitalis

Braganca, Portugal

1998

Species Misopates orontium . . . . . . . . . . . . . . . . . . calycinum . . . . . . . . . . . . . . . . . Antirrhinum molle lopesianum . . . . . . . . . . . braun-blanquetii . . . . . . . . . . . . graniticum . . . . . . . . . . . . . . . . . majus subsp. linkianum . . . . . . majus subsp. cirrhigerum . . . . . valentinum . . . . . . . . . . . . . . . . . siculum . . . . . . . . . . . . . . . . . . .

Population Code

at 538C for 30 s, and primer extension at 728C for 2 min. Cloning and Sequencing The PCR products were cloned into the pCR 2.1 vector using the TA cloning kit (Invitrogen). DNA sequencing was performed with an Applied Biosystems model 377 DNA sequencing system with the ABI PRISM Dye Termination cycle-sequencing kit (Perkin Elmer) using the primers for the M13 forward and M13 reverse priming sites of the pCR 2.1 vector. Since a low rate of nucleotide misincorporation occurs in PCR reactions, it is known that this approach will lead to some sequencing errors. Such errors may be a problem for detailed diversity studies but are usually negligible when one is characterizing gene duplications or estimating divergence, because they represent a small proportion of the number of sequence differences. Some of the newly characterized cycloidea genes are very similar (see Results and Discussion), however, and therefore it is technically very difficult to detect possible sequencing errors. Analyses of the Sequences DNA sequences were deposited in GenBank (accession numbers AF146833–146848 for cyc1A, AF146849–146862 for cyc1B, AF146863–146871 for cyc2, AF146872–146873 for cyc3, and AF146874– 146880 for cyc4). The nucleotide sequences to be compared were aligned using CLUSTAL X, version 1.64b (Thompson et al. 1997), and minor manual adjustments were performed using SeqPup, version 0.6f (Gilbert 1995). The numbers of synonymous and nonsynonynous differences between pairs of sequences were calculated

Sicily, Italy Napoli, Italy Israel

using the DnaSP software (Rozas and Rozas 1997). Neighbor-joining trees were generated using the homologous regions sequenced in all the genes with MEGA, version 1.01 (Kumar, Tamura, and Nei 1994). The alignment of the 468-bp sequence region used to generate the neighbor-joining trees is deposited in the EMBL Nucleotide Sequence Database (alignment number ds38407). Results Characterization of Antirrhinum cyc Loci Evidence for a New Cycloidea Locus, cyc2 When PCR was performed using the primers P1 and P2, two bands with different sizes were observed. One was 855 bp (the expected fragment size; see Materials and Methods), and the other 950 bp. The smaller band was cloned from two individuals of all taxa analyzed (table 1), and in 30 cases the nucleotide sequences were determined. When these sequences were analyzed, it was evident that while some sequences from Misopates orontium were only 840 bp long, others were 855 bp long. This 15-bp difference cannot be resolved in a 1.6% agarose gel. There were 103 variable nucleotide sites and four variable indels among the 30 DNA sequences analyzed. The sequences with the shorter and longer amplification products differ by 40 fixed differences and 3 indels. The large number of differences strongly suggests that these two groups of sequences represent two genes. We therefore tested whether both types of sequences are present in every individual analyzed. Between the two groups of sequences there is a fixed difference for a MspI restriction site (present in the 840-bp group of sequences but not in the 855-bp sequences). We used a

1476

Vieira et al.

FIG. 1.—Evidence that the cyc1 and cyc2 genes are present in every individual of each species analyzed. A, Schematic representation of the MspI restriction sites present in the 619-bp amplification product (obtained with the Cu and Cl primers) of the cyc1 and cyc2 genes. The fixed difference in the MspI restriction site between cyc1 and cyc2 can be used to distinguish the two genes. B, MspI restriction pattern of the 619-bp amplification product of cyc1 and cyc2. The arrows point to the expected 568- and 460-bp bands that are always observed; it is not possible to resolve the 51- and 108-bp bands in a 1.6% agarose gel. Sequence names are according to the population codes in table 1. From left to right at the top: 1-kb and 100-bp DNA ladder, orontiumB-1 and -2, orontiumG-1 and -2, calycinum-1 and -2, molle-1 and -2, braunblanquetii-1 and -2, cirrhigerumAlg-1 and -2, cirrhigerumAve-2 and -3, and linkianum-7 and -9; from left to right at the bottom: 1-kb and 100bp DNA ladder, graniticumB-1 and -2, graniticumS-1 and -2, valentinum-241, siculum6-13, siculum7-2, Digitalis, Cymbalaria, and Linaria.

shorter amplification product (619 bp, using the Cu and Cl primers; see Materials and Methods) that still included the region containing the MspI site to test for the presence of the sequence types. Figure 1 shows the restriction patterns of the products obtained from two randomly chosen individuals of each species. (Note that four individuals of each species were analyzed, but only two of each species are shown in the figure because no differences in the banding patterns were seen.) These results show that both types of sequences are present in every individual of every species analyzed; i.e., we see the pattern expected if there are two different genes. The cyc gene deposited in the GenBank database does not have the MspI restriction site, and we therefore named the gene with the MspI restriction site cyc2. The fixed MspI restriction site difference between the genes also allows us to clone and sequence the cyc2 gene. Evidence for a Further Cycloidea Locus, cyc1B As shown above, the expected 855-bp band obtained with the P1 and P2 primers is not homogeneous, but is a mixture of two bands with different sizes (855 and 840 bp). When the sequence data from the 855-bp band were analyzed, the complete set of sequences could again be classified into two groups, differing by three nucleotides (at positions 253 and 254, group 1 has G and G and group 2 has A and A, while at position 539, group 1 has C and group 2 has G). In addition, a further variant was found between the two groups at position 786 (A in most group 1 sequences and G in group 2). This difference is not completely fixed, because the sequence from the Alg population of A. majus subsp. cirrhigerum belongs to group 2 based on positions 253, 254, and 539, but it has an A at position 786. These

sequences could be alleles of the same gene, or they could represent a recent duplication. The two nucleotide differences at positions 253 and 254 create an additional AciI restriction site in group 1 that is absent in group 2; this can be used to test whether these two groups of sequences are different alleles or different genes, as in the cyc1/cyc2 analysis. For this test, once again, the shorter amplification product with the Cu and Cl primers was used. Unfortunately, the restriction pattern of the amplification products obtained from two randomly chosen individuals of each species was not conclusive (data not shown). We could not show the presence of the AciI site in some of our positive controls (genomic DNA from individuals that were known from our previous sequencing data to possess the AciI restriction site); this is probably because of unequal amplification of the two types of sequences. However, we had previously cloned the 855-bp band (obtained with the P1 and P2 primers) from one individual of each species, and we therefore screened several clones to check if both types of sequences were present in every individual. The screening procedure involved amplifying 619-bp products from clones and digesting with AciI. These tests clearly established that both types of sequences are amplified from every individual from every species analyzed (fig. 2), the expected pattern if these are two different genes. We denote by cyc1A the gene that (like the cyc gene deposited in GenBank) does not have the AciI restriction site, while the gene with the AciI restriction site was named cyc1B. Evidence for a Further Cycloidea Locus, cyc3 Two bands with different sizes are observed on 1.6% agarose gels when PCR is performed using

Cycloidea—The Evolution of a Gene Family

1477

FIG. 2.—Evidence that the cyc1A and cyc1B genes are present in every species analyzed. A, schematic representation of the AciI restriction sites present in the 619-bp amplification product (using the Cu and Cl primers) of the cyc1A and cyc1B genes. The fixed difference in the AciI restriction site between cyc1A and cyc1B can be used to distinguish the two genes. B, AciI restriction pattern of the 619-bp amplification products. Each pair of lanes is from the same individual (different colonies) and shows the presence of the two genes cyc1A and cyc1B. These colonies were obtained after cloning an apparently homogeneous PCR product (855-bp band, obtained with the P1 and P2 primers) from one individual of each species. The arrows point to the expected 255-, 144-, and 111-bp bands that allow the two genes to be distinguished. Sequence names are according to the population code in table 1. From left to right at the top: 100-bp DNA ladder; lanes 2 and 3—orontiumB-5; lanes 4 and 5—calycinum-2; lanes 6 and 7—molle-1; lanes 8 and 9—braun-blanquetii-1; lanes 10 and 11—cirrhigerumAlg-1; lanes 12 and 13—linkianum7; lanes 14 and 15—graniticumB-1; lanes 16 and 17—valentinum-241; lanes 18 and 19—siculum7-2; lanes 20 and 21—Digitalis; lanes 22 and 23—Cymbalaria; lanes 24 and 25—Linaria; lane 26—100-bp DNA ladder.

primers P1 and P2 (see above). Besides the 855-bp band (the expected fragment size; see Materials and Methods), an additional 950-bp amplification product is also observed. To test whether this is a further cycloidea sequence, we cloned and sequenced this 950bp amplification product from A. majus subsp. cirrhigerum. When the amino acid sequence deduced from this was compared (by BLAST searching; Altschul et al. 1997) with the genes deposited in GenBank, the highest similarity was observed with cyc1A. Nevertheless, when the two sequences were compared, there were 12 indels and 86 amino acid

differences out of a total of 283 amino acids. This level of divergence strongly suggests that the 950-bp amplification product is not an allele of the cyc1A gene, but a cyc-related gene. From this sequence, we designed primers C1 and C2, which do not amplify cyc1A, cyc1B, or cyc2. A band with the expected size (749 bp) was observed in all four individuals of every species analyzed (two of each species are shown in fig. 3). This pattern is consistent with the presence of a new cyc-related gene (cyc3). The C1 and C2 primers should allow us to amplify the cyc3 gene specifically. However, when these primers are used, additional

FIG. 3.—Evidence that the cyc4 gene is present in every individual of every species analyzed. The arrows point to the expected 749-bp amplification product using the primers C1 and C2. Sequence names are according to the population code in table 1. From left to right at the top: 1-kb and 100-bp DNA ladder, orontiumB-1 and -2, orontiumG-1 and -2, calycinum-1 and -2, molle-1 and -2, braun-blanquetii-1 and -2, cirrhigerumAlg-1 and -2, cirrhigerumAve-2 and -3, linkianum-7 and -9, and 1 kb DNA ladder. From left to right at the bottom: 1-kb and 100bp DNA ladder; graniticumB-1 and -2; graniticumS-1 and - 2; valentinum-241; siculum6-613; siculum7-2; Digitalis, Cymbalaria, and Linaria; and 1-kb DNA ladder.

24.5 24.5 6.1 18.3 0.0 24.5 23.0 23.0 24.5 24.5

30.8 24.5 30.7 34.0

30.8 30.8

A. graniticum. . . . . . . . . . . . . . . . .

braun-blanquetii. . . . . . . . . . . . molle . . . . . . . . . . . . . . . . . . . . . valentinum . . . . . . . . . . . . . . . . siculum . . . . . . . . . . . . . . . . . . . A. A. A. A.

M. orontium. . . . . . . . . . . . . . . . . . Digitalis . . . . . . . . . . . . . . . . . . . . .

34.0 40.3

24.5 18.2 24.5 15.2

23

A. m. subsps. linkianum . . . . . . . .

34.0 27.7 40.2 37.1

39.9

A. m. subsp. majus . . . . . . . . . . . . A. m. subsp. cirrhigerum . . . . . . .

NOTE.—Mean pairwise numbers of nonsynonymous differences per nonsynonymous site, Pn (above the diagonal), and synonymous differences per synonymous site, Ps (below the diagonal). All values are multiplied by 103. The sequences compared are 814 bp long. When more than one sequence from a species is included, the number of sequences used is shown (N).

5.3 — — 24.5

5.3 3.5 8.8 8.8 0.0 1.7 7.0 5.3

5.3 5.3 10.6 0.0 6.1 27.7 27.7 7.0 5.3 — 27.6 1.7 — 18.3 21.4 — 6.1 24.5 27.7

7.5 5.6 7.5 9.3 4.0 5.7

7.4 19.4 23.0 16.8 19.9 23.0

— 34.0

cyc1A

29.2

Digitalis

7.6 5.8 5.8 9.4 4.1 5.8 6.3

4.7 8.1 17.7

3.5 4.4 3.5 7.9 7.0 7.9 1.7 2.6 3.5 4.4 3.9 6.6 2.3 6.8

Antirrhinum siculum (N 5 2) Antirrhinum valentinum Antirrhinum graniticum Antirrhinum Antirrhinum (N 5 4) braun-blanquetti molle Antirrhinum majus subsp. linkianum (N 5 3) Antirrhinum majus subsp. cirrhigerum (N 5 2)

Table 2 cyc1A Sequence Variation Within and Between Species

DNA Variability Within and Between Species Sequence variation was estimated within and between species (see tables 2–5) in terms of the average numbers of synonymous differences per synonymous site (Ps) and of nonsynonymous differences per nonsynonymous site (Pa). To test for an effect of breeding system on sequence diversity, we assume that, as described in the literature (Harrison and Darby 1955; Herrmann 1973; Sutton 1988), the species of Misopates, Antirrhinum majus subsp. majus, and A. siculum are self-compatible, while the remaining Antirrhinum species are all selfincompatible. This categorization is conservative, as our own observations indicate self-compatibility in some populations of species classified as self-incompatible, as in the case of the Ave population of A. majus subsp. cirrhigerum. Within the group of self-compatible species, the level of synonymous diversity (p; see Nei 1987) ranges from 0 to 10 3 1023 (average 4.4 3 1023). Within the self-incompatible group of species, the level of synonymous polymorphism ranges from 0 to 43.4 3 1023 (average 16.7 3 1023). If the synonymous polymorphism values (p) are divided into two classes (p , 10 and p $ 10), the numbers of observations falling into each class differ significantly between the self-compatible and the self-incompatible groups (x2 5 4.84; P , 0.05), in the expected direction of lower values in more inbreeding species. For the cyc1A, cyc2, and cyc4 genes, all Pa/Ps ratios, both within and between species, are less than 1

Misopates orontium

Evidence for cyc4 Because, as shown above, the primers C1 and C2 do not amplify cyc1A, cyc1B, or cyc2, it was expected that these primers could be used to amplify the cyc3 gene specifically. However, DNA sequences based on such reactions from M. orontium, A. majus subsp. cirrhigerum, Antirrhinum graniticum, and Digitalis purpurea again fall into two groups, differing by 43 fixed nucleotide differences. Within each group, only a few (on average, two) nucleotide differences were found in pairwise comparisons between the sequences from these different species. The large number of differences between the two sequence groups strongly suggests that they represent two genes. One of the 43 nucleotide differences can be identified by the presence or absence of an AvaII restriction site. The gene without the AvaII restriction site was named cyc3, while the gene with the AvaII restriction site was named cyc4. This difference in presence of the AvaII restriction site enabled us to test whether the two groups of sequences are different alleles or different genes. However, as in the case of cyc1A versus cyc1B, the restriction pattern of the amplification products obtained from two randomly chosen individuals of each species was not conclusive (data not shown). Furthermore, the C1 and C2 primers amplify several bands of different sizes and unequal intensities that confound the analysis.

4.4 5.3 43.4 29.7

bands are also observed, suggesting that even more cyc-related genes may remain to be discovered.

5.3 6.2

Vieira et al.

Antirrhinum majus subsp. majus

1478

Cycloidea—The Evolution of a Gene Family

1479

Table 3 cyc1B Sequence Variation Within and Between Species Antirrhinum majus subsp. Antirrhinum Antirrhinum Antirrhinum cirrhigerum braun- Antirrhinum valentinum siculum Misopates (N 5 2) blanquetii molle (N 5 2) (N 5 2) orontium

cyc1B A. A. A. A.

m. subsp. cirrhigerum . . . . braun-blanquetii. . . . . . . . . molle . . . . . . . . . . . . . . . . . . valentinum . . . . . . . . . . . . .

0.0 5.8 0.0 14.7

0.0 — 5.8 20.7

1.7 1.7 — 14.7

A. siculum . . . . . . . . . . . . . . . .

0.0

5.8

0.0

3.4 3.4 5.1 3.4 29.7 14.7

M. orontium. . . . . . . . . . . . . . . M. calycinum. . . . . . . . . . . . . .

0.0 2.9

5.8 8.8

0.0 2.9

Digitalis . . . . . . . . . . . . . . . . . . L. triornithophora or C. muralis . . . . . . . . . . . .

5.8 0.0

11.7 5.8

5.8 0.0

Misopates calycinum (N 5 2)

Lanaria triornithophora or Cymbalaria Digitalis muralis

0.9 0.9 2.6 4.3

1.7 1.7 3.4 5.1

3.4 3.4 5.1 5.1

5.1 5.1 6.8 8.5

0.0 0.0 1.7 3.4

2.6

4.3

6.0

0.9

14.7 17.7

1.7 0.0 0.0 2.9

— 2.9

6.8 8.5

1.7 3.4

20.8 14.7

5.8 0.0

5.8 0.0

5.1 6.8 5.8 8.8 2.9

— 5.8

5.1 —

NOTE..—See table 2 for details. The sequences being compared are 814 bp long.

(table 6). These ratios suggest that purifying selection is acting on these genes. For cyc1B, however, both within and between populations, Pa/Ps ratios higher than 1 are obtained for approximately 50% of the comparisons. The 14 cyc1B sequences analyzed include 12 nonsynonymous and 13 synonymous variants, suggesting that we may again be pooling together sequences from different cyc-related genes. At present, however, there are no evident differences that could be used to separate the cyc1B sequences into two groups. Genealogical Relationships Among the cyc Genes The relationships among the different cyc-related DNA sequences are shown in the gene tree in figure 4. The sequences for each putative gene group together, and bootstrap values are high, supporting our interpretation that the different groups of sequences correspond to different genes. The numbers of fixed nucleotide differences between the different genes or groups of genes are also indicated (fig. 4). Due to the low level of divergence observed between the species, it is not possible, using any of these genes, to resolve the phylogenetic relationships of the taxa studied. Because four of the five genes analyzed (cyc1A, cyc1B, cyc2, and cyc4) are also found in Digitalis, these gene duplications must have occurred before the split

between the lineages leading to Digitalis and Antirrhinum. The time of the divergence of Digitalis and Antirrhinum/Misopates can be roughly estimated to be 5 MYA if we assume that Scrophulariaceae and Solanaceae have been separated for 40 Myr (Xue et al. 1996) and that branch lengths in figure 15B of Chase et al. (1993) are linearly related to time, which is, of course, dubious. With this dating, the synonymous mutation rates for these loci can be estimated by dividing the Ps values between all Antirrhinum/Misopates sequences and the putative D. purpurea sequences of the same gene by 10 Myr (twice the estimated divergence time). The mutation rates per year were estimated as the averages of these values, and are as follows: cyc1A, 2.7 3 1029; cyc1B, 1.0 3 1029; cyc2, 1.1 3 1029; and cyc4, 1.2 3 1029. These may be overestimates, because other phylogenies (Reeves and Olmstead 1998; Wolfe and dePamphilis 1998) have suggested an earlier date for the split between Digitalis and Antirrhinum/Misopates. Discussion Our data show that cyc is a member of a gene family composed of at least five genes. Four of the five genes (cyc1A, cyc1B, cyc2, and cyc4) are present in all of the species analyzed, and cyc3 is present in at least

Table 4 cyc2 Sequence Variation Within and Between Species

cyc2 A. m. subsp. cirrhigerum . . . . . . .

Antirrhinum majus subsp. cirrhigerum (N 5 2)

Antirrhinum graniticum (N 5 2)

Misopates orontium (N 5 4)

Digitalis

2.9

2.1

5.6

5.0

8.5

4.3 10.0 10.1

7.9

A. graniticum. . . . . . . . . . . . . . . . .

0.0 20.1 15.1

M. orontium. . . . . . . . . . . . . . . . . .

12.6

5.7 10.0 10.0

Digitalis . . . . . . . . . . . . . . . . . . . . .

10.0

15.1

NOTE.—See table 2 for details. The sequences being compared are 462 bp long.



1480

Vieira et al.

Table 5 cyc4 Sequence Variation Within and Between Species Antirrhinum malus subsp. cirrhigerum (N 5 2)

cyc4 A. m. subsp. cirrhigerum

Antirrhinum graniticum (N 5 2)

Misopates orontium (N 5 2)

Digitalis

7.6

3.7

3.8

3.8

3.8

0.0 0.0 0.0

0.0

A. graniticum

7.5 19.9 30.7

M. orontium

10.0

0.0 0.0 26.7

Digitalis

10.0

26.7



NOTE.—See table 2 for details. The sequences being compared are 711 bp long.

M. orontium. Using the same assumptions as for the neutral mutation-rate estimates in the preceding section, the ages of the duplication events can be estimated by dividing the average Ps values between the different genes by the average mutation rate for all of the cyc genes (1.5 3 1029). If the split of Digitalis and Antirrhinum/Misopates were older than the putative 5 Myr, our estimates of the duplication events would be older. The duplication events between cyc3/cyc4 and between cyc2/(cyc1A/cyc1B) are then estimated to have occurred more than 75 MYA. This date can be compared with the time of the origin of the angiosperms, estimated to have occurred between 160 and 348 MYA based on molecular clocks (with considerable uncertainty; see Goremykin, Hansmann, and Martin 1997) and no earlier than the early Cretaceous based on fossil evidence. No reliable angiosperm fossils older than about 130 Myr have been found (Crane, Friis, and Pedersen 1995; but see Sun et al. [1998] for the best-supported example of a Jurassic angiosperm). The fossil record indicates that the angiosperms radiated rapidly between 130 and 90 MYA (e.g., Crane, Friis, and Pedersen 1995; for further discussion, see Li 1997). The cyc3/cyc4 and cyc2/ (cyc1A/cyc1B) genes are therefore probably present in many other species, as well as those studied here. There is no reason to believe that any of the cyc genes is a pseudogene. We have no direct evidence for expression, but none of the genes has any interruption of the open reading frame, and, both within and between species, the number of synonymous substitutions per synonymous site is usually much larger than the number

of nonsynonymous substitutions per nonsynonymous site (tables 2–6), as expected for genes under purifying selection. It is not known whether these five cyc-related genes perform different functions or not. The temporal and spatial patterns of expression of each gene will be of interest but are beyond the scope of this study. There is no apparent correlation between the gene duplications and any change in the shape of the flowers. The estimated age of the cyc1A/cyc1B duplication event is 7.5 Myr, suggesting that this duplication occurred after the split between the Scrophulariaceae and Callitrichaceae on the one hand, and the Acanthaceae on the other hand (based on information in Chase et al. 1993). Having irregular flowers is, however, a derived characteristic that seems to have appeared much longer ago than the cyc1A/cyc1B gene duplication, just before the split between most Scrophulariales and the Lamiales (Chase et al. 1993). The estimated 75 Myr of divergence that separates the cyc4/cyc3 and the cyc2/(cyc1A and cyc1B) genes suggests that these duplications occurred before the diversification of the clade including Solanaceae, Cornaceae, Garriaceae, Ericaceae, and Hydrangeaceae (Chase et al. 1993). All of these families have regular flowers (Heywood 1985). Estimates of the ages of the duplication events depend heavily on the mutation rates used. If the mutation rates are underestimates, the ages of the duplication events are overestimated. Our mutation rate estimate (1.5 3 1029; the average of the rates for the different cyc genes) is lower than those obtained for nuclear genes in short-lived monocotyledonous plants (5.1 3 1029 to

Table 6 Proportions of Pa/Ps Ratios Higher than 1, Within and Between Species, for the cyc1A, cyc1B, cyc2, and cyc4 Genes WITHIN SPECIES

cyc1A . . . . . cyc1B . . . . . cyc2. . . . . . . cyc4. . . . . . . a

BETWEEN SPECIES

No. of Comparisons

Pa/Ps . 1a

Mean Pa/Ps Ratiob

No. of Comparisons

Pa/Ps . 1a

Mean Pa/Ps Ratiob

4 4 3 3

0 2 0 0

0.27 — 0.33 0.38

45 36 6 6

0 17 0 0

0.23 — 0.46 0.26

When more than one sequence was available for the same species, the average of the Pa/Ps ratio was used. The mean Pa/Ps ratios are the averages of the ratios that could be calculated, i.e., only the cases where Ps is nonzero. Altogether, for cyc1A, cyc2, and cyc4, there were only two such cases between species and two within species. For cyc1B, Pa/Ps ratios were not calculated because of heterogeneity of the values. b

Cycloidea—The Evolution of a Gene Family

1481

FIG. 4.—Neighbor-joining tree using the Jukes-Cantor distance, showing the relationships among the different cyc-related DNA sequences studied (only the 468-bp sequence region that is common to all of the DNA sequences was used). Sequence names are according to table 1. The numbers after the sequence names are sample codes. The lengths of the terminal branches are not proportional to the distance values, because branch lengths in the tree are proportional to the numbers of inferred nucleotide site changes according to the reconstruction, rather than according to the distance values. Percentages of bootstrap replicates supporting the branches are shown where the values exceeded 75%. Represented in brackets are the numbers of fixed nucleotide differences and fixed indels, respectively, in the region analyzed, between the different genes or groups of genes. The topology of the tree is unchanged when other distances, including Kimura two-parameter distances, are used.

7.1 3 1029; Wolfe, Sharp, and Li 1989; Gaut et al. 1996) and in palms (2.61 3 1029; Gaut et al. 1996), for which it has been suggested that a generation-time effect may cause a reduced mutation rate. However, our estimate is similar to that obtained for the first published estimate for a dycotyledonous plant (Gossypium sp.; 1.47 3 1029 to 2.05 3 1029; Small, Ryburn, and Wendel 1999). For each cyc gene, divergence between the species studied is low. This observation suggests that the cyc genes are not a good choice for phylogenetic analyses in these species and/or that the species evolved very recently. The observations that species of Antirrhinum can hybridize (Mather 1947; Harrison and Darby 1955; Herrmann 1973) and that this group of species is geographically restricted (von Wettstein 1891; Rothmaler 1956; Doaigey and Harkiss 1991) are compatible with

recent origin, but the similarity of sequences from Digitalis is puzzling. Our data also suggest that the average level of DNA polymorphism at synonymous sites is influenced by the mating system of species of Antirrhinum and Misopates. The average synonymous-site p value for inbreeding species is similar to the lower values found for other inbreeding plants. Values ranging from 4.73 3 1023 to 28.6 3 1023 have been reported for the inbreeding species Arabidopsis thaliana, Hordeum vulgare, and Ipomea purpurea (Huttley et al. 1997; Charlesworth and Charlesworth 1998), and values as low as zero were found in inbreeding species of Leavenworthia (Liu, Zhang, and Charlesworth 1998). The values we found in outcrossers are similar to the range of values for other such species (49.3 3 1023 in Zea mays, 30.7 3 1023 in

1482

Vieira et al.

Leavenworthia stylosa, and 6.66 3 1023 in Pennisetum glaucum; see Cummings and Clegg 1998; Liu, Zhang, and Charlesworth 1998). Note that most of these data, including ours, are for diversity measured on samples of plants taken from different populations of a species, rather than from a single population. Total species diversity levels are expected to be less reduced for inbreeders than are within-population values (see Charlesworth, Nordborg, and Charlesworth 1997). Despite this expectation of only a moderate difference, and despite the fact that our comparisons use pooled data from several different loci, the inbreeding taxa appear to have only about one fourth of the diversity seen in the outcrossers, although the errors on the estimates are large. However, given the difficulties of defining orthologous loci, these genes appear to be unsuitable for detailed studies of sequence diversity. The existence of this moderate-sized gene family could very easily have escaped our notice had we not examined diversity within the study species. Paralogous loci with very different sequences (such as cyc1A, cyc1B, and cyc2 versus cyc3 and cyc4) will, of course, become evident as soon as sequence data are collected, but sequences that are similar (such as cyc1A and cyc1B) will often be assumed to be allelic variants of a single locus. It is, of course, widely recognized that orthologous genes must be compared for phylogenetic inferences (e.g., Li 1997), as well as for the study of within-species diversity. Southern blotting studies are often used to provide evidence for orthology, but definitive evidence is not always generated, because this approach relies on the assumption that sequences flanking the genes will differ. This may not always be the case, especially for very recent duplications, such as those found in this study. At present, data on rates of accumulation of diversity in sequences flanking plant genes are few, apart from evidence for high diversity at maize and teosinte loci (Sanmiguel et al. 1997; Wang et al. 1999) and in the flanking regions of some self-incompatibility loci (Coleman and Kao 1992; Boyes et al. 1997). In general, there are no rules as to what level of difference suggests that more than a single gene is present. Length differences in introns often occur between different alleles at a single locus (see Liu, Charlesworth, and Kreitman 1998), so a length difference cannot be taken as evidence that a sequence is not allelic. Given the existence of diversity differences between loci under different selective regimes (such as high diversity close to sites under balancing selection; see, e.g., Hughes and Yeager 1998; Charlesworth and Awadalla 1998) and the fact that loci in genomic regions where recombination rarely occurs tend to show low diversity (Begun and Aquadro 1992; Stephan and Langley 1998), it is unlikely that there is any general way to recognize paralogy from sequence differences alone. Phylogenetic analyses, although generally useful, cannot reveal paralogy of very recent duplications unless many species are studied, as here. Genetic evidence of segregation is not necessarily helpful, because if there is genetic diversity at loci, one cannot always distinguish between a single lo-

cus and a tandem duplication. To be rigorous, careful investigation of the presence or absence of sequences within and between individuals, i.e., of diversity, is therefore needed to establish whether a putative gene is truly a single locus. This can sometimes be simpler, as well as being more likely to be definitive, than Southern blotting. It is not widely realized how much work may be needed to obtain good evidence for orthology, nor is there widespread awareness that, perhaps unlike the situation for Drosophila, small gene families are common among plant nuclear loci, which are being increasingly used for phylogenetic analyses (e.g., Galloway, Malmberg, and Price 1998). Acknowledgments We thank Isabel Mateu and Andrew Hudson for sending us plant material. We also thank Carlos Aguiar and Ana Maria Carvalho for helping us with the fieldwork. C.P.V. is supported by the Commission of European Communities (grant ERBFMBICT 972455). J.V. is supported by the Fundac¸a˜o para a Ciencia e Tecnologia (PRAXIS XXI/BPD/14120/97), and D.C. is supported by an NERC Senior Research Fellowship. LITERATURE CITED

ALTSCHUL, S. F., T. L. MADDEN, A. A. SCHA¨FFER, J. ZHANG, Z. ZHANG, W. MILLER, and D. J. LIPMAN. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389–3402. AWADALLA, P., and K. RITLAND. 1997. Microsatellite variation and evolution in the Mimulus guttatus species complex with contrasting mating systems. Mol. Biol. Evol. 14:1023– 1034. BEGUN, D. J., and C. F. AQUADRO. 1992. Levels of naturally occurring DNA polymorphism correlate with recombination rates in D. melanogaster. Nature 356:519–520. BOYES, D. C., M. E. NASRALLAH, J. VREBALOV, and J. B. NASRALLAH. 1997. The self-incompatibility (S) haplotypes of Brassica contain highly divergent and rearranged sequences of ancient origin. Plant Cell 9:237–247. CARPENTER, R., and E. S. COEN. 1990. Floral homeotic mutations produced by transposon-mutagenesis in Antirrhinum majus. Genes Dev. 4:1483–1493. CHARLESWORTH, B., M. NORDBORG, and D. CHARLESWORTH. 1997. The effects of local selection, balanced polymorphism and background selection on equilibrium patterns of genetic diversity in subdivided populations. Genet. Res. 70:155– 174. CHARLESWORTH, D., and P. AWADALLA. 1998. The molecular population genetics of flowering plant self-incompatibility polymorphisms. Heredity 81:1–9. CHARLESWORTH, D., and B. CHARLESWORTH. 1998. Sequence variation: looking for effects of genetic linkage. Curr. Biol. 8:R658–R661. CHARLESWORTH, D., and Z. YANG. 1998. Allozyme diversity in Leavenworthia populations with different inbreeding levels. Heredity 81:453–461. CHASE, M. W., D. E. SOLTIS, R. G. OLMSTEAD et al. (42 coauthors). 1993. Phylogenetics of seed plants: an analysis of nucleotide sequences from the plastid gene rbcL. Ann. Mo. Bot. Gard. 80:528–580.

Cycloidea—The Evolution of a Gene Family

COLEMAN, C. A., and T.-H. KAO. 1992. The flanking regions of Petunia inflata S alleles are heterogeneous and contain repetitive sequences. Plant Mol. Biol. 18:725–737. CRANE, P. R., E. M. FRIIS, and K. R. PEDERSEN. 1995. The origin and early diversification of angiosperms. Nature 374: 27–33. CUMMINGS, M. P., and M. T. CLEGG. 1998. Nucleotide sequence diversity at the alcohol dehydrogenase 1 locus in wild barley (Hordeum vulgare ssp. spontaneum): an evaluation of the background selection hypothesis. Proc. Natl. Acad. Sci. USA 95:5637–5642. DOAIGEY, A. R., and K. J. HARKISS. 1991. Application of epidermal characters to the taxonomy of European species of Antirrhinum (Schrophulariaceae). Nord. J. Bot. 11:513–524. FRANCO, J. A. 1971. Nova Flora de Portugal (Continente e Ac¸ores): 1. Asa Editores. Lisboa, Portugal. GALLOWAY, G. L., R. L. MALMBERG, and R. A. PRICE. 1998. Phylogenetic utility of the nuclear gene Arginine decarboxylase: an example from Brassicaceae. Mol. Biol. Evol. 15: 1312–1320. GAUT, B. S., B. R. MORTON, B. C. MCCAIG, and M. T. CLEGG. 1996. Substitution rate comparisons between grasses and palms: synonymous rate differences at the nuclear gene Adh parallel rate differences at the plastid gene rbcL. Proc. Natl. Acad. Sci. USA 93:10274–10279. GILBERT, D. G. 1995. Seqpup version 0.6f: a biosequence editor and analysis application. http://sunsite.sut.ac.jp/pub/ academic/biology/molbio/seqp up/. GOREMYKIN, V. V., S. HANSMANN, and W. F. MARTIN. 1997. Evolutionary analysis of 58 proteins encoded in six completely sequenced chloroplast genomes: revised molecular estimates of two seed plant divergence times. Plant Syst. Evol. 206:337–351. HALLIDAY, G. and M. BEADLE. 1983. Flora Europaea. Cambridge University Press, Cambridge, England. HAMRICK, J. L., and M. J. GODT. 1996. Effects of life history traits on genetic diversity in plant species. Philos. Trans. R. Soc. Lond. B Biol. Sci. 351:1291–1298. HARRISON, B. J., and L. A. DARBY. 1955. Uninateral hybridization. Nature 176:982. HERRMANN, V. H. 1973. Selbst-und kreuzungsinkompatibilitat verschiedener Antirrhinum-arten. Biol. Zentralbl. 92:773– 777. HEYWOOD, V. H. 1985. Flowering plants of the world. Croom Helm Publishers, Oxford, England. HUGHES, A., and M. YEAGER. 1998. Natural selection and the evolutionary history of the major histocompatibility complex loci. Front. Biosci. 3:510–516. HUTTLEY, G. A., M. L. DURBIN, D. E. GLOVER, and M. T. CLEGG. 1997. Nucleotide polymorphism in the chalcone synthase-A locus and evolution of the chalcone synthase multigene family of common morning glory Ipomoea purpurea. Mol. Ecol. 6:549–558. INGRAM, G. C., S. DOYLE, R. CARPENTER, E. A. SCHULTZ, R. SIMON, and E. S. COEN. 1997. Dual role for fimbriata in regulating floral homeotic genes and cell division in Antirrhinum.. EMBO J. 16:6521–6534. KRAMER, E. M., R. L. DORIT, and V. F. IRISH. 1998. Molecular evolution of genes controlling petal and stamen development: duplication and divergence within the APETALA3 and PISTILLATA MADS-box gene lineages. Genetics 149: 765–783. KUMAR, S., K. TAMURA, and M. NEI. 1994. MEGA— molecular evolutionary genetics analysis software for microcomputers. Comput. Appl. Biosci. 10:189–191. LI, W.-H. 1997. Molecular evolution. Sinauer, Sunderland, Mass.

1483

LIU, F-L., D. CHARLESWORTH, and M. KREITMAN. 1998. Joint influence of mating system and selection on nucleotide diversity at the phosphoglucose isomerase locus in the plant genus Leavenworthia. Genetics 151:343–357. LIU, F-L., L. ZHANG, and D. CHARLESWORTH. 1998. Genetic diversity in Leavenworthia populations with different inbreeding levels. Proc. R. Soc. Lond. B Biol. Sci. 265:293– 301. LUO, D., R. CARPENTER, C. VINCENT, L. COPSEY, and E. COEN. 1995. Origin of floral asymmetry in Antirrhinum. Nature 383:794–799. MATHER, K. 1947. Species crosses in Antirrhinum. I. genetic isolation of the species majus, glutinosum, and orontium. Heredity 1:175–186. NEI, M. 1987. Molecular evolutionary genetics. Columbia University Press, New York. REEVES, P. A., and R. G. OLMSTEAD. 1998. Evolution of novel morphological and reproductive traits in a clade containing Antirrhinum majus (Scrophulariaceae). Am. J. Bot. 85: 1047–1056. ROZAS, J., and R. ROZAS. 1997. DnaSP version 2.0: a novel software package for extensive molecular population genetics analysis. Comput. Appl. Biosci. 13:307–311. ROTHMALER, W. 1956. Taxonomische monographie der gattung Antirrhinum. Feddes Rep. 136:1–134. SANMIGUEL, P., A. TIKHONOV, Y. K. JIN et al. (11 co-authors). 1997. Nested retrotransposons in the intergenic regions of the maize genome. Science 274:765–768. SCHOEN, D. J., and A. H. D. BROWN. 1991. Intraspecific variation in population gene diversity and effective population size correlates with the mating system in plants. Proc. Natl. Acad. Sci. USA 88:4494–4497. SMALL, R. L., J. A. RYBURN, and J. F. WENDEL. 1999. Low levels of nucleotide diversity at homoeologous Adh loci in allotretraploid cotton (Gossypium L.). Mol. Biol. Evol. 16: 491–501. STEPHAN, W., and C. H. LANGLEY. 1998. DNA polymorphism in Lycopersicon and crossing-over per physical length. Genetics 150:1585–1593. SUN, G., D. L. DILCHER, S. ZHENG, and Z. ZHOU. 1998. In search of the first flower: a Jurassic angiosperm, Archaefructus, from northeast China. Science 282:1692–1695. SUTTON, D. A. 1988. A revision of the tribe Antirrhineae. Oxford University Press, London. THOMPSON, J. T., J. GIBSON, F. PLEWNIAK, F. JEANMOUGIN, and D. G. HIGGINS. 1997. The ClustalX window interface: flexible stategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25:4876–4882. VON WETTSTEIN, R. 1891. Antirrhinum. Pp. 3b, 37 in H. G. A. ENGLER and K. E. PRANTL, eds. Die naturlichen pflanzenfamilien. Vol. 4. Verlag W. Engelmann, Leipzig. WANG, R. L., A. STEC, J. HEY, L. LUKENS, and J. DOEBLEY. 1999. The limits of selection during maize domestication. Nature 398:236–239. WOLFE, A. D., and C. W. DEPAMPHILIS. 1998. The effect of relaxed functional constraints on the photosynthetic gene rbcL in photosyntethic and nonphotosynthetic parasitic plants. Mol. Biol. Evol. 15:1243–1258. WOLFE, K. H., P. M. SHARP, and W.-H. LI. 1989. Rates of synonymous substitution in plant nuclear genes. J. Mol. Evol. 29:208–211. XUE, Y., R. CARPENTER, H. G. DICKINSON, and E. S. COEN. 1996. Origin of allelic diversity in Antirrhinum S locus RNAse. Plant Cell 8:805–814.

PAMELA SOLTIS, reviewing editor Accepted July 21, 1999