Duplication-Dependent CG Suppression of the ... - Semantic Scholar

1 downloads 0 Views 131KB Size Report
tion of isochores (Fryxell and Zuckerkandl 2000). sequence analysis ..... dehyde 3-phosphate dehydrogenase gene family: organ specific. Kricker, M. C., J. W. ...
Copyright  2003 by the Genetics Society of America

Duplication-Dependent CG Suppression of the Seed Storage Protein Genes of Maize Gertrud Lund,*,1 Massimiliano Lauria,* Per Guldberg† and Silvio Zaina‡ *Plant Biochemistry Laboratory, Department of Plant Biology, The Royal Veterinary and Agricultural University, DK-1871 Frederiksberg C, Denmark, †Institute of Cancer Biology, Danish Cancer Society, DK-2100 Copenhagen, Denmark and ‡ Experimental Cardiovascular Research, Wallenberg Laboratory, Department of Medicine, University of Lund, 205 02 Malmø, Sweden Manuscript received February 28, 2003 Accepted for publication June 13, 2003 ABSTRACT This study investigates the prevalence of CG and CNG suppression in single- vs. multicopy DNA regions of the maize genome. The analysis includes the single- and multicopy seed storage proteins (zeins), the miniature inverted-repeat transposable elements (MITEs), and long terminal repeat (LTR) retrotransposons. Zein genes are clustered on specific chromosomal regions, whereas MITEs and LTRs are dispersed in the genome. The multicopy zein genes are CG suppressed and exhibit large variations in CG suppression. The variation observed correlates with the extent of duplication each zein gene has undergone, indicating that gene duplication results in an increased turnover of cytosine residues. Alignment of individual zein genes confirms this observation and demonstrates that CG depletion results primarily from polarized C:T and G:A transition mutations from a less to a more extensively duplicated gene. In addition, transition mutations occur primarily in a CG or CNG context suggesting that CG suppression may result from deamination of methylated cytosine residues. Duplication-dependent CG depletion is likely to occur at other loci as duplicated MITEs and LTR elements, or elements inserted into duplicated gene regions, also exhibit CG depletion.

I

N many organisms, nuclear DNA is methylated at cytosine residues, resulting in 5-methylcytosine (5mC). In plants, symmetrical 5⬘-CpG-3⬘ (CG) and 5⬘-CpNpG-3⬘ (CNG) are the most frequent targets of cytosine methylation, whereas in mammals 90% of methylation is restricted to the CG dinucleotide (Gruenbaum et al. 1981, 1982). However, the degree and ratio of CG and CNG methylation can vary considerably between plant species (Jeddeloh and Richards 1996; Kovarik et al. 1997). For example, in maize, ⵑ28% of cytosine residues are methylated compared to only 6% in Arabidopsis (Leutwiler et al. 1984; Matassi et al. 1992; Montero et al. 1992). Furthermore, analysis of the rRNA genes from maize has shown that the external cytosine is twofold less methylated compared to the internal cytosine residue of the 5⬘-CCG-3⬘ sequence (Kovarik et al. 1997), indicating that CG methylation occurs more frequently than CNG methylation. CG suppression, or depletion, refers to the underrepresentation of the CG dinucleotide compared to an estimated value based on the G ⫹ C content of the sequence investigated. CG suppression is especially evident in the mammalian genome where the frequency of the CG dinucleotide can be up to fivefold lower

1 Corresponding author: Plant Biochemistry Laboratory, Department of Plant Biology, Thorvaldsensvej 40, DK-1871 Frederiksberg C, Denmark. E-mail: [email protected]

Genetics 165: 835–848 (October 2003)

than the expected value (Bird 1980). In contrast, both monocot and dicot plant genes are only slightly CG depleted (an average of 75–80% of expected values), and CNG suppression is lacking or less severe than CG depletion (McClelland 1983; Gardiner-Garden et al. 1992; Ashikawa 2001). The most commonly explained mechanism of CG depletion relates to the tendency of 5mC to undergo spontaneous deamination to thymidine, resulting in C:T or G:A transition mutations (Couloundre et al. 1978). Interestingly, the mutability of CG has been shown to be one of the most important causes of germline point mutations in human genetic diseases and is a frequent occurrence in somatic mutations leading to cancer (Cooper and Krawczak 1989; Jones et al. 1992; Holstein et al. 1994). In addition to the mutability of 5mC, recent evidence has shown that cytosine deamination also contributes to CG suppression (Fryxell and Zuckerkandl 2000). Although the majority of methylation in plants is associated with repetitive DNA sequences such as transposons, duplicated gene regions can also be methylated (Bianchi and Viotti 1988; Flavell et al. 1988; Bennetzen et al. 1994; Flavell 1994; Bender and Fink 1995; Ronchi et al. 1995; Rabinowicz et al. 1999). In Neurospora crassa, duplicated sequences are efficiently targeted by methylation, and a large number of C:T transition mutations are introduced following duplication [hence the name repeat-induced point mutation (RIP;

836

G. Lund et al.

Cambereri et al. 1989; Selker 1990)]. Similarly, in Ascobolus immersus a process referred to as methylation induced premeotically (MIP) results in de novo methylation of a DNA sequence upon duplication (Goyon and Faugeron 1989). The observed consequences of de novo methylation in RIP and MIP include gene inactivation and a reduction in the frequency of recombination (Barry et al. 1993; Rountree and Selker 1997; Maloisel and Rossignol 1998). Similar roles of duplication-induced DNA methylation have been proposed to occur in plants (Flavell 1994; Bender 1998). In mammals, duplicated genes are more CG suppressed compared to single-copy genes. This observation has led to the suggestion that duplicated genes have a history of methylation and subsequent mutation of methylated residues (Kricker et al. 1992). Likewise, in plants, the multigene families of 5S rRNA genes from Arabidopsis and rRNA genes from maize show elevated levels of transition mutations that are consistent with deamination of 5mC, in particular the nonfunctional members of these gene families (Edward et al. 1996; Matieu et al. 2002). However, neither study can confirm if CG loss results from spontaneous deamination of methylated residues over time or whether depletion is the consequence of an active mechanism linked to duplication. We have analyzed the CG dinucleotide and CNG trinucleotide content of the large zein gene family, which encodes the seed storage proteins of maize. Due to differences in gene copy number of each subfamily, the zein genes provide an ideal model system to analyze the effect of gene duplication on CG suppression. In addition, the highly abundant LTR-retrotransposons and MITEs have also been analyzed for evidence of CG suppression. The zeins constitute 50–60% of total endosperm protein and can be divided into two major fractions, zein-1 and zein-2, on the basis of solubility characteristics (Esen 1986). The zein-1 fraction consists of the 19- and 22-kD polypeptides that are encoded by a large gene family, the ␣-zeins. On the basis of DNA sequence identity and hybridization characteristics, this family can be further divided into four subfamilies, z1A, z1B, z1C, and z1D (Heindecker and Messing 1986; Rubenstein and Geraghty 1986). Genes belonging to the z1A, z1B, and z1D subfamilies encode the 19-kD zein genes, whereas the 22-kD zein genes are encoded mainly by the z1C subfamily. The 19-kD genes have an estimated copy number of 56 per haploid genome, whereas the 22-kD zeins are presumed to be present in 15 copies per haploid genome (Hagen and Rubenstein 1981; Wilson and Larkins 1984). However, the exact copy number of the ␣-zein genes can show considerable variation among different inbred lines (Llaca and Messing 1998; Song and Messing 2002). In addition, within the 19-kD zein gene family, the z1A subfamily has the highest copy number, followed by z1B and z1C (Wilson and Larkins 1984; Heindecker and Messing 1986; Song

and Messing 2002). In contrast, the 10-, 15-, 16-, and 27-kD proteins that represent the zein-2 fraction are encoded by one or two genes that show limited sequence similarity to the ␣-zeins (Prat et al. 1985; Kirihara et al. 1988; Swarup et al. 1995). The majority of 22-kD zein genes form a dense gene cluster, ⵑ168 kb in size, on chromosome 4 of maize (Llaca and Messing 1998), whereas the 19-kD zein genes are distributed on five unlinked genomic locations on maize chromosomes 1, 4, and 7 (Soave et al. 1981, 1982; Wilson et al. 1989; Woo et al. 2001; Song and Messing 2002). Phylogenetic analysis of the ␣-zein genes has revealed that the 19- and 22-kD zein genes share a common ancestor (Song and Messing 2002). Given that only the 22-kD zein genes have been identified in Coix lacryma-jobi, an ancestor of maize (Leite et al. 1990), it is probable that in maize the 19-kD zein genes have derived from the 22-kD zein genes. Interestingly, it has been estimated that the amplification of the ␣-zein gene family in maize occurred within the last 3–4 million years (Song et al. 2001; Song and Messing 2002). In contrast to the clustered zein genes, MITEs and LTR elements are dispersed in the genome. MITEs are frequently associated with promoter and 3⬘ regulatory regions of genes, whereas the larger LTR-transposons are typically found in intergenic regions (Kumar and Bennetzen 1999). The copy numbers of MITEs and LTR elements range from 3000 to 10,000 copies and a few to 50,000 copies, respectively (Bureau and Wessler 1992, 1994; SanMiguel et al. 1996; Zhang et al. 2000). Similar to the amplification of the zein gene family, a majority of LTR-retrotransposons have colonized the maize genome during the last 5 million years (SanMiguel et al. 1998). Our analysis shows that duplicated zein genes are CG suppressed and that the degree of suppression correlates with the copy of each subfamily. Likewise, within the 19- and 22-kD zein gene families the extent of CG depletion correlates with the number of duplications each individual gene has undergone. Despite their high copy number, most MITEs and LTR elements are not CG suppressed, except when duplicated or located in a duplicated DNA sequence. This suggests that the process leading to CG depletion is activated upon duplication or is a consequence of the duplication process itself. We discuss the possible role of duplication-dependent CG depletion in the evolution of the GC-poor isochores in which the zein genes are located.

MATERIALS AND METHODS Sequences: The di-and trinucleotide composition of the coding region of 32 zein genes belonging to the zein-1 fraction and 6 genes belonging to zein-2 fraction was analyzed. All 22kD zein genes were derived from the inbred line BSSS53 (af090447), whereas the 19-kD gene sequences were derived

Duplication-Dependent CG Depletion from the B73 inbred line (af546187, af546188; af546189, af546190; Song et al. 2001; Song and Messing 2002). Only full-length genomic and cDNA clones, including genes with in-frame stop codons, were analyzed. Clones in which the open reading frame was disrupted by insertions or deletions were omitted from the analysis. One exception was Z492M16-5, which was included as it is expressed despite a deletion in the open reading frame. The correct open reading frame was determined by the use of GenBank’s annotations, and nucleotide compositions were generated using the Genetics Computer Group (GCG) analysis software package. Sequences of MITEs and LTR-retrotransposons were extracted from gene sequences according to the annotations of the authors (Bureau and Wessler 1992, 1994; SanMiguel et al. 1996; Tikhonov et al. 1999; Zhang et al. 2000). Only the LTR region and primer-binding site of the LTR-retrotransposon was analyzed. CG analysis: To measure the extent of repression of a given di- or trinucleotide, a score ␳ was calculated by the formula ␳ ⫽ O/E, where O and E denote the observed and expected counts, respectively. Overall expected counts for di- and trinucleotides were calculated by multiplying the observed counts of each nucleotide and dividing the product by the total number of nucleotides found in the sequence. Positiondependent expected counts were calculated assuming the absence of any codon bias. The positions of the 5⬘ and 3⬘ di- or trinucleotides relative to codon triplets are indicated by roman numerals; e.g., I-II denotes a dinucleotide including the first two nucleotides of a codon triplet. For the CG dinucleotide, position-dependent expected counts were calculated as follows: I-II ⫽ 2/3 of arginine-specifying codons; II-III ⫽ the sum of 1/6 of serine-, 1/4 of proline-, threonine-, and alaninespecifying codons; III-I ⫽ NNC ⫻ GNN/T; N represents any nucleotide and T represents the total number of triplets in the sequence. In the case of CNG trinucleotide, positiondependent expected counts were calculated as follows: I-III ⫽ the sum of 1/6 of arginine- and leucine-, 1/4 of proline-, and 1/2 of glutamine-specifying codons; II-I ⫽ NCN ⫻ GNN/T; and III-II ⫽ NNC ⫻ NGN/T. Alignment between members of the ␣-zein gene family: Pairwise alignments were conducted of individual expressed members of the 19- and 22-kD zein genes by GAP analysis (GCG analysis software package). For each alignment, the percentage of C:T and G:A transition mutations was compared to the total number of single-base-pair point mutations. To establish if transition mutations occur in a polarized fashion, i.e., from a younger to an older duplication, or from a gene that has undergone less to more duplications, transition mutations of each gene were counted in individual alignments. Importantly, only transition mutations occurring in a CG or CNG context were considered. Statistical analysis: Differences between O and E values were tested using the chi-square analysis. To test for differences in ␳-values between groups, the Mann-Whitney U-test was employed. All statistical tests were performed using the STATISTICA software package for Macintosh (StatSoft, Tulsa, OK). Bisulfite analysis: Genomic DNA was extracted from young leaf tissue of the inbred lines W64A and W22 using the DNAeasy kit (QIAGEN, Valencia, CA). The W64 and W22 inbred lines contain the Tourist element located in the 5⬘ flanking region of the single-copy or duplicated 27-kD zein gene, respectively (Das and Messing 1987). Between 1 and 2 ␮g of DNA were treated with bisulfite as described by Zeschnigk et al. (1997). For PCR analysis 1/10 vol of bisulfite-treated DNA was employed in a standard PCR reaction. The primer pairs employed for amplification of the Tourist element from bisulfite-treated DNA were as follows: W64A, 5⬘-TAGGTATAT GATTAGTGGTAATTTAATATT-3⬘ and 5⬘-ATTCTTAAAAC

837

TTTACATACCAATACATAA-3⬘; W22, 5⬘-GGGTATATAATT AGTGTAATTTAATATATG-3⬘ and 5⬘-ATTCTTAAAACTTTA CATACCAATACATAA-3⬘. The resulting PCR products were cloned by TOPO cloning and 16 individual clones were sequenced at MWG Biotech (Ebersberg, Germany). To confirm the previously published MITE sequences, the Tourist element was also amplified from genomic DNA employing the following primer pairs: W64A, 5⬘-CCTTGGTTGTTGGCTCATAAT3⬘ and 5⬘-CAGATGAGTATGATCTCGGCA-3⬘; W22, 5⬘-ATA AGTGTTCTGGATATTGGTTGTT-3⬘ and 5⬘-TCAGATGAGT ATGATCTCGCA-3⬘. These primers were also tested on bisulfite-treated DNA and failed to give a product of the expected size. To ensure that the observed patterns of methylation did not result from incomplete strand separation during the bisulfite reaction, the Tourist element from the W64A inbred was cloned, bisulfite treated, and amplified with the bisulfite primers. This element contains only one cytosine that can be methylated by the Escherichia coli dcm methylase (i.e., the internal cytosine residue of the CCWGG sequence). As expected, sequence analysis of 10 independent clones showed that this cytosine remained unmodified. All the remaining cytosine residues of the Tourist element had undergone modification to thymidine.

RESULTS

CG and CNG analysis of the zein genes: Table 1 shows ␳-values (observed/expected) of CG dinucleotides and CNG trinucleotides of genes belonging to the zein-1 and zein-2 fractions. All 19-kD zein genes analyzed were isolated from the B73 inbred line, whereas the 22-kD zein genes were derived from the BSSS53 inbred line (Song et al. 2001; Song and Messing 2002). A ␳-average was calculated of both zein fractions, of the 19- and 22kD zein genes, and of each of the three 19-kD zein subfamilies, z1A, z1B, and z1D. The zein-1 fraction, representing the multicopy ␣-zeins has a CG average of 0.40 (P ⬍ 0.001), whereas the ␳-average of single-copy zein genes of zein-2 fraction is 0.75 (P ⬍ 0.001). Indeed, the zein-1 fraction is more suppressed than the zein-2 fraction (P ⬍ 0.001). Furthermore, the average GC content of the zein-1 fraction is 48% compared to 66% of the zein-2 fraction (results not shown), indicating that CG suppression is accompanied by a decrease in G ⫹ C (GC) content. In contrast, none of the zein fractions are suppressed at the CNG trinucleotide. It can also be observed that the degree of suppression varies between subfamilies of the zein-1 fraction. The more abundant 19-kD zein genes are the more CG suppressed compared to the less abundant 22-kD zein genes (0.34 and 0.49, respectively; P ⬍ 0.001) and, likewise, within the 19-kD gene family the degree of suppression is associated with the copy number of each subfamily. The z1A subfamily, which has the highest copy number, is more suppressed compared to the less abundant z1B and z1C subfamilies (P ⬍ 0.009 and P ⬍ 0.027, respectively). We also analyzed 18 19-kD genes from different genetic backgrounds and found no differences in the average CG and CNG scores (results not shown). The amino acid content of zein-1 and zein-2 fractions

Z513H09-1 Z57A02-2

Z492M16-1 Z492M16-2 Z492M16-4 Z492M16-5 Z492M16-6 Z531H07-1b Z531H07-2 b

Z448f F14-2 Z448F14-3 Z448F14-4 Z448F14-5 Z448F14-6 Z448F14-7 Z350DO7-1 Z350DO7-2 Z350DO7-3 b

Gene

726 723

726 726 723 722 723 726 723

804 705 705 705 705 705 702 702 702

bp

0.34*** 0.05

0.43 0.35 0.39

0.38 0.37 0.37 0.29 0.39 0.34 0.40 0.36

0.30 0.34 0.34 0.30 0.30 0.32 0.29 0.21 0.33 0.30

␳CG

19-kD zein gene family

1.20 NS 0.13

1.04 0.88 0.96

1.26 1.29 1.25 1.29 1.18 0.93 1.03 1.18

1.26 1.21 1.29 1.30 1.24 1.27 1.34 1.24 1.22 1.12

␳CNG

801 801 798 801

azs22;11b azs22;15b azs22;20 b azs22;21b

bp 801 801 801 801 801 801 714 801 801 807

Gene azs22;4 azs22;8 azs22;10 azs22;12 azs22;14 zp22/6 zp22/D87 azs22;2 b azs22;5b azs22;6 b

22-kD average: SD

z1C

Subfamily

a

0.09

0.49*** 0.05 0.40***

0.43 0.45 0.44 0.48

0.52 0.45 0.49 0.58 0.57 0.50 0.44 0.48 0.50 0.50

␳CG

22-kD zein gene family

NS, not significant; bp, base pairs; ␳ ⫽ O/E (O, observed; E, expected). *P ⬍ 0.05; **P ⬍ 0.01; ***P ⫽ 0.001. a See materials and methods for accession numbers. b Nonexpressed gene.

SD

19-kD average SD Zein-1 average:

Average:

z1D

Average:

z1B

Average:

z1A

Subfamily

a

Zein-1 fraction (multicopy ␣-zeins)

Overall CG and CNG zein score

TABLE 1

0.12

1.09 NS 0.05 1.15 NS

1.07 1.08 1.11 1.06

1.11 1.08 1.10 1.09 0.98 1.11 1.18 1.03 1.05 1.16

␳CNG

543 552 630 672 612 453

bp

SD

Zein-2 average:

m12147 m16460 m16218 x53514 x02230 m23537

Acc. no.

0.12

0.75***

0.70 0.82 0.82 0.82 0.81 0.53

␳CG

0.23

1.32***

1.36 1.47 1.40 1.36 1.44 0.86

␳CNG

10-, 15-, 16-, and 27-kD zein genes

Zein-2 fraction (single-copy ␤-, ␥-, and ␴-zeins)

838 G. Lund et al.

Duplication-Dependent CG Depletion

differs considerably, which could potentially influence overall CG and CNG scores. For example, the ␣-zeins are particularly rich in glutamine, leucine, proline, and alanine, whereas the single-copy ␥-zeins have a high content of methionine. To address this problem, CG and CNG frequencies were analyzed in a positiondependent context (Table 2). Position-dependent frequencies correct for differences in amino acid content but not for amino acid codon bias. Essentially, positiondependent frequencies of the zein genes largely reflect overall CG and CNG frequencies. The multicopy ␣-zeins are suppressed at positions II-III and III-I (Table 2; ␳ ⫽ 0.41 and 0.55, respectively; P ⬍ 0.001); in contrast, the single-copy zein genes are not CG suppressed at any position. This implies that the low overall CG score observed for the 10-kD zein gene, m23537 (Table 1), is related to the amino acid content of this gene. Within the zein-1 fraction, the 19-kD zein genes are CG suppressed at positions II-III and III-I (P ⬍ 0.001). The z1A subfamily is more suppressed than the z1B subfamily at position III-I, whereas the opposite is true of position II-III (P ⬍ 0.002 and P ⬍ 0.025, respectively). The 22kD zein genes are also CG suppressed at position II-III (␳ ⫽ 0.52; P ⬍ 0.001), but to a lesser degree than the 19-kD zein genes (␳ ⫽ 0.33; P ⬍ 0.001). The CNG trinucleotide is suppressed only at position I-III of the zein-1 fraction (␳ ⫽ 0.60; P ⬍ 0.009; results not shown) and, again, the 19-kD zein genes are more suppressed compared to the 22-kD zein genes at this position (P ⬍ 0.001). A recent analysis of the B73 and BSSS53 inbred lines has shown that only a relatively small number of ␣-zein genes are expressed (Song et al. 2001; Woo et al. 2001). Most of the nonexpressed genes contain in-frame stop codons or insertion/deletions (Spena et al. 1983; Liu and Rubenstein 1992; Llaca and Messing 1998; Song et al. 2001). Analysis of the average overall CG content of seven expressed vs. seven nonexpressed 22-kD genes (marked with a superscript b in Table 1) showed no significant differences (0.51 vs. 0.47, respectively; P ⫽ 0.225). However, position-dependent frequencies indicated that the inactive genes are more CG suppressed at position II-III (P ⫽ 0.025). Due to small sample size, a similar calculation of the 19-kD genes was not undertaken. The relative contributions of the expressed ␣-zein genes have been assessed in the B73 inbred line (Woo et al. 2001). The 19-kD zein genes are the most highly expressed, whereas the 22-kD zein genes are expressed at a threefold lower level. An estimated five genes encode most 19-kD zein gene transcripts, and four genes account for most 22-kD zein gene transcripts. Interestingly, the average CG values of these 19- and 22-kD zein genes correlate inversely with expression (P ⫽ 0.49; ␳ ⫽ 0.71). This is also true within the 22-kD zein gene family, and a similar tendency was observed for the 19-kD zein gene family (results not shown). This indicates that

839

highly expressed genes are more CG suppressed compared to low expressing genes. To understand if CG suppression is a general effect of a specific chromosomal region, the CG content of intergenic regions of the 22-kD zein gene cluster located on chromosome 4S was analyzed. The lengths of the 22-kD zein intergenic regions varied from 2517 to 14,438 bp. The ␳-average CG of the intergenic region was 0.68, which is significantly higher than the ␳-average of 0.50 of the 22-kD genes (P ⬍ 0.001). CG analysis of MITEs and LTR-retrotransposons: The observed copy number variation in CG suppression prompted us to investigate whether the high-copy-number LTR-retrotranposons and MITEs exhibited similar behaviors. Three MITE families (Tourist, Stowaway, and Heartbreaker) and three groups of LTR-retrotransposon (Ty1-copia, Ty3-gypsy, and an unclassified group), differing in element copy number between and within each group, were analyzed for evidence of CG suppression. The ␳-value and G ⫹ C content was calculated for MITEs and LTR elements in different sequence contexts (Tables 3 and 4, respectively). Despite the high copy number of MITEs and LTR elements, no association was found between the degree of suppression and element copy number. Most transposons were not, or were only slightly, CG suppressed. For example, the ␳-average of Tourist and Heartbreaker families was 0.66 and 0.60 (P ⬍ 0.041 and P ⬍ 0.018), respectively, whereas no suppression was observed of Stowaway family or LTR elements. However, large differences in CG suppression were observed of both MITEs and LTR elements (␳ ranging from 0.16 to 1.46 and 0.26 to 1.12, respectively). We found that the copy number of the insertion site could largely explain the variation in ␳; i.e., elements inserted into multicopy gene regions were more CG suppressed than elements inserted into single-copy genes. This is nicely illustrated by the ␳-values of Stowaway found 3⬘ of the single-copy 10- and 27-kD zein genes and the multicopy 22-kD zein genes (0.66, 0.96, and 0.25, respectively; Table 3B; Bureau and Wessler 1994). Two ␳-values of Tourist and Stowaway elements located 5⬘ and 3⬘, respectively, of the 27-kD zein genes are indicated (x53514 and x56118; Table 3, A and B). x56118 represents an allele of a tandem duplication of a 27-kD zein gene (Das et al. 1991), whereas x53514 (zc2) represents a single copy of a 27-kD zein gene (Reina et al. 1990). For both Tourist and Stowaway elements, CG suppression is more severe upon insertion in the duplicated allele. In addition, suppression of the Stowaway element is accompanied by a 2% decrease in G ⫹ C content. Severe CG suppression was also observed of tandemly duplicated MITEs. An example of this can be seen by comparing the ␳-value of a single-copy or duplicated Tourist element located in the 3⬘ region of the Adh1 locus (x17556 and x04049, respectively; Table 3A; Bureau and Wessler 1992). No suppression is observed of the single-copy Tourist element (␳ ⫽ 1.46),

Z513H09-1 Z57A02-2

Z492M16-1 Z492M16-2 Z492M16-4 Z492M16-6 Z531H07-1b Z531H07-2 b

Z448f F14-2 Z448F14-3 Z448F14-4 Z448F14-5 Z448F14-6 Z448F14-7 Z350DO7-1 Z350DO7-2 Z350DO7-3 b

Gene

0.67 NS 0.46

0.33 0.50 0.42

NC NC 0.37 0.50 1.50 1.12 0.87

0.37 NC NC NC NC NC NC NC NC 0.37

I-II

0.33*** 0.11

0.29 0.42 0.36

0.21 0.21 0.26 0.31 0.28 0.21 0.25

0.39 0.49 0.53 0.37 0.37 0.44 0.27 0.16 0.38 0.38

II-III

0.42*** 0.17

0.67 0.33 0.50

0.67 0.72 0.54 0.61 0.54 0.42 0.58

0.37 0.26 0.17 0.27 0.27 0.25 0.40 0.32 0.32 0.29

III-I 0.50 1.00 0.50 0.50 0.75 0.50 0.75 0.50 0.50 0.50 0.75 0.50 1.50 0.75

azs22;11b azs22;15 b azs22;20 b azs22;21b

I-II

azs22;4 azs22;8 azs22;10 azs22;12 azs22;14 zp22/6 zp22/D87 azs22;2 b azs22;5b azs22;6 b

Gene

22-kD average: SD

z1C

Subfamily

0.34

0.68 NS

0.68 NS 0.28

0.40 0.44 0.42 0.49

0.57 0.45 0.60 0.62 0.63 0.59 0.49 0.49 0.53 0.50

II-III

22-kD zein gene familya

0.13

0.41***

0.52*** 0.08

0.63 0.72 0.67 0.73

0.68 0.64 0.69 0.81 0.75 0.70 0.63 0.69 0.72 0.83

III-I

0.20

0.55***

0.71 NS 0.06 Zein-2 average: 0.81ns SD

m13507 m12147 x53515 m16218 x53514 x02230 m23537

Acc. no.

0.16

0.81 NS

0.90 0.90 0.50 0.90 0.75 0.90 NC

I-II

0.55

1.65***

1.16 1.12 1.92 2.16 2.06 2.18 0.92

II-III

10-, 15-, 16-, and 27-kD zein genes

0.13

1.12 NS

1.14 1.13 1.32 1.10 1.17 1.12 0.89

III-I

Zein-2 fraction (single-copy ␤-, ␥-, and ␴-zeins)

I-II, CG dinucleotide in position 1 of open reading frame; II-III, CG in position 2 of open reading frame; III-I, CG in position 3 of open reading frame; I-III, CNG in position 1 of open reading frame; II-I, CNG in position 2 of open reading frame; III-II, CNG in position 3 of open reading frame; NC, not calculated; other symbols are as in Table 1.

SD

Zein-1 average:

19-kD average: SD

Average:

z1D

Average:

z1B

Average:

z1A

Subfamily

19-kD zein gene familya

Zein-1 fraction (multicopy ␣-zeins)

Position-dependent CG and CNG zein scores

TABLE 2

840 G. Lund et al.

Duplication-Dependent CG Depletion

841

TABLE 3 CG scores of MITE families Location Acc. no. x17556a x04049b x07940a S48688a x53514a x56118b j05212 x15406 x15407 Tourist average: z11879a m23537a x53514a x56118b x73152a x61085b Stowaway average:

af203730 af203733 af203731 af203729 af203732 Heartbreaker average:

Gene

IS

CG ␳

%G ⫹ C

Bp

O

E

126 137 136 128 132 131 142 130 137

10 1 1 2 7 5 1 0 3

6.86 6.26 2.24 7.27 5.49 5.72 4.75 5.17 5.12

1.46 0.16 0.45 0.28 1.27 0.87 0.21 NC 0.59 0.66*

47 43 26 48 41 42 37 40 39 40

B. Stowaway (copy number not reported) P-gene Intron 1 80 4 10-kD zein 3⬘ 1 153 2 27-kD zein 3⬘ 1 154 2 27-kD zein 3⬘ 2 163 0 Gpc4 Intron 4 157 4 22-kD zein 3⬘ ⵑ15 156 1

3.20 3.07 2.08 1.72 3.92 3.98

1.25 0.66 0.96 NC 1.02 0.25 0.83 NS

40 28 23 21 32 32 29

0.62 0.64 0.63 0.66 0.44 0.60*

44 43 44 45 43 44

Adh-1C m Adh-1S Bz-McC Wx-B2 27-kD zein 27-kD zein Oleosin, KD18 Pseudo-Gpa1 Pseudo-Gpa2

NA NA NA NA NA

Copy no. of IS

A. Tourist (10,000 copies) 3⬘ 1 3⬘ 1 3⬘ 1 Exon 1 5⬘ 1 5⬘ 2 3⬘ 3–4 5⬘ ⵑ10 5⬘ ⵑ10

C. Heartbreaker (3,000–4,000 copies) NA 1 314 NA 1–2 314 NA 2 314 NA 2–4 313 NA ⬎10 314

9 9 9 10 6

14.54 14.01 14.36 15.11 13.75

IS, insertion site; NA, not available; Adh-1S, alcohol dehydrogenase 1S allele; Adh-1C m, alcohol dehydrogenase 1C m allele; Bz-McC, UDP glucose flavonoid glucosyl transferase; Wx, ADP glucose glucosyl-transferase; Gpa1 and Gpa2, pseudogene of glyceraldehyde-3-phosphate dehydrogenase; P, anthocyanin gene; Gpc4, glyceraldehyde3-phosphatase; other symbols are as in Table 1. a Insertion into single-copy sequence. b Insertion in clustered multicopy gene region or duplicated element.

whereas the tandemly duplicated element is severely CG suppressed (␳ ⫽ 0.16). Again, CG suppression of the duplicated element is accompanied by a 4% decrease in G ⫹ C content compared to the single-copy insertion. LTR regions also exhibit severe CG suppression upon duplication or if located in a duplicated DNA region (Table 4). For example, CG suppression is observed of x58700, a Hopscotch-like transposon inserted in the promoter region of a multicopy 19-kD zein gene (White et al. 1994), and of u68406, a tandem duplication of an element, Kake-1 (SanMiguel et al. 1996). An average ␳-value was calculated of elements inserted into single- vs. multicopy gene regions or tandemly duplicated elements of selected data points (Tables 3 and 4; labeled with superscript a and b, respectively). The criterion for selection of data points was knowledge of copy number of both element and insertion sequence. In addition, only clustered multicopy genes were represented in the multicopy group. Therefore, the Gpc4 gene, which belongs to a small dispersed multigene

family, was analyzed as a single-copy insertion (Russell and Sachs 1991), and some elements inserted in multicopy sequences were not included as it is unknown whether these genes are clustered or dispersed (e.g., Gpa pseudogenes and oleosins). CG suppression was not observed of LTR elements and MITEs located in single-copy genic regions (␳ ⫽ 0.94), whereas tandemly duplicated elements or gene sequences were suppressed (␳ ⫽ 0.33; P ⬍ 0.020). Furthermore, the ␳-average of MITEs and LTR elements inserted in single-copy gene regions was higher than the average value of tandemly duplicated elements and elements inserted into duplicated DNA regions (P ⬍ 0.004). Simply analyzing ␳-values of MITEs and LTR elements located in single- vs. multicopy gene regions produced results identical to the selected data set, the latter being less significant (P ⬍ 0.009). CG depletion as a function of time or gene duplication: If CG suppression results from passive deamination of methylated residues, the extent of depletion should

842

G. Lund et al. TABLE 4 CG values of LTR-retrotransposons Element

Acc. no.

Location

Name

Copy no.

u12626a x58700b af082134a u68401 u68410 u68408 af090447 u68405 Average:

Hopscotch Hopscotch Stonor Fourf Victim Opie-2 Prem-2 Ji-3

2–6 2–6 30–40 ⵑ100 ⵑ100 ⬎30,000 NA 50,000

af015269a U68409 U68404 U68403 U68402 af090447 U11059a af090447 Average:

Magellan Reina Huck-2 Grande-zm1 Cinful Zeon-1 Zeon-1 NA

U68407 U68406b Average:

Milt Kake-1

Gene

IS

Ty1-copia group wx-K Exon 12 19-kD zein 5⬘ wx-Stonor Intron5/exon6 334B7.4 Exon 1 Intergenic 334B7.4 3⬘ Intergenic Intergenic

Ty3-gypsy group 4–8 Pl Exon 1 ⬍ 10 Intergenic ⵑ100 334B7.4 Exon 1 ⬎1300 Intergenic 20,000 Intergenic 20,000 Intergenic 20,000 27-kD zein 5⬘ NA Intergenic

Nonclassified group ⵑ100 334B7.4 3⬘ ⵑ100 Intergenic

CG IS copy no.

Bp

1 ⵑ56 1 NA NA NA NA NA

231 147 560 1162 100 1271 1424 1176

1 NA NA NA NA NA 1 NA

336 19 19.54 323 15 20.43 1644 162 170.66 645 39 43.00 605 20 23.00 669 25 23.93 649 21 21.40 669 137 157.52

NA NA

742 182

O

E



11 11.22 0.98 2 4.97 0.40 37 38.02 0.97 64 63.94 1.00 1 3.84 0.26 69 80.22 0.86 82 104.09 0.79 58 72.62 0.80 0.76 NS

75 2

%GC 44 43 52 47 40 50 54 50 48

0.97 0.73 0.95 0.90 0.86 1.04 0.98 1.04 0.93 NS

49 51 65 52 39 38 37 44 47

66.47 1.12 6.53 0.27 0.70 NS

60 40 50

Symbols are as in Tables 1 and 3. a Insertion into single-copy sequence. b Insertion in clustered multicopy gene region or duplicated element.

reflect the number of years a sequence has been methylated. Table 5A shows a comparison between ␳, the estimated time of insertion, and the number of duplications of the seven expressed 22-kD zein genes clustered on chromosome 4S. Similarly, the estimated times of retrotransposon insertion at the Adh1-F locus has been compared to CG frequencies (Table 5B). Neither CG nor CNG suppression correlated with time of insertion of the 22-kD zein genes or of the LTR-retrotransposons. However, the 22-kD zein genes showed an inverse correlation between CG content and the number of duplications each gene has undergone (r ⫽ 0.85; P ⬍ 0.014). This was found to be specific of the CG dinucleotide. Similarly, analysis of the expressed 19-kD zein genes isolated from the B73 cluster also showed that the degree of CG suppression correlated with the extent of duplication (r ⫽ 0.52; P ⬍ 0.05; n ⫽ 14; results not shown). Sequence divergence of duplicated DNA sequences: To identify the fate of CG dinucleotides, pairwise alignments of the seven expressed zein genes was conducted (Table 5). For each of the 21 alignments, the total num-

ber of C:T and G:A transition mutations and the total number of transition mutations occurring in a CG or CNG context were counted and compared to the total number of point mutations (results not shown). In addition, transition mutations occurring in a CG or CNG context were counted for each gene in the pairwise alignments. Transition mutations were the most common point mutations observed, representing on average 61% of total point mutations, and the vast majority occurred in a CG or CNG context (average 74%). In 7/21 alignments performed, it was possible to distinguish if CG depletion resulted from the time or extent of duplication, whereas the remaining alignments were noninformative. The informative alignments included zp22/ D87 and azs22;8, azs22;10 and azs22;4, azs22;10 and azs22;14, azs22;12 and zp22/6, zp22/6 and azs22;14, azs22;12 and azs22;10, and zp22/6 and azs22;14. Four of these alignments showed that transition mutations occurring in CG context were consistent with CG suppression resulting from duplication, whereas none were consistent with the time of amplification. For example, 11 transition mutations (occurring in a CG context)

Duplication-Dependent CG Depletion TABLE 5 CG scores of 22-kD zein genes and LTR regions compared to insertion time and duplication

A. 22-kD gene azs22;12 azs22;10 zp22/6 azs22;14 azs22;4 zp22/D87 azs22;8

B. LTR Cinful Grande-Zm1 Opie-1 Fourf Milt Ji-3 Reina Huck-1 Victim Zeon-1 a b



Insertion (MYR)a

No. of duplications

CG

CNG

0.6 0.6 0.6 1.4 1.4 1.6 2.3

4 6 6 6 6 8 7

0.58 0.49 0.50 0.57 0.52 0.44 0.45

1.09 1.10 1.11 0.98 1.11 1.18 1.08

Insertion (MYR)b

CG

0.26 0.12 0.18 1.39 1.56 1.86 2.08 2.26 2.42 2.75

0.90 0.90 0.90 1.00 1.10 0.80 0.73 0.90 0.30 1.00

Insertion/duplication as estimated by Song et al. (2001). Insertion time as estimated by SanMiguel et al. (1998).

were observed from azs22;12 to zp22/6, whereas only 7 were observed in the opposite direction. This is in agreement with duplication-dependent CG suppression as zp22/6 has undergone a larger number of duplications compared to azs22;12 (six vs. four, respectively). If CG suppression were the result of time-dependent deamination of cytosine residues, an equal number of polarized transition mutations would have been expected given the fact that these genes have both amplified 0.6 MYA. In contrast, no association between transition mutations occurring at the CNG trinucleotide and duplication was found for the informative alignments. Interestingly, 5 of the informative alignments did show a time-dependent decrease in CNG content. For the noninformative alignments, the majority of transition mutations observed at CG dinucleotides (12/14) also occurred in a polarized fashion from a less to a more duplicated gene (or from a younger to an older duplication). However, only 6/14 alignments showed a similar behavior at the CNG trinucleotide. These results support the prior observation that only CG suppression correlates with the extent of duplication. Alignments were also performed between genes belonging to the 19-kD zein subfamilies, z1A, z1B, and z1D, which have undergone an average of eight, six, and four duplications, respectively. Two expressed genes representative of the z1A subfamily (Z448F14-3

843

and Z448F14-4) were aligned to all the expressed genes of z1B and z1C. In 16/18 alignments, a higher number of transition mutations at CG dinucleotides were observed from the less duplicated z1B and z1C genes to the more extensively duplicated z1A genes. The same could be concluded by alignments of two z1B genes (Z492M16-1 and Z492M16-4) to the two expressed z1D genes. Of the 18 alignments performed, transition mutations represented on average 57% of total single base pair mutations and, of these, 48% occurred in a CG or CNG context. Similar conclusions were found by aligning the duplicated 27-kD zein gene, x56118, to the single-copy 27kD zein gene, zc2 (x53514). These genes exhibit 97% sequence similarity. Only nine point mutations were observed, six being C:T or G:A transition mutations. Again, transition mutations occurred in a polarized fashion from the single copy to the duplicated gene (four vs. two, respectively). However, only transition mutations observed from the zc2 to the x56118 allele were in a CG context (3/4). This pattern of mutation was also mirrored by alignment of the Tourist and Stowaway elements located in the 5⬘ and 3⬘ flanking region of the single-copy and duplicated 27-kD zein gene, respectively. Finally, alignment of the single-copy and duplicated Tourist element 3⬘of the Adh1 locus confirmed that C:T and G:A transition mutations are polarized (i.e., occur from the single to the duplicated MITE sequence). The degree of sequence identity between individual MITEs varies between 46 and 88% (Bureau and Wessler 1992). Interestingly, the average degree of similarity between duplicated MITEs and MITEs inserted in duplicated gene regions is higher than the average degree of similarity of MITEs inserted in single-copy regions (66 vs. 63%, respectively; P ⫽ 0.0431). This supports our observations that linked duplicated sequences evolve more similarly compared to single or dispersed, multicopy sequences. Methylation status of single- vs. multicopy genic regions: C:T and G:A transition mutations are the presumed products resulting from deamination of methylated cytosines. Therefore, a possible explanation of the enhanced turnover of CG dinucleotides might be that duplicated sequences exhibit qualitative or quantitative differences in methylation compared to single-copy genes. To this end, we analyzed the methylation status of the Tourist element located in the single-copy or tandem duplication of the 27-kD zein gene (zc2 and x56118) by bisulfite sequencing. These elements were chosen because they exhibit small differences in CG and G ⫹ C content (see Table 3). The results showed that in both sequence contexts the MITE was hypermethylated at all CG dinucleotides, in addition to a proportion of methylated cytosines in a nonsymmetrical sequence context (see Figure 1). All 29 cytosine residues of the MITE located 5⬘ of the zc2 coding region were methyl-

844

G. Lund et al.

Figure 1.—Methylation state of a Tourist element in the single-copy or duplicated 27-kD zein gene. Bisulfite sequencing of a Tourist element in the 5⬘ region of the single or duplicated 27-kD zein gene. M, methylation of symmetrical CG and CNG sequences; M, methylation of nonsymmetrical sequences. The methylation state of 16 independent clones is indicated, and each letter represents two observations.

ated compared to 25/30 cytosine residues of the element inserted in the tandem duplication. In addition, quantitative analysis of 16 independent clones indicated that the MITE located in the duplicated 27-kD zein gene showed 19 and 37% reduction in symmetrical and asymmetrical methylation, respectively, compared to the element located in the single-copy gene. DISCUSSION

Our results show that during a short evolutionary time span, individual ␣-zein genes have accumulated large variations in CG content. In particular, within the 19and 22-kD zein gene families, extensively duplicated genes are more CG depleted compared to less duplicated genes. In addition, the 19-kD zein genes are more suppressed than the 22-kD zein genes, indicating that the former have undergone a greater number of gene

duplications. This is supported by the fact that the 19kD genes are more abundant compared to the 22-kD genes (Hagen and Rubenstein 1981; Wilson and Larkins 1984; Heindecker and Messing 1986). Duplication-dependent depletion of the ␣-zein genes results largely from C:T and G:A transition mutations in a CG or CNG context. Similarly, elevated levels of transition mutations have also been identified in other multigene families in plants such as the GAPA and rDNA gene families of maize and the 5S RNA genes from Arabidopsis (Quigley et al. 1989; Edward et al. 1996; Matieu et al. 2002). For these gene families, a higher level of C:T (or G:A) transition mutations was observed of the nontranscribed genes, and it was argued that CG depletion resulted from relaxation of selective constraints at the transcriptional level (Quigley et al. 1989; Matieu et al. 2002). This explanation is, however, inadequate for the variation in CG suppression observed of

Duplication-Dependent CG Depletion

the ␣-zein gene family, as the most highly expressed genes are the most CG depleted. We speculate that the high expression levels of the most CG-suppressed ␣-zeins are caused by the lack of CG dinucleotides available for methylation, thus relieving methylation-mediated transcriptional repression. Indeed, the fact that ␣-zein genes are methylated in both the coding and noncoding regions and exhibit an inverse relation between CG methylation and expression lends support to this idea (Bianchi and Viotti 1988; Lund et al. 1995; Sturaro and Viotti 2001). In addition, differences in the extent of methylation could explain the large variations in expression levels of individual zein genes (Woo et al. 2001; Song and Messing 2002). The in vivo rate of deamination of methylated cytosine residues in plants is not known, whereas in mammals the estimated half-life of a cytosine residue is between 24 and 60 million years (Yang et al. 1996). However, the average plant gene is less depleted in CG compared to the average mammalian gene (average CG score is 0.68 and 0.22, respectively) perhaps indicative of a decreased mutability of CG dinucleotides in plants (Gardiner-Garden and Frommer 1987; Gardiner-Garden et al. 1992). On an evolutionary time scale, many transposons represent recent insertions in the maize genome. For example, a large number of LTR-retrotransposons that map to the Adh1-F locus have inserted during the last 5 million years (SanMiguel et al. 1998). Although these LTR regions exhibit a twofold increase in transition mutations compared to nonmethylated intronic regions (SanMiguel et al. 1998), we found no correlation between CG suppression and insertion time of these sequences. Given that spontaneous deamination of 5mC is a very slow process, this suggests that the observed low CG values of duplicated elements or of elements inserted into duplicated gene regions also result from enhanced turnover of CG as a result of gene duplication. If duplicated sequences are methylated compared to their single-copy counterparts, an increase in methylationrelated deaminations might be expected, subsequently resulting in a reduction in CG content. However, bisulfite analysis of a MITE inserted in the single-copy or duplicated 27-kD zein gene locus showed that the element was methylated in both sequence contexts. In addition, as most transposons are hypermethlyated (Rabinowicz et al. 1999; Tompa et al. 2002), it is probable that the majority of LTR regions analyzed in this study are methylated. Indeed, the observation that LTR regions of retrotransposons that map to the Adh-F locus exhibit a twofold increase in transition mutations compared to nonmethylated intronic regions strongly suggests that these single-copy insertions are methylated (SanMiguel et al. 1998). However, despite indirect evidence that these elements are methylated, most elements were not CG suppressed. Likewise, although both single- and multicopy zein genes have been shown to

845

be hypermethylated in plant tissue (Bianchi and Viotti 1988; Lund et al. 1995), only suppression of the multicopy zeins was observed. Taken together, the data suggest that methylation status per se is not sufficient to explain differences in CG suppression. However, duplicated sequences may exhibit a specific methylation pattern that alters the mutability of 5mC compared to a single-copy methylated sequence. The Tourist element inserted in the 5⬘ region of the single-copy 27-kD zein gene showed a quantitative difference in methylation compared to the same element inserted in the duplicated 27-kD zein gene. The significance of these findings in relation to CG suppression can only be speculated. However, in rodents neither CG density nor the methylation status could explain the observed mutation frequencies of CG dinucleotides of a transgene (Skopek et al. 1998; Monroe et al. 2001). Likewise, the methylation status of the expressed and nonexpressed 5S RNA genes in Arabidopsis failed to account for the elevated levels of C:T and G:A transition mutations observed of the nonexpressed genes (Matieu et al. 2002). An alternative explanation for the observed differences in CG content is that selective pressures differ between chromosomal regions, which could result in different mutation rates of CG dinucleotides. For example, this might explain why the 22-kD zein genes, which map to chromosome 4S, are suppressed, whereas no suppression of DNA sequences that map to the Adh1-F chromosomal region was observed. However, analysis of intergenic and LTR regions of retrotransposons, located in the 340-kb 22-kD zein gene cluster, showed that CG suppression was localized to zein gene regions and was not a common feature of this chromosomal region. That CG suppression is independent of chromosomal content is also supported by the fact that many sequences analyzed exhibit large variations in CG despite being located in identical genomic regions, e.g., Tourist and Stowaway elements located in the noncoding regions of the single-copy or duplicated 27-kD zein gene or the single or duplicated Tourist element located in the 3⬘ region of the Adh1 locus. A study of 101 maize genes has shown that 40% of codon-usage variation is due to a bias toward G or C vs. A or U ending codons (Fennoy and Bailey-Serres 1993). The bias toward C or G in the third position (GC3) is larger for the single-copy zeins, whereas the ␣-zeins have low GC3 values. We found that CG dinucleotides of the ␣-zeins were suppressed at positions II-III and III-I, whereas the single-copy genes showed an excess of CG at these positions. This is interesting as CG suppression at position III-I seems to be specific of methylating species, whereas nonmethylating species show an excess of CG at this position (Schorderet and Gartler 1992). Therefore, we argue that the differences in the GC3 bias between the single- and multicopy ␣-zein genes reflect the fact that the ␣-zein genes have undergone

846

G. Lund et al.

extensive gene duplication and are not due to a bias in codon usage between the single- and multicopy zein genes. Mammalian and plant genomes are made up of large regions of relatively homogeneous base composition known as isochores (Bernardi 2000), and the debate is ongoing of whether this mosaic structure is caused by mutation bias, natural selection, or biased gene conversion (reviewed by Eyre-Walker and Hurst 2001). Of particular interest in this context is the finding that cytosine deamination plays a primary role in the evolution of isochores (Fryxell and Zuckerkandl 2000). In maize, most genes are confined to isochores with a narrow GC range, with the exception of the ␣-zeins and ribosomal genes that are located in GC-poor and GCrich fractions, respectively (Carels et al. 1995). We have previously argued that the low GC3 content of the ␣-zeins could be explained by duplication-dependent CG depletion. Given that the GC content and, in particular, the GC3 content of a gene is highly correlated to the overall GC content of the isochore in which it is located (Bernardi et al. 1985; Clay et al. 1996; EyreWalker and Hurst 2001), duplication-dependent CG loss may, in part, explain the evolution of this particular GC-poor isochore. We have shown that duplicated zein genes, LTR elements, and MITEs undergo specific changes in nucleotide sequence. These changes have been observed in CG dinucleotides and result in C:T and G:A transition mutations and a net reduction in GC content. Such a process is reminiscent of RIP in N. crassa, where duplications are de novo methylated and riddled with point mutations (Selker 1990). Unfortunately, this study cannot discern if transition mutations have occurred immediately upon duplication or result from subsequent mitoses. This question has been addressed in Arabidopsis by sequence analysis of multicopy insertions of transgenes after three sexual generations (MittelsteinScheid et al. 1994). None of the transition mutations characteristic of RIP were found, and it was argued that if RIP occurs in plants it occurs at a much lower frequency. However, this experiment does not necessarily exclude the possibility of a RIP-like mechanism in plants. Indeed, if transition mutations are linked to the duplication process, merely inserting a multicopy locus into the Arabidopsis genome would, expectedly, fail to recover any transition mutations. From the detailed analysis of the 22-kD zein genes it is clear that the rate of CG depletion of duplicated sequences is enhanced compared to the average deamination of methylated residues of dispersed multicopy sequences. Further analysis will reveal if this mechanism is common to other duplicated gene families. Suppression of MITEs and LTR elements inserted in duplicated gene regions indicates that this might be the case. Interestingly, rapid responses to genome-wide duplication of genomes have been shown to occur in wheat and

Arabidopsis allotetraploids. These changes include nonrandom alterations in methylation and gene silencing caused by methylation or gene loss (Kashkush et al. 2002; Madlung et al. 2002). Although it is unclear whether the observed effects result from chromosome doubling or from the hybridization of different genomes, it suggests that specific mechanisms are activated in plants in response to DNA amplification that presumably function to maintain genome stability. The authors thank Mik Noordeweir for assisting in part of the sequence analysis, Angelo Viotti and Vincenzo Rossi for critical reading of the manuscript, and E. Linton for estimates of insertion time of the 22-kD zein genes. This work was supported by a grant from the Danish National Research Foundation.

LITERATURE CITED Ashikawa, I., 2001 Gene-associated CpG islands in plants as revealed by analyses of genomic sequences. Plant J. 26: 617–625. Barry, C., G. Faugeron and J. L. Rossignol, 1993 Methylation induced premeiotically in Ascobolus: coextension with DNA repeat lengths and effect on transcription elongation. Proc. Natl. Acad. Sci. USA 90: 4557–4561. Bender, J., 1998 Cytosine methylation of repeated sequences in eukaryotes; the role of DNA pairing. Trends Biochem. Sci. 23: 252–256. Bender, J., and G. R. Fink, 1995 Epigenetic control of an endogenous gene family is revealed by a novel blue fluorescent mutant of Arabidopsis. Cell 83: 725–734. Bennetzen, J. L., K. Schrick, P. S. Springer, W. E. Brown and P. SanMiguel, 1994 Active maize genes are unmodified and flanked by diverse classes of modified, highly repetitive DNA. Genome 37: 565–576. Bernardi, G., 2000 Isochores and the evolutionary genomics of vertebrates. Gene 241: 3–17. Bernardi, G., B. Olofsson, J. Filipski, M. Zerial, J. Salinas et al., 1985 The mosaic genome of warm blooded vertebrates. Science 228: 953–958. Bianchi, M. W., and A. Viotti, 1988 DNA methylation and tissuespecific transcription of the storage protein genes of maize. Plant Mol. Biol. 11: 203–214. Bird, A. P., 1980 DNA methylation and the frequency of CpG in animal DNA. Nucleic Acids Res. 8: 1499–1504. Bureau, T. E., and S. R. Wessler, 1992 Tourist : a large family of small inverted repeat elements frequently associated with maize genes. Plant Cell 4: 1283–1294. Bureau, T. E., and S. R. Wessler, 1994 Stowaway : a new family of inverted repeat elements associated with the genes of monocotyledonous and dicotyledonous plants. Plant Cell 6: 907–916. Cambereri, E. B., B. C. Jensen, E. Schabtach and E. Selker, 1989 Repeat-induced G-C to A-T mutations in Neurospora. Science 244: 1571–1575. Carels, N., A. Barakat and G. Bernardi, 1995 The gene distribution of the maize genome. Proc. Natl. Acad. Sci. USA 92: 11057– 11060. Clay, O., S. Caccio, Z. Zoubak, D. Mouchiroud and G. Bernardi, 1996 Human coding and noncoding DNA: compositional correlations. Mol. Phylogenet. Evol. 5: 2–12. Cooper, D. N., and M. Krawczak, 1989 Cytosine methylation and the fate of CpG dinucleotides in vertebrate genomes. Hum. Genet. 83: 181–188. Couloundre, C., J. H. Miller, P. J. Farabaugh and W. Gilbert, 1978 Molecular basis of base substitution hotspots in Escherichia coli. Nature 274: 775–780. Das, O. P., and J. Messing, 1987 Allelic variation and differential expression of the 27-kDa zein locus in maize. Mol. Cell. Biol. 7: 4490–4497. Das, O. P., K. Ward, S. Ray and J. Messing, 1991 Sequence variation

Duplication-Dependent CG Depletion between alleles reveals two types of copy correction at the 27 kDa zein locus of maize. Genomics 11: 849–856. Edward, S., I. V. Bukler and T. P. Holtsford, 1996 Zea mays ribosomal repeat evolution and substitution patterns. Mol. Biol. Evol. 14: 623–632. Esen, A., 1986 Separation of alcohol-soluble proteins (zeins) from maize into three fractions by differential solubility. Plant Physiol. 80: 623–627. Eyre-Walker, A., and L. D. Hurst, 2001 The evolution of isochores. Nat. Genet. Rev. 2: 549–555. Fennoy, S. L., and J. Bailey-Serres, 1993 Synonymous codon usage in Zea mays L. nuclear genes is varied by levels of C-ending and G-ending codons. Nucleic Acids Res. 23: 5294–5300. Flavell, R. B., 1994 Inactivation of gene expression in plants as a consequence of specific sequence duplication. Proc. Natl. Acad. Sci. USA 91: 3490–3496. Flavell, R. B., M. O’Dell and W. F. Thompson, 1988 Regulation of cytosine methylation in ribosomal DNA and nculeolus organizer expression in wheat. J. Mol. Biol. 204: 523–534. Fryxell, K. J., and E. Zuckerkandl, 2000 Cytosine deamination plays a primary role in the evolution of mammalian isochores. Mol. Biol. Evol. 17: 1371–1383. Gardiner-Garden, M., and M. Frommer, 1987 CpG islands in vertebrate genomes. J. Mol. Biol. 196: 261–282. Gardiner-Garden, M., J. A. Sved and M. Frommer, 1992 Methylation sites in angiosperm genes. J. Mol. Evol. 34: 219–230. Goyon, C., and G. Faugeron, 1989 Targeted transformation of Ascobolus immersus and de novo methylation of the resulting duplicated DNA sequences. Mol. Cell. Biol. 9: 2818–2827. Gruenbaum, Y., T. Naveh-Many, H. Cedar and A. Razin, 1981 Sequence specificity of methylation in higher plant DNA. Nature 292: 860–862. Gruenbaum, Y., H. Cedar and A. Razin, 1982 Substrate and sequence specificity of a eukaryotic DNA methylase. Nature 295: 620–622. Hagen, G., and I. Rubenstein, 1981 Complex organization of zein genes in maize. Gene 13: 239–249. Heindecker, G., and J. Messing, 1986 Structural analysis of plant genes. Annu. Rev. Plant Physiol. 37: 439–466. Holstein, M., M. S. Greenblatt, K. Rice, T. Soussi, R. Fuchs et al., 1994 Database of p53 gene somatic mutations in human tumors and cell lines. Nucleic Acids Res. 22: 3551–3555. Jeddeloh, J. A., and E. J. Richards, 1996 mCCG methylation in angiosperms. Plant J. 9: 579–586. Jones, P. A., W. M. Rideout, J. C. Shen, C. H. Spruck and Y. C. Tsai, 1992 Methylation, mutation and cancer. Bioessays 14: 33–36. Kashkush, K., M. Feldman and A. A. Levy, 2002 Gene loss, silencing and activation in a newly synthesized wheat allotetraploid. Genetics 160: 1651–1659. Kirihara, J., J. B. Petri and J. Messing, 1988 Isolation and sequence of a gene encoding a methionine rich 10-kDa zein protein from maize. Gene 71: 359–370. Kovarik, A., R. Matyasek, A. Leitch, B. Gazdova, J. Fulnecek et al., 1997 Variability in CpNpG methylation in higher plant genomes. Gene 201: 25–33. Kricker, M. C., J. W. Drake and M. Radman, 1992 Duplicationtargeted DNA methylation and mutagenesis in the evolution of eukaryotic chromosomes. Proc. Natl. Acad. Sci. USA 89: 1075– 1079. Kumar, A., and J. L. Bennetzen, 1999 Plant retrotransposons. Annu. Rev. Genet. 33: 479–532. Leite, A., L. M. M. Ottoboni, M. L. P. N. Targon, M. J. Silva, S. R. Turcinelli et al., 1990 Phylogenetic relationship of zein and coixins as determined by immunological cross-reactivity and Southern blot analysis. Plant Mol. Biol. 14: 743–751. Leutwiler, L. S., B. R. Hough-Evans and E. M. Meyerowitz, 1984 The DNA of Arabidopsis thaliana. Mol. Gen. Genet. 194: 15–23. Liu, C. N., and I. Rubenstein, 1992 Molecular characterization of two types of 22 kilodalton ␣-zein genes in a gene cluster in maize. Mol. Gen. Genet. 234: 244–253. Llaca, V., and J. Messing, 1998 Amplicons of maize genes are conserved within genic but expanded and constricted in intergenic regions. Plant J. 15: 211–220. Lund, G., P. Ciceri and A. Viotti, 1995 Maternal-specific demethyl-

847

ation and expression of specific alleles of zein genes in the endosperm of Zea mays L. Plant J. 8: 571–581. Madlung, A., R. W. Masuelli, B. Watson, S. H. Reynolds, J. Davidson et al., 2002 Remodeling of DNA methylation and phenotypic and transcriptional changes in synthetic Arabidopsis allotetraploids. Plant Physiol. 129: 733–746. Maloisel, L., and J. L. Rossignol, 1998 Suppression of crossingover by DNA methylation in Ascobolus. Genes Dev. 12: 1381– 1389. Matassi, G., R. Melis, K. C. Kuo, G. Macaya, C. W. Gehrke et al., 1992 Large-scale methylation patterns in the nuclear genomes of plants. Gene 122: 239–245. Matieu, O., Y. Yukawa, M. Sugiura, G. Pikard and S. Tourmente, 2002 5S rRNA genes expression is not inhibited by DNA methylation in Arabidopsis. Plant J. 29: 313–323. McClelland, M., 1983 The frequency and distribution of methylatable DNA sequences in leguminous plant protein coding genes. J. Mol. Evol. 19: 346–354. Mittelstein-Scheid, O., K. Afsar and J. Paszkowski, 1994 Gene inactivation on Arabidopsis thaliana is not accompanied by an accumulation of repeat-induced point mutations. Mol. Gen. Genet. 244: 325–330. Monroe, J. J., M. G. Manjanatha and T. R. Skopek, 2001 Extent of CpG methylation is not proportional to the in vivo spontaneous mutation frequency at transgenic loci in Big Blue rodents. Mutat. Res. 476: 1–11. Montero, L. M., J. Filipski, P. Gil, J. Capel, J. M. Martinez-Zapater et al., 1992 The distribution of 5-methylcytosine in the nuclear genome of plants. Nucleic Acids Res. 20: 3207–3210. Prat, S., J. Cortadas, P. Puigdomenech and J. Palau, 1985 Multiple variability in the sequence of a family of maize endosperm proteins. Nucleic Acids Res. 13: 1493–1504. Quigley, F., H. Brinkmann, W. F. Martin and R. Cerff, 1989 Strong functional GC pressure in a light regulated maize gene encoding subunit GAPA of chloroplast gyceraldehyde-3-phosphate dehydrogenase: implications for the evolution of GAPA pseudogenes. J. Mol. Evol. 29: 412–421. Rabinowicz, P. D., K. Schutz, N. Dedhia, C. Yordan, L. D. Parnell et al., 1999 Differential methylation of genes and retrotransposons facilitates shotgun sequencing of the maize genome. Nat. Genet. 23: 305–308. Reina, M., P. Guillen, I. Ponte, A. Boronat and J. Palau, 1990 Sequence analysis of a genomic clone encoding a Zc2 protein from Zea mays W64A. Nucleic Acids Res. 18: 6425. Ronchi, A. K., K. Petroni and C. Tonelli, 1995 The reduced expression of endogenous duplications (REED) in the maize R gene family is mediated by DNA methylation. EMBO J. 14: 5318–5328. Rountree, M. R., and E. U. Selker, 1997 DNA methylation inhibits elongation but not initiation of transcription in Neurospora crassa. Genes Dev. 11: 2383–2395. Rubenstein, I., and D. E. Geraghty, 1986 The genetic organization of zeins, pp. 297–315 in Advances in Cereal Science and Technology, edited by Y. Pomeranz. American Association of Cereal Chemists, St. Paul. Russell, D. A., and M. M. Sachs, 1991 The maize cytosolic glyceraldehyde 3-phosphate dehydrogenase gene family: organ specific expression and genetic analysis. Mol. Gen. Genet. 229: 219–228. SanMiguel, P., A. Tikhonov, Y. K. Jin, N. Motchoulskaia, D. Zakharov et al., 1996 Nested retrotransposons in the intergenic regions of the maize genome. Science 274: 737–738. SanMiguel, P., B. S. Gaut, A. Tikhonov, Y. Nakajima and J. L. Bennetzen, 1998 The paleontology of intergene retrotransposons of maize. Nat. Genet. 20: 43–45. Schorderet, D. F., and S. M. Gartler, 1992 Analysis of CpG suppression in methylated and non-methylated species. Proc. Natl. Acad. Sci. USA 89: 957–961. Selker, E. U., 1990 Premeiotic instability of repeated sequences in Neurospora crassa. Annu. Rev. Genet. 24: 579–613. Skopek, T., D. Marino, K. Kort, J. Miller, M. Trumbauer et al., 1998 Effect of target gene CpG content on spontaneous mutation in transgenic mice. Mutat. Res. 400: 77–88. Soave, C., R. Reggiani, N. Difonzo and F. Salamini, 1981 Clustering of genes for 20 kd zein subunits in the short arm of maize chromosome 7. Genetics 97: 363–377.

848

G. Lund et al.

Soave, C., R. Reggiani, N. Difonzo and F. Salamini, 1982 Genes for zein subunits on maize chromosome 4. Biochem. Genet. 20: 1027–1038. Song, R., and J. Messing, 2002 Contiguous genomic DNA sequence comprising the 19-kD gene family from maize. Plant Physiol. 130: 1626–1635. Song, R., V. Llaca, E. Linton and J. Messing, 2001 Sequence, regulation, and evolution of the maize 22-kD ␣-zein gene family. Genome Res. 11: 1817–1825. Spena, A., A. Viotti and V. Pirotta, 1983 Two adjacent genomic zein sequences: structure, organization and tissue-specific restriction pattern. J. Mol. Biol. 169: 799–811. Sturaro, M., and A. Viotti, 2001 Methylation of the Opaque2 box in zein genes is parent-dependent and affects O2 DNA binding activity in vitro. Plant. Mol. Biol. 46: 549–560. Swarup, S., M. C. P. Timmermans, S. Chaudhuri and J. Messing, 1995 Determinants of the high-methionine trait in wild and exotic germplasm may have escaped selection during early cultivation of maize. Plant J. 8: 35–40. Tikhonov, A. P., P. J. SanMiguel, Y. Nakajima, N. M. Gorenstein and J. L. Bennetzen, 1999 Colinearity and its exceptions in orthologous adh regions of maize and sorghum. Proc. Natl. Acad. Sci. USA 96: 7409–7414. Tompa, R., C. M. McCallum, J. Delrow, J. G. Henikoff, B. van Steensel et al., 2002 Genome-wide profiling of DNA methylation reveals transposon targets of CHROMOMETHYLASE3. Curr. Biol. 12: 65–68.

White, S. E., L. F. Habera and S. R. Wessler, 1994 Retrotransposons in the flanking regions of normal plant genes: a role for copia-like elements in the evolution of gene structure and expression. Proc. Natl. Acad. Sci. USA 91: 11792–11796. Wilson, C. M., G. F. Sprague and T. C. Nelson, 1989 Linkage among zein genes determined by isoelectrical focusing. Theor. Appl. Genet. 77: 217–226. Wilson, D. R., and B. A. Larkins, 1984 Zein gene organization in maize and related grasses. J. Mol. Evol. 20: 330–340. Woo, Y. M., D. W. Hu, B. A. Larkins and R. Jung, 2001 Genomics analysis of genes expressed in maize endosperm identifies novel seed proteins and clarifies patterns of zein gene expression. Plant Cell 13: 2297–2317. Yang, A. S., M. L. Gonzalgo, J. Zingg, R. P. Miller, J. Buckley et al., 1996 The rate of CpG mutation in Alu repetitive elements within the p53 tumor suppressor gene in the primate germline. J. Mol. Biol. 258: 240–250. Zeschnigk, M., C. Lich, K. Buiting, W. Doerfler and B. Horsthemke, 1997 A single-tube PCR test for the diagnosis of Angelman and Prader-Willi syndrome based on allelic methylation differences at the SNRPN locus. Eur. J. Hum. Genet. 5: 94–98. Zhang, Q., J. Arbuckle and S. R. Wessler, 2000 Recent, extensive, and preferential insertion of members of the miniature invertedrepeat transposable element family Heartbreaker into genic regions of maize. Proc. Natl. Acad. Sci. USA 97: 1160–1165. Communicating editor: J. Birchler