Expandable varl gene of yeast mitochondrial DNA:

3 downloads 0 Views 1MB Size Report
MICHAEL E. S. HUDSPETH*, ROBERT D. VINCENTtf, PHILIP S. PERLMANt, DEBORAH S. SHUMARD*,. LAURELEE 0. ...... Kroon, A. M.. & Saccone, C.
Proc. Nati. Acad. Sci. USA Vol. 81, pp. 3148-3152, May 1984

Genetics

Expandable varl gene of yeast mitochondrial DNA: In-frame insertions can explain the strain-specific protein size polymorphisms [gene conversion/(G-C) site clusters]

MICHAEL E. S. HUDSPETH*, ROBERT D. VINCENTtf, PHILIP S. PERLMANt, DEBORAH S. SHUMARD*, LAURELEE 0. TREISMAN*, AND LAWRENCE I. GROSSMAN*§ *Department of Cellular and Molecular Biology, Division of Biological Sciences, University of Michigan, Ann Arbor, MI 48109; and tMolecular, Cellular and Developmental Biology Program, Department of Genetics, Ohio State University, Columbus, OH 43210

Communicated by Horace W. Davenport, December 19, 1983

The varl locus of yeast mitochondrial DNA ABSTRACT encodes a protein of the small mitochondrial ribosome subunit, denoted varl protein. The size of varn protein was previously shown to exhibit g strain-dependent polymorphism, determined by various combinations of at least three genetic elements. We report here that the varn gene is itself polymorphic and that the six forms of this gene examined here differ by various combinations of three in-frame insertions into the coding region of the smallest allele. These insertions, which appear to be the molecular basis for the genetic elements, could increase the size of varn protein by 8, 10, 16, 24, or 26 amino acid residues, accounting for the observed protein polymorphisms. Furthermore, we have characterized three additional sources of sequence variation located outside of the coding region but within major transcripts of the var) gene.

sequence variations. These variations fall into two groups. One group consists of three short in-frame insertions into the gene for varn protein relative to the smallest allele. These insertions were found in several combinations that could increase the size of the protein by up to 26 amino acids, allowing varn to vary in size up to 7% without apparent functional consequences. These insertions appear to be the molecular basis of the allelic determinants previously defined by genetic analysis (2). The second group consists of three sequence variations outside of the coding region but within the primary transcript (ref. 10; H. P. Zassenhaus, N. C. Martin, and R. A. Butow, personal communication). Two of these latter variations are GC-rich insertions and the third is an altered restriction site. At least two of these flanking variations, however, do not affect the size of varn protein.

The varl locus of yeast mitochondrial DNA (mtDNA) has been identified by a number of alleles that affect the relative electrophoretic mobility of varn (1, 2), a protein of mitochondrial ribosomes (3-5). Although associated with particular forms of varn protein, the alleles do not have any clear effect on cell growth or respiration. Recent studies, however, describe point mutations within the var) region that interfere with mitochondrial protein synthesis (6). The mutants, unlike the alleles found in nature, do affect cell growth, indicating that, although the particular allele present is not of primary importance for normal mitochondrial function, active varn protein is necessary. A second unusual property of varn is that, in certain crosses, some progeny arise that synthesize nonparental sizes of varn protein (2, 7). The genetic data have been interpreted to be a consequence of the reassortment of two genetic deter-

MATERIALS AND METHODS Growth of cells, isolation of DNA, end labeling, and DNA sequencing and restriction mapping were carried out as described (8, 11, 12), as was DNA sequencing by a modification (8) of the method of Maxam and Gilbert (13). The sequencing strategy for the various inserts was as follows: x was sequenced from the Dde I site within the tRNAser gene; a, from Ava II and Hpa II sites within a; bi, from the Bcl I site; b2, from the first and second Mbo I sites; the variable BamHI site, from the BamHI site; and z, from the BamHI site. The x, bi, and z labelings were carried out after both digestions were complete. Labeled ends were separated by either a second restriction digestion or strand separation.

RESULTS Fine-Structure Restriction Site Map of Petite Mutant A1710 (var)[40.0]). The entire varl determinant region has been shown to be present on an 1800-base portion of the mitochondrial genome of petite A17-10 (11), the smallest available genome (ca. 5300 bases) capable of transmitting by recombination or specifying in trans the var)[40.0] allele.1 We have now extended the original restriction map of A17-10 with 10 additional enzymes and focus in Fig. la on the relevant portion, Hincll fragment 10 of the wild-type map (11). Characterization of Petite Mutant Strains with Different varl Alleles. Short insertions have been deduced to be present in the varl region (7, 11). To locate and characterize these insertions, we isolated petite mutants retaining the

minants, termed a and b, each either present or absent 'in a given strain. Furthermore, two alternative and partially overlapping forms of the b element, termed b and bp (partial b element), have been distinguished. These nonparental varn species arise with characteristic frequencies and represent the preferential conversion of smaller to larger alleles. Thus, defining the molecular basis of the varn size polymorphism should provide new insight to the efficient recombinational phenomenon present in the varn gene that formally resembles asymmetric gene conversion. It was recently shown that the var) region contains the structural gene for varn protein (8) and the complete sequences in the smallest known allele of the coding region (8) and its flanking sequences (9) have been reported. Here we report studies of the var) region in strains exhibiting six different forms of varn and the identification of a number of

$Alleles are designated as var [n] or [n], where n is the apparent size

in kilodaltons of the varl protein present in that allele. The protein specified by the [40.0] (smallest) allele has a calculated molecular weight of 46,786 (8). tPresent address: Division of Infectious Diseases, Washington University School of Medicine, St. Louis, MO 63110. §To whom reprint requests should be addressed.

The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact.

3148

Genetics:

Proc. NatL. Acad. Sci. USA 81 (1984)

I

0

//R

tRNA

0!;1 5

Hudspeth et aL

w

w

2

tants carrying the alleles varl[40.0], [42.0], and [42.2].

1kb

The nucleotide sequence of the second cluster was determined in mtDNA from petite mutant 1-20 (varn[41.8]) (Fig. 2b). As suggested by the restriction analysis, that cluster contains an exact inverted repeat of the 46-base GC-rich sequence (8); in addition, it contains two flanking adenosine residues, to yield a total insert of 48 bases. This insert begins after nucleotide 390 from the start of the reading frame of the [40.0] allele; it preserves the reading frame and predicts its lengthening by 16 codons, consistent with the observed 1.8kilodalton increase in size of varn protein (W. M. Ainley, P. Hensley, and R. A. Butow, personal communication). Based on an absolute correlation in both parental and recombinant strains, we conclude that this GC-rich insert is the a element (Table 1). A similar GC-cluster was reported outside the varl region between the olil and tRNASer genes in strain D273-10B (14, 17) that is absent in most of our strains. Although those workers did not provide the sequence of the 7 bases between the two Hpa II sites, the 39 bases reported are identical to, and in the same orientation as, the common GC-cluster in varl. On analyzing the flanking sequences at each location, we found in all cases the same sequence of 20 A+T residues (Fig. 2c). However, the orientations of the GC-rich clusters relative to the 20 bases of flanking sequences are not identical in all three cases. The sequences flanking both the common varl cluster and the a insert are identical and of opposite orientation to the olil flanking sequences. These polarity relationships are summarized in Fig. 2d. We note that the olil and a clusters can represent perfect inverted repeats of 68 bases separated by 1571 bases. The b and bp Elements Are AT-Rich Inserts. There are two additional small insertions present only in strains designated b+ or bK (varl[42.0], [42.2], [43.8], and [44.0]) (Fig. la; Table 1). We have sequenced the insert labeled b2 in Fig. la in the varl alleles [42.2] and [44.0] and find it to contain the same 18-base sequence shown in Fig. 3 and previously noted to be present in [42.0] (8); we designate it b2(18). The b2(18) insert is in the reading frame of var][40.0] and extends an asparagine cluster in the varl gene by six additional asparagine residues (Fig. 3). In contrast, the insert between Hha I and Bcl I (Fig. la) has been observed in two forms in strains containing b elements. We found the shorter form in bp strain 3-8 and note that it is also present in bp strain DS401 (14). Like b2(18), it is inserted into the reading frame of vari[40.0]; it extends a cluster of eight asparagines by two (Fig. 3). When we sequenced the same region in strain 3-26 (var [42.2]), we found a longer form of this insert that extends the original cluster of eight by an additional four asparagine residues. The coding increments of these insertions, which we designate bJ(6) and bJ(12), are equivalent to the size increase observed for the

a

///

e. GUC QP/ y [M 19S w1A9 16S

b

e DS401 (42.0)

A17-10(40.0).3-26(42.2), 1-20 (41.8).1-46(44.0) 3-10 (43.8). 2-33 (44.0). 3-8 (42.0) FIG. 1. The varl region and retained portions of petite mutants used. (a) Restriction map and summary of the varl region in petite mutant A17-10. The two HinclI sites define fragment 10 of the wildtype map (11). The tRNAU'CN gene contains the HincIl site at the left edge of HincIl fragment 10 (14). ORF, reading frame for varn protein (8). The boxes represent insertions found in various alleles; the unlabeled box near z represents the optional BamHI site. The wavy lines represent the two largest transcripts of this region (ref. 10; H. P. Zassenhaus, N. C. Martin, and R. A. Butow, personal communication). (b) Portions of the wild-type genome retained in petite strains transmitting the varl alleles reported here. The rightmost end, determined by restriction mapping, is shown for each indicated allele with reference to the map in a. The wavy vertical lines indicate petite mutants that extend past the map region shown.

varl region from strains having the var [41.8], [42.0], [42.2], [43.8], and [44.0] alleles. Each petite used (Table 1) was shown by mtDNA restriction analysis to retain the varl region (Fig. lb) and by other tests to contain and be able to express in trans all the genetic elements necessary for determining its allelic form. We constructed fine-structure maps of each petite genome with the same set of restriction enzymes used with mtDNA from petite A17-10. Fragments that differed (Fig. la, boxes) when compared with A17-10 were analyzed further by nucleotide sequencing.

Insertions Within the Reading Frame The a Element Is a GC-Rich Insert. Strains expressing varl alleles [41.8], [43.8], and [44.0] contain a second Hha I site located 230 bases from the (common) Hha I site present in all strains. Since the common Hha I site is part of a GC-rich site cluster also containing Sst II, Hpa II, and Ava II sites (Fig. la), we analyzed the above genomes for the presence of the entire cluster. Our results, summarized in Fig. 2a, show that the second Hha I site is part of a second GC-rich site cluster similar to the common one (cf. also ref. 16) but containing the sites in reverse order. All of the sites present in the second site cluster were absent from mtDNA of the petite muTable 1. Genetic and physical variation among varl alleles Allele* Petite Genetic Wild-type strain strain determinant a

b

bp

Sequence variation

Bar oliit

a

bl(6)

bi(12)

b2

site

x

z

A10

-/+

-

-/+

F

+

-

-/+

-

+

+

-

A17-10 + + 1-20 DS401 D273-1OB + + + + NP1-1B 3-8 Is + + 3-26 + 42.2 + + + + + + 3-10 43.8 D6 + + + + + 5DSS 44.0 2-33, 1-46 This table contains data described in this paper, as well as for additional wild-type strains. +, Present; strains and absent from others. *The size estimates used to designate the varl alleles are those used originally in ref. 7. tRefers to the GC cluster near oli].

40.0 41.8 42.0

3149

-

+

-/+ -/+ -

+

-

+

-, absent,

No. of strains 8 3

7 4

1 2

+/-, present in some

3150

Genetics:

Proc. Natl. Acad. Sci. USA 81 (1984)

Hudspeth et aL

0

I

MboI BamHI HinIE HinII HpoU HinfI HhoI BcII HpaUI

3

2

L

L. kb

HoelIf

a

Hin R HpaIl HinfI HhaI BclI Mbo/Bam HI Hinll Io

-692 -446

CI~el,-

A

-

O/if

I I

Allele Genotype

11

asn pro ala gly pro ile arg trp thr thr gly leu ala pro ala gly A4C CCC GCG GGA CCA ATC CGG TGA ACA ACC GGATTG GCG CCC GCG GGG

C

A A TA AT A TA A A

40.0

bl

42.0

bp

ile asn asn asn asn asn ATT AAT AAT AAT AAT AAT 495 478

42.2

bl

ile asn asn asn asn asn asn asn asn asn asn asn asn tyr ATT AAT AAT AAT AAT AAT AAT AAT AAT AAT AAT AAT AAT TAT 478

40.0

b2-

GC

FIG. 2. The a element and its flanking sequences. (a) Restriction map of the varl region showing fine-structure detail for the GC-rich cluster present in all strains (left) and the insert present only in a' strains (right). (b) Nucleotide sequence of the a element. Pairs of arrows indicate inverted repeats within the sequence. The derived amino acid sequence is also shown. (c) Flanking sequences. The 20 nucleotides flanking the lower bracket are found in varl[40.0] (a-) at the a element "acceptor" position. The a element consists of the 46-base GC cluster plus two adenosines. (d) Polarity of GC-rich clusters and their flanking sequences. The top line represents the orientations of the common GC-rich cluster (solid arrow) and the 20 bases of flanking sequence (dashed arrows) shown in c. The next two lines show the relative orientations of the a element and the GCrich cluster in the olil region (14) with the same flanking sequence.

proteins in varl [42.0] and [42.2]. Since all strains analyzed here either contain bJ(6) or bJ(12) together with b2(18), or have none of these insertions (Table 1), we conclude that the b element (2) is bJ(12) + b2(18) while the bp element of strain D273-10B is bJ(6) + b2(18).

asn asn asn asn asn tyr AAT MT AAT AAT AAT TAT 496

asn asn ile AAT AAT ATT 841

leu leu asn asn asn TTA TTA AAT AAT AAT 840

b2

leu leu asn asn asn asn asn asn asn asn asn asn asn ile TTA TTA AAT AAT AAT AAT MT AAT MT AAT AAT MT AAT ATT 826

FIG. 3. b insertions. The restriction map includes the varl coding region (thickened line). The numbers below each site are nucleotide positions shown 5' to 3' on the noncoding strand, starting with the first codon (8). For each b+ allele, the nucleotide and amino acid sequences are shown in the region that differs from b-. Numbers refer to nucleotide positions indicated on the map.

ate this recombinant, cob mutants of strains D273-10B ([42.0]) and ID416/161 ([40.0]) were crossed and three wildtype progeny were selected. One of these progeny strains was also found to be recombinant in the varl region, having var)[40.0] but lacking the BamHI site (data not shown). The z Insertion. At least some strains containing the [40.0], [41.8], [42.0], [42.2], and [44.0] alleles contain an insertion 3' from the reading frame. We examined this insertion (denoted z) in strain 1-20 (varl[41.8]) and find it to be a 28-base GCrich (89.3%) cluster starting 72 bases past the variable A Allele

[42.0] x+

Insertions Flanking the Reading Frame In addition to the coding inserts that expand the reading frame, we detected three inserts in flanking regions. Recent mapping of varl transcripts (9, 10) indicates that these flanking inserts are transcribed, with the latter two present in the 3' untranslated portion of the most abundant transcript (16S) of the varl region. The x Insertion. In the bp strain studied (Table 1) there is a 63-base insertion starting 178 bp 3' from the end of the tRNAser gene. We determined the sequence of this insertion (denoted x) by comparing the published sequence for a varl[42.0] (14) strain with our sequence of varl[40.0] (Fig. 4A). It is GC-rich (71%) but unrelated to the common site cluster and lacking internal symmetry. Since it is not present in the 16S RNA, presumably the varl mRNA, it is unlikely to play a role in protein size determination. The Variable BamHI Site. Strains containing the varl [42.0] and [43.8] alleles lack the single BamHI site located beyond the 3' end of the varl reading frame and instead contain a new HinfI site generated (Fig. 4B) by a single base change. A second base change is also present 5 bases away. We have obtained a recombinant that shows that the BamHI site variation is not an allelic determinant. To gener-

asn asn asn asn asn tyr AAT AAT AAT AAT MT TAT 490

826

[GC-cluster] A A T A T T A A T A A

a- a-

Sequence

ile asn asn asn ATT AAT AAT AAT 489 478

42.0 42.2 44.0

d

752 1207 2086 975 1507

1 198 87

[40.0]x-

Avai _PvuI ShyI CCCCGCGGGGCGGACCCCGAAGGAGTCCGACTG ThaI

FGAAGGAGTGCGAGACCCCATGGGAACCCCTn GTCCGGTCCGACCC TTTTTATTCTT 450

Hpa I

Avail

B Allele Bam HI

C,

40.0

CTTTTGGATCCTATTT

42.0

CTTTGGGATTCTATTT Hinf I

Alllo

[41.8] z+

.

[40,0] z

HaeM Hpai

AvaII

CCCTCGGGTCCGCCCCGCAGGGGCCGGC f

ATATATTACTATTATAATTATT 1572

FIG. 4. Sequence variation outside the varl coding region. (A) The x insertion. The region of varl [40.0] shown is numbered back to the first base of the coding region. (B) The variable BamHI site. The sequence in the region of the BamHI site is compared for the [40.0] and [42.0] alleles. Starred bases differ. (C) The z insertion. The acceptor sequence in varl [40.0] is numbered from the first base of the coding region (8, 9). The adenosine at position 1582 in [40.0] is absent in the z+ allele.

Genetics:

Proc. Natl. Acad. Sci. USA 81 (1984)

Hudspeth et aL

BamHI recognition site (Figs. la and 4C). Comparison of the z region in strain 1-20 with that in A17-10 (varl[40.0], z-) shows that, in addition to the insertion, the first base following the insertion in the z- strain is absent in strain 1-20 (Fig. 4C). The z sequence is unrelated to either the GC-rich common site cluster in varn or to x. Indeed, a search of published yeast mitochondrial sequences indicates that z is unique. Interestingly, the subsequence G-G-G-T-C-C-G-C-C-C-C-G-C is found four times (and a fifth as the 12-mer starting at position 2) in the vicinity of coding sequences. Despite the lack of apparent relationship between z and the common GCcluster, z is flanked on its 3' side by the same 11 bases that flank the GC-cluster near olil (Fig. 2d); the 5' flanking sequence is unrelated. z is not an allele-specific determinant since strains carrying the [40.0], [41.8], [42.0], and [42.2] alleles have been observed with z optionally present (Table 1).

DISCUSSION We have analyzed six alleles of the varn locus on yeast mtDNA, each of which produces an electrophoretically distinguishable form of varn protein. We find a total of six se-

variations among the alleles. Three of the variations within the reading frame of the gene, and they consist of in-frame insertions relative to the smallest allele. The insertions, in various combinations, can increase the protein size by 8, 10, 16, 24, or 26 amino acids, accounting for the protein polymorphisms addressed in this study (Table 1). The remaining three sequence variations are outside the reading frame of the varn gene but within its major transcripts. The in-frame insertions are unusual. Most insertion events previously studied are transpositions that lead to loss of gene function. By contrast, varn alleles retain full function and are indistinguishable in terms of growth and respiratory capacity despite the approximately 7% change in size between the smallest and largest allelic products. The varn insertions are reflected in the protein products, based on their decreased electrophoretic mobilities (2, 7) and on direct analysis of varn proteins (W. M. Ainley, P. Hensley, and R. A. Butow, personal communication). Variation in gene length has been observed in other instances. In yeast mitochondria the 15S rRNA gene contains a 40-base GC-rich sequence in the central part of the gene in some strains but not in others (15); when present, it is part of the 15S rRNA transcript. The large (21S) rRNA gene can contain, in addition to the optional intron (18), 66-base GCrich insert in wc+ strains that is also transcribed when present (19). Nuclear examples are the silk fibroin gene (20, 21), in which both gene and product are polymorphic in size in different inbred stocks of Bombyx mori, and sgs4, a glue polypeptide gene in Drosophila melanogaster, which varies in size in different strains by the number of tandemly repeated units it contains in-frame of a 21-base sequence (22, 23). Correlation Between Physical Mapping and Genetic Studies in var). The physical basis for the genetic elements reported here (Table 1) confirms and extends the model presented by Butow and co-workers (2, 7) based on genetic analyses. All a+ strains contain a GC-rich cluster absent from a- strains. All b+ and bp strains have two inserts absent from all bstrains. Strains that are b+ or bp have one of these inserts in common (b2(18)) and differ in the size of the other. In-Frame Insertions in var). The mechanism by which the a and b elements are transferred in a cross from a genome containing them to a genome lacking them is unknown. The a element is shown here to be a 48-base GC-rich cluster that contains a 46-base inverted repeat of a region starting 204 bases upstream from it on the coding strand and present in all strains. In addition, the 20 bases surrounding these clusters are the same. Both this GC-cluster and its flanking quence

are

w

3151

AT-rich sequences are present near varl (between olil and tRNASerN) at least one more time in some strains. Previous

genetic analysis has shown that the a element behaves as a unit in crosses (2, 7). Moreover, in a+ x a- crosses, the a+ allele is preferentially recovered compared with other markers (7). Thus, the recombination behavior of the a element resembles asymmetric gene conversion. The preferential conversion of the short (intronless) form of the yeast mitochondrial large rRNA gene to the long form, which contains a 1.1-kilobase intron, is formally an analogous event to the a insertion (18, 24-26). However, the mechanistic similarities of the two events are yet to be determined. The events at the 21S rRNA gene involve and were first detected by associated flanking marker effects (co-conversion of flanking point mutations on both sides of the insertion) (27). Although the only available marker flanking the a element is the bi insertion starting 91 bases away, the recombinant type a'bJ(J2)+b2(18)- was never observed in crosses of the type a+b+ x a-b-. An alternative interpretation of the genetic and physical data is that the preferential conversion of a- to a+ results from a transposition (28) of the 46-base common GC-rich cluster, with attendant duplication of two additional adenosine residues in the region that receives the a element. However, although transposition may well explain the original formation of an a+ allele, it would require special assumptions compared with gene conversion to explain why a+ alleles are never seen in a- X a- crosses. Whereas the a element is either present or absent, the behavior of the b element is more complex. It consists physically of two DNA sequences found about 350 bases apart (Figs. la and 3). Despite this physical separation of its parts, the b element behaves as a unit most of the time (2, 7), and it is transmitted preferentially in crosses of the form a-b- x a+b+, albeit at 1/15th the frequency of the a element. These events involving the b element are called preferential gene conversion rather than crossing-over [between a and bJ(12)] because the two recombination types (a+b- and a-b+) arise, even without selection, with different frequencies; in addition, in the cross a+b- x a-b+, the recombinant type a-bthat would arise by a simple crossover between a and bJ(12) was not found. Strausberg et al. (2) also noted more infrequent cases (