Sequence-specific recognition of DNA by zinc-finger ... - Europe PMC

3 downloads 0 Views 1MB Size Report
Contributed by Thomas A. Steitz, July 6, 1992. ABSTRACT. We have ...... Studier, F. W., Rosenberg, A. H., Dunn, J. J. & Dubendorff,. J. W. (1990) Methods ...
Proc. Nati. Acad. Sci. USA Vol. 89, pp. 9759-9763, October 1992 Biochemistry

Sequence-specific recognition of DNA by zinc-finger peptides derived from the transcription factor Spl (mobility shift/primer extension/GC box/binding isotherm/competition assay)

RICHARD W. KRIWACKIt, STEVE C. SCHULTZU§, THOMAS A. STEITZt, AND JOHN P. CARADONNAt¶ Departments of tChemistry, and $Molecular Biophysics and Biochemistry and the Howard Hughes Medical Institute, Yale University, P.O. Box 6666, New Haven, CT 06511

Contributed by Thomas A. Steitz, July 6, 1992

ABSTRACT We have overexpressed and purified two peptide fragments of Spl that contain the three "zinc-finger'" domains necessary for specific Spl DNA binding. These peptides assume a stable, folded conformation in solution in the presence of Zn2+ as shown by DNA binding assays and NMR spectroscopy. Mobility-shift assays demonstrate that the Spl peptides recognize a number of different Spl DNA binding sites (GC boxes, with the core sequence GGGCGG). The dissociation constant for a 92-amino acid peptide binding to the GGGGCGGGGC sequence (Kd 10 nM) and the relative affinities for several other DNA sequences definitively demonstrate Spl-like binding properties. The thermodynamic binding site for Spl-Zn92 has been mapped using the primerextension/mobility-shift assay revealing that the 5' portion of the GC box DNA sequence (GGG GCG) contributes more strongly to the total binding energy than the 3' portion (GGGC). These rindings are interpreted in the context of the Spl amino acid sequence in comparison with the structurally characterized Zif-268/DNA complex. A model is proposed that offers a structural explanation for the ability of Spl to recognize a diverse array of DNA sequences in terms ofthe individual (and different) DNA binding properties of each of the three zinc-finger domains.

niques based on gel electrophoresis to demonstrate that these Spl peptides recognize the same DNA sequences as wildtype Spl. Considering the three zinc-fingers as individual DNA binding domains and, analogously, considering the Spl binding site to be composed of three individual subsites [as first proposed for Zif-268 by Nardelli et al. (9)], our electrophoresis studies reveal that the binding affinities associated with the three, pairwise zinc-finger/DNA subsite interactions are not equal, with the affinity of the second and third fingers binding to their respective DNA subsites within the 5' portion of the GC box significantly greater than that for the first Spl finger binding to its DNA subsite within the 3' portion of the GC box. These findings are shown to be consistent with (i) the high conservation of the GGG GCG motif at the 5' end of GC box sequences, and (ii) the relatively low conservation within the 3' end of these sequences. We compare the Spl peptide sequence with that of Zif-268 (10) and propose a model for the binding of Spl to the consensus GC box.

MATERIALS AND METHODS Expression and Purification of Spl Peptides. Several zincfinger-containing Spl peptides of different length were overexpressed during the course of studies designed to determine the minimal peptide length required for Spl-like DNA binding; we present here results obtained with two of these peptides. Standard cloning techniques (11-14) were used to construct expression vectors that code for residues 521-639 and residues 533-623 of Spl (pSpl-Znl21 and pSpl-Zn92, respectively). The amino acid sequences for Spl-Znl21 and Spl-Zn92 are given in Table 1. These peptides were overexpressed in Escherichia coli strain BL21(DE3) (13) using standard protocols (13, 14). Purification procedures were performed at 4°C unless otherwise noted. E. coli cells (50 g) in which either Spl-Znl2l or Spl-Zn92 was overexpressed were resuspended and lysed in buffer L [50 mM Hepes, pH 8.0/50 mM NaCI/10 mM dithiothreitol (DTT)/1 mM EDTA/1 mM phenylmethylsulfonyl fluoride]. Solid urea was added to the supernatant to 5 M. The peptides were isolated using cation-exchange chromatography (S Sepharose and Mono S, Pharmacia) with buffers composed of 5 M urea/50 mM Hepes, pH 8.0/10 mM DTT/1 mM EDTA using an NaCl gradient. Pooled fractions were reduced at 70°C for 30 min (100 mM DTT) and purified by reverse-phase HPLC (C4 resin, Vydac, Hesperia, CA) using solutions composed of H20/0.1% trifluoroacetic acid (TFA) and CH3CN/0.1% TFA. The peptides were lyophi-

The "zinc-finger" motif (1, 2) has been identified within the amino acid sequences of a large number of DNA binding proteins, including a wide variety of eukaryotic transcription factors in which this motif mediates sequence-specific recognition of DNA. Transcription factor Spl (3), which contains three contiguous Cys2His2 zinc-finger domains, binds to some, but not all, DNA sequences that contain the asymmetric GGGCGG hexanucleotide core (GC box) within a large number of cellular and viral promoters. Early studies demonstrated that Spl-responsive promoters typically contain multiple binding sites of different DNA sequence and Spl binding affinity (4-6). A survey of high-affinity Spl DNA binding sites (Kd 10-9 M) reveals that a large number of substitutions within the generalized Spl GC box sequence are tolerated while high binding affinity is maintained (7, 8). These observations raise questions about the molecular basis of the Spl/DNA recognition process and the role played by multiple Spl binding sites of different affinity with regard to the enhancement of transcriptional activation. We are interested in understanding the molecular details of DNA recognition by Spl and with this aim have developed a system for studying the underlying intermolecular interactions at the structural level. In this report we describe the overexpression, purification, and characterization of the DNA binding properties of two peptides, 121 and 92 amino acids in length, that contain the three zinc-fingers of Spl. We have employed various tech-

Abbreviations: SV40, simian virus 40; HIV, human immunodeficiency virus; Mt, metallothionein. §Present address: Department of Chemistry and Biochemistry, University of Colorado, Boulder, CO 80309. ITo whom reprint requests should be addressed.

The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact. 9759

9760

Biochemistry: Kriwacki et al.

Proc. Natl. Acad. Sci. USA 89 (1992)

Table 1. Amino acid sequences for Spl-Znl21 and Spl-Zn92 (Spl-Zn92 is in bold type, between brackets) in the N-terminal to C-terminal direction MKDSEGRGSGDPG

[ (M)

IC

30

20

10

KQHICHIQGCGNKYGETSiaLRAHLRWHTGE 50

40

60

RPFMCTWSYCGKRFTESDELQRHKRTHTGE 70

80

90

KKFAC - -PECPKRFMRSDBLSrHIKTHQNK K] GGPGVALSVGTLPLDS

Key residues discussed in the text are underlined, and the Cis2His2 residues are in large, bold type. Numbering is for Spl-Zn92.

lized and stored in an argon atmosphere (220C). Isolated yield =10 mg per 1 g of wet cells. Gel Mobiity-Shift, DNA Binding Assay. See refs. 15 and 16. Six-site oligonucleotide. The plasmid pRVS35' (ref. 17; M. Biggin, personal communication) was digested with Pst I and Xho I to generate an =80-base-pair (bp) fragment containing three tandem repeats of the GC box; the fragment was gel purified and 3' end-labeled. Binding reaction mixtures contained the 3' 32P-end-labeled DNA fragment (4100 pM), 10 mM Tris (pH 8.0), 50 mM NaCl, 100 ,uM ZnSO4, and 200 ng of poly(dA/dT) per tul (Pharmacia). Spl-Znl21 was added last, followed by equilibration (220C, 20 min) and electrophoresis in 6% polyacrylamide gels (75 polyacrylamide:1 bisacrylamide) preequilibrated in 1:2 Tris/borate. Additional binding reaction mixtures included fixed Spl-Znl21 concentration ('4 ,uM) and increasing concentrations of two unlabeled competitor oligonucleotides: Mt IIa-1/2 (Mt) and CAP30-1/2 (CAP) (Table 2). Single-site oligonucleotides. The G-rich strands for the 20-bp single GC box oligonucleotide series were 5' endlabeled and annealed with their complementary strands. Binding reactions (as above, including 0.5% Nonidet-P40) with these 20-mers (4-6 nM) were performed with (i) increasing Spl-Zn92 concentration and (ii) fixed Spl-Zn92 concentration and increasing concentrations of unlabeled 20-mers. Additional binding reaction mixtures included 10 mM EDTA to remove Zn2+. Samples were then applied to 10% polywas

Table 2. Oligonucleotides used in this work Mobility-Shift Assays. Sine-site. Mt IIa-1/2

GCCGGGGCGGGGCTTCTGCA

HIV-LTR III-1/2

GCCGAGOCGTGGCTTCTGCA

SV40 III-1/2

GCCTGGGCGGAGTTTCTGCA

Single-Site Mutants. Mt -T3

GCCGGTGCGGGCTTCTGCA

Mt -T4

Non-Specific-Site. CAP-30-1/2

CAATTAATGTGAGTTAGCTCACTCATTAGG

Primer-extension I Moblity-Shift DHFR-44a DHFR-12ap

Assay

CGCGGCGGGCCTTGGTGGGGCGGGCCTAAGCTGCGCAAGTGG

DHFR-44b

GCGCCGCCCGGAACCACCCCCGCCCCGGATTCGACGCGTTCACC

Top strands are listed 5' -* 3', bottom strands are listed 3' -* 5', and complementary strands are not listed.

acrylamide gels (75:1) and electrophoresed as above. The dried gels were quantitated using a Betascope 603 blot analyzer (Betagen, Waltham, MA). The fraction of labeled DNA bound by Spl-Zn92 (0b) was calculated using the equation Ob = Ib/(Ib + If), where lb is the intensity of the Spl-Zn92-bound DNA band and If is the intensity of the free DNA band. The values of Ob for protein titrations with the oligonucleotides metallothionein [GC box within the human Mt Ila promoter (6)], human immunodeficiency virus long terminal repeat [HIV-LTR, site 3 (7)], and simian virus 40 [SV40 early promoter, site 3 (6)] (Table 2) without competitors were fit (KaleidaGraph program, Abelbeck Software) using the binding isotherm equation: (Ab = [A - (A2 - B)1/2]/(2 X K. x [Sp1-Zn92]j), where A = [Ka([*DNA]t + [Spl-Zn92]t) + 1] and B = 4 x (K.)2 X [*DNA]t x [Spl-Zn92]t (18). The values of Ob for experiments in which Spl-Zn92 concentration was fixed and the concentration of competitor oligonucleotides was varied were analyzed using the equation: *Ob = [A - (A2 - B)1/2]/(2 x [*DNA]t), where A = [Ka + (r x [cDNA]t + [Spl-Zn92]t + [*DNA]f) + 1], B = 4 x [*DNA]t x [Spl-Zn92]t, and r = KU1KS (18). In these equations Ka and K' are the equilibrium constants for Spl-Zn92 binding to the 32P-labeled and unlabeled competitor oligonucleotides, [*DNA]t and [CDNA]t are the total concentrations of 32P-labeled and unlabeled oligonucleotides, and [Spl-Zn92]t is the total concentration of Spl-Zn92 peptide. Primer-Extension/Mobility-Shift Assay. The primerextension/mobility-shift assay was performed as reported by Liu-Johnson et al. (19) and Gartenberg et al. (20) with only minor modifications. The oligonucleotides (Table 2) contain a single GC box flanked at either end by 17 bp from the dhfr promoter, site I (7). The Spl-Zn92 bound and free oligonucleotides were separated in 10%o polyacrylamide gels (75:1). Isolated DNA fractions from bound and free bands were separated by electrophoresis in 201% polyacrylamide (20:1)/7 M urea gels. Dried gels were quantitated as before.

RESULTS DNA Binding Properties of Spl Peptides. Initial mobilityshift DNA binding experiments using the 32P-labeled six-site oligonucleotide showed a ladder of up to six Spl-Znl2lbound bands, indicating that Spl-Znl21 binds specifically to the six GC box sites within the DNA fiagment. The specificity of binding was further challenged by competition experiments in which comparable amounts of two unlabeled oligonucleotides, (i) a 20-mer containing a single GC box (Mt) or (ii) a 30-mer lacking a GC box (CAP), were added to the binding reactions (Fig. 1). The fraction of labeled six-site oligonucleotide bound at a fixed concentration of Spl-Znl21 in these experiments is reduced by Mt and is unaffected by CAP. The Zn2+ dependence of DNA binding was established by mobility-shift DNA binding experiments using labeled Mt in the presence and absence of Zn2+; binding is observed in the presence of Zn2+ and is eliminated in the presence of 10 mM EDTA (data not shown). Additionally, preliminary NMR studies show that Spl-Zn92, in the presence of a stoichiometric amount of Zn2+, assumes a stable, folded conformation and, further, that this conformation is abolished in the absence of Zn2+ (data not shown). Mobility-shift experiments performed in the absence of competitor DNA [poly(dA/dT)] indicate that Kd ' 10 nM. This value compares well with results reported by Harrington et al. (21) for native Spl (Kd 1-3 nM), demonstrating that Spl-Zn92 contains all those residues essential for tight binding. Variation in DNA affinity among members of the tight binding class was surveyed in a series of DNA binding experiments using three duplex oligonucleotides containing different GC-box binding sites (Mt, HIV, and SV40) nested

Biochemistry: Kriwacki et al. b comp. wi CAP

a comp. w/ Mt :E Z ID

C) L

O

_;

CN

Iv

O

N~

Proc. Natl. Acad. Sci. USA 89 (1992)

:g z

_I

x

X:

c:

r-

>

2x>

m

z

:)

:

:

O O ? S0 UC

O

m1

_,

H

N

[competitor DNA]

tN

_- wells

Spl -Zn 1 21 bound DNA; # of molecules bound "-

..N-

2

free DNA

-

Table 3. Relative Ka values determined using mobility-shift assay in the presence of oligonucleotide competition

Labeled site Mt HIV SV40

Mt 1.0 1.0 3.4

Unlabeled competitor site, relative Ka value HIV SV40 Mt-T3 Mt-T4 0.8 0.2 0.1 0.2 1.0 0.2 2.4 1.0

6 5

_- 4 _- 3 -

FIG. 1. Results of mobility-shift assay using 32P-labeled six-site oligonucleotide in the presence of (a) unlabeled Mt competitor DNA and (b) unlabeled CAP DNA.

within identical flanking sequences (Table 2). The relative Ka values for Spl-Zn92 binding to each of the three sites were obtained from mobility-shift competition studies using labeled and unlabeled oligonucleotides (Fig. 2). These experiments were performed for all nine combinations of pairs of labeled and unlabeled 20-mers; the relative values for Ka (Table 3) show that the affinity of Spl-Zn92 for the Mt and HIV sites is comparable and that the affinity for the SV40 site is about 3- to 5-fold lower. Additionally, the ability of Spl-Zn92 to discriminate between the Mt Ila site and two single-base mutants of this site was determined in competition mobility-shift assays using labeled Mt and unlabeled Mt, Mt-T3, and Mt-T4 (Table 2). The Mt-T3 and Mt-T4 variants involve G [conserved among high-affinity Spl binding sites (7, 22)] to T nucleotide changes at positions 3 and 4 of the GGfi GCG GGGC Mt Iha sequence. As expected, Ka is significantly reduced for the mutant sites: 10-fold and 5-fold for Mt-T3 and Mt-T4, respectively (Table 3). Oligonucleotide Length Required for Maximal Binding. We have used the primer-extension/mobility-shift assay (19, 20) to determine the limits of the thermodynamic DNA binding site for Spl-Zn92 using 44-bp oligonucleotides containing a single high-affinity GC-box derived from the mouse dhfr promoter (Table 2). This experiment yields the dependence of relative K. values on DNA length as an oligonucleotide is lengthened in one-base increments through the binding site to the terminus of the template strand. A plot of relative K. values as a function of oligonucleotide length for extension through the GC box in the two directions (Fig. 3) reveals that,

at the 5' end of the GC box, five or six residues beyond the generalized "consensus" binding site are required to achieve full binding affinity. For the 3' end of the binding site a different result is observed; only the GGG GCG portion of the binding site is required for moderate affinity and the affinity increases as bases are added through the GGGC portion of the site. This result does not depend on the fraction of DNA bound by Spl-Zn92 as controlled by variation of the peptide concentration within the range "200 nM-2.00 ,uM (data not shown).

DISCUSSION The Spl Peptides Mimic Wild-Type Spl DNA Binding Properties. We have overexpressed and purified two peptide fragments of Spl that contain the three zinc-finger domains necessary for specific Spl DNA binding in quantities sufficient for structural studies. Mobility-shift experiments show that both Spl peptides bind with specificity and in a zincdependent manner to DNA sequences containing GC boxes. In the case of Spl-Znl2l, the mobility-shift assay with the six-site fragment reveals from one to six shifted bands corresponding to the binding of one to six peptide molecules per DNA fragment. This band pattern is altered or eliminated by competitor DNA containing a single GC box but not by nonspecific DNA. Under the experimental binding conditions, there is no evidence for cooperative binding of SplZnl21 to adjacent GC box sequences. The lack of cooperativity was also reported for the binding ofintact Spl with the 21-bp repeat elements of the SV40 promoter that contains a 0

4-' -

0

C

0 g0

0.8

IF

U) 0 .0 0.4 0

11....:

A

A

A

A

\.. .

1[

't

c 0.2 0 0

co 0.0 1 0D-9 M

".

1i

M

10'5M

[unlabeled competitor DNA] FIG. 2. Competition mobility-shift assay using 32P-labeled Mt in competition with unlabeled Mt, HIV, and SV40 DNA. The experimental data are given as discreet points [Mt (n), HIV (e), SV40 (.), CAP (A)] and the theoretical data are given as continuous curves [Mt (solid curve), HIV (dashed curve), and SV40 (dotted curve)].

p

* Ix

IIi

10

0

b

1.0 r

z a 0.6 F

1.0 0.8 0.6 0.4 0.2 0.0

0 *0

9761

-ri

1.0 0.8 0.6 0.4 0.2 0.0

20

30

40

.,

GC-box

.la

\ :.

TG GTGQ222C222QCTAA

12

16

20

24

28

32

position within 44-mer sequence FIG. 3. Primer-extension/mobility-shift assay giving the dependence of relative K. values on oligonucleotide length. Two separate experiments are represented: extension from annealed oligonucleotides DHFR-44a/DHFR-12ap in the 3' -* 5' direction (with respect to the G-rich strand of the GC box) (e) and extension from annealed oligonucleotides DHFR-44b/DHFR-12bp in the 5' -*3' direction (o). (a) Extension over the entire length of the template 44-mers. (b) Expansion of the region in a contained within the dashed box. The GC box sequence in b positions the binding site along the x axis.

9762

Biochemistry: Kriwacki et al.

three tandem copies of the GC box (6). Using single-site oligonucleotides, binding to Spl-Zn92 is reduced or eliminated by GC-box-containing competitor oligonucleotides but not by nonspecific competitor DNA. These results clearly show that for both Spl-peptides only oligonucleotides containing an Spl binding site are effective competitors and directly demonstrate specificity for the GGG GCG GGGC binding site. To further demonstrate the fidelity of DNA sequence recognition for our peptides, we have examined the binding of Spl-Zn92 to a series ofwild-type "high-affinity" GC boxes from the human Mt Iha, HIV long terminal repeat, and SV40 promoters, as well as binding to two mutants of the Mt Ila site. The affinity of Spl-Zn92 for the three high-affinity sites can be ranked, according to our results, in the order Mt HIV > SV40 (Table 3). The series of sites studied here exemplifies the characteristic DNA sequence diversity among Spl binding sites; examination of "high"- to "lowaffinity" Spl binding sites (Table 4) shows that a number of single and multiple substitutions are tolerated within the 10-bp site, yielding the consensus sequence: G(T)GG GCG GG(A)G(A)C(T) (7, 22). The HIV and SV40 sequences represent those sites with the greatest number of substitutions with respect to the Mt Iha G-rich site. The fact that these substitutions are tolerated in our experiments strongly suggests that the mode of DNA recognition for the Spl-Zn92 fragment mimics that for wild-type Spl. The third and fourth positions within the GC-box are conserved as G residues among high-affinity Spl binding sites (Table 4; ref. 22) and we expected that mutation of these positions to T residues within the Mt Iha sequence would diminish Spl-Zn92 binding affinity. The competition experiments with these mutants indicate that Spl-Zn92 has lower affinity for the mutant sites and that the decreases in relative Ka values (Table 3) correspond to decreases in AG of binding of 1.5 to 1.1 kcal/mol (1 kcal = 4.18 kJ) (for Mt-T3 and Mt-T4, respectively). In the context of the intermolecular interactions that exist between the peptide and DNA, these differences could correspond to the loss of one or two hydrogen bonds between Spl-Zn92 and the highly conserved G residues that are found in the wild-type sequence. By showing Table 4. Spl DNA binding sites High affinity sites: GGG GCG GGGC

TGG GCG GGGC GGGGOCG G&G. GGG GCG G&GC GGG G&G TGGC

XGG GCG GG&C GAG GCG JGGC

HSV IE-3 (V), DHFR (I,III),

MT-IIA, CH-TK INTRON HSV IE-3 (III,IV)

SV40 (III,V) (II,IV) HIV-LTR (I) HIV-LTR (II) HIV-LTR (III) DHFR

Medium affinity sites: GGG GCG GGG0

HSV IE-3 (I)

GGG GCG GGG0

HSV IE-3 (II)

XGG GCG GGGT XGG GCG GA0C

HSV TK (II)

GGG GCG GGAT

SV40 (II) SV40 (IV)

GGG GCG GG&C

SV40 (VI) Lo affinity sites: GGG GCG GAGA SV40 (I) GGG

GCG GQGQ

See refs. 7 and 8.

HSV-TK (I)

Proc. Natl. Acad. Sci. USA 89 (1992)

that binding affinity is reduced for Spl-Zn92 with DNA sequences that are not Spl binding sites, these results further demonstrate that the mode of DNA binding for Spl-Zn92 and Spl is very similar. Delineation of the GC Box Binding Site Within the dhfr Promoter Sequence. The primer-extension/mobility-shift experiments provide direct evidence that Spl-Zn92 recognizes the dhfr promoter, site I GC box and suggest that the 5' portion of the binding site contributes more significantly to the binding energy of this protein/DNA interaction than the 3' portion. The results show that for extension in the 3' -* 5' direction (Fig. 3, closed figures) the complete binding site plus three bases is required before appreciable binding to Spl-Zn92 (relative Ka > 0.2) is observed, whereas for extension in the 5' -- 3' direction (Fig. 3, open figures), appreciable binding is observed after the addition ofthe GGG GCG G portion of the binding site. Alternatively, Spl-Zn92 may bind to one of two other potential binding sites generated by shifting the 10-base GC box "frame" either one or two bases in the 5' direction (with respect to the G-rich strand) giving rise to the sequences GGG GGjC GGGG and TGG GGG CGGfi, respectively. These alternate sites, however, diverge at several key positions from the consensus sequence G(T)GG GCG GG(A)G(A)C(T) (mismatches indicated above) and do not provide properly spaced recognition elements for Spl binding and, therefore, are ruled out on this basis. The requirement at the 5' end of the GC box for several nucleotides beyond the end of the consensus sequence for maximal binding affinity may be due to (i) the existence of non-sequence-specific peptide/DNA interactions or (ii) the effect of changing local DNA structure on peptide binding affinity as the DNA is lengthened in the 5' direction. The designation of possible peptide/DNA contacts outside the consensus DNA binding site as non-sequence-specific in nature is supported by a statistical analysis of all known Spl binding sites (22), which fails to show significant sequence preference outside the decanucleotide consensus site. Our current experiments, however, do not allow us to distinguish between or confirm these hypotheses. Comparison of Spl and Zif-268 Binding to DNA. The high-resolution crystal structure of the Zif-268/DNA complex (10) provides exquisite insight into the recognition of DNA by the Cys2His2 class of zinc-finger proteins. The DNA binding sites for Zif-268 (GcG GGG GcG = A B A) (23) and for Spl (GGG GcG GGGc = B A B) are closely related and can be thought of as rearrangements of one another (A B A versus B A B). An analysis of the Spl sequence, considering those residues involved in DNA binding in the Zif-268/DNA structure, reveals striking similarities between Zif and Spl that parallel those within the DNA binding sites (Fig. 4). The first and third fingers of Zif contact the third and first GcG triplets, respectively (5' -- 3' direction), within the Zif DNA binding site, and a pair of Arg residues emanating from the a-helix of each finger is responsible for sequence-specific recognition of guanine (Arg-18 and Arg-24 of finger 1; Arg-74 and Arg-80 of finger 3). The second finger of Zif contacts the central GGG triplet within the binding site with these contacts mediated by Arg-46 and His-49 within the a-helix of this finger. The similarities between the Zif and Spl amino acid sequences (Fig. 4) suggest the following: (i) finger 3 of Spl may contact the first GGG triplet of a GC box through interactions mediated by Arg-77 and His-80 (Spl-Zn92 numbering system; in analogy with finger 2 of Zif), (ii) finger 2 of Spl may contact the central GcG triplet through Arg-49 and Arg-55 (in analogy with fingers 1 and 3 of ZiO, and (iii) finger 1 of Spi departs from the Zif model with Lys-19 in the place of Arg-46 (Zif, finger 2) while His-22 mimics His-49 (Zif, finger 2). The amino acids for the three zinc domains of Spl can be symbolized as follows: (i) Spl finger 3, RH- for GGG

Biochemistry: Kriwacki et al.

Proc. Natl. Acad. Sci. USA 89 (1992)

a C-term.

zif-3

zif-2

zif-A

Nterm.

=SRKRHT

SDLLTT

5'-GcG

GGG

Spl-3

Spl-2

5'-GGG

GcG

GcG-3' Sp1-1

=RSEHLSKHTkT-HRSDELQRHKTH -KT.SHLRAHLRWH-b

Lzif-2

zif-3

Zif-t

GGGc-3'

I

GGGc

GcG FIG. 4. Similarities between the Spl and Zif peptide sequences and binding site DNA sequences. (a) Sequence similarities in the context of the amino acid sequences (C-terminal to N-terminal direction). (b) Individual fingers grouped according to target DNA sequences. In a, amino acid residues that are bold and underlined are either involved in DNA recognition in the Zif/DNA structure or proposed to be involved in DNA recognition by Spl. The His2 residues of the Cis2His2 motif are given in large, bold type. The data for Zif are based on the three-dimensional structure (10) and the data for Spl are based on the arguments developed in the text.

recognition, (ii) Spl finger 2, R-R for GcG recognition, and (iii) Spl finger 1, KH- for GGGc recognition. A Model for Spl/DNA Binding. The comparison between Zif and Spl binding can be further elaborated considering known Spl binding sites (Table 4). The Zif crystal structure indicates that the fingers with R-R recognition elements (fingers 1 and 3) contact only the G residues of the GcQ triplet while the finger with the RH- element (finger 2) contacts only the second and third G residues of the GiG triplet. If, in analogy to Zif, Spl recognizes DNA through RH- (finger 3), R-R (finger 2), and KH- (finger 1) elements that interact with the G(iG (triplet 1), (jcQ (triplet 2), and GGGc (triplet 3) subunits of the DNA binding site, respectively, then the specific bases contacted in the Zif binding site should also be contacted in the Spl binding site and therefore should be highly conserved among Spl binding sites. This requirement holds true with (i) positions 2 and 3 of triplet 1 conserved as G in all but one high-affinity Spl binding site (Table 4) and (ii) positions 1 and 3 within triplet 2 conserved as G in all Spl binding sites. The third subunit within the Spl binding site tolerates a variety of substitutions; this fact suggests that a novel recognition process exists here with respect to the Zif system and is consistent with the divergence of the third Spl recognition element (finger 3, KH-) from the Zif model. These suggestions are supported by the experiments with the Mt mutants since the altered sites (GGG to GGT for Mt-T3 and -QcG to TcG for Mt-T4) correspond to key positions in the above model and, correspondingly, DNA binding affinity is decreased with these mutants. Our model provides an explanation for the asymmetry within the plot of Ka versus DNA length (Fig. 3). The Zif/DNA structure (10) reveals two types of sequencespecific contacts between the zinc-fingers and DNA: (i) bidentate hydrogen bonds between the guanidinium group of Arg residues and guanine bases and (ii) a single hydrogen bond between the imidazole ring of a His residue and guanine. We envision similar contacts between Spl fingers 3 and 2 and DNA due to the peptide sequence and DNA binding site similarities pointed out above (Fig. 4). Spl finger 1 differs from the Zif system and may contact its DNA subsite through hydrogen bonds arising from Lys-19 and His-22. The nature of the side-chain/DNA interactions and the greater confor-

9763

mational flexibility of KH- versus RH- or R-R contacts with DNA may partly account for the sequence diversity that is tolerated within the 3' portion of the Spl binding site. This analysis parallels the results of Fig. 3, with the 5'-GGG GCG portion of the site vital for binding (mediated by RH- and R-R contacts) and the 3'-GGGC portion (mediated by KH- contacts) not vital. This analysis is also consistent with the finding that a peptide containing only fingers 2 and 3 of Spl binds strongly to the Mt GC box (unpublished results) through putative contacts with the 5'-GGG GCG motif. Although this analysis vastly simplifies the nature of the interaction between Spl-Zn92 (and Spl) and GC box DNA sites and ignores the multitude of other contacts that must exist between the two, it does account for the striking asymmetry of binding affinity across the binding site. In addition, this model offers a structural basis for the observed ability of Spl, in contrast to Zif, to accommodate considerable sequence variation within its cognate DNA binding site. Additional molecular biological and structural studies designed to challenge this hypothesis are necessary. We thank Dr. Donald M. Crothers for critical comments on the manuscript, Drs. Jason D. Kahn and Kevin M. Weeks for advice on DNA binding experiments, Mr. Paul Raccuia for technical assistance during protein purification, and Dr. Mark Biggin for providing the GC-box-containing plasmid. This work was supported in part by a Camille and Henry Dreyfus Grant for Distinguished New Faculty in Chemistry (to J.P.C.), Predoctoral Biophysical Fellowship 5 T32 GM08293-04 (to R.W.K.) from the National Institutes of Health, and the New Graduate Assistance Program in Chemistry, Grant P200A00228-91 (to R.W.K.). 1. Miller, J., McLachlan, A. D. & Klug, A. (1985) EMBO J. 4, 1609-1614. 2. Brown, R. S., Sander, C. & Argos, P. (1985) FEBS Lett. 186, 271-274. 3. Kadonaga, J. T., Carner, K. R., Masiarz, F. R. & Tjian, R. (1987) Cell 51, 1079-1090. 4. Dynan, W. S. & Tjian, R. (1983) Cell 32, 669-680. 5. Gidoni, D., Dynan, W. S. & Tjian, R. (1984) Nature (London) 312, 409-413. 6. Gidoni, D., Kadonaga, J. T., Barrera-Saldana, H., Takahashi, K., Chambon, P. & Tjian, R. (1985) Science 230, 511-517. 7. Kadonaga, J. T., Jones, K. & Tjian, R. (1986) Trends Biochem. Sci. 11, 20-23. 8. Jones, K. A., Kadonaga, J. T., Luciw, P. A. & Tjian, R. (1986) Science 232, 755-759. 9. Nardelli, J., Gibson, T. J., Vesque, C. & Charnay, P. (1991) Nature (London) 349, 175-178. 10. Pavletich, N. P. & Pabo, C. 0. (1991) Science 252, 809-817. 11. Yanisch-Perron, C. J., Vieira, J. & Messing, J. (1985) Gene 33, 103-115. 12. Maniatis, T., Fritsch, E. F. & Sambrook, J. (1982) Molecular Cloning:A Laboratory Manual (Cold Spring Harbor Lab., Cold

Spring Harbor, NY). 13. Studier, F. W. & Moffatt, B. A. (1986) J. Mol. Biol. 189, 113-130. 14. Studier, F. W., Rosenberg, A. H., Dunn, J. J. & Dubendorff, J. W. (1990) Methods Enzymol. 185, 60-89. 15. Fried, M. G. & Crothers, D. M. (1981) Nucleic Acids Res. 9, 6505-6525. 16. Revzin, A., Ceglarek, J. A. & Garner, M. M. (1986) Anal. Biochem. 153, 172-177. 17. Courey, A. J. & Tjian, R. (1988) Cell 55, 887-898. 18. Lin, S. & Riggs, A. D. (1972) J. Mol. Biol. 72, 671-690. 19. Liu-Johnson, H. N., Gartenberg, M. R. & Crothers, D. M. (1986) Cell 47, 995-1005. 20. Gartenberg, M. R., Ampe, C., Steitz, T. A. & Crothers, D. M. (1990) Proc. Natl. Acad. Sci. USA 87, 6034-6038. 21. Harrington, M. A., Jones, P. A., Imagawa, M. & Karin, M. (1988) Proc. Natl. Acad. Sci. USA 85, 2066-2070. 22. Bucher, P. (1990) J. Mol. Biol. 212, 563-578. 23. Christy, B. A., Lau, L. F. & Nathans, D. (1988) Proc. NatI.

Acad. Sci. USA 85, 7857-7861.