Bushweller Letter - Duke Computer Science

1 downloads 7 Views 624KB Size Report
ture of the CBFβ heterodimerization domain (residues 1–141), and map the binding site for ... homologs, Brother (Bro) and Big Brother (Bgb)14,15. A truncated.

© 1999 Nature America Inc. • http://structbio.nature.com

letters Solution structure of core binding factor b and map of the CBFa binding site Xuemei Huang1, Jeff W. Peng2, Nancy A. Speck3 and John H. Bushweller1

© 1999 Nature America Inc. • http://structbio.nature.com

1Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, Virginia 22906, USA. 2Protein NMR Group, Vertex Pharmaceutical Incorporated, Cambridge, Massachusetts 02139, USA. 3Department of Biochemistry, Dartmouth Medical School, Hanover, New Hampshire 03755, USA.

The core binding factor b subunit (CBFb) is the non-DNA binding subunit of the core-binding factors, transcription factors essential for multiple developmental processes including hematopoiesis and bone development. Chromosomal translocations involving the human CBFB gene are associated with a large percentage of human leukemias. The N-terminal 141 amino acids of CBFb contains the heterodimerization domain for the DNA-binding CBFa subunits, and is sufficient for CBFb function in vivo. Here we present the high-resolution solution structure of the CBFb heterodimerization domain. It is a novel a/b structure consisting of two three-stranded b-sheets packed on one another in a sandwich arrangement, with four peripheral a-helices. The CBFa binding site on CBFb has been mapped by chemical shift perturbation analysis. The core binding factor b subunit (CBFb) is the non-DNA binding subunit of the heterodimeric transcription factor complexes called core binding factors, or CBFs1,2. CBF subunits are encoded by four genes in mammals. CBFA1, CBFA2 (AML1), and CBFA3 encode DNA binding CBFa subunits, and CBFB encodes the CBFb subunit1-6. CBFb heterodimerizes with each of the CBFa subunits in vitro, although an in vivo requirement for this association has been demonstrated only for CBFa2. CBFa can bind DNA in the absence of CBFb in vitro, but with lower affinity than the CBFa/b complex1,2. Homozygous disruption of either the Cbfa2 or the Cbfb genes in mice results in essentially identical phenotypes: midgestation embryonic lethality accompanied by extensive hemorrhaging and a profound block at the fetal liver stage of hematopoiesis7,8. In humans, chromosomal

rearrangements that disrupt the CBFA2 and CBFB genes are associated with a variety of leukemias 9. All of these translocations result in the synthesis of chimeric proteins, two of which have been directly demonstrated to block CBF function in a transdominant manner10-12. The inversion on chromosome 16 involving the CBFB gene, inv(16)(p13;q22), is associated with 10% of acute myeloid leukemias3. This translocation results in the production of a chimeric protein that contains the N-terminal 165 amino acids of CBFb fused to the coiled-coil region of a smooth muscle myosin heavy chain protein3, thus it retains the heterodimerization domain of CBFb whose structure is described here. A knock-in of this fusion protein results in the identical phenotype observed for homozygous disruptions of the Cbfa2 and Cbfb genes10. The primary structures of CBFb and its Drosophila homologs are not similar to those of any other proteins, and the mechanism by which CBFb stabilizes the CBFa–DNA complex is unusual, in that contacts to DNA are not substantially altered. The CBFb subunit is an essential component of the CBF complex and is mutated in a substantial percentage of human leukemias making it an interesting and important target for structural studies. We have recently characterized the fold of the CBFa Runt domain in its DNA-bound state as an s-type Ig fold and identified a putative CBFb binding site on the CBFa Runt domain13. Here, we describe the high-resolution solution structure of the CBFb heterodimerization domain (residues 1–141), and map the binding site for CBFa on CBFb.

Structure determination The heterodimerization domain in CBFb has been localized to its N-terminal 135 amino acids, which corresponds to a region of significant homology between CBFb and its two Drosophila homologs, Brother (Bro) and Big Brother (Bgb)14,15. A truncated CBFb protein containing amino acids 1–141 [CBFb(141)] binds to CBFa in vitro with the same affinity as a full length isoform of CBFb, CBFb(187)16. The isolated heterodimerization domain, CBFb(141), also appears to display essentially the same fold as it does in the context of the full length CBFb(187) protein16, consistent with their similar in vitro biochemical behavior. In addition, the isolated heterodimerization domain is sufficient to rescue definitive hematopoiesis of Cbfb deficient mouse embryonic stem cells (Miller et al., unpublished results). For these reasons, our structural studies have focused on this 141 amino acid region of CBFb. CBFb(141) contains two additional amino acids at its N-terminus (Gly and Ser) encoded by the restriction endonuclease site used to fuse the CBFb cDNA to sequences encoding the bacterial glutaredoxin protein16. These two residues remain on CBFb after it is cleaved from glutaredoxin16. All numbering in this paper starts from the N-terminal Met of the CBFb sequence, thus the numbering of the PDB file will be offset by two relative to this numbering. Backbone and side chain assignments for the protein have been described17. A total of 4,254 cross-peaks were assigned and integrated in the 60 ms 3D 15N and 13C-edited NOESY spectra of CBFb(141) which yielded a total of 1,614 meaningful upper distance constraints (Table 1) after processing with DYANA. A total of 361 dihedral angle constraints derived from J-coupling and cross-correlation data were also employed in the structure calculaFig. 1 Stereoview of the 20 conformers representing the solution structure of tions. Stereospecific assignments were obtained for 75 CBFb(141). Green indicates the b-strand regions, red the helical regions, and blue the pairs of diastereotopic substituents. Fig. 1 depicts the loop regions of the structure. 624

nature structural biology • volume 6 number 7 • july 1999

© 1999 Nature America Inc. • http://structbio.nature.com

© 1999 Nature America Inc. • http://structbio.nature.com

letters

Fig. 2 Ribbon representation of CBFb(141) produced with the program MOLMOL. The helices are colored red and yellow, the b-strands cyan, and other segments gray. The secondary structural elements are labeled as described in the text.

backbones of the 20 energy-minimized conformers utilized to represent the solution structure of CBFb (141). The atomic root mean square (r.m.s.) deviation about the mean coordinates for the 20 conformers for all residues is 0.70 ± 0.10 Å for the backbone nuclei and 1.28 ± 0.13 Å for all heavy atoms (Table 1). These values drop to 0.56 ± 0.09 Å and 1.09 ± 0.10 Å, respectively, for residues 1–68 and 83–141 (excluding a mobile loop region connecting the two domains of the protein). A check of the 20 conformers using PROCHECK-NMR shows 98.5% of the residues in allowed regions of the Ramachandran plot.

clear depression for the N-terminal residues of this loop, indicative of increased motion in this region of the protein. The identity between CBFb and its Drosophila homolog Brother is 57% for residues in the heterodimerization domain, however the extended loop region between the two domains displays almost no identity whatsoever, as expected for a flexible connecting loop. The C-terminal folded region initiates with b-strands 4 (residues 94–102), 5 (residues 107–115), and 6 (residues 120–127) which combine to form another antiparallel b-sheet on the other face of the protein. Strands 4 and 5 are connected by a twisted turn that does not fit any of the classical categories but is stabilized by a hydrogen bond between the NH of Val 106 and the CO of Leu 103, whereas strands 5 and 6 are connected by a classical type 1 b-turn. The structure then terminates with a-helix 4 (residues 129–138) which is capped at its C-terminal end by hydrogen bonds. The two sheets in the protein combine to form a b-sandwich that is actually a continuous twisted bsheet with hydrogen bonds between residues 25 and 27 of b1 and residues 121 and 123 of b6 providing the contacts between the N-terminal and C-terminal b-sheets. The C-terminal end of the protein is in close proximity to the N-terminus of the protein. The N-terminal residues of the protein are also in contact with the turn connecting b-strands 4 and 5. These residues combine to form an independent hydrophobic core involving Val 4, Val 5 (N-terminus), Phe 17, Phe 18 (a-helix 2), Ile 102 (b-strand 4), Leu 103, Val 106 (turn between b-strand 4 and 5), and Phe 127 (b-strand 6) that is also stabilized by several long-range hydrogen bond and electrostatic interactions. The CBFa binding site on CBFb It has been very elegantly demonstrated that the binding sites on proteins for small molecules as well as binding sites for other proteins or nucleic acids can be determined by mapping the chemical shift changes in 15N-1H HSQC spectra upon binding18. We mapped the binding site for CBFa on CBFb by comparing the backbone amide NH and tryptophan side chain NH chemical shifts for 15N-labeled CBFb alone and for 15N-labeled CBFb in a ternary complex with unlabeled CBFa2 Runt domain and DNA. Fig. 4 shows the results of this analysis. It should be men-

Structure description A ribbon representation of the structure of CBFb(141) is shown in Fig. 2. As predicted from CD spectra16, CBFb is an a/b protein. CBFb is divided into two non-autonomously folded regions that are separated by a loop with increased flexibility. The N-terminal region starts with a-helices 1 and 2, which are contiguous to one another with a break at Glu 15. After the kink at Glu 15, normal a-helical hydrogen bonding is observed and the second helix is C-terminally capped by a hydrogen bond between the backbone NH of Arg 23 and the side chain O of Ser 22. This is followed by b-strand 1 (residues 25–29)which is followed by a short 310- helix that was identified in 18 out of the 20 conformers. The characteristic i to i+3 hydrogen bonding pattern of a 310-helix was observed between the CO of residues 31 and 32 and the NH of residues 34 and 35, respectively, in the majority of the 20 conformers. This is followed by the third a-helix extending from residue 41 to 50. bstrands 2 and 3 (residues 53–57 and 63–66, respectively) follow with a type 1 b-turn between them. b-strands 1, 2, and 3 combine to form an anti-parallel b-sheet on one face of the protein. The N-terminal folded region (ending at Fig. 3 Plot of heteronuclear NOE as a function of the resolved backbone NH groups in b-strand 3) is followed by an extended loop from CBFb(141). Secondary structure elements are indicated in boxes at the top. Heteronuclear residue 68 to 93. The N-terminal portion of this loop NOE values were determined from spectra recorded in the presence and absence of a pro(residues 68–83) displays lower resolution than the ton presaturation period of 3 s within a total recycle delay of 5 s between acquisitions. The values were calculated from the ratio of peak heights in the spectra recorded with C-terminal portion (residues 83–93) (Fig. 1). 15N-1H and without proton saturation. The standard deviation of the NOE value was determined heteronuclear NOE measurements (Fig. 3) show a based on the measured background noise levels. nature structural biology • volume 6 number 7 • july 1999

625

© 1999 Nature America Inc. • http://structbio.nature.com

letters

© 1999 Nature America Inc. • http://structbio.nature.com

Fig. 4 (Top) 15N-1H HSQC spectra of a, uncomplexed and b, complexed CBFb(141) recorded at 500 MHz. c, Mapping of chemical shift changes in backbone amide N and NH and Trp side chain N and NH nuclei of 15N-labeled CBFb(141) upon binding of CBFa(41–214)–DNA versus the primary sequence of the protein. Solid boxes correspond to 15N chemical shift changes and open boxes correspond to NH chemical shift changes. Secondary structure elements are indicated in boxes at the top.

a

tioned that this analysis cannot distinguish chemical shift changes resulting from direct contacts with the protein from chemical shift changes resulting from conformational changes induced as a result of that binding. In addition, since we have added a CBFa Runt domain–DNA complex to CBFb, there is the possibility that the DNA could be inducing some of the observed chemical shift changes. However, since it has been shown by alkylation c interference that CBFb does not alter the footprint of CBFa on the DNA2, it is assumed that there are no close CBFb–DNA interactions that could cause large chemical shift changes in the protein. There are a large number of shifts possibly indicative of a conformational change in the protein, however, the largest changes are clustered in several distinct regions in the sequence. A map of the significant (>150 Hz) perturbations on the three-dimensional structure of the protein is shown in Fig. 5. These changes are localized to a specific region of the three-dimensional structure including bstrand 3, residues in the loops preceding and following b-strand 3, b-strand 4, b-strand 5, the loop connecting b-strand 4 and b-strand 5, and three residues in b-strand 6. Both residues 3 and 4 show large perturbations upon binding. Deletion mutagenesis studies have shown that even very short (5–6 residue) deletions from the N-terminus of the Drosophila and mouse proteins abrogate binding to CBFa14,15. The deletion mutations of this region could destabilize the protein by disrupting the small hydrophobic core involving the Nand C-termini and residues in the loop between b-strand 4 and bstrand 5 mentioned above. The residues in b-strand 3 show an alternating pattern of large chemical shift change followed by a small chemical shift change (Fig. 4) which is consistent with a localized interaction with CBFa. Several residues in the N-terminal end of the extended loop that connects the two domains of the protein also show substantive changes. There is a cluster of perturbed residues beginning with Ala 99 in the middle of b-strand 4 and extending to residue 110 in the middle of b-strand 5. A pattern of alternating large and small shift perturbations is observed for Met 101–Val 106 (Fig. 4) where the more perturbed residues (Met 101, Leu 103, Gly 105) all have their NHs pointing in towards the region between strands 4 and 5 rather than out, again providing a clear picture of the binding site for CBFa. The side chain NH of Trp 110 shows the most dramatic perturbation observed with a 551 Hz shift, whereas the side chain NH of Trp 113 is unperturbed by binding to CBFa. Three residues in b6 (Met 122–Cys 124) also show perturbations upon CBFa binding, consistent with their spatial proximity to the loop leading into b3 that is also perturbed by CBFa binding. 626

b

Interestingly, there are three Cys residues (25, 107, 124) located in or in the vicinity of the CBFa binding site. CBFa is very sensitive to oxidation of its critical Cys residues19,20. CBFb has been shown to protect CBFa against cysteine oxidation and the concomitant reduction in DNA binding activity, so the presence of these cysteines in the binding site may be of significance to

Fig. 5 Map of the chemical shift perturbation data on a ribbon representation of CBFb(141). Yellow indicates large chemical shift changes(>150 Hz) upon binding of CBFa to CBFb(141).

nature structural biology • volume 6 number 7 • july 1999

© 1999 Nature America Inc. • http://structbio.nature.com

letters CBFb’s ability to modulate the redox behavior of CBFa. Note added in proof: An independent structure determination of this protein is reported in another paper in this issue of Nature Structural Biology21. Methods

© 1999 Nature America Inc. • http://structbio.nature.com

Sample preparation. CBFb(141) was expressed and purified according to the procedure described by Huang et al.16. Celtone media (Martek, Inc.) was employed for isotopic enrichment of the protein with 13C and 15N. Fractionally 13C-labeled protein was prepared using M9 minimal medium containing 15% 13C glucose. NMR spectroscopy. NMR measurements were carried out at 20 °C on a Varian UnityPlus 500 MHz NMR spectrometer. For collection of NOE upper distance constraints, 3D 13C-edited and 15N-edited NOESY spectra were recorded with a 60 ms mixing time on the 1.5 mM 13C/15N-labeled sample in 95% H2O/5% 2H2O. Values of 3JHNHa, 3JHaHb, and 3JNHb vicinal coupling constants were measured as described22–24. Constraints on the y angle were obtained based on crosscorrelated relaxation between 1Ha-13Ca dipolar and 13C' chemical shift anisotropy mechanisms25. Constraints on the c2 angle for Leu and Ile were obtained from 3JCaCd26. Stereospecific assignment of the methyl groups of the valine and leucine residues were obtained from an analysis of a 3D HCCH-TOCSY spectrum recorded on a fractionally 13C-labeled sample27.

Table 1 Structural Statistics for CBFb(141) Experimental NMR constraints NOE distance constraints intraresidue sequential medium range long range Angle constraints f y c1 c2 (Ile) c2 (Leu) NMR constraint violations NOE constraint violations: Sum (Å) Maximum (Å) Dihedral angle constraint violations: Sum (°) Maximum (°) AMBER energy (Kcal mol-1)

1,614 546 412 193 463 361 136 136 86 5 8

20.47 ± 0.48 (19.83 ... 21.28) 0.10 ± 0.00 (0.10 ... 0.11) 41.71 ± 5.93 (29.81 ... 58.30) 2.31 ± 0.22 (1.91 ... 2.78) -7072.08 ± 81.04 (-7174.49 … -6898.30)

Root mean squared deviation from the mean structure Backbone atoms of all residues All heavy atoms of all residues Backbone atoms of residues 1–68 and 83–141 All heavy atoms of residues 1–68 and 83–141 Backbone atoms of regular secondary structure elements All heavy atoms of regular secondary structure elements Ramachandran statistics analyzed using PROCHECK-NMR Residues in allowed regions Residues in disallowed regions

Mapping of CBFa2 binding site. Unlabeled CBFa(41–214) was prepared as described28 and complexed to an 18 base pair duplex DNA containing a core site sequence. A 20% excess of the CBFa2(41–214)–DNA complex was added to a sample of 0.65 mM 15N-labeled CBFb(141) in a buffer of 10 mM potassium phosphate, pH 6.5, 1 mM EDTA, 0.2 mg ml-1 NaN3, 5 mM DTT and 5% 2H2O. Initial NMR spectroscopy was carried out at 35 °C on a Varian UnityPlus 500 MHz NMR spectrometer. 15N-1H HSQC and 3D 15N-edited [1H, 1H] TOCSY spectra were recorded on the CBFb(141) sample prior to complexation. For the ternary complex containing 15N-labeled CBFb(141), 15N-1H HSQC and 3D 15N-edited [1H, 1H] NOESY spectra were recorded at 35 °C on a Bruker DRX 800 MHz NMR spectrometer. Determination of the three-dimensional structure. Structure calculations were carried out via torsion angle dynamics using the program DYANA29. The input for the DYANA calculations consisted of upper distance limits derived from NOESY cross-peak intensities using the program CALIBA and dihedral angle constraints from the program HABAS. Following the torsion angle dynamics calculations, the 20 conformers with the lowest target function values were subjected to energy minimization using the AMBER force field implemented in the program OPAL30. The resulting 20 energy minimized conformers were used to represent the solution structure of CBFb(141).

98.6% 1.4%

Correspondence should be addressed to J.H.B. email: [email protected] Received 22 March, 1999; accepted 14 May, 1999. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24.

Coordinates. The structure has been deposited in the Protein Data Bank (accession number 2jhb).

25. 26. 27.

Acknowlegments This work was supported by grants from the United States Public Health Service (NIH). NAS is a Leukemia Society of America Scholar.

28

nature structural biology • volume 6 number 7 • july 1999

0.70 ± 0.10 1.28 ± 0.13 0.56 ± 0.09 1.09 ± 0.10 0.41 ± 0.07 0.84 ± 0.09

29. 30.

Ogawa, E. et al. Virology 194, 314–331 (1993). Wang, S. et al. Mol. Cell. Biol. 13, 3324–3339 (1993). Liu, P. et al. Science 261, 1041–1044 (1993). Bae, S.C. et al. Gene 159, 245–248 (1995). Levanon, D. et al. Genomics 23, 425–432 (1994). Ogawa, E. et al. Proc. Natl. Acad. Sci. USA 90, 6859–6863 (1993). Wang, Q. et al. Proc. Natl. Acad. Sci. USA 93, 3444–3449 (1996). Wang, Q. et al. Cell 87, 697–708 (1996). Meyers, S. & Hiebert, S.W. Crit. Rev. Eukaryo. Gene. Expression. 5, 365–383 (1995). Castilla, L.H. et al. Cell 87, 687–696 (1996). Yergeau, D.A. et al. Nature Genet. 15, 303–306 (1997). Okuda, T. et al. Blood 91, 3134–3143 (1998). Berardi, M. et al. Structure, in the press (1999). Golling, G., Li, L.H., Pepling, M., Stebbins, M. and Gergen, J.P. Mol. Cell. Biol. 16, 932–942 (1996). Kagoshima, H., Akamatsu, Y., Ito, Y. & Shigesada, K. J. Biol. Chem. 271, 33074–33082 (1996). Huang, X. et al. J. Biol. Chem. 273, 2480–2487 (1998). Huang, X., Speck, N. A. & Bushweller, J. H. J. Biomol. NMR 12, 459–460 (1998). Weber, C. et al. Biochemistry 30, 6563–6574 (1991). Kurokawa, M. et al. J. Biol. Chem. 271, 16870–16876 (1996). Akamatsu, Y. et al. J. Biol. Chem. 272, 14497–14500 (1997). Goger, M. et al. Nature Struct. Biol. 6, 620–623 (1999). Kuboniwa, H., Grzesiek, S., Delaglio, F. & Bax, A. J. Biomol. NMR 4, 871–878 (1994). Grzesiek, S., Kuboniwa, H., Hinck, A. P. & Bax, A. J. Am. Chem. Soc. 117, 5312–5315 (1995). Düx, P., Whitehead, B., Boelens, R., Kaptein, R. & Vuister, G.W. J. Biomol. NMR 10, 301–306 (1997). Yang, D., Gardner, K.H. & Kay, L.E. J. Biomol. NMR. 11, 213–220 (1998). Bax, A., Max, D. & Zax, D. J. Am. Chem. Soc. 114, 6923–6925 (1992). Neri, D., Szyperski, T., Otting, G., Senn, H. & Wüthrich, K. Biochemistry 28, 7510–7516 (1989). Crute, B. E., Lewis, A. F., Wu, Z., Bushweller, J.H. & Speck, N.A. J. Biol. Chem. 271, 26251–26260 (1996). Güntert, P., Mumenthaler, C. & Wüthrich, K. J. Mol. Biol. 273, 283–298 (1997). Luginbühl, P., Güntert, P., Billeter, M. & Wüthrich, K. J. Biomol. NMR 8,136–146 (1996).

627