Npgrj_NMETH_938 931..937 - Duke Computer Science

2 downloads 0 Views 417KB Size Report
Oct 23, 2006 - peaks from more than one but less than four residues (Table 1). DdCAD-1 and ... systems had no side-chain NOE correlations (Table 1). This.
ARTICLES

© 2006 Nature Publishing Group http://www.nature.com/naturemethods

A new strategy for structure determination of large proteins in solution without deuteration Yingqi Xu1,2, Yu Zheng1,2, Jing-Song Fan1 & Daiwen Yang1 So far high-resolution structure determination by nuclear magnetic resonance (NMR) spectroscopy has been limited to proteins o30 kDa, although global fold determination is possible for substantially larger proteins. Here we present a strategy for assigning backbone and side-chain resonances of large proteins without deuteration, with which one can obtain high-resolution structures from 1H-1H distance restraints. The strategy uses information from through-bond correlation experiments to filter intraresidue and sequential correlations from through-space correlation experiments, and then matches the filtered correlations to obtain sequential assignment. We demonstrate this strategy on three proteins ranging from 24 to 65 kDa for resonance assignment and on maltose binding protein (42 kDa) and hemoglobin (65 kDa) for high-resolution structure determination. The strategy extends the size limit for structure determination by NMR spectroscopy to 42 kDa for monomeric proteins and to 65 kDa for differentially labeled multimeric proteins without the need for deuteration or selective labeling.

NMR spectroscopy is a powerful technique for determining structures of proteins in solution at atomic resolution1. At present, B15% of protein structures whose coordinates have been deposited in the protein data bank were determined by NMR spectroscopy, but only B1% of the NMR structures and global folds were generated for proteins larger than 25 kDa (ref. 2). Many proteins ranging from 30 to 60 kDa which cannot be crystallized, await highresolution structure determination by NMR spectroscopy. With the introduction of deuterium labeling3–5 and transverse relaxation optimized spectroscopy (TROSY)6, it is now possible to obtain assignments of backbone resonances of monomeric proteins up to 100 kDa (refs. 7–9). One can model crude global folds of large proteins based on backbone assignment, amide-amide distance restraints available in perdeuterated proteins10, residual dipolar couplings (RDCs)11 and protein database information12,13. To improve the quality of protein global folds, reintroduction of some methyl protons into otherwise deuterated proteins is necessary because methyl groups are often involved in hydrophobic cores and provide many long-range distance restraints14,15. But no matter how good the quality of the global fold is, the arrangements of many side chains cannot be defined precisely. Moreover, the

preparation of such deuterated samples is always costly and timeconsuming. Additionally, some proteins must be unfolded to accelerate the exchange of amide 2H to 1H, and then must be refolded. This unfolding-refolding process is nontrivial for most proteins. Although high-quality structures can be obtained with specifically labeled samples2, the extremely high cost of such samples impedes the application of this labeling approach. Because only 1H spins produce distance restraints that dominate the quality of solution structures, the simplest and cheapest samples used for obtaining high-resolution structures are nondeuterated proteins. In contrast, most triple-resonance experiments for establishing resonance assignments do not work for uniformly 13C, 15N-labeled large proteins without deuteration, except for nuclear Overhauser effect (NOE) spectroscopy (NOESY) and multiplequantum 13C total correlation spectroscopy (MQ-CCH-TOCSY) experiments16–18. Therefore, the development of new NMR techniques is necessary for solving solution structures of proteins 430 kDa. Here we present a new strategy to assign backbone and sidechain resonances of large proteins without the use of deuterium and specific labeling. On the basis of the assignments, we determined high-resolution structures from distance restraints derived from NOEs and dihedral restraints derived from chemical shifts. We demonstrated the strategy on three samples: Ca2+-dependent cell adhesion protein (DdCAD-1, 214 residues, B24 kDa, 3% a helices, 46% b strands), maltose binding protein (MBP, 370 residues, B42 kDa, 42% a helices, 16% b strands) and human normal adult hemoglobin in the carbonmonoxy form (HbCO A, chain-specifically 13C,15N-labeled, 141 residues for a chain, 146 residues for b chain, B65 kDa tetramer, 77% a helices). RESULTS General strategy for sequential assignment The strategy consists of five steps. First, clusters are formed by grouping HC-NH NOE and Ca-NH (HNCA) correlations that have identical NH chemical shifts. Second, spin systems are identified by separating out intraresidue and sequential HC-NH NOE correlations from other inter-residue NOEs observed in a four-dimensional (4D) 13C,15N-edited NOESY spectrum with the use of HNCA and MQ-CCH-TOCSY spectra (Fig. 1a). Third, spin

1Department

of Biological Sciences, National University of Singapore, 14 Science Drive 4, Singapore 117543. 2These authors contributed equally to this work. Correspondence should be addressed to D.Y. ([email protected]).

RECEIVED 2 MARCH; ACCEPTED 8 AUGUST; PUBLISHED ONLINE 23 OCTOBER 2006; DOI:10.1038/NMETH938

NATURE METHODS | VOL.3 NO.11 | NOVEMBER 2006 | 931

Hj β C i–1

β Ci

β H i–1

O α C i–1 α H i–1

C

Ni

α Ci

Hi

α Hi

β Hi

30

67.5

35

I348

V347Hβ

I348Hα

45

4

2 1

H (p.p.m.)

0

29.3

29.3

39.4

43.0 δ1

γ2 γ1

γ1

30

V347Hγ1 I348Hγ1

50

7.72 I348

20

V347Hγ2 V347Hα

64.9

γ2

I348Hδ1 I348Hγ2

I348Hβ 40

37.9

N 118.23 p.p.m.

38.9

15

46.8

HN 7.72 p.p.m.

48.6

d 1

31.5

c

118.2

41.5

γ Hi

C (p.p.m.)

b γ Ci

β

β

40

13

Cj

C (p.p.m.)

γ H i–1

13

γ C i–1

Intraresidue spin system

Sequential spin system

a

13 C (p.p.m.)

ARTICLES

systems are classified by residue type based on 1H and 13C chemical shifts. Fourth, fragments are established from clusters by matching the intraresidue spin system of one cluster with the sequential spin system of another cluster. Fifth, fragments are mapped onto the protein sequence in a manner similar to the traditional tripleresonance approach19. Peak clusters We constructed clusters using three-dimensional (3D) TROSYHNCA and 4D 13C,15N-edited NOESY spectra (Supplementary Methods online). Most amide groups could be unambiguously distinguished in the two spectra for each sample, although the TROSY heteronuclear single-quantum correlation (HSQC) spectra were crowded in some regions (Supplementary Fig. 1 online). For the three proteins studied here, only a few clusters contained crosspeaks from more than one but less than four residues (Table 1). DdCAD-1 and MBP had 2 and 9 such clusters, respectively, and each HbCO A chain had 3 clusters. Most clusters contained both intraresidue and sequential HNCA correlations (Supplementary Fig. 2 online), whereas B10–15% of clusters contained only one HNCA correlation. Spin-system identification and amino-acid type determination An example of identifying spin systems from a given cluster is described below. First, we extracted the intraresidue and sequential Ca-NH peaks to build two initial spin systems from the HNCA slice defined by an amide in a cluster (Fig. 1b). Second, we found that two HaCa-NH NOE peaks from the NOESY slice located at the amide matched the two Ca-NH peaks in Ca chemical shifts (Fig. 1c). Third, we listed the TOCSY slices defined by the CH spin pairs of individual HC-NH NOEs in Figure 1c (Fig. 1d). According to TOCSY correlations of Ca-CkHk or Ck-CaHa, where superscript k denotes the kth side chain atom, we assigned three and five NOE peaks into the two spin systems, respectively. The I348Cg1Hg12 slice displayed a lot of noise as a result of the overlap of intense signals from lysine CdHds. Nevertheless, its correlations 932 | VOL.3 NO.11 | NOVEMBER 2006 | NATURE METHODS

1.29

1.71

0.90

0.96

1.90

3.78

0.94

1.05

2.16

Figure 1 | Identification of spin systems. (a) Strategy used to generate spin systems from one cluster defined 60 α by an amide. Spin pairs (shown in rounded boxes) and spin Ca within each aliphatic side chain form a spin α system indicated by a solid box. The spins in the same spin system correlate with each other via a through-bond 70 experiment (MQ-CCH-TOCSY, bidirectional arrows). Amide NiHi (i denotes residue number) correlates with Ca spins and CH spin pairs in both intraresidue and sequential spin systems via through-bond (HNCA; unidirectional solid Hα Hβ Hγ1 Hγ2 Hα Hβ Hγ2 Hδ1 Hγ12 Hγ13 arrows) and through-space (13C,15N-edited NOESY; dashed arrows) interactions. NiHi may also correlate with other V347 I348 a a inter-residue CjHj spin pairs through medium- or long-range NOEs. However, CjHj does not correlate with C i or C i–1. 1 H (p.p.m.) 13 15 (b–d) Representative slices taken from TROSY-HNCA (b), 4D C, N-edited NOESY (c) and 3D MQ-CCH-TOCSY recorded on MBP (d). Green peaks in slices in b and c are aliased by 19.5 and 22 p.p.m. in the 13C dimension, respectively. The actual Ca chemical shifts in the slice in b are indicated on the right side of the slice. The F2 frequencies for slices in b and d are indicated on the top of each slice. The peaks displayed in green in d are aliased by 26 p.p.m. in F2. 3.48

© 2006 Nature Publishing Group http://www.nature.com/naturemethods

50

with Cg2 and Cd1 were strong and easily recognized. We determined the two spin systems as valine and isoleucine, respectively, according to the chemical shifts of spins allocated to the spin systems (Supplementary Methods). Using this procedure, we constructed many spin systems (Table 1). For DdCAD-1 (tumbling time 12.5 ns at 30 1C), almost all spin systems contained one or more CH spin pairs. For larger proteins such as MBP (B20 ns) and HbCO A (B30 ns), however, B10% of spin systems contained no NOE correlations, and B20% of spin systems had no side-chain NOE correlations (Table 1). This resulted from the poorer sensitivities of the 4D NOESY and 3D MQ-CCH-TOCSY toward larger proteins. Additionally, the MQCCH-TOCSY for MBP was very crowded in some regions, such as those corresponding to lysine residues, and we identified only o55% of the expected TOCSY correlations. Nevertheless, many spin systems had enough characteristic chemical-shift information, which allowed us to determine amino-acid types. For MBP and HbCO A, we typed B60–70% of the spin systems (Table 1 and Supplementary Fig. 3 online). As the result of resonance degeneracy, this procedure led to a few medium- and long-range HC-NH NOEs being included erroneously as spin-system members; this happened in 8 of the initial MBP spin systems. Additionally, we initially assigned several spin systems to wrong types because of missing characteristic peaks or inclusion of incorrect peaks (Table 1). As demonstrated below, these errors did not affect the correctness of the final assignments. Assembly and mapping of connectivity fragments When two spin systems best matched each other in HC chemical shifts, we generated one dipeptide segment from them. In most cases, such a segment corresponded to one dipeptide fragment in the protein sequence. However, very few of such segments might result from the connection of nonadjacent residues, as found in the study of MBP. Using all dipeptide segments, we established several fragments (covering 448% of the residues) and uniquely mapped many of them to protein sequences (Table 1

ARTICLES

Clusters Clusters lacking one HNCA peak Clusters lacking one spin system Spin systems identified Spin systems without NOEs Spin systems with only one HaCa spin pair Spin systems typed asc: Gly Ala Thr Val Ile Leu Others Dipeptide segments Residues assigned from fragments unique in connectivity and mapping Residues assigned based on uniquely mapped fragments Uniquely assigned residues Ambiguously assigned residues Unassigned residues

DdCAD-1 214; 202; 200a

MBP 370; 348; 329a

HbA a-chain 141; 134; 122a

HbA b-chain 146; 139; 138a

197 + 2(2)b 19 3 399 4 30 26; 0; 28 11; 0; 12 31; 0; 35 42; 2; 42 17; 0; 17 16; 1; 16 214; 3; 245 164 174

332 + 8(2) + 1(3)b 50 26 676 51 126 55; 0; 56 83; 1; 88 37; 0; 41 40; 1; 40 34; 0; 42 47; 0; 58 161; 0; 352 217, (2)d 176, (1)e

118 + 3(2) b 16 13 231 10 79 13; 0; 14 34; 0; 42 15; 0; 17 21; 1; 26 0; 0; 0 13; 0; 35 45; 4; 126 70 89

131 + 2(2) + 1(3)b 15 12 258 24 66 24; 0; 26 29; 0; 30 6; 0; 11 26; 3; 36 0; 0; 0 13; 0; 36 57; 7; 132 71 81

26 200 0 2

163, (1)f 339 0 9

28 117 5 12

54 135 0 4

aTotal

residue number; expected amide correlations; amides assigned with triple-resonance approach. bThe second (third) term: number of the clusters containing information from two (three) amides or residues. of spin systems correctly typed; wrongly typed; expected. dNumber of dipeptide segments formed by nonadjacent residues. eNumber of residues wrongly assigned in the first stage. The correctness of the assignments was assessed based on published results obtained with the triple-resonance approach. fNumber of the wrong assignments in the first stage that were corrected at the second stage. cNumbers

13C

(p.p.m.)

and Supplementary Fig. 3). Owing to the presence of two the cluster of Ile9 (X11), we found that X11 contained two wrong dipeptide segments for MBP, one mapped fragment initially spin systems definitely corresponding to valine and isoleucine. contained one incorrect assignment at the carboxy terminus of In the protein sequence, there are only two Val-Ile or Ile-Val the fragment. segments: Val8-Ile9 and Val347-Ile348. Because we had already Here we describe the cause, identification and correction of assigned Val347 and Ile348, the previous assignment for cluster the initial wrong assignment. The amide of Asp58 (cluster X209 was very likely incorrect. Moreover, the cluster of Ile9 X209) displayed two long-range NOEs from its interactions could be connected with Trp10 and then extended to Asn12. with V8Hg1 and V8Hb (Fig. 2a). We initially identified these as Thus, the correct assignment for Ile9 was determined to be sequential NOEs and incorrectly assigned them as the sequential cluster X11. This example confirms that errors made in the spin-system members of Asp58, because coincidently, Val8 and Pro57 have identical a 1 b1 c1 Ca chemical shifts (Fig. 2a,b). Subsequently HN 8.72 p.p.m. 15N 117.81 p.p.m. HN 10.01 p.p.m. 15N 125.33 p.p.m. HN 9.12 p.p.m. 15N 128.20 p.p.m. 30 30 30 we incorrectly typed the spin system as L7Hα V8Hβ V8Hβ valine. The intraresidue spin system of I9Hδ1 35 35 35 Val8 matched the spin system of Asp58 P57Hα P57Hα I9Hγ2 better than that of Ile9 (Fig. 2a–c). AddiI9Hβ 40 40 40 V8Hα V8Hα V8Hγ1 V8Hγ1 V8Hγ1 tionally, cluster X209 was initially assigned X209 L7Hδ1 45 L7Hβ 45 45 X29 X11 as Ile9 (note that Lys6-Leu7-Val8 was (D58) (V8) (I9) an assigned fragment). We did not detect L7Hδ2 50 50 50 the error immediately because X209 4 2 4 2 4 2 0 0 0 lacked intraresidue NOEs and was located 1H (p.p.m.) 1H (p.p.m.) 1H (p.p.m.) at the fragment terminus. By checking

d

7.21 p.p.m. 15N 116.21 p.p.m. L14Hα L14Hβ

40 (p.p.m.)

Figure 2 | Resolution of ambiguous connectivity between clusters. (a–f) Representative F1F2 slices taken from the 4D 13C,15N-edited NOESY spectra recorded on MBP (a–c) and the b chain of HbCO A (d–f). Each plane is labeled with its 15N and 1HN chemical shifts and the corresponding residue. The cluster number is also labeled inside each panel. All green peaks were aliased by 22 p.p.m. for MBP or 20 p.p.m. for HbCO A in the 13C dimension.

1HN

45

13C

© 2006 Nature Publishing Group http://www.nature.com/naturemethods

Table 1 | Summary of clusters, spin systems, dipeptide segments and assignments

1HN

7.90 p.p.m.

15N

D52Hα

A13Hβ

L14Hγ

X92 (L14)

e

111.17 p.p.m.

7.41 p.p.m.

15N

D52Hα

V54Hγ1

102.82 p.p.m. M55Hε

40

V54Hγ2 45 M55Hβ

X136 (M55)

1HN

M55Hε

V11Hγ 40 L14Hδ 45

f

V54Hγ1 V54Hγ2

X6 (G56)

G56Hα

M55Hγ 50

V54Hβ

50 A13Hα

M55Hα

55

55 4

2 1H (p.p.m.)

V54Hβ

50

M55Hα 0

55 4

2 1H (p.p.m.)

0

4 1H

2 (p.p.m.)

0

NATURE METHODS | VOL.3 NO.11 | NOVEMBER 2006 | 933

ARTICLES 14

Occurrence (%)

12 10 8 6 4 2

© 2006 Nature Publishing Group http://www.nature.com/naturemethods

16 >1 7

6 8 10 12 14

2 4

0

–2

–6 –4