NMR of modular proteins - Duke Computer Science

1 downloads 0 Views 545KB Size Report
F3 and F2 have clearly identifiable hydrophobic cores but for F1, EG and TB, other stabilizing features, such as disul- fides and calcium, seem to be required to.
NMR supplement

NMR of modular proteins Iain D. Campbell and A. Kristina Downing NMR studies of domains, dissected from large modular proteins, are described. Particular emphasis is placed on modules from the extracellular proteins fibrillin-1 and fibronectin.

That NMR is now an effective method for determining the structure of proteins is illustrated by the appearance of about 100 newly determined structures per year1. The main advantages of NMR over diffraction methods (for example see ref. 2 and articles in this supplement) are: (i) the method operates in the solution state; (ii) there is no ‘phase problem’; and (iii) protein dynamics, ligand binding and environmental effects (pH, salt, temperature) can be measured relatively easily . There are, however, significant disadvantages, of which the major one is the size limit of about 30,000 Mr for ab initio structure determination. Fortunately, biology has chosen to construct many proteins from modular units or domains3 of a size that is usually well within the limits of the NMR method. This means that single or multiple domains can be produced using recombinant expression methods and their structures determined. This ‘dissection’ approach is now commonplace in structural studies of modular proteins. Well-known examples include domains from intracellular signaling proteins4,5 nucleic acid bind-

ing proteins6, cell surface receptors7 and extracellular matrix proteins8,9. In this brief review we will not attempt to catalog the large number of modular proteins that have been studied by NMR, rather we wish to illustrate some recurring features of such studies. In applying the dissection approach, some general issues are: (i) identifying sequences that correspond to independently folded domains; (ii) determining domain linkage and assembly; (iii) relating the properties of single or tandem domains to those of the intact protein; and (iv) assessing the advantages and disadvantages of NMR methods in such studies. We will henceforth use the word ‘module’ to refer to a subset of domains that are made up from a contiguous amino-acid sequence and are used repeatedly in different proteins. We will also discuss the structure determinations of certain extracellular modular proteins to illustrate our points, since a dissection approach to correlate structure with function is the only viable approach to obtain atomic resolution structural information about large glycosylated proteins of this kind.

Typical examples of modular extracellular matrix proteins, fibrillin-1 and fibronectin, are shown in Fig. 1. The human fibrillin-1 gene encodes a 350,000 Mr modular glycoprotein10 that contains mainly epidermal growth factor-like (EGF) modules, interspersed with transforming growth factor-β binding protein-like (TB) modules. Fibrillin-1 is a major component of 10–12 nm connective tissue microfibrils11, and over 140 mutations have been identified in patients with Marfan syndrome (MFS) (reviewed in ref. 12), a heritable disease of connective tissues. To date no clear genotype-phenotype relationship has been derived. Since fibrillin-1 maintains connective tissue microfibril architecture, knowledge of its structure and assembly may help elucidate the molecular basis of the MFS disease. Fibronectin is a large extracellular glycoprotein, involved in adhesion and migration events in many physiological processes including embryogenesis, wound healing, hemostasis, and thrombosis13. Each monomer is composed almost entirely of three different types of protein modules (F1, F2 and F3). The biological

Fig. 1 Module organization of fibrillin-110 and fibronectin8. White symbols denote rare modules only found in fibrillin-like proteins. 496

nature structural biology • NMR supplement • july 1998

NMR supplement

Fig. 2 Multiple sequence alignment of TB modules from human fibrillin-1. This alignment was produced using CLUSTALX27. Sequence similarity is plotted as a function of residue number below the alignment. Color coding is as follows: yellow = C; cyan = 80% E/D; green = 80% K/R; magenta = 80% G; red = 80% hydrophobic (W/L/V/C/I/M/A/F/Y/P) for all but P and C; orange = P if hydrophobic. TB modules 1–7 are encoded by exons 9, 16–17, 24, 37–38, 41–42, 50–51, 57 respectively10.

functions of fibronectin are also dis- tion14 and analysis of backbone relaxation face) are highly variable16. This kind of varisectable in that the binding sites for a wide data15 for the sixth TB module from human ation seems particularly common for modrange of molecules have been shown to be fibrillin-1 reveal that this region is not well- ules with a well-defined hydrophobic core associated with various fragments. ordered on the NMR time scale (ps–ns) such as the F3 module. Perhaps the best known of these function- (Fig. 3). It is likely that this flexibility is While theoretical prediction of the relaal regions is one that binds to integrins on important for the properties of 10–12 nm tive orientation of F3 modules in cell surfaces and includes an RGD peptide connective tissue microfibrils containing fibronectin seems very difficult, studies of a fibrillin-1. sequence in the tenth F3 module. pair of calcium binding (cb) EGF modules While the growing structure database of from human fibrillin-1 suggests that some The structures of all five module types present in fibrillin and fibronectin are protein modules is allowing fairly good pre- types of pairwise module interactions may now known (see Fig. 1). The structure of a dictions of module homologs9, prediction of be generalized. The cbEGF module is the TB module was, however, obtained only how modules are assembled is much more most common type of module in human very recently14 and our knowledge about difficult. There are at least two reasons for fibrillin-1, and it is organized mainly as mulstructural features of each module type is this; one is that module interfaces do not tiple tandem repeats (Fig. 1). The NMR improving as the database continues to usually involve any recognizable secondary structure determination of the cbEGF 32–33 expand. F3 and F2 have clearly identifiable structure elements, which makes modeling pair shows that the two modules are hydrophobic cores but for F1, EG and TB, difficult, and another is that module–mod- organized into a rigid, rod-like arrangement other stabilizing features, such as disul- ule interfaces in an evolving system can (Fig. 4) with a calcium atom stabilizing the fides and calcium, seem to be required to accommodate amino acid changes relatively inter-module linkage17. In a multiple produce stable modules. easily compared to changes within a mod- sequence alignment of all cbEGF module In extracellular proteins, identification ule. Thus the interface regions between pairs from fibrillin, the length of the linker of independently folding modules is sim- modules can evolve fairly rapidly and give between the two modules is constant, and it plified because exon boundaries and rise to many different types of module con- has been predicted that all tandem repeats module boundaries often coincide. In nection. For example, in the structure of of cbEGF modules in fibrillin-1 are held in intracellular proteins, however, more four F3 modules (7–10) the ways in which the extended conformation by calcium17. A model of the organization of fibrillin-1 sophisticated methods, including enzyme the modules are connected (tilt and twist digests and multiple sequence alignment, angles and buried surface area at the inter- monomers in connective tissue microfibrils has been constructed in which are often required to identify they adopt a parallel, stagmodules. In general if the module gered conformation with boundaries are identified incor33–50% overlap17. This model rectly, expression levels will be was derived from knowledge poor and/or the module may not of the NMR structure of a fold properly. In a multiple cbEGF pair extrapolated to sequence alignment, amino acids all-tandem cbEGF (35 playing key structural roles will repeats; see Fig. 1) together often be identical or conservawith the results of monoclonal tively substituted. antibody binding studies18. However, regions of low Fibrillin containing microfibsequence homology are often rils have a characteristic beadimportant for determining the ed appearance when properties of an intact protein. visualized by rotary shadowConsider, for example, the aligning electron microscopy19, and ment of TB modules from human these studies have shown that fibrillin-1 shown in Fig. 2. Near monoclonal antibodies raised the C-terminus of the module, there is less similarity amongst the Fig. 3 Superposition of the TB6 family of NMR structures14. This figure against fibrillin bind with a single periodicity between the sequences. Structure determina- was rendered28 from MOLSCRIPT29 input. nature structural biology • NMR supplement • july 1998

497

NMR supplement

Fig. 4 Schematic ribbon drawing of fibrillin cbEGF 32–3317. Calcium atoms are shown in red. This figure was rendered28 from MOLSCRIPT29 input.

beads. The model for microfibril architecture can be tested experimentally. For example, the NMR structure predicts destabilization of the cbEGF pairs in the absence of calcium, and electron micrographs show that when calcium is completely removed from the microfibril, it becomes diffuse in appearance19. With the model providing background, NMR can be used to probe the consequences of disease-causing mutations in fibrillin-1. For example, a domain containing the N2144S mutation, which is associated with a relatively serious form of MFS, was expressed and its properties compared to the wild type domain using NMR, showing that the mutation correlated with an approximately five-fold decrease in calcium affinity20. A consideration of all known point mutations in cbEGF modules associated with MFS shows that most of them are predicted to result in a distinct structural defect in terms of the model and the structure of the cbEGF 32–33 module pair. In particular, a subset of these mutations are likely to disrupt the pairwise cbEGF module interactions. Thus the NMR studies of fibrillin modules, taken together with electron micrographs of connective tissue microfibrils and site specific mutations that correspond to disease states, all suggest quite 498

strongly that, in the case of dissected fragments from human fibrillin-1, the properties of each construct are directly relevant to those of the intact protein. F3 modules can be used to illustrate the complementary nature of X-ray and NMR structural methods. The structure of the RGD-containing tenth F3 module was first determined by NMR21 and the RGD peptide was shown to be relatively flexible. Later, the structures of modules 7–10 were determined by diffraction16. In the X-ray structure there was no evidence for relative motion between modules and the RGD sequence was well defined, with B-factors similiar to residues in the structured regions of the molecule in the crystal. In contrast, when two NMR groups studied the ninth and tenth modules from human22 and mouse23, it was found from 15N relaxtion data that the RGD sequence was mobile in solution compared to the structured regions of the molecule. In addition, there were relatively few observed NOEs between these two modules, suggesting a weak interaction, and the relative orientation of the two modules was bent more than 20° compared to the crystal structure23. It is interesting that the relative orientation of the 9–10 module pair is biologically important since efficient binding to integrins requires not only the RGD site on the tenth F3 module but also a PHSRN site on the ninth F3 module (see Fig. 5). There is a practical difficulty in defining the structure of module pairs (or larger) fragments by NMR because NOE based structure calculations depend on short range (