Functional diversity of the phosphoglucomutase superfamily: structural ...

4 downloads 0 Views 1MB Size Report
PGM1 (M83088), IP-PGM ice plant phosphoglucomutase (U84888),. MA-PGM1 and MA-PGM-2 corn phosphoglucomutases (U89341 and. U89342), MTB-PGM1 ...
Protein Engineering vol.12 no.9 pp.737–746, 1999

Functional diversity of the phosphoglucomutase superfamily: structural implications

Sergei Levin, Steven C.Almo2 and Birgit H.Satir1 2Department

Departments of Anatomy and Structural Biology and of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA 1To

whom correspondence should be addressed

Three-dimensional structural models of three members of the phosphoglucomutase (PGM) superfamily, parafusin, phosphoglucomutase-related protein and sarcoplasmic reticulum phosphoglucomutase, were constructed by homology modeling based on the known crystal structure of rabbit muscle phosphoglucomutase. Parafusin, phosphoglucomutase-related protein and sarcoplasmic reticulum phosphoglucomutase each have 50% or more identity with rabbit muscle phosphoglucomutase at the amino acid level and all are reported to exhibit no or minor phosphoglucomutase activity. There are four major insertions and two deletions in the parafusin sequence relative to PGM, all of which are located in surface-exposed loops connecting secondary structural elements. The remaining amino acid substitutions are distributed throughout the sequence and are not predicted to alter the polypeptide fold. Parafusin contains a putative protein kinase C site located on a surface loop in domain II that is not present in the homologs. Although the general domain structure and the active site of rabbit muscle phosphoglucomutase are preserved in the model of phosphoglucomutase-related protein, a major structural difference is likely to occur in domain 1 due to the absence of 55 amino acid residues in PGM-RP. This deletion predicts the loss of three α-helices and one β-strand from an anti-parallel β-sheet in this domain as compared with the rabbit muscle phosphoglucomutase. Keywords: aciculin/parafusin/phosphoglucomutase/protein superfamily Introduction Phosphoglucomutase (PGM) (EC 5.4.2.21) is an evolutionarily conserved and well characterized enzyme that catalyzes the interconversion of glucose 1-phosphate and glucose 6phosphate via a glucose 1,6-diphosphate intermediate, making it a key enzyme in both glycolysis and gluconeogenesis. The amino acid sequence of the rabbit muscle enzyme (RB-PGM) was published in 1983 (Ray et al., 1983) and the threedimensional structure was later determined to 2.7 Å resolution (Dai et al., 1992) and further extended to 2.4 Å resolution. RB-PGM is a monomer of molecular weight 61 600 Da with a unique four-domain architecture. Formation of the glucose 1,6-diphosphate intermediate is facilitated by 116Ser that has a phosphate group attached to it in the active phosphoenzyme. A metal ion, Mg21, is coordinated by the phosphate group and the three aspartates (residues 319, 321, 323) (Wierenga et al., 1981; Lin et al., 1986) and is involved in catalysis of the phosphate exchange. © Oxford University Press

We have identified parafusin (PFUS), a 63 kDa protein from the ciliate Paramecium, that has 50.7% identity with RB-PGM (Subramanian et al., 1994) and suggested that PFUS is a member of the PGM superfamily, which contains a number of different proteins with high sequence identity with RB-PGM, as detailed below. Like some of the other members of the superfamily, PFUS when isolated from Paramecium has little or no PGM activity; nevertheless these cells possess normal PGM activity, probably attributable to a different protein species (Andersen et al., 1994). Interestingly, PFUS is a glycoprotein that contains a short chain of O-linked mannoses on one or more serine residue(s). PFUS undergoes two covalent modifications consisting of phosphorylation/dephosphorylation and a more unusual covalent modification, phosphoglucosylation/dephosphoglucosylation, where Glc-1-P is transferred to the mannose chain by a Glc-1-P phosphotransferase or removed by a Glc-1-P phosphodiesterase. Both phosphorylation and phosphoglucosylation are associated with serine residue(s) (Murtaugh et al., 1987; Satir et al., 1990; Subramanian and Satir, 1992; Subramanian et al., 1994). Confocal microscopy shows PFUS to be localized to the membranes of the dense core secretory vesicles and to the exocytic site of the cell membrane in Paramecium. Upon stimulation of exocytosis, PFUS is dephosphoglucosylated (Subramanian and Satir, 1992) and it dissociates from the membranes (Zhao and Satir, 1998). Together these results have led to the suggestion that PFUS may represent a new type of cytosolic signal transduction molecule, where carbohydrate cycling is hypothesized to be of physiological importance. In addition to PFUS, other well characterized members of the PGM superfamily include: 1. PGM-SR: a 60 kDa protein, cloned and characterized from rabbit skeletal sarcoplasmic reticulum (SR) (Lee et al., 1992). This protein (96.5% identical with PGM) associates with the SR membranes and its PGM activity is reported to be reduced to 3% of that of RB-PGM. The only amino acid substitutions found in PGM-SR relative to RB-PGM are located within 77 N-terminal amino acids. This protein is a substrate for calmodulin-dependent protein kinase, the kinase which has been suggested to regulate Ca21 release from the SR (Narayanan and Xu, 1997); therefore, PGMSR probably functions in regulation of Ca21 release from the SR. 2. PGM-RP: using a series of monoclonal antibodies raised against human uterine smooth muscle, Belkin and Burridge (1994) identified a doublet protein band (Mr 60/63 kDa) which localized to focal contacts and adherens junctions in muscle. This protein, formerly called aciculin, is now called PGM-RP. When PGM-RP is isolated, it has no detectable PGM enzymatic activity. The deduced amino acid sequence (70% identical with RB-PGM) is smaller (506 amino acids) than RB-PGM (561 amino acids), with the apparent deletion of 55 amino acids coming from the 737

S.Levin, S.C.Almo and B.H.Satir

N-terminus. The function of PGM-RP is not known, but it has been shown to coimmunoprecipitate and coimmunolocalize with dystrophin–utrophin complex, proteins that are part of an assembly that links the actin cytoskeleton to the plasma membrane and the extracellular matrix, at cell–matrix and cell–cell adhesions (Belkin and Burridge, 1995a,b; Moiseeva et al., 1996). It has been suggested that PGM-RP is part of this dystrophin–utrophin complex. It is likely that within the PGM superfamily the overall polypeptide fold has been preserved and that specific insertions and deletions have permitted some members of the superfamily to acquire different functions. Using the coordinates of RBPGM, we have applied homology modeling methodologies to model the three-dimensional structures of PFUS, PGM-RP and PGM-SR. These models localize the differences in the primary amino acid sequences and suggest whether the changes in the spatial organization of the active site that might account for loss of PGM activity. The models highlight structural changes that might be involved in the new functions associated with these members of the superfamily. Materials and methods Homology modeling, also termed comparative modeling or knowledge-based modeling, develops a three-dimensional model of a protein based only on its amino acid sequence and on the known three-dimensional structures of homologous proteins. All modeling and structural analysis were performed with the InsightII software package (Biosym Technologies, San Jose, CA), running on a Silicon Graphics workstation. The atomic coordinates for the rabbit muscle phosphoglucomutase, refined to 2.4 Å, were obtained from the Brookhaven Protein Database. The sequences for the members of the phosphoglucomutase superfamily PFUS, PGM-RP and PGM-SR were obtained from a SWISS PROT database search. The multiple sequence alignment of these phosphoglucomutase homologs was performed using a standard multiple alignment algorithm (Needleman and Wunsch, 1970). Higher weights in alignment were given to areas important for the phosphoglucomutase activity, such as active site loop and the Mg21 binding site (Wierenga et al., 1981). Regions of high homology were assigned using program POLINA (Protein Organization LINear Analysis) (Levin and Satir, 1998). Cα atoms of the residues from PFUS, PGM-SR and PGM-RP sequences in the regions of high homology were superimposed on the template structure of RB-PGM. For construction of the PFUS model all modified loops, including four major insertions and two major deletions (three amino acids or larger), were generated ab initio. At that stage, the model was subjected to energy minimization by the conjugate gradient method of Powell, as implemented in the XPLOR software package (Brunger et al., 1990). Initially, 500 steps of conjugate gradient minimization with the CHARMM non-bonded potential were employed, while Cα atoms in strongly conserved regions were constrained. In addition, the distance between Mg21 and its coordinating ligands was constrained. This resulted in relaxation of the side chain positions and helped to alleviate sub-optimal van der Waals contacts in the initial model, while maintaining the backbone structure similar to the template model. An additional 500 conjugate gradient steps with CHARMM non-bonded potential with no constraints on the Cα atoms followed. The slow 738

cooling procedure (Brunger et al., 1990) was employed with an initial temperature of 1000° K, a time step of 0.5 fs and temperature decrement of 25° K per 25 fs. After the system had reached 300° K, 500 steps of conjugate gradient minimization were performed, as before. We performed the energy minimization of the template structure of RB-PGM as a control and its structure did not change significantly (backbone r.m.s.d. of 0.023 Å). Confirmatory subcloning and re-sequencing of the PFUS gene were performed according to Subramanian et al., (1994). The region of the deletion II was amplified by PCR from Paramecium cDNA using sequences for insertion II (Figure 2) and conserved sequence from the C-terminus of the molecule. The resulting region was subcloned, purified and sequenced manually using Sequenase V2.0 chain termination sequencing kit. For construction of the model of PGM-RP, Cα atoms of the residues from the PGM-RP sequence were directly superimposed from the template structure of RB-PGM, since there are no insertions or deletions, except for deletion of the first 55 amino acids. Subsequently, this model was subjected to 500 conjugate gradient steps with the CHARMM nonbonded potential. The quality of the model structures was evaluated using Procheck (Morris et al., 1992) and all models were found to be well within the quality limits for crystal structures with the resolution of the template (2.4 Å). Results Of the 34 proteins (Figure 1) identified by a BLAST search using both the active site and the magnesium binding site sequences and also the total sequence of RB-PGM as bait, 16 were selected that display 20% or more identity with RB-PGM (Figure 2). From these, the proteins that display enzymatic activity significantly different from that of phosphoglucomutase were selected for homology modeling: PGM-RP, which plays a structural role in adherens junctions, PFUS, which is implicated in calcium regulated exocytosis and PGM-SR, which is involved in calcium dynamics. Proteins exhibiting dual activity (both phosphoglucomutase and phosphomannomutase), such as PGM from yeast (GeneBank ID: P33401), have not been modeled in detail but will be discussed. Alignment of these primary amino acid sequences of PGM-SR, PFUS and PGM-RP (formerly aciculin) (Moiseeva et al., 1996) in relation to RB-PGM is shown in Figure 3. The main differences noted between PFUS and RB-PGM are the presence in PFUS of an extended N-terminus (16 residues), four insertions (I1–I4), two deletions (D1 and D2) and the presence of a putative protein kinase C (PKC) site (residues 223–225) (Subramanian et al., 1994). A similar comparison between PGM-SR, PGM-RP and RB-PGM shows that the major differences between these sequences and RB-PGM are at the N-terminus, with PGM-RP having a 55 residue deletion relative to RB-PGM; PGM-SR, a five residue addition. All four proteins have very similar C-termini. Despite reports that the PGM homologs have considerably reduced or undetectable PGM enzymatic activity (Andersen et al., 1994; Belkin and Burridge, 1994), the amino acid sequence at the active site of PGM appears largely conserved. For homology modeling, we initially mapped the aligned sequence of PFUS onto the backbone of the 2.7 Å structure of RB-PGM (Dai et al., 1992). Examination of this model showed that deletion II mapped directly on to β-strand 3 in

Functional diversity of the phosphoglucomutase superfamily

found in any other protein structure (Dai et al., 1992). The mapping of the deletion could not be reconciled with both the sequence and structural information. Accordingly, we subcloned the region by PCR from Paramecium, using insertion in PFUS I-2 as a starting primer and the conserved last 10 amino acids of domain IV as a termination primer. Using manual sequencing, we re-confirmed the sequence of PFUS through the D-2 region (data not shown). The RB-PGM structure has recently been extended to 2.4 Å and a modified chain tracing reported (Liu et al., 1997; Ray et al., 1997) (Figure 2b). The connectivity of domain IV was redefined to a completely anti-parallel configuration of ‘6, 1, 5, 4, 3, 29. In the current structure, the deletion D-2 is well accommodated by a long loop, connecting helix IVα1 (residues 434–449) to a β-strand IVβ2 (residues 469–472) (Figure 4D). The revised 2.4 Å resolution structure for RBPGM allowed for the completion of a satisfactory homology model of PFUS. The VERIFY3D (Luthy et al., 1992) environment plot (Figure 5) suggests that mistracing of the structure has been corrected.

Fig. 1. The phylogenetic tree was generated using the program METREE (Rzhetsky and Ney, 1994) using a conservation matrix that takes into account the deletions and insertions in the protein sequence. The proteins, as identified by a BLAST search are, in alphabetical order: AC-PGM from Acetobacter, AG-PGM Agrobacter phosphoglucomutase (U38977), AR-PGM Arabidopsis phosphoglucomutase (AC002311), BR-PGM Borrelia PGM (U45763), CE-PGM C.elegans PGM (AC34553), CH-PGM Chlamydia PGM (U74533), DT-PGM Dictyostelium phosphoglucomutase (U61984), EC-PGM1 and EC-PGM2 sequences from E.coli (D90707 and U08369), EC-PGMPMM E.coli PGM/PMM (D38457), HU-PGM human PGM1 (M83088), IP-PGM ice plant phosphoglucomutase (U84888), MA-PGM1 and MA-PGM-2 corn phosphoglucomutases (U89341 and U89342), MTB-PGM1 Mycobacterium tuberculosis (AL123456) and MTB-PGM2 Mycobacterium tuberculosis (AL178640), NG-PGM Neisseria gonorrheae PGM (GU077172), NM-PGM Neisseria meningitis PGM (GU98231), PC-PGM Pyrococcus PGM (K09656), PFUS (P3457) Paramecium parafusin, PGM-RP PGM-related protein from human uterus (L40933), RB-PGM rabbit muscle phosphoglucomutase, the template structure (M97664), RT-PGM rat phosphoglucomutase (L11694), SP-PGM spinach PGM (U345723), SR-PGM rabbit muscle sarcoplasmic reticulum PGM(M97663), SY-PGM PGM from Synechocystis TE-PGM Tetrahymena PGM (AF020726), TP-PGM Aquifex (thermophilic bacteria) PGM (T34563), TR-PGM Treponema palidum PGM (AE001219), XT-PGM Xanthomonas PGM (AL87545), YC-PGM1 and YC-PGM2 Saccharomyces cerevisae phosphoglucomutases (U09498 and U09499), YP-PGM Saccharomyces pombe phosphoglucomutase (U48601), YP-PGMPMM yeast PGM/PMM (P457434).

the β-sheet of domain IV of the RB-PGM molecule (Figure 4a). Since the homology in the neighboring regions is high and the size of the deletion is rather large (10 amino acids), the deletion could not be accommodated in the short loops connecting the β-strands in domain IV. This resulted in a deletion of a β-strand from the middle of a five-strand β-sheet, which was unlikely to result in a folded polypeptide chain. Furthermore, the connectivity of domain IV in the 2.7 Å structure of RB-PGM was unusual, with a mixed anti-parallel configuration of ‘2, 1, 3, 4, 5, 6’ (Figure 4c) that has not been

Models of PGM superfamily proteins Figure 6a and b show the 3-D models of PFUS and PGM-RP, respectively, superimposed on the structure of RB-PGM. As is expected from the homology modeling method and the high overall sequence similarity, the overall structures of the proteins are predicted to be conserved. The four domains described in RB-PGM, domain 1 (blue), domain 2 (pink), domain 3 (green) and domain 4 (yellow), are present in all cases. The stereochemical parameters of these models show that they are in reasonable energetic conformations while still being very similar in overall structure to RB-PGM (Table I). The backbone r.m.s. deviations (r.m.s.d.s) of the models are shown in Table II and indicate that they are within 2.3 Å of each other (Figure 7a and b). The backbone structure of PGMSR is identical with that of RB-PGM (r.m.s.d. ,0.1 Å; data not shown). For PFUS, the backbone r.m.s.d. grows significantly if generated loops are included in the superimposition over the backbone of the template (Figure 7a). The N-terminal residues of PFUS and PGM-SR that extend beyond the N-terminus of RB-PGM are not modeled in this procedure, but are discussed below. Structure of PFUS As expected, the loop regions are the regions of the greatest structural differences from the template structure. All insertions and deletions found in PFUS are located on surface-exposed loops of the molecule (Figure 6, highlighted in red). All insertions and one deletion (D-1) are located in domains I–III. The major PFUS deletion (D-2) is located in a surface loop of domain IV. The residues (residues 223–225) involved in the putative PKC phosphorylation site are localized to a surface loop in domain II. The positions of the residues composing the active site are conserved in the PFUS model (Table III) and the majority of residues interacting with or forming hydrogen bonds with catalytically important residues are also conserved. Structure of the PGM-RP Although the general domain structure of RB-PGM is preserved in the model of PGM-RP, a major structural difference is found in domain 1 due to the missing 55 amino acid residues. This deletion removes segments that correspond to three α-helices and one β-strand from an anti-parallel β-sheet 739

S.Levin, S.C.Almo and B.H.Satir

Fig. 2. Alignment of selected members of the PGM superfamily based on homology to RB-PGM (rabbit muscle phosphoglucomutase) of 20% or higher. The regions important for the phosphoglucomutase activity are shown in bold. The proteins are: TE-PGM Tetrahymena PGM (AF020726), PFUS (–) Paramecium parafusin, MA-PGM1 and MA-PGM-2 corn phosphoglucomutases (U89341 and U89342), AR-PGM Arabidopsis phosphoglucomutase (AC002311), IP-PGM ice plant phosphoglucomutase (U84888), DT-PGM Dictyostelium phosphoglucomutase (U61984), RT-PGM rat phosphoglucomutase(L11694), RB-PGM rabbit muscle phosphoglucomutase, the template structure (M97664), HU-PGM human PGM1 (M83088), SR-PGM rabbit muscle sarcoplasmic reticulum PGM (M97663), PGM-RP PGM-replate protein from human uterus (L40933), AG-PGM Agrobacterium phosphoglucomutase (U38977), YC-PGM1 and YC-PGM2 Saccharomyces cerevisae phosphoglucomutases (U09498 and U09499), YP-PGM Saccharomyces pombe phosphoglucomutase (U48601). There are a number of deletions and insertions in the proteins as compared with the template structure: CPI-1, CPI-3 and CPI-4 are insertions present only in ciliates and plant phosphoglucomutase superfamily members. CI1 is an insertion that is only present in ciliates (Tetrahymena and Paramecium), while PD1 is present exclusively in Paramecium. YD-1 is a deletion specific to yeast proteins and M(–), D-2 is present in ciliate, yeast and Agorobacter superfamily members.

740

Functional diversity of the phosphoglucomutase superfamily

Fig. 3. Alignment of amino acid sequences of modeled members of the PGM superfamily: PFUS Paramecium parafusin, PGM-SR rabbit sarcoplasmic reticulum phosphoglucomutase, PGM rabbit muscle phosphoglucomutase, PGM-RP human uterus phosphoglucomutase-related protein (formerly aciculin). Regions important for PGM activity are highlighted on the alignment. In the PFUS sequence are shown two deletions (D1 and D2) and four insertions (I1–I4) as compared with template rabbit muscle PGM and putative PKC site. The missing 55 N-terminal amino acids residues of PGM-RP are marked in the PGM-RP sequence.

Fig. 4. (a) Ribbon diagram of the domain IV of the PGM molecule traced at 2.7 Å resolution. The β-strand that would be deleted in PFUS is marked in dark gray. (b) Ribbon diagram of the domain IV of the PGM molecule re-traced from 2.4 Å resolution data. (c) and (d) connectivity diagrams of the intermediate (2.7 Å) and high (2.4 Å) resolution structures of domain IV of rabbit muscle PGM.

741

S.Levin, S.C.Almo and B.H.Satir

Table I. The r.m.s.d.s (Å) of the Cα traces of the models as superimposed on the template structure of rabbit muscle phosphoglucomutase (PGM) and each other after the energy minimization of the models

Fig. 5. VERIFY3D plot of the 2.4 Å PGM structure (solid line) and 2.7 Å PGM structure (dotted line). The vertical line represents the average 3D–1D profile score (Luthy et al., 1992) in a 10 residue sliding window, the center of which is at the sequence position indicated by the horizontal axis. It demonstrates that the scores for the low-resolution PGM structure drop significantly at the regions that were traced incorrectly. Scores for the first three and the final three sequence positions have no meaning.

PGM All Core PFUS All Core PGM-RP All Core

PGM

PFUS

PGM-RP

0 0

2.26 0.89

0.74 0.74

2.26 0.89

0 0

2.10 0.75

0.74 0.74

2.10 0.75

0 0

If the inserted loops are included in the superimposition (‘All’) as opposed to core regions only (‘Core’), the r.m.s.d.s grow significantly. The average deviation for complete models was 1.7 Å.

in this domain (Figure 8b) as compared with the template structure (Figure 8a). The missing elements disrupt the hydrophobic core and domain I contributes 978.4 Å2 of buried surface (764.3 Å2 hydrophobic surface including contributions from the backbone) exposed to the solvent as calculated by the difference between the combined exposed surface of the protein model and deleted elements minus the exposed surface of the PGM template.

Fig. 6. (a) Ribbon diagram of the PFUS model. The domains I–IV are colored blue, pink, green and yellow, respectively. Insertions and the sites of deletions are colored red and the functionally important Ser138 and protein kinase C (PKC) site are marked. (b) Ribbon diagram of the PGM-RP model. Note the missing parts of the hydrophobic core in domain I.

742

Discussion The PGM molecule is a single polypeptide chain (63 kDa) composed of four α/β domains. The molecule contains over 40 secondary structural elements and the domains form a compact heart-shaped structure with a deep crevice between the two lobes. The active enzymatic site of the molecule is located at the bottom of the crevice on the surface of domain I. Domains I, II and III all have an α3–β3–α1 configuration, where the anti-parallel configuration of β4 is ‘2, 1, 3, 4’. This configuration has not been described previously in any other crystal structures and is unique to phosphoglucomutase (Liu et al., 1997). All of the 34 proteins displaying a PGM/PMM phosphoserine signature as identified by a BLAST search are considered in this paper (Figures 1 and 7a and b). The sequences from 16 of these proteins were mapped on to the backbone of RBPGM to ensure that all the deletions and insertions were found in variable loop regions and no major structural alterations have occurred (Figure 2). Only the three members (PFUS, PGM-RP and PGM-SR), that have been biochemically well characterized and display significant decrease of phosphoglucomutase activity, were selected for homology modeling. Their three-dimensional models have been constructed based on the known crystal structure of RB-PGM. All of these homologs have 50% or more identity with RB-PGM at the amino acid level. The high degree of identity at the primary structure level and the conservation of the active site loop region and Mg21 binding site suggest a common overall threedimensional structure, making the likelihood of successful homology modeling probable. From the phylogenetic tree (Figure 1) of the PGM superfamily it appears that, early on, three major branches diverged from the common ancestor. The first branch (1) contains 12 members from both eukaryotic and prokaryotic organisms. Branch 3 contains only five members and consists of mainly bacterial proteins. The exception to this is a yeast protein that

Functional diversity of the phosphoglucomutase superfamily

Table II. Quality parameters as generated by Procheck (Morris et al., 1992) compared with the template structure

Resolution (Å) Number of residues Ramachandran plot (core regions) (%) Ramachandran plot (disallowed regions) (%) Overall G-factor Quality of packing

PGM

PFUS

PGM-RP

PGM-SR

2.4 561 84.6 0.2 0.22 4.52

N/A 587 89.2 0 20.3 4.65

N/A 501 87.1 0 20.13 2a

N/A 567 84.9 0 0.21 4.56

After the energy minimization procedure, the structures are of similar quality as the template. aQuality of packing was not measured for PGM-RP because of missing parts of the hydrophobic core.

Table III. Conservation of the catalytically important resides and second shell residues from RB-PGM in other members of the PGM superfamily Suggested catalytic function

PGM

PGM-RP

PFUS

Domain I phosphoserine

Ser116

CR

CR

Asp287 Gly288 Asp289 Gly290 Asp291

CR Ala CR CR CR

CR CR Ala CR CR

Glu376 Ser377

CR CR

CR CR

Domain II Mg21 binding site

Fig. 7. (a) and (b) superimpositions of the Ca traces of the models on to the template structure of 2.4 Å rabbit muscle PGM (thin line) in both panels. Cα traces of the PFUS model (thick gray line) (a) and Cα trace of PGMRP (b).

exhibits both PGM and phosphomannomutase activity. The rest of the sequences in this branch have been obtained from the genome sequencing and have been assigned their function based on the sequence similarity to the PGM superfamily. Branch 2 contains the 17 most evolutionarily conserved proteins, mostly from eukaryotic organisms, with one exception being Agrobacter tumifaciens. Within this group, yeast (both Saccharomyces pombe and Saccharomyces cerivisae) diverged early from the main group; Agrobacter appears to separate from the remaining members later than yeast. The remaining members eventually separate into three groups, one containing sequences from ciliate (PFUS from Paramecium tetraurelia and PGM from Tetrahymena thermophila), mammalian PGMlike sequences and a third group, containing PGM sequences from plant, Dyctyostelium and C.elegans. All of the proteins modeled in this paper are contained in this branch. It is interesting to note that human PGM-RP is more distant from human PGM than PGMs from other mammalian organisms. Given the large amount of information available for protein superfamilies, it is often difficult to decide which proteins are most representative for applications such as structural genomics, homology modeling and mutagenesis. Phylogenetic trees are not adequate for this task and we have developed a method for the hierarchical representation of protein superfamilies. The algorithm uses a known pair-wise similarity matrix to derive a superfamily representation map and tree (Levin and Satir, 1998). The representation tree (Figure 9b) shows how proteins are conserved from different species, showing two major sub-families of sequences (S1 and S2). A number of tools exist that can be used to validate and check the tracing of the protein structures derived from crystallographic data, such as 1D–3D (Luthy et al., 1992). In addition to these methods, this work represents another validation tool applicable to low and medium resolution

Domain III specificity loop Domain IV phosphate binding site H-bond interactions with PSer116 Oε1 and Oε3 (non-bridging oxygens)

H-bond interactions with PSer116 Oε2 and active site Mg21

Arg426 CR Glu430 CR (backbone amide)

CR CR

Asp262 His117 Asn118 Pro261 Arg22 Arg292 Thr114

CR CR Cys CR Deleted CR CR

CR CR CR CR CR CR CR

Asn389 Asn293 Pro261 Lys388 Ser94 Thr95 Thr380

CR Tyr CR CR CR CR CR

CR CR CR CR CR CR CR

CR denotes conserved residue. Changes in any of the residues involved in catalysis are not observed in any of the members. However, replacement of Asn118 and Asn293 that are involved in the formation of H-bonds to the phosphoserine by Cys and Tyr, respectively, in PGM-PR possibly might explain absence of PGM function in this protein, especially considering that Asn118 is conserved in all other members of the PGM superfamily. None of the changes in PFUS obviously explain the change in enzymatic function.

structures, for example derived from electron diffraction. Checking of the backbone threading using alignment and homology modeling with moderately homologous proteins (from 60 to 30% identity) as exemplified in this paper can be performed as soon as the secondary structure elements can be assigned. There are four major insertions and two deletions in the PFUS sequence, all of which are allocated in surface-exposed loops connecting secondary structure elements. The remaining changes, distributed evenly throughout the sequence, are point mutations and do not appear to cause alterations of secondary structure elements or the overall fold of the protein. Some of 743

S.Levin, S.C.Almo and B.H.Satir

Fig. 8. (a) Ribbon diagram of the domain I of the PGM molecule traced at 2.4 Å resolution. The secondary structure elements that would be deleted in PGM-RP are marked in dark gray. (b) Ribbon diagram of domain I of the PGM-RP molecule as modeled from the template structure. Note the non-packed secondary structure elements. (c) Connectivity diagrams of domain I of rabbit muscle PGM with highlighted secondary structure elements that would be deleted in PGM-RP. (d) Space-filling model of PGM-RP.

the same deletions and insertions are found in PGM molecules of other species, such as PGM1 and PGM2 from yeast and PGM from Agrobacter (Figure 2). It is unlikely for these changes to have any effect on the enzymatic function of the proteins, a specific covalent modification site present in a surface-exposed region could add new function to the protein. The putative PKC site found only in PFUS is located on a surface loop of domain II. This may be of importance for PFUS function since a calcium-dependent phosphorylation of PFUS has been observed in in vitro studies (Subramanian and Satir, 1992). The region of the PKC site in PFUS is highly variable in other members of the superfamily. However, the PKC site of PFUS is unique, since PFUS is currently the only member of the superfamily that is known to undergo serine phosphorylation/ dephosphorylation cycling. From this, it can be speculated that highly variable regions in the proteins (e.g. loop regions) can evolve separate functionality. There are similar examples of pairs of homolog proteins, such as two histidine phosphatases, 744

where one protein contains functional Asn glycosylation sites, while another does not (Mullaney and Ullah, 1998). This possibly implies that highly variable surface loops are important in divergent evolution within protein families (Chiou et al., 1995, 1998). Membrane association has been described for two members of the PGM superfamily, PFUS (Zhao and Satir, 1998) and PGM-SR (Lee et al., 1992). Because of a high content of hydrophobic amino acids, in particular Leu, the extended N-termini [16 amino acids in PFUS and eight amino acids (from 1 to 8) in PGM-SR] are likely to form amphipathic α-helices similar to the synthetic fusion peptides and viral fusion proteins that have been shown to partially incorporate into lipid bilayer (Hui and May, 1995). Although the general domain structure and the catalytically important active site residues of PGM are preserved in the model of PGM-RP (Table III), a major structural difference is found in domain 1 due to the missing 55 amino acid residues. This substantial deletion results in the loss of three α-helices

Functional diversity of the phosphoglucomutase superfamily

Fig. 9. NyTree (S.Levin and B.H.Satir, in preparation) representative hierarchy tree of the superfamily. The representation tree for the PGM superfamily was generated from the same similarity matrix as the one used for generation of the alignment of the superfamily. The proteins were unified into three representative subgroups. Group 1 included most of the eukaryotic PGM superfamily members such as RB-PGM, RT-PGM, SR-PGM, PGM-RP, HU-PGM1, Paramecium PFUS, YC-PGM1, YC-PGM2, YP-PGM, TE-PGM, DT-PGM, MA-PGM1, MA-PGM2, IP-PGM and AG-PGM and yeast PGM/PMM (P457434). Group 2 included the PGM sequence from Mycobacterium tuberculosis (AL123456), PGM from Synechocystis and Acetobacter and two PGM sequences from E.coli (D90707 and U08369). Group3 included other sequence from Mycobacterium tuberculosis (AL178640) and spinach PGM (U345723), Treponema palidum PGM (AE001219), Pyrococcus PGM (K09656), Chlamydia PGM (U74533), Borrelia PGM (U45763), C.elegans PGM (AC34553), Aquifex PGM (T34563), Xanthomonas PGM (AL87545), Neisseria meningititis PGM (GU98231), Neisseria gonorrhea PGM (GU077172) and E.coli PGM/PMM (D38457).

and one β-strand from a small anti-parallel β-sheet in this domain (Figure 8b) as compared with PGM (Figure 8a). The major changes in domain I are especially well demonstrated by the space filling model (Figure 8d). Since the cDNA for PGM-RP has been repetitively cloned and sequenced from both genomic DNA and cDNA (Moiseeva and Critchley, 1997), the possibility of sequencing/cloning artifact is small. The four missing secondary structural elements in domain I (Figure 8c) result in 978.4 Å2 of buried surface (764.3 Å2 hydrophobic surface including contributions from the backbone) to be exposed to the solvent. This change will probably indicate major structural rearrangements in domain I of PGM-RP as compared with the template structure. PGM-RP is the only protein that has such a major alteration in domain 1. It is notable, however, that the N-terminus is the most variable region in all of the sequences in the superfamily. It is plausible that the N-terminal variability could provide for novel function(s) such as sub-cellular localization signals. PGM-RP, whose function currently is not clearly defined, can be immuno-precipitated as a complex containing dystrophin/ utrophin and actin and therefore has been suggested to be involved in the organization of membrane domains, specifically focal contacts and adherence plaques. One of the changes in this protein is a substitution in a single residue (PGM Asn118

to PGM-RP Cys62), which is thought to be responsible for the absence of PGM activity (Moiseeva et al., 1996). This residue is highly conserved throughout the superfamily (Figure 1), suggesting its functional importance. Since the protein is known to associate with dystrophin in adherens junctions, it is possible that the large exposed hydrophobic patch in domain I serves as a binding surface for the additional protein–protein interactions. Both the fold and the amino acid sequence appear to be strongly conserved in the PGM superfamily. According to our current knowledge there are only a few regions in the functionally distinct members of the family where novel function(s) can be accommodated. There is variability in the definitions of the protein superfamilies. For example, proteins of the enolase superfamily are defined by both sequence homology and similar enzymatic activity, whereas the human crystallin (homologous to alcohol dihydrogenase) superfamily is defined exclusively on the basis of sequence homology. In the enolase superfamily proteins homologous at the sequence level present a number of related but distinct folds, which correlate with the functional differences between the proteins (Babbitt et al., 1996). Even though the proteins have different functions, their enzymatic activities share a high activation energy step of proton abstraction (Gulick et al., 1998). 745

S.Levin, S.C.Almo and B.H.Satir

Because of that, newly discovered proteins such as D-glucarate dehydratase-related protein from E.coli already have a suggested function involved in a proton abstraction step (Hubbard et al., 1998). In contrast, the crystallins represent a family in which the folds and the sequence are conserved to a very large extent, but the proteins perform totally unrelated functions. One example of this is alcohol dehydrogenase and human cystallin that share 96% identity at the amino acid level; however, crystallin has no detectable dehydrogenase activity (Chiou et al., 1998) while having all active site residues conserved. Therefore, we conclude that sequence homology is not an indicator of identical protein function. High sequence similarity in functionally important regions has always served as an indicator of similarity in protein function. However, recent results have shown that some proteins with very high sequence homology can have drastically different functions. It is important to remember that neither PFUS, PGM-RP nor PGM-SR have been shown to possess phosphoglucomutase activity (Lee et al., 1992; Andersen et al., 1994; Moiseeva et al., 1996), while retaining high sequence similarity to original phosphoglucomutases. It is possible that the proteins lose enzymatic activity and assume novel functions as a result of post-translational modification, such as the glycosylation that is evidently significant for PFUS function. It is not clear from the present modeling what structural characteristics alter the function of these molecules. However, it appears that whatever evolutionary changes are responsible for the different functions of the members of the PGM superfamily, they do not involve alterations of the overall protein folds. It even appears that the structure of the regions important for PGM activity is conserved among the members of the superfamily. At present, a mechanistic explanation for the different catalytic activities is not available. References Andersen,A.P., Wyroba,E., Reichman,M., Zhao,H. and Satir,B.H. (1994) The activity of parafusin is distinct from that of phosphoglucomutase in the unicellular eukaryote Paramecium. Biochem. Biophys. Res. Commun., 200, 1353–1358. Babbitt,P.C. et al. (1996) The enolase superfamily: a general strategy for enzyme-catalyzed abstraction of the alpha-protons of carboxylic acids. Biochemistry, 35, 16489–16501. Belkin,A. and Burridge,K. (1994) Expression and localization of the phosphoglucomutase-related cytoskeletal protein, aciculin, in skeletal muscle. J. Cell. Sci., 107, 1993–2003. Belkin,A. and Burridge,K. (1995a) Association of aciculin with dystrophin and utrophin. J. Biol. Chem., 270, 6328–6337. Belkin,A. and Burridge,K. (1995b) Localization of utrophin and aciculin at sites of cell–matrix and cell–cell adhesion in cultured cells. Exp. Cell Res., 221, 132–140. Brunger,A.T., Krukowski,A. and Erickson,J.W. (1990) Slow-cooling protocols for crystallographic refinement by simulated annealing. Acta Crystallogr. A, A46, 585–593. Chiou,S.H., Yu,C.W., Lin,C.W., Pan,F.M., Lu,S.F., Lee,H. and Chang,G.G. (1995) Octopus S-crystallins with endogenous glutathione S-transferase (GST) activity: sequence comparison and evolutionary relationships with authentic GST enzymes. Biochem. J, 309, 793–800. Chiou,S.H., Pan,F.M., Peng,H.W., Chao,Y. and Chang,W.C. (1998) Characterization of gammaS-crystallin isoforms from a catfish: evolutionary comparison of various gamma-, gammaS- and beta-crystallins. Biochem. Biophys. Res. Commun., 252, 412–419. Dai,J.B., Liu,Y., Ray,W.J.,Jr and Konno,M. (1992) The crystal structure of muscle phosphoglucomutase refined at 2.7-ångstrom resolution. J. Biol. Chem., 267, 6322–6337. Gulick,A.M., Palmer,D.R., Babbitt,P.C., Gerlt,J. and Rayment,I. (1998). Evolution of enzymatic activities in the enolase superfamily: crystal structure of (D)-glucarate dehydratase from Pseudomonas putida. Biochemistry, 37, 14358–14368.

746

Hubbard,B.K., Koch,M., Palmer,D.R., Babbitt,P. and Gerlt,J.A. (1998). Evolution of enzymatic activities in the enolase superfamily: characterization of the (D)-glucarate/galactarate catabolic pathway in Escherichia coli. Biochemistry, 37 (41):14369–14375. Hui,C. and May,R.J. (1995) Synthetic peptides catalyze liposome fusion reaction. Biophys. J., 73, Suppl. 2176. Lee,Y.S., Marks,A.R., Gureckas,N., Lacro,R., Nadal-Ginard,B. and Kim,D.H. (1992) Purification, characterization and molecular cloning of a 60-kDa phosphoprotein in rabbit skeletal sarcoplasmic reticulum which is an isoform of phosphoglucomutase. J. Biol. Chem., 267, 21080–21088. Levin,S. and Satir,B.H. (1998) POLINA: evaluation of single amino acid substitutions in protein superfamilies. Bioinformatics, 14, 310–312. Lin,Z., Konno,M., Abad-Zapatero,C., Wierenga,R., Murthy,M.R., Ray,W.J.,Jr and Rossmann,M.G. (1986) The structure of rabbit muscle phosphoglucomutase at intermediate resolution. J. Biol. Chem., 261, 264– 274. Liu,Y., Ray,W.J.,Jr and Baranidharan,S. (1997) Structure of rabbit muscle phosphoglucomutase refined at 2.4 Å resolution. Acta Crystallogr., D53, 392–401. Luthy,R., Bowie,J.U. and Eisenberg,D. (1992) Assessment of protein models with three-dimensional profiles. Nature, 356, 83–85. Moiseeva,E. and Critchley,D.R. (1997) Characterisation of the promoter which regulates expression of a phosphoglucomutase-related protein, a component of the dystrophin/utrophin cytoskeleton predominantly expressed in smooth muscle. Eur. J. Biochem., 248, 634–643. Moiseeva,E., Belkin,A.M., Spurr,N.K., Koteliansky,V.E. and Critchley,D.R. (1996) A novel dystrophin/utrophin-associated protein is an enzymatically inactive member of the phosphoglucomutase superfamily. Eur. J. Biochem., 235, 103–113. Morris,A., MacArthur,M.W., Hutchinson,E.G. and Thornton,J.M. (1992) Stereochemical quality of protein structure coordinates. Proteins, 12, 345– 364. Mullaney,E.J. and Ullah,A.H. (1998) Identification of a histidine acid phosphatase (phyA)-like gene in Arabidopsis thaliana. Biochem. Biophys. Res. Commun., 251, 252–255. Murtaugh,T.J., Gilligan,D.M. and Satir,B.H. (1987) Purification of and production of an antibody against a 63 000 Mr stimulus-sensitive phosphoprotein in Paramecium. J. Biol. Chem., 262, 15734–15739. Narayanan,N. and Xu,A. (1997) Phosphorylation and regulation of Ca21pumling ATPase in cardiac sarcoplasmic reticulum by calcium/calmodulin kinase. Basic Res. Cardiol., 1, 25–35. Needleman,S. and Wunsch,C.D. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol., 48, 443–453. Ray,W.J.,Jr, Hermodson,M.A., Puvathingal,J.M. and Mahoney,W.C. (1983) The complete amino acid sequence of rabbit muscle phosphoglucomutase. J. Biol. Chem., 258, 9166–9177. Ray,W.J.,Jr, Baranidharan,S. and Liu,Y. (1997) Enhanced diffractivity of phosphoglucomutase crystals. Use of an alternative cryo-crystallographic procedure. Acta Crystallogr., D53, 385–392. Rzhetsky,A. and Ne,M. (1994) METREE: a program package for inferring and testing minimum-evolution trees. Comput. Appl. Biosci., 10, 409–412. Satir,B.H., Srisomsap,C., Reichman,M. and Marchase,R.B. (1990) Parafusin, an exocytic-sensitive phosphoprotein, is the primary acceptor for the glucosylphosphotransferase in Paramecium tetraurelia and rat liver. J. Cell. Biol., 111, 901–907. Subramanian,S.V and Satir,B.H. (1992). Carbohydrate cycling in signal transduction: parafusin, a phosphoglycoprotein and possible Ca21-dependent transducer molecule in exocytosis in Paramecium. Proc. Natl Acad. Sci. USA, 89, 11297–11301. Subramanian,S.V., Wyroba,E., Andersen,A.P. and Satir,B.H. (1994) Cloning and sequencing of parafusin, a calcium-dependent exocytosis-related phosphoglycoprotein. Proc. Natl Acad. Sci. USA, 91, 9832–9836. Wierenga,R., Lewis,D.G., Rossmann,M.G. and Ray,W.J.,Jr (1981) The structure determination of rabbit phosphoglucomutase. Philos. Trans. R. Soc. London, Ser. B: Biol. Sci., 293, 205–208. Zhao,H. and Satir,B.H. (1998) Parafusin is a membrane and vesicle associated protein that cycles at exocytosis. Eur. J. Cell. Biol., 75, 46–53. Received September 6, 1998; revised March 10, 1999; accepted March 22, 1999