'30K' superfamily of viral movement proteins - CiteSeerX

6 downloads 0 Views 2MB Size Report
each family revealed a common predicted core structure flanked by variable N- and C-terminal ... plant after establishing infection in a single cell (Carrington et.
Journal of General Virology (2000), 81, 257–266. Printed in Great Britain ...................................................................................................................................................................................................................................................................................

The ‘ 30K ’ superfamily of viral movement proteins Ulrich Melcher Department of Biochemistry and Molecular Biology, Oklahoma State University, Stillwater, OK 74078, USA

Relationships among the amino acid sequences of viral movement proteins related to the 30 kDa (‘ 30K ’) movement protein of tobacco mosaic virus – the 30K superfamily – were explored. Sequences were grouped into 18 families. A comparison of secondary structure predictions for each family revealed a common predicted core structure flanked by variable N- and C-terminal domains. The core consisted of a series of β-elements flanked by an α-helix on each end. Consensus sequences for each of the families were generated and aligned with one another. From this alignment an overall secondary structure prediction was generated and a consensus sequence that can recognize each family in database searches was obtained. The analysis led to criteria that were used to evaluate other virus-encoded proteins for possible membership of the 30K superfamily. A rhabdoviral and a tenuiviral protein were identified as 30K superfamily members, as were plantencoded phloem proteins. Parsimony analysis grouped tubule-forming movement proteins separate from others. Establishment of the alignment of residues of diverse families facilitates comparison of mutagenesis experiments done on different movement proteins and should serve as a guide for further such experiments.

Introduction Plasmodesmal transport of macromolecules (Lucas et al., 1993) occurs for selected molecules, such as phloem proteins, synthesized in companion cells and deposited in sieve elements (Ding, 1998), and viral genomes, which spread throughout a plant after establishing infection in a single cell (Carrington et al., 1996 ; Nelson & van Bel, 1998). Most plant viral genomes contain a movement protein (MP) gene or genes required for spread of the infection (Leisner, 1999). Recently, a phloem protein cross-reacting with an antiserum against a viral MP was identified, shown to move from cell to cell and to carry RNA to the next cell (Xoconostle-Cazares et al., 1999). Recognized MPs fall into four superfamilies : the products of the triple gene block of potexviruses and related viruses ; the tymoviral MPs ; a series of small polypeptides, less than 10 kDa, encoded by carmo-like viruses and some geminiviruses ; and the ‘ 30K ’ superfamily, related to the 30 kDa tobacco mosaic virus (TMV) MP. A variety of activities has been demonstrated for 30K superfamily MPs (Leisner, 1999), including abilities to bind nucleic acids, to increase the sizeexclusion limit of plasmodesmata, to localize to and accumulate Author for correspondence : Ulrich Melcher. Fax j1 405 744 7799. e-mail u-melcher-4!alumni.uchicago.edu

0001-6504 # 2000 SGM

in plasmodesmata, to move to neighbouring cells on microinjection, to facilitate movement of RNA to neighbouring cells on microinjection, to form tubular structures and to interact with cytoskeletal elements. Which of these activities is relevant to movement of infection is unclear. At least two mechanisms have been proposed for the 30K superfamily MPs. In one, typified by the TMV MP, the MP expands the plasmodesmal connections between cells, allowing the viral genome, bound to protein, to pass into the neighbouring cell. In the second, typified by comoviruses, MP-containing tubules are established between cells and serve as conduits for particles. Despite their common function, 30K superfamily members have but few conserved motifs in their amino acid sequences (Melcher, 1990, 1993 ; Koonin et al., 1991 ; Mushegian & Koonin, 1993). An LXDX – G motif was the only conserved &! (! feature noted in a study of a limited number of sequences (Melcher, 1990). A hydrophobic sequence resides just Nterminal of the LXDX – G motif (Koonin et al., 1991). The &! (! poor sequence conservation suggests that the 30K superfamily members may have a common three-dimensional structure obscured by divergence in sequence. To discover secondary structure elements contributing to three-dimensional structures, 30K MP sequences were aligned and the alignments used to predict secondary structures. Similarities to phloem proteins and to heat shock proteins of the hsp70 class were evaluated. CFH

U. Melcher

Methods

Results

Groupings of similar MP amino acid sequences were designated ‘ families ’, in a non-taxonomic sense. For each virus MP family, available sequences were identified by BLAST search (Altschul et al., 1990) of a non-redundant compilation of protein sequences, using the MP of one family member as query. When sequences of two or three isolates of the same virus were available, one sequence was arbitrarily chosen as representative. When sequences of more than three isolates were available, a consensus was first generated, using Profilemake (GCG) (Gribskov et al., 1989), and the consensus was used in the alignment. Sequences within each family were preliminarily aligned using Pileup (GCG). The Pileup alignment was inspected visually for unlikely gaps, which were then removed or consolidated. The positioning of gaps was refined by gap translation (Melcher, 1993) using the SCM2 structural correlation matrix (Niefind & Schomburg, 1991). The multiple sequence alignments (or single sequences in cases of one-member families) were used as input to PHD protein secondary structure prediction (Rost et al., 1994). The PHD subset prediction for three-state secondary structure (expected average accuracy greater than 82 %) was used. A consensus was generated via Profilemake and modified at positions occupied by blanks in 50 % or more of the sequences by replacing consensus residues with blanks. A Pileup preliminary alignment of the resulting 18 family sequences was subjected to a phylogenetic analysis by parsimony using PROTPARS in the Phylip package (Felsenstein, 1989). For this analysis, the less reliably alignable N- and C-terminal one-thirds of the sequences were ignored. The pattern of relatedness revealed by the most parsimonious tree was used to refine the alignment using a progressive alignment strategy (Feng & Doolittle, 1987). This consisted of aligning first the most closely related sequences and then adding those pairs to other related pairs or sequences, progressively building pairwise to alignment of the two most distant groups. Alignment used the gap translation strategy and the SCM2 matrix. The cycle of protein parsimony followed by realignment was repeated until no further changes in the tree resulted (Berti & Storer, 1995). The reliability of relationships inferred by parsimony was tested by analysis of 100 bootstrap replicates using SEQBOOT, PROTPARS and CONSENSE (Phylip package). Branches not supported in over 80 % of the bootstrap trees were collapsed. PROTDIST (Phylip package) was used to calculate distances. Before construction of plots of similarity, residue positions occupied in three or fewer sequences were removed. To differentiate peaks in similarity due to alignment from those due to stretches of similar amino acids, similarity plots were also calculated with one set offset by one position relative to the other. A 10 position window size at each position and the SCM2 scoring matrix (Niefind & Schomburg, 1991) were used. SCM2 was also used in the determination of significance scores for alignment of pairs of multiple sequence alignments (or single sequences) by 30 randomizations of the order of residues as previously described (Melcher, 1993). Profilemake and BLAST generated a single 30K superfamily consensus sequence. Initially, each family sequence had equal weight in consensus construction. Weights for families identified with the lowest expect scores in BLAST 2.0 were reduced, while those for families not identified by BLAST were increased. Consensus sequence generation, BLAST search and weighting adjustment were repeated until the BLAST search identified at least one member of each of the input families using default parameters. Relationships among families were also explored using PSIBLAST (Altschul et al., 1997) seeded with family consensus sequences. All alignments with sources of the sequences used are available at http :\\opbs.okstate.edu\virevol\index.html.

Group alignment and secondary structure prediction

CFI

Previously, 20 groups of MPs were identified as 30K superfamily members (Koonin et al., 1991 ; Melcher, 1990, 1993 ; Mushegian & Koonin, 1993 ; Mushegian, 1994) : the MPs of bromo-, cucumo-, alfamo-, ilar-, diantho-, tombus-, tobamo-, tobra-, umbra-, begomo-, como-, nepo-, tospo-, sequi-, capillo-, tricho-, furo-, idaeo-, caulimo- and badnaviruses. For the present analysis, sequences of the alfamo- and ilarviral MPs were grouped as one family because of the close similarity of those sequences and the few alfamoviral sequences available. For similar reasons, tricho- and capilloviral MPs were grouped as one family. Of the furoviruses, only the soilborne wheat mosaic virus MP was used, because other sequenced furoviral RNAs encode MPs of the triple gene block superfamily. The nepoviral MPs were divided into two families (A and B) because of only distant similarities between members of the two (Mushegian, 1994). Members of the 30K superfamily contain a ‘ common core ’ domain consisting of two α-helices separated by a series of βelements. Within each of the 19 30K families, the sequences were aligned with one another. The family alignments (or single sequences for furo-, idaeo- and sequiviruses) were used for secondary structure prediction. Predictions (Fig. 1) for all but one of the families contained, consistent with earlier predictions (Melcher, 1990, 1993), a series of β-elements. The N-terminal region of the sequiviral polyprotein, thought to contain movement function (Mushegian & Koonin, 1993), was predicted to be predominantly helical, and may thus not be a superfamily member. The smallest of the 30K MPs, those of the tombusviruses, should have one α-helical element about 15 residues from the C terminus, preceded by a series of more than five β-elements. An equivalent series of β-elements in the 17 other predictions was, in most cases, flanked on the C-terminal side by an α-helical segment. In all but the bromoviral MPs an α-helical segment was also predicted to flank the β-structurerich region on the N-terminal side. The common core was flanked by additional sequences on N-terminal and C-terminal sides. The N-terminal flanking region was of variable length. The region was longer for those MPs known to form virion-carrying tubules and for the idaeoviral MP. These had numerous predicted α-helical segments. Loop structure was also commonly predicted for this region. The predicted structures on the C-terminal sides were predominantly loops except for the probable extensively helical tospo- and furoviral MP C termini. The C-terminal region also varied in length among the families, with the tombusviral MPs having only 15 residues in this region. Multiple sequence alignment

To examine the relationship of the similarity in predicted secondary structure to sequence similarity, a multiple sequence

30K movement proteins

Fig. 1. Predicted secondary structures for acknowledged 30K movement protein superfamily members. C, B,  and – represent α-helical structure, β-elements, loop structure and no prediction, respectively. Significance scores for the alignment of sequences of a family with the 17 consensus sequences of the other families are given to the right of the predictions. Secondary structures predicted for the overall alignment are shown on the bottom and are identified by letters for α-helices and by numbers for β-elements.

alignment of 30K families was performed. To simplify alignment, consensus sequences (Gribskov et al., 1989) were first generated for each family with multiple sequences. The sequiviral sequence was not included in the overall alignment since its predicted secondary structure excluded it from the 30K superfamily. An alignment of the resulting 18 sequence families was generated by an iterative progressive alignment– parsimony analysis strategy (Berti & Storer, 1995). The alignment and parsimony were repeated until realignment did not alter the most parsimonious tree obtained (22 cycles). The alignment of family consensus sequences was also used to produce an overall secondary structure prediction (Fig. 1, lowest line). The central region of the overall prediction consisted of seven β-elements and four α-helical segments. The outer α-helical segments, α-A and α-D, corresponded to those noted in the predictions for individual families. α-B corresponded only to a helical segment predicted in the nepoviral A, badnaviral, comoviral and tobraviral families. α-C was not predicted for any of the constituent families individually. The β-elements β-1, β-2, β-5 and β-6 corresponded to β-elements in most of the family predictions. Variable structures were found in the family predictions for the region between β-2 and β-5 and around β-7. The C-terminal region was predicted to be a predominantly random coil. The N-terminal extension specific to tubule-forming 30K superfamily members was predicted to be predominantly helical. Examination of the aligned sequences for conserved sequence motifs (Fig. 2) failed to reveal any absolutely conserved residues. Five regions of moderate sequence conservation were noted. The first, characterized by a PLX(P\D) motif, preceded α-A. The second spanned β-1 and β-2 and contained the LXD motif at the C terminus of β-2. The β-3–β4 region contained positions rich in aromatic and proline residues. The fourth region contained the almost completely conserved Gly residue in the N terminus of β-6. The last region

occurred unexpectedly near the C termini of 14 of the families and consisted of variations of a SIS tripeptide. 30K superfamily consensus and similarity plots

An overall consensus sequence generated from the alignment of the family consensuses (Fig. 3) was used in a BLAST search of protein databases and recognized at least one member of each family that contributed to the consensus. This recognition suggested that a satisfactory alignment of related sequences had been obtained. As a further test of the overall accuracy of the alignment, each family multiple sequence alignment was tested for significance relative to the superfamily alignment. For this assessment, the consensus for the family being evaluated was removed from the superfamily alignment. For most families, the alignment scores were between 7n0 and 13n0 standard deviations (SD) above the mean for scrambled comparisons (Fig. 1). Nepoviral and badnaviral MPs scored lower. A self-comparison of the 18 family sequences (positive control) yielded 41n3 SD, while the average of 30 offset self-comparisons (negative control) gave 1n0 SD. The degree of sequence similarity varied along the alignment (Fig. 4). The β-1 and β-2 elements were the most strongly similar. The loop between them (L12) was less conserved. The degree of similarity of L12 was close to that of the loop preceding α-A (LA), L34 and L56. The α-D segment and the region containing the SIS motif also yielded similarity peaks. Similarity in the α-D region was due, in part, to contiguous similar residues, as reflected in a peak in the control similarity calculation using a dataset pair offset by one residue. The proximity of the similarity profiles for aligned and offset pairs at the N and C termini suggests that the alignment in these regions was not reliable. Iterative searching of databases using PSI-BLAST (Altschul et al., 1997) allows exploration of relationships among 30K CFJ

U. Melcher

Fig. 2. Excerpts from the alignment of 30K MP superfamily family sequences. The upper block shows the section around the beginning of α-A on the left and the β-1 and β-2 section on the right. The lower block shows from left to right : β-4 to L45 ; β-6 and surrounding residues ; the SIS motif. Residues in inverse font are those in which four or more families had the same consensus residue at that position, while shaded residues are those where four or more residues were similar. !, *, O and – indicate consensus aliphatic, hydrophobic, aromatic and charged positions, respectively. The complete alignment is available at http ://opbs.okstate.edu/virevol/index.html.

superfamily members alternate to the consensus approach. Searches were seeded with the consensus sequences for each of the acknowledged 30K superfamily families and reiterated until no further MP sequences were retrieved. The search did not find previously unrecognized candidates for 30K superfamily membership. The consensuses for five families did not identify other groups even though reciprocal searches identified members of these five families. Neither of the two nepoviral sequences identified any MPs outside the family nor were nepoviral MPs identified in searches with other sequences. In contrast, the tobraviral consensus sequence identified MPs belonging to 12 families, including bromo- and furoviruses, and umbraviral MPs were identified by searches initiated with eight family consensus sequences including those of cucumo-, furo- and dianthoviruses. These results strongly suggest sequence similarity between MPs of the CGA

tripartite viruses and MPs such as those of umbra- and tobraviruses that are related to tobamoviral MPs. Candidates for 30K superfamily membership

MPs of rhabdo-, waika-, tenui- and closteroviruses have not been identified or have not been assigned to a MP superfamily. MPs of non-acknowledged 30K superfamily members were not identified in the database search with the overall consensus sequence or the PSI-BLAST search. Nevertheless, secondary structures predicted for candidate MPs were examined to identify those with β-elements flanked by α-helices. For Sonchus yellow net rhabdovirus, protein sc4 had a series of βelements flanked by α-helical segments at the separation expected for 30K superfamily members. The sc4 protein sequence was thus aligned with previously recognized 30K

30K movement proteins

Fig. 3. Alignment of candidates for membership in the 30K superfamily of MPs with an overall 30K consensus sequence. Residues similar or identical to those in the consensus are in inverse font. PHD secondary structure and accessibility predictions for the 30K multiple sequence alignment are shown in the SEC and ACC lines, respectively. Periods (.) indicate prediction uncertainty and dashes (–) indicate gaps inserted to facilitate alignment.

Fig. 4. Self-similarity plot for alignment of 30K MP families. Scores from a structural correlation matrix were summed over sliding 10 residue windows in a comparison of the multiple sequence alignment of the superfamily against itself (upper curve) and against the same alignment offset by one residue (lower curve).

superfamily members (Fig. 3). The alignment had a significance score of 8n0 SD. A similarity plot (Fig. 5 A) had peaks above control corresponding to LA, β-1, β-2, L34 and L56, features also found in the plot of the previously recognized 30K superfamily members (Fig. 4). A small peak was in the SIS

region. An additional region of similarity was between β-2 and L34. Thus, sc4 probably functions as a rhabdoviral MP, a conclusion supported by identification of the sc4 gene as the only gene not present in animal-infecting rhabdoviruses (Scholthof et al., 1994). The N-terminal region of waikaviral polyproteins, a region not having an equivalent in the related picornaviral polyproteins (Reddick et al., 1997), aligned with the set of 30K superfamily sequences (Fig. 3) with a score of only 4n9 SD. There were several inconsistencies with the 30K superfamily in predicted secondary structure. Yet, a similarity plot (Fig. 5 E) revealed some regions of sequence similarity, particularly in the β-1–β-2 region and the SIS tail. ORFs 1 and 3 of Southern bean mosaic sobemovirus have cell-to-cell movement function (Sivakumaran et al., 1998), but their predicted secondary structures or the positions of conserved residues were not shared by the 30K superfamily, making membership of the 30K superfamily unlikely. Tenuiviral genomes are multipartite with ambisense RNAs for which no MP genes are known (Toriyama et al., 1998). Of the four ORFs (2V, 3V, 4V and 4C) with no probable function assigned, 4C had a predicted secondary structure that correCGB

U. Melcher

Table 1. PAM 250 distances of closest interfamily neighbours of 30K superfamily movement proteins Distances were determined using PROTDIST with the PAM250 scoring matrix.

MP family FuroDianthoAlfamo-\IlarBromoCucumoTenui-* Waika-* Clostero-* Rhabdo-* TobraTobamoGeminiTombusUmbraNepo- (B) TospoIdaeoCaulimoBadnaNepo- (A) ComoCapillo-\TrichoPhloem* MeanpSD…

Closest distance 2n89 2n89 3n48 1n34 1n34 5n61 9n38 7n12 5n03 2n66 2n66 4n05 3n19 4n16 8n25 4n55 4n55 4n06 4n08 3n91 3n91 4n11 4n77 3n67p1n49

Neighbour family DianthoFuroDianthoCucumoBromoUmbraCapillo-\TrichoGeminiIdaeoTobamoTobraTombusTobraFuroGeminiUmbraDianthoNepo- (A) CaulimoComoNepo- (A) CaulimoBromo-

* Distances not included in calculation of the mean.

sponded well throughout its length with that of the 30K superfamily. The amino acid sequence was added to the alignment of previously recognized superfamily members (Fig. 3), with a significance score of 7n9. A similarity plot (Fig. 5 C) had a major peak, relative to control, corresponding to L12. Additional areas of above background similarity were near the N terminus to α-A, around β-5 and loops LB3 and L7D. Thus, the tenuiviral 4C protein probably belongs to the 30K MP superfamily. An MP function was recently identified in a closterovirusencoded protein (Agranovsky et al., 1998) previously known as a homologue of hsp70 proteins. These proteins include the Escherichia coli DnaK protein for which X-ray crystallographic information is available (Zhu et al., 1996). Secondary structure prediction for the closteroviral hsp70 homologues revealed that the C-terminal half of the proteins had an arrangement of β-elements and α-helical segments similar to those predicted for 30K superfamily members. However, the length of the similarly arranged section was longer for the closteroviral hsp70 homologues. To achieve alignment with the 30K superfamily members (Fig. 3), one pair of β-elements had to be CGC

Fig. 5. Similarity plots of candidate 30K MP superfamily members : (A) rhabdoviral sc4 ; (B) phloem proteins related to pCm-PP16 ; (C) tenuiviral 4C ; (D) closteroviral hsp70 homologue ; (E) waikaviral N terminus. Similarity scores of the candidate set with 18 aligned family consensus sequences were determined for a sliding window of 10 residues using the SCM2 scoring matrix. Predicted secondary structure elements corresponding to peaks are labelled as in Fig. 1.

deleted from the hsp70 homologue. The closteroviral hsp70 homologues aligned with a score of 4n8 SD. The DnaK–hsp70 homologue alignment (Agranovsky et al., 1998) was used to align DnaK to the 30K superfamily set, yielding a score of only 2n4 SD. The N terminus of the DnaK protein-binding domain aligned to the MP β-2 element. Prominent peaks in the similarity plots of the hsp70 homologues against the 30K superfamily (Fig. 5 D) were at L12, β-1 and L56. Similar peaks, but less pronounced, occurred in the similarity plot of DnaK compared to the 30K superfamily (data not shown). Antigenic cross-reaction suggests that some phloem proteins may be 30K superfamily members (XoconostleCazares et al., 1999). The secondary structure predicted from a multiple sequence alignment of the cross-reactive CmPP16 and related sequences (Xoconostle-Cazares et al., 1999) was predominantly β-structure with one α-helical segment near the C terminus. An alignment of the phloem sequences with the

30K movement proteins

Fig. 6. Relationships among putative 30K MP superfamily members determined by bootstrapped parsimony. Branches with less than 80 % support (100 bootstrap replicates) were collapsed. 0, RT, N, A, I, II and III represent the type of polymerase encoded by the viruses : none, RNA-dependent DNA polymerase, negative-strand virus, ambisense-strand virus and positivestrand virus, supergroups I, II, III RNA-dependent RNA polymerases, respectively. The thin-lined polygon encloses those MPs known to form virion-bearing tubules.

30K superfamily (Fig. 3) resulted in a score of 6n5 SD. The alignment began at the β-2 segment, the segment that marked the beginning of the DnaK peptide-binding domain (Zhu et al., 1996) alignment. The similarity plot (Fig. 5 B) had a prominent peak in the β-2 element and an additional peak at β-5. Knotted1 and homologues (P24345, JQ2379, Q43484, AAB81079.1, CAA96511.1, Q41330, P46639, BAA76904.1, Q41853, AAC32818.1, AAD00692), PP1 (U66277) and PP2 (JQ1731) phloem fibril proteins also traffic between cells. Their predicted structures easily distinguished them from 30K MPs. Distance measures also tested candidacy for 30K superfamily membership. Family consensuses were generated for the waika-, tenui- and closteroviral proteins and the phloem proteins. Pairwise distances were calculated between all pairs of the 23 family sequences using the evolutionarily derived PAM250 scoring matrix. For each of the 18 families of acknowledged superfamily members, the smallest distance to another family member was identified and the values averaged (Table 1). The mean smallest distance was 3n7p1n5. The distance between the nepoviral B family and its closest 30K superfamily member was more than 2 SDs above this value, thus casting doubt on its membership in the 30K superfamily. The phloem family was within 2 SDs of the mean smallest distance for six families (bromo-, cucumo-, tobamo-, gemini-, umbra- and tospoviruses) with bromoviral MPs being the closest at 4n8. The rhabdoviral sc4 was within 2 SDs of the smallest mean distance for four families (tombus-, tospo-, idaeo- and badnaviruses) with the idaeoviral MP being the

closest at 5n0. Only one acknowledged consensus MP, that of umbraviruses, was within the 2 SD range for tenuiviral 4C. The waika- and closteroviral families did not have a closely related family in the 30K superfamily. Parsimony-derived relationships among 30Ksuperfamily members

Fig. 6 shows the relationships deduced by parsimony among the acknowledged 30K superfamily members, the putative tenuiviral and rhabdoviral MPs and the phloem proteins. Relationships among the acknowledged members alone were analysed similarly (data not shown) and differed from those shown only in the clustering of tombus- and begomoviral MPs. Fig. 6 suggests division of the 30K superfamily into three clusters, two of which radiate from common points without much subdefinition of relationships. The third, containing capillo-\tricho-, nepo-, como-, caulimo-, tospo-, badna- and idaeoviral MPs, had significant substructure. This cluster contained all the MPs thought to induce movement by synthesis of trans-wall tubules. Since it has been proposed that viruses be classified based on the sequences of their RNA or DNA polymerases (Koonin, 1991), the diagram also indicates the type of polymerase encoded by the viruses. No correlation between diagram-position and polymerase type was noted, except that all supergroup I viruses were on the tubule branch. Distance analysis (Felsenstein, 1989) (Table 1 and data not shown) suggests relationships among the proteins CGD

U. Melcher

that differ from those derived by parsimony. For example, the closest MP to the begomoviral family is the idaeoviral protein, which itself is closest to the dianthoviral family. Yet, these families are scattered on the parsimony tree (Fig. 6).

Discussion Amino acid sequences that are 20–25 % identical are generally believed to lie in the twilight zone of relationships, where conclusions about their common ancestry are questionable. The relationships explored among 30K MPs, though generally of sequences with sequence identities in or below the twilight zone, nevertheless produced several important insights. Previous reports identifying the 30K superfamily (Koonin et al., 1991 ; Melcher, 1990, 1993 ; Mushegian, 1994), have not encountered universal acceptance (Giesman-Cookmeyer et al., 1995 ; Mise et al., 1993). The work reported here supports the concept of a 30K superfamily as a meaningful grouping of proteins related in sequence and structure. Significance scores for membership of sequences in the family were generally well above the range of scores considered questionable. The member families had a similar arrangement of predicted secondary structure elements in the common core. A consensus generated from their alignment identified members of all constituent families in database searches. Though conserved motifs were few, a few residue positions were occupied by the same amino acid in most MPs. These include the LXDX – G &! (! motif previously recognized (Melcher, 1993). A new relatively conserved motif resembling SIS was found near the C termini of most family members. The criteria for membership of candidate sequences in the 30K superfamily of movement proteins thus include : a common core of predicted secondary structure elements consisting of a series of β-elements between two α-helices ; additional N-terminal and C-terminal sequences (not obligatory, such as in the tombusviral MPs) ; a distance from at least one previously recognized superfamily member that is within the range of distances between close relatives within the superfamily ; and a significance score in alignment with acknowledged family members of 5n5 or greater. The presence of LXDX – G and SIS motifs give additional support for &! (! membership, as does recognition in BLAST searches using the derived consensus sequence. However, failure to be recognized by the consensus sequence is not sufficient for exclusion from the family, since tenuiviral and rhabdoviral MPs and the phloem proteins were not recognized by such a search. These criteria allowed the exclusion of one sequence previously thought to belong, that of the sequiviral N-terminal domain. They also suggested that some nepoviruses might have MPs not related to the 30K superfamily and that waikaviral and closteroviral MPs are not 30K superfamily members. It is intriguing that the currently known small set of phloem proteins related to CmPP16 had sequences and a predicted CGE

structure similar to those of the 30K superfamily proteins. Thus, the knowledge gained through the study of MPs can be used in assessing a possible role in trafficking by plant-encoded proteins. In addition, 30K superfamily membership for these phloem proteins means that tools developed for study of MP function can be applied to intercellular traffic of the plant proteins. The similarities suggest evolutionary homology, a possibility that may lead to hypotheses on the evolutionary origin of viral MPs and thus of plant viruses. Discrepancies were encountered between positions of 30K superfamily members on a tree derived by parsimony (Fig. 6) and distance (Table 1) analyses. The parsimony tree was derived from an alignment obtained by repeated optimization of relationships by parsimony. A different set of relationships could have been revealed had alignment been optimized using distances. Distances were not used because the distances of most family members from each other were large. The values suggested too many positions at which multiple substitutions occurred to accurately estimate evolutionary relatedness. When distances in bifurcating evolutionary trees are long, long branch attraction occurs (Lyons-Weiler & Hoelzer, 1997). Long branch attraction describes branch positions on inferred phylogenetic trees that do not reflect their ontogeny. The discrepancies encountered between the parsimony tree (Fig. 6) and distance calculations (Table 1) probably reflect long branch attraction. Perhaps in the course of evolution, MP genes of plant viruses explored most combinations of residues that produce optimally active MPs. These combinations represent fitness peaks in an MP sequence landscape. Despite doubt that the parsimony tree (Fig. 6) accurately represents evolutionary relationships, it accurately reflects a subclassification of 30K superfamily members based on hypothesized mechanism of movement. Three sub-superfamilies were distinguished, two as clusters and one as an extended branch. Those MPs (badna-, nepo-, tospo-, caulimoand comoviral MPs) for which tubule formation is thought to provide a conduit for virus particles in cell-to-cell movement (Cheng et al., 1998 ; Wieczorek & Sanfacon, 1993 ; Storms et al., 1995 ; Kasteel et al., 1996) lie on the extended branch of the tree. Conversely, MPs that are not thought to use such a mechanism were found in the clusters. The tubule-forming branch included idaeo-, capillo- and trichoviral MPs. It would therefore be interesting to determine whether these also form tubules important for movement. A common sequence feature of the tubule-forming subsuperfamily was the extra length of the N-terminal domain. Thus this domain may be responsible for polymerization into tubules and other functions specifically associated with this mode of movement of viral infections. Indeed, deletions in the N-terminal, but not in the C-terminal, region of the cauliflower mosaic virus MP were recently reported to destroy its ability to form tubules in cultured insect cells (Thomas & Maule, 1999). However, MPs lacking the long N-terminal domain can form tubules in protoplasts (Zheng et al., 1997 ; Kasteel et al.,

30K movement proteins

1997), suggesting that the N-terminal domain is not required for polymerization. The C-terminal domain, predicted to not readily assume a definable structure, does not perform an essential function for several MPs. Both C-terminal truncation (Gafny et al., 1992 ; Berna et al., 1991) and fusion of the C terminus to reporter proteins such as green fluorescent protein (Heinlein et al., 1995 ; Itaya et al., 1997) fail to interfere noticeably with function. The likelihood that the common core domain contains a nucleic acid binding domain (Citovsky et al., 1992 ; Fujita et al., 1998 ; Giesman-Cookmeyer & Lommel, 1993 ; Schoumacher et al., 1994 ; Thomas & Maule, 1995) raises the possibility that the C-terminal domain may be a flexible tail which regulates access to the binding cavity of the common core domain. The variations on the SIS motif near the C terminus could represent sites for regulation by phosphorylation. Indeed, phosphorylation of tobamoviral and trichoviral MPs has been demonstrated (Watanabe et al., 1992 ; Citovsky et al., 1993 ; Sato et al., 1995). Increasingly, protein domains are found to belong to a limited number of ‘ fold ’ groupings unified by similar threedimensional arrangements of secondary structure elements. Thus, the profile of MP secondary structure elements (Fig. 1) may lead to fold recognition. Previously, it was suggested, based on more limited MP sequences and available folds, that the MPs may fold like aspartic proteases, in particular the lentiviral proteinases (Melcher, 1993). The current work suggested some similarity of the 30K superfamily members to the closteroviral MPs (Agranovsky et al., 1998), known homologues of DnaK, whose three-dimensional structure has been determined (Zhu et al., 1996). Further analysis could not firmly reject the suggestion, leaving open the possibility that 30K MPs have an hsp70 peptide-binding fold. A third possibility is an OB fold (Horvath et al., 1998), a fold characteristic of many proteins that bind single-stranded nucleic acids. It shares with the 30K MP prediction a similar array of β-elements flanked by one or two α-helices, an analogous placement of regions variable in secondary structure, and a high frequency of aromatic residues in selected loop regions (L34 and L56, Fig. 2). That the RNA-binding domains of several MPs map to the central domain is consistent with the region having an OB fold. On the other hand, that the phloem proteins are missing the first β-element argues against an OB fold for 30K MPs. Regardless of the true fold assumed by the common core of the 30K MPs, the alignments and predicted structures presented here should aid the interpretation of molecular genetic studies in which mutations in the MP genes were created without knowledge of the structures mutated (Giesman-Cookmeyer & Lommel, 1993 ; Kahn et al., 1998). These studies have been performed with a variety of genes from different families, making comparison of the results from mutagenesis of one family impossible to compare with results from a different family. Further, the predicted structures enable better choices of residues as targets for future mutagenesis.

This work was supported by the Oklahoma Agricultural Experiment Station and is being published with the approval of its Director.

References Agranovsky, A. A., Folimonov, A. S., Folimonova, S., Morozov, S., Schiemann, J., Lesemann, D. & Atabekov, J. G. (1998). Beet yellows

closterovirus HSP70-like protein mediates the cell-to-cell movement of a potexvirus transport-deficient mutant and a hordeivirus- based chimeric virus. Journal of General Virology 79, 889–895. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. (1990). Basic local alignment search tool. Journal of Molecular Biology

215, 403–410. Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W. & Lipman, D. J. (1997). Gapped BLAST and PSI-BLAST : a

new generation of protein database search programs. Nucleic Acids Research 25, 3389–3402. Berna, A., Gafny, R., Wolf, S., Lucas, W. J., Holt, C. A. & Beachy, R. N. (1991). The TMV movement protein : role of the C-terminal 73 amino

acids in subcellular localization and function. Virology 182, 682–689. Berti, P. J. & Storer, A. C. (1995). Alignment\phylogeny of the papain superfamily of cysteine proteases. Journal of Molecular Biology 246, 273–283. Carrington, J. C., Kasschau, K. D., Mahajan, S. K. & Schaad, M. C. (1996). Cell-to-cell and long-distance transport of viruses in plants. Plant

Cell 8, 1669–1681. Cheng, C. P., Tzafrir, I., Lockhart, B. E. & Olszewski, N. E. (1998).

Tubules containing virions are present in plant tissues infected with Commelina yellow mottle badnavirus. Journal of General Virology 79, 925–929. Citovsky, V., Wong, M. L., Shaw, A. L., Prasad, B. V. & Zambryski, P. (1992). Visualization and characterization of tobacco mosaic virus

movement protein binding to single-stranded nucleic acids. Plant Cell 4, 397–411. Citovsky, V., McLean, B. G., Zupan, J. R. & Zambryski, P. (1993).

Phosphorylation of tobacco mosaic virus cell-to-cell movement protein by a developmentally regulated plant cell wall-associated protein kinase. Genes & Development 7, 904–910. Ding, B. (1998). Intercellular protein trafficking through plasmodesmata. Plant Molecular Biology 38, 279–310. Felsenstein, J. (1989). Phylogeny inference package. Cladistics 5, 164–166. Feng, D. F. & Doolittle, R. F. (1987). Progressive sequence alignment as a prerequisite to correct phylogenetic trees. Journal of Molecular Evolution 25, 351–360. Fujita, M., Mise, K., Kajiura, Y., Dohi, K. & Furusawa, I. (1998). Nucleic acid-binding properties and subcellular localization of the 3a protein of brome mosaic bromovirus. Journal of General Virology 79, 1273–1280. Gafny, R., Lapidot, M., Berna, A., Holt, C. A., Deom, C. M. & Beachy, R. N. (1992). Effects of terminal deletion mutations on function of the

movement protein of tobacco mosaic virus. Virology 187, 499–507. Giesman-Cookmeyer, D. & Lommel, S. A. (1993). Alanine scanning mutagenesis of a plant virus movement protein identifies three functional domains. Plant Cell 5, 973–982. Giesman-Cookmeyer, D., Silver, S., Vaewhongs, A. A., Lommel, S. A. & Deom, C. M. (1995). Tobamovirus and dianthovirus movement proteins

are functionally homologous. Virology 213, 38–45. Gribskov, M., Lu$ thy, R. & Eisenberg, D. (1989). Profile analysis.

Methods in Enzymology 183, 146–159.

CGF

U. Melcher Heinlein, M., Epel, B. L., Padgett, H. S. & Beachy, R. N. (1995).

Interaction of tobamovirus movement proteins with the plant cytoskeleton. Science 270, 1983–1986. Horvath, M. P., Schweiker, V. L., Bevilacqua, J. M., Ruggles, J. A. & Schultz, S. C. (1998). Crystal structure of the Oxytricha nova telomere

end binding protein complexed with single strand DNA. Cell 95, 963–974. Itaya, A., Bao, Y., Nelson, R. & Ding, B. (1997). Cell-to-cell trafficking of cucumber mosaic virus movement protein : green fluorescent protein fusion produced by biolistic gene bombardment in tobacco. Plant Journal 12, 1223–1230. Kahn, T. W., Lapidot, M., Heinlein, M., Reichel, C., Cooper, B., Gafny, R. & Beachy, R. N. (1998). Domains of the TMV movement protein

involved in subcellular localization. Plant Journal 15, 15–25. Kasteel, D. T. J., Perbal, M. C., Boyer, J. C., Wellink, J., Goldbach, R. W., Maule, A. J. & van Lent, J. W. M. (1996). The movement proteins

of cowpea mosaic virus and cauliflower mosaic virus induce tubular structures in plant and insect cells. Journal of General Virology 77, 2857–2864. Kasteel, D. T., van der Wel, N. N., Jansen, K. A., Goldbach, R. W. & van Lent, J. W. (1997). Tubule-forming capacity of the movement proteins

of alfalfa mosaic virus and brome mosaic virus. Journal of General Virology 78, 2089–2093. Koonin, E. V. (1991). The phylogeny of RNA-dependent RNA polymerases of positive-strand RNA viruses. Journal of General Virology 72, 2197–2206. Koonin, E. V., Mushegian, A. R., Ryabov, E. V. & Dolja, V. V. (1991).

Diverse groups of plant RNA and DNA viruses share related movement proteins that may possess chaperone-like activity. Journal of General Virology 72, 2895–2903. Leisner, S. M. (1999). Genetic basis of virus transport in plants. In Molecular Biology of Plant Viruses, Chapter 7, pp. 161–182. Edited by C. L. Mandahar. Boston : Kluwer. Lucas, W. J., Ding, B. & van der Schoot, C. (1993). Plasmodesmata and the supracellular nature of plants. New Phytologist 125, 435–476. Lyons-Weiler, J. & Hoelzer, G. A. (1997). Escaping from the Felsenstein zone by detecting long branches in phylogenetic data. Molecular Phylogenetics and Evolution 8, 375–384. Melcher, U. (1990). Similarities between putative transport proteins of plant viruses. Journal of General Virology 71, 1009–1018. Melcher, U. (1993). HIV-1 proteinase as structural model of intercellular transport proteins of plant viruses. Journal of Theoretical Biology 162, 61–74. Mise, K., Allison, R. F., Janda, M. & Ahlquist, P. (1993). Bromovirus movement protein genes play a crucial role in host specificity. Journal of Virology 67, 2815–2823. Mushegian, A. R. (1994). The putative movement domain encoded by nepovirus RNA-2 is conserved in all sequenced nepoviruses. Archives of Virology 135, 437–441. Mushegian, A. R. & Koonin, E. V. (1993). Cell-to-cell movement of plant viruses. Insights from amino acid sequence comparisons of movement proteins and from analogies with cellular transport systems. Archives of Virology 133, 239–257. Nelson, R. S. & van Bel, A. J. E. (1998). The mystery of virus trafficking into, through and out of the vascular tissue. Progress in Botany 59, 476–533.

CGG

Niefind, K. & Schomburg, D. (1991). Amino acid similarity coefficients for protein modeling and sequence alignment derived from main-chain folding angles. Journal of Molecular Biology 219, 481–497. Reddick, B. B., Habera, L. F. & Law, M. D. (1997). Nucleotide sequence and taxonomy of maize chlorotic dwarf virus within the family Sequiviridae. Journal of General Virology 78, 1165–1174. Rost, B., Sander, C. & Schneider, R. (1994). PHD–An automatic mail server for protein secondary structure prediction. Computer Applications in the Biosciences 10, 53–60. Sato, K., Yoshikawa, N., Takahashi, T. & Taira, H. (1995). Expression, subcellular location and modification of the 50 kDa protein encoded by ORF2 of the apple chlorotic leaf spot trichovirus genome. Journal of General Virology 76, 1503–1507. Scholthof, K. B., Hillman, B. I., Modrell, B., Heaton, L. A. & Jackson, A. O. (1994). Characterization and detection of sc4 : a sixth gene

encoded by Sonchus yellow net virus. Virology 204, 279–288. Schoumacher, F., Giovane, C., Maira, M., Poirson, A., Godefroy, C. T. & Berna, A. (1994). Mapping of the RNA-binding domain of the alfalfa

mosaic virus movement protein. Journal of General Virology 75, 3199–3202. Sivakumaran, K., Fowler, B. C. & Hacker, D. L. (1998). Identification of viral genes required for cell-to-cell movement of southern bean mosaic virus. Virology 252, 376–386. Storms, M. M., Kormelink, R., Peters, D., Van Lent, J. W. & Goldbach, R. W. (1995). The nonstructural NSm protein of tomato spotted wilt

virus induces tubular structures in plant and insect cells. Virology 214, 485–493. Thomas, C. L. & Maule, A. J. (1995). Identification of the cauliflower mosaic virus movement protein RNA-binding domain. Virology 206, 1145–1149. Thomas, C. L. & Maule, A. J. (1999). Identification of inhibitory mutants of cauliflower mosaic virus movement protein function after expression in insect cells. Journal of Virology 73, 7886–7890. Toriyama, S., Kimishima, T., Takahashi, M., Shimizu, T., Minaka, N. & Akutsu, K. (1998). The complete nucleotide sequence of the rice grassy

stunt virus genome and genomic comparisons with viruses of the genus Tenuivirus. Journal of General Virology 79, 2051–2058. Watanabe, Y., Ogawa, T. & Okada, Y. (1992). In vivo phosphorylation of the 30-kDa protein of tobacco mosaic virus. FEBS Letters 313, 181–184. Wieczorek, A. & Sanfacon, H. (1993). Characterization and subcellular localization of tomato ringspot nepovirus putative movement protein. Virology 194, 734–742. Xoconostle-Cazares, B., Xiang, Y., Ruiz-Medrano, R., Wang, H. L., Monzer, J., Yoo, B. C., McFarland, K. C., Franceschi, V. R. & Lucas, W. J. (1999). Plant paralog to viral movement protein that potentiates

transport of mRNA into the phloem. Science 283, 94–98. Zheng, H. Q., Wang, G. L. & Zhang, L. (1997). Alfalfa mosaic virus movement protein induces tubules in plant protoplasts. Molecular Plant–Microbe Interactions 10, 1010–1014. Zhu, X., Zhao, X., Burkholder, W. F., Gragerov, A., Ogata, C. M., Gottesman, M. E. & Hendrickson, W. A. (1996). Structural analysis of

substrate binding by the molecular chaperone DnaK. Science 272, 1606–1614. Received 14 June 1999 ; Accepted 24 September 1999