The nucleotide sequence of parsnip yellow fleck virus: a plant picorna-like virus. A. D. Turnbull-Ross, B. Reavy,* M. A. Mayo and A. F. Mutant. Scottish Crop ...
Journal of General Virology (1992), 73, 3203-3211. Printedin Great Britain
3203
The nucleotide sequence of parsnip yellow fleck virus: a plant picorna-like virus A. D. Turnbull-Ross, B. Reavy,* M. A. Mayo and A. F. Mutant Scottish Crop Research Institute, Invergowrie, Dundee D D 2 5DA, U.K.
The complete sequence of 9871 nucleotides (nts) o f parsnip yellow fleck virus (PYFV; isolate P-121) was determined from c D N A clones and by direct sequencing o f viral RNA. The R N A contains a large open reading frame between nts 279 and 9362 which encodes a polyprotein o f 3027 amino acids with a calculated Mr o f 336212 (336K). A P Y F V polyclonal antiserum reacted with the proteins expressed from phage carrying c D N A clones from the 5' half o f the P Y F V genome. Comparison o f the polyprotein sequence of
P Y F V with other viral polyprotein sequences reveals similarities to the putative NTP-binding and R N A polymerase domains o f cowpea mosaic comovirus, t o m a t o black ring nepovirus and several animal picornaviruses. T h e 3' untranslated region o f P Y F V R N A is 509 nts long and does not have a poly(A) tail. The Y-terminal 121 nts may form a s t e m - l o o p structure which resembles that formed in the g e n o m i c R N A o f mosquito-borne flaviviruses.
Introduction
infectivity after treatment with pronase or proteinase K (Murant et al., 1987). It is of considerable interest to know whether the similarity of PYFV to animal picornaviruses in respect of particle properties holds true at the level of genome organization. Here we report the complete nucleotide sequence of genomic R N A from PYFV isolate P-121 (Murant & Goold, 1968) which belongs to the parsnip serotype and is the type member of the parsnip yellow fleck virus group.
Parsnip yellow fleck virus (PYFV) has a combination of properties unlike those of any other plant virus. Its particle constituents resemble those of the picornaviruses of vertebrates (Murant & Goold, 1968; Hemida & Murant, 1989a) and it depends on a helper virus, anthriscus yellows virus, for transmission in the semipersistent manner by the aphid Cavariella aegopodii (Murant & Goold, 1968; Elnagar & Murant, 1976). PYFV has therefore been placed in a new taxonomic group (Murant et al., 1987; Murant, 1988) provisionally called the 'parsnip yellow fleck virus group' (Murant, 1991). PYFV isolates fall into two major serotypes with somewhat different, though overlapping, host ranges (Hemida & Murant, 1989b). The serotypes are called the 'parsnip' and 'Anthriscus' serotypes after the hosts in which they occur. PYFV particles are isometric, 30 nm in diameter (Murant & Goold, 1968; Hemida & Murant, 1989 a) and contain three protein species of approximate Mr 31K, 26K and 22.5K (Hemida & Murant, 1989a). Sedimentation in sucrose gradients separates the particles into two fractions. The top component (60S) particles appear to be empty shells (Hemida & Murant, 1989a), whereas the bottom component (152S) particles contain a ssRNA estimated to be 9.9 kb by electrophoresis of glyoxylated R N A in agarose gels (Hemida & Murant, 1989a). PYFV R N A may also possess a genome-linked protein ( V P g ) j u d g i n g by the loss of 0001-1194 © 1992 SGM
Methods Virus and RNA purification. PYFV isolate P-121 was propagated in spinach plants (var. Medania) and purified from infected leaves ~ harvested 19 days post-inoculationas described by Hemida & Murant (1989a). Viral RNA was extracted by resuspending virus pellets in 10 mM-Tris HC1 pH 7-6, 0.1 mM-EDTA, 1.0~ (w/v) SDS, heating at 65 °C for 5 min and then extracting with an equal volume of phenol. The aqueous phase was extracted twice more with phenol:chloroform:3-methyl-l-butanol (25:24:1 v/v/v) and the RNA was stored in 70~ ethanol at -70 °C.
Escherichia coli strains and DNA manipulations. E. coli strain XL1Blue (Stratagene) was used, except in the initial cloning of cDNA into pUC19 when E. coli strain DH5~ (Bethesda Research Laboratories) was used. Transformationswere essentiallyas described by Hanahan (1983). Plasmid DNA was prepared by alkaline lysis (Birnboim & Doly, 1979). cDNA synthesis and screening of libraries. A cDNA synthesis kit (Pharmacia) was used for cDNA synthesis from viral RNA primed
3204
A. D. Turnbull-Ross and others
with oligo(dT), or with a mixture of oligo(dT) and random primers. N o t I / E c o R I linkers were ligated onto the cDNA which was then
purified by chromatography in Sephacryl S-400 columns (Pharmacia) and ligated into pUC19 or 2ZAPII (Stratagene). The ~[ZAPII tigation and in vitro packaging reactions were performed as described by the manufacturers (Stratagene, BCL). Recombinant phage were screened with a polyclonal antiserum raised against PYFV particles (Hemida & Murant, 1989b) as described by Huynh et al. (1985). Selected clones were excised in vivo from 2ZAPII into pBluescript, as described by the manufacturer, for further analysis. Dot blots (Buluwela et al., 1989) were probed with restriction fragments from selected cDNA clones to obtain additional clone coverage of the PYFV genome. The majority of the sequence was determined from clones in pUC19 or pBluescript by the dideoxynucleotide method (Sanger et al., 1977; Chi et al., 1988) with T7 D N A polymerase. The remainder of the sequence was obtained, after subcloning of the cDNA into M13, by using the Klenow fragment of DNA polymerase I and substituting deaza-dGTP for dGTP. The sequence was compiled and analysed using the GCG package (Devereux et al., 1984). Determination o f the R N A 5'-terminal sequence. The sequence of the 5'-terminal region was determined by direct sequencing of RNA with the aid of two synthetic 17-mer oligonucleotides (complementary to positions 47 to 63 and 138 to 154) and reverse transcriptase as described by Geliebeter (1987). Terminal deoxynucleotidyl transferase was used to add homopolymer tails to the reverse transcriptase products (DeBorde et al., 1986). Determination o f the R N A 3"-terminal sequence. Purified PYFV RNA was 3' end-labelled with [32p]pCp and T4 RNA ligase (England et al., 1980) and applied to 1.0 ml Sephadex G50 spun columns (Maniatis et al., 1982). The 3'-terminal nucleotide was determined by digesting an aliquot of the labelled RNA with ribonuclease T2 (Donis-Keller et al., 1977) and separating the products by two-dimensional thin layer chromatography on 100 mm x 100 mm cellulose plates (Polygram Cel 300MN). The solvent systems were isobutyric acidq)-5 M-NH4OH (5 : 3 v/v) in the first dimension and isopropanol-HCl-water (70:15:15 v/v/v) in the second dimension (Saneyoshi et al., 1972). Additional sequence was obtained by partial digestion of the [32p]pCp-labelled RNA with base-specific ribonucleases as described by Natsuaki et al. (1991).
Results The nucleotide sequence of P Y F V
The entire PYFV genome was sequenced in both directions from cloned cDNA except for the 5'-terminal 68 nucleotides (nts) which were sequenced directly from the RNA. Over 80% of the sequence was determined from at least two independent cDNA clones and on average each position was sequenced 4.9 times. The complete genome of PYFV was found to be 9871 nts long (Fig. 1), which is in good agreement with an earlier estimate of 9900 + 290 nts (Hemida & Murant, 1989a). Sequence heterogeneity was found at four positions and may have arisen from variation in the virus population, or from errors introduced by reverse transcriptase in the synthesis of the cDNA first strand, as reported by Lomonossoff & Shanks (1983) and Meyer et al. (1986). All the variation was located in the large open reading
frame (ORF; see below). A transversion from U to A (nt 6149, in one of five clones) and a transition from U to C (nt 9350, in one of three clones) did not alter the amino acid encoded, but two C to U transitions resulted in changes from Thr to Ile (nt 3163, in one of four clones) and Leu to Phe (nt 4395, in one of three clones). Determination of the 3"-terminal sequence of P Y F V
The 22 T-terminal nts of PYFV RNA were determined by partial ribonuclease digestion of the pCp-labelled RNA (Fig. 2). The sequence deduced from several repetitions was 5' GAAAGUAAAUAUUAAAUAAGGX 3'. The terminal nucleotide X could not be resolved in this system, but digestion of pCp-labelled RNA with ribonuclease T2 and chromatographic separation of the products revealed C to be labelled predominantly. A sequence identical to that deduced from the RNA was found at the 3' end of one of the cDNA clones. ORFs in P Y F V
The largest ORF occupies 92% of the genome and is in the plus viral (messenger RNA) sense (Fig. 1). The putative translation product is 3027 amino acids long and forms a polyprotein with a calculated Mr of 336212 (336K). The three next largest ORFs (588, 453 and 360 nts) are all encoded in the viral minus sense. The second longest ORF in the positive sense is only 303 nts long. The large ORF begins with the first AUG codon from the 5' end of the RNA (nts 279 to 281) and ends with an ochre codon (nts 9358 to 9360). The first AUG codon is located within a short sequence in which six out of nine nucleotides match the consensus (AACAAUGCC) proposed by Liitcke et al. (1987) for plant initiation codons. The only in-frame initiation codon in the first 1 kb with a better context is at nt 591 with seven out of nine matches to the consensus. The first AUG codon is probably the initiation codon for the PYFV polyprotein because (i) this AUG has an A residue in position - 3 which is unlikely to be skipped by a scanning ribosomal 40S subunit (Kozak, 1987), (ii) the base composition from nts 1 to 278 (34-2% U, 24.8% C, 13.3% G and 27.7% A) has a high U + A and low G residue content consistent with leader sequences in RNA of other plant viruses (Gallie et al., 1987) and if the 5' untranslated region (UTR) is extended to nt 590 the G content rises to 20.0 %; and (iii) an oligopyrimidine tract precedes the first AUG codon. Oligopyrimidine tracts also precede the initiation codons of tomato black ring virus (TBRV) RNA 1 (Greif et al., 1988) and cowpea mosaic virus (CPMV) B RNA (Lomonossoff & Shanks, 1983), and are important in translation initiation of picornavirus RNA (reviewed by Agol, 1991).
Nucleotide sequence of P YFV RNA 1 121
UUAAAAGCAAir~GCAUCGAUUACAGAAUUCAUUCUUCA~UCUUUCUCUUCAAAGCUCUCUUGAUUUCC~G~U~G~U~C~ ACUACGUCUCUCUAAGUGCUAACC~CGUGUGAL'UJk~A~]C~CCAAU*~G
1
~4
S
S
S
120
C A C C U C G U G ~ G ~ A ~ C ~
S
Q
S
V
N
M
V
D
G
V
D
L
~ 5UCUAUCUCUCUCGADUCUCU
N
D
T
A
V
~
~ L
C V
~ U ~ R I
~ G ~ A ~ U S F S D D E
~ E
C ~ I
C
I
D
S
~ G
A F
~ T
K
L
N
V
1081 269
AUCAAGAUcUGUUUGGGUGGCUUGAAGCGGCUCAAGGGGGUGGD~AUGUG~CCCAAAGt"JUGAGAUCUGCCA~CACGCA~G~G~A~C~GU~UC~ Q D L F G W E A A Q G G G Y V Q S R S A T T H T S N S V L
H
361 29
C U G C U A G U G ~ A C A U G G G G A G A A C U G G A U A G ~ . G C A U C C A C C U G CA~GUACAACUUCUUUGACGUGUCAGACGUUGAACGCAAUCCUGGAC, A ~ A S G T W G L D R A S T C M Y F F V S D V E R N P G S
G ~ S
481 69
AAAAGAGGGCUGGCGCUCUCCUUGGAGtrGUCUCGGG CUUUUAAGGACACUGAACAAGUUCUUAGUGC~CGCUG~ K R A G A L L G V R A F K D T E Q V S A E R C
601 109
CU~K/UGAGcCAUUAAA~UAGGGCUGAGAUUACUCCUAccGCAGCCAG~%AACUAGACA~C~cACACAGAGCCAAAUUUAA~A~AU~GUAU~G F E P L K D R A B T P T A A S K L D T L E A H R A K F N Y
M
721 149
CUGUCACL~CUUGAUGCACCAAGGAGAUUCUCGCGAAUGCAUCAUGUAUCUD~UGACCGCCGGUUCA~GGAcCCUCUU~ V T 5 L M H G D S R E C I M Y L C D R F K D P L L
A L
041 189
UGCAGACUC ~GUCUACAA~ACUGGA~UGAUGGCCUUCUCUAGGAAGGAAGCUAUAGCUGCAGACAGACUGCAGCU~AUUUGUAUGUCAA~UG~~ Q T H V Y K G R M M A F S R K A I A D R L Q L Y L Y V K G
961 229
ACACUCCAAUAACUGUcAAUGUUAGAACCUCGCUGAUUU~CUCcAAUGCAGAAAAUCUGCUCAAAUGUGAUUCUCAAAU~UG~UAUAG~A~ T P T V N V R T L I G S N A E N L K C D S Q I D T D M
C Q
I
A
240
k
2%
K
400 68
A
T
T
600 108
R
V
A
720 148
~ P
G
M
040 108
E
T
Q
N
960 220
5
F
A
Q
N
I080 268
K
W
N
P
V
1200 300
1201 309
U I ~ U U CUGUG~CIJ C A G ~ C A U A G C i ~ C G C A ~ A U U ~ C A C C A ~ C C A U A C 4 % U G ~ C ~ A ~ G ~ ~ G ~ U C C ~ A U A G G S V S Q R A S R M I T K A D A T 5 Y E D S D V V G S N A K C P S
I
G
1320 348
1321 349
GA~AGAACUUGAGAAUUGGAGUUGCUCAGGCCUCCAUUCAGAACACCAAGGAU~C~UA~~~UC~A~CC~UA~GUC L
1440 388
1441 389
UUGA~GUUUUAUGA~CUCU~CCUCUAUAUCUCAGACGCU~CCCCUUAUCAAUGAUCAAUUCACCAGACCCAUCUAUUCCAGAA~GU~G~A~C~U~GC~ E S F M S S S I S Q T P L I D Q T R P I Y S R T F E W K A
1561 429
CCAUUUUCCAGUUAGAGCUG CCAGGAGAUGUCGUL~CCCCAAGCCUCUUCCUUG I F Q L E L G D V V G P Q A S L
1681 469
C U G G G A A U G A A A G C U A C A U G G G A G CUCI/UAAAAUUGUUACAGACCAGCUAAGGCGUUUCCAUGAGG C U A ~ G ~ U ~ G N E S Y M A L K I V T D Q L R F E A K Q D D A R V
1801 509
CCA~UUC~UGGCAUCAAGAUACCCAUAGAGUUCAUGUCGAUACACA~CGGUUUCUG K D S D G I I P I E F M S I H A V
1921 549
CACUUUCUCAUAUL'UCGCUCUCUAGUCCUG U U C U G A G U A U A A C A A U C C A G G U U U U U G C C A A G A A U G U C A A A G C C ~ C U A C A U G A U G U G G A G A U ~ C ~ C ~ C ~ A ~ L 5 H I S L 5 P V L S T I Q F A N V K A D Y M M W R S L T
2041 589
A U G C C A C ' J C U G C C U U C U G C A G U U G G U G A C A A U U U U G G C C G U U U G C G CACAAG C C A G A G U G A G A U C I K J G U C C A C t " J ~ ~ A ~ G ~ ~ ~ G A T L P S A G D N F G R L R T Q S I L S T S Q I L ~ L L T R A F G T
2161 629
CA~UG K V
2281 669
U G U A C C CUUUC UGG CGGGG C U C U C U U G U U C U A A C U U U U G A G A U U A A U U G U U C A G C C U C U A C A A G A G G A A ~ U U G A U A G U G U ~ G U G A C C C C ~ A ~ G ~ G ~ G ~ A Y A W R G L V L T F I N C 5 A S R G K L I V S V T K G G V A L G I
2400 700
2401 709
CAG C U U C G C A C C A G G G A U A U G G U G C A G A ~ U U U G A C C U U G G A A C U U ~ C U C C A C A A G A U C U U U C A ~ C ~ ~ U ~ A G U C ~ U A ~ U ~ U ~ U ~ U ~ A S H Q G Y A E F D L G T S S R S T M F V S T D ~
2520 748
2521 749
GUGCCUUUCAAGGAGUGUGGGACUGC CCAGUGGCAAACUUG CUCGUUUUGCAUCCCAUUACUAGCAUAGCAGAAUCAACGCCCUCCCUGGAUAUCAGAUGCUACCUACAUCCUGGGC~G A F ~ G V W C P V A N L L V L P I S I A E S T P S V D I R C Y L H G P D
2640 7~8
2641 789
A~CAA[~JAAGAGGAAGAAGGCACAUUGGD~`~JAAGAGCCGCUU~CC~U~G~C~GU~GUA~AG~U~C~U~C~ S
Q
V
D
F
S
2760 828
2761 829
IFJGAUGCCACGGAGGAGAG U G U G G U G G U G G C A G U U C C C U G U G C U C C A U G G U A U U O G A A G G ~ G A ~ G U C G A U U A C A C A ~ C ~ U C ~ G ~ D A T E E S V V A V P C A P W S K E V D Y T L L Q
N
?
L
H
C F T
2880 068
2881 869
UGUGGAGAGGAGACAUUGAAUACAGGUUUGUGGUGAAAGAGGAGGCUCUUGG CGAUGGAUGGCAGAG CcCAAUt"J~G~ W R G D I E Y R F V V K E E A L G D G Q S P I S V W
U N
C P
~ K
I
3000 908
3001 909
CAAAGAUAUCAAACAAGA~AUAAGUAAGG~CUUAC K I S N K K S K E T Y
R
L
3120 948
3121 949
G CAAGAUUCUAGAUACAACCAAAGACACAGC~.GGUGAUUCUACUUCUCCCUCAGUGACCCAAAUAA~UA~ K I L D T T D T A G D S T S P S V T I T Y T
G
U I
K
3240 988
3241 909
UUCCGA~CAG CAUCAAC-~CUA~UAUACUCC~GCCAGG~U~GUUUAGGCAUC~AG~C~A~AGC P K N S I K K L K V Y S K P G E N F F R
L
G
G
U ~ Y K
3360 1028
3361 1029
AGCCAUUC CA~L"JCGGUCC CCGAUGU~AUAACCCCAUCCA~GAAUCAUCCAAGA~GAACUUG~UUUAAG P F Q N S V P D V F I T P S K E S 5 K E L F
~ G ~ C C ~ G ~ G ~ V P ~ L A V
3480 1068
1481 1069
G A C A G G C U CAAGGGUUC43 ~ U C A A ~ U U A ~ C - ~ U U U U ~ t~JC~ CU C U C ~ A A A U i V J G A U G A A G A ~ G A ~ ~ Q A Q G L V K I K G F G S L W F D E E T
3601 1109
UCGACC CCAUACAAGAUG~GGAUGGG CA~GG~UCAAUGAGAUAGUUUCCUCGGUGUCAAC D P I Q D E W A K R K N E I S S S
3721 1149
UUUCAACUCUUUUUGAUGUUAUGAUAGGGA~GUGAGGGGUGUCCUUUCUUCUCUGGUAGACAG CAUAUCCGGAGCUUUU~UGUG~A~ S T L F D V I G K V R G V L S S L V S I G A F K M C L
K
F
N
Q
L
R
I
G
A
Q
A
S
I
Q
N
T
K
D
S
P
Y
S
L
A
A
D
R
G
R
H
I
G
L
R
A
A
S
R
L
K
~ G
F
V
A
A
~ A
Q
D
A U
A
C
~
G
P
A
P
V
A
S
T
D
T
G
A
1560 428
~FGUAGUGACAC~ UG CAGAGAG C U U U C U G U U U U U C A A G CGAUUUUGAGCUUAGUAUUCULKIUGA F D T M Q R A F C F S S D E L S L L T
A
~ G C A U G A U ~ ~ H D S N S H N
CAGCAAGAUACAC~ UGCUCGA~UGAUAAUUGCUGAGU~CUCUCCAUCCAAIk3AGUUCAC~AA~CC~ Q Q D T A R V I I A E F A H P S S R N V
L
F
~
S
CAUC~GUUEK/GUL'UGAUG H G K K C L
A
Q
A
S
D
P
G
V
T
L
~ H
A
L
~
C
F
A
1800 508
~ C ~ G ~ G ~ G U A G ~ A ~ C L S R E A R V T
P
1920 548
N
2040 588
H
~ G ~ L L
W
E
~
C
~
S
P
U 5
~ Q
G
A
~ S
U Q
L
Q
v
K
CC~GU~U~AGU~G P K V V E
S
A
A
L
N
G K
A M
~ A
P
S
~
~ Q
~
~ S ~
U ~ U M V
A
A
3720 1148
L
I
3840 I188
3841 1189
G CAUAUCCAUUUCUGCUGUUCUUGGGUAUUGUACUCUUAAGUUAGUGGAAAACUCUGUUCCCGAUG CACUGGGAAUUUUCAAGG CGCUCAUGAUGGUGG CUAUAA~UC~U~G~U I S I S A V G Y C T L K L V E S V D A G I F K A L M M V A I T S S
A
3960 1228
3961 1229
U G U A C U G G C C A A A A G C C G C A A U C U C A A U A ~ U C A C C A A G U A U G A G G A G CAG U U U A A A G A C A U A G A G A A ~ A ~ ~ C ~ A ~ G Y W P K A A S I V T K Y B E Q F K D ~ N C S T Y K
K
4081 1269
U G G A A G G C G C G A C C C C A G C U A A A G CAUG CG C C A C C A A U D U U C A A G A U U U A G C U C A C G G A A A G G C G C A G G C A G G A G G A A A G U C U U U U C U C G A G UUGG C A G G U C U C A U A G C G U A C A U A A ~ C E G A T P A A C A T N F E D L H G A Q A G G K F L E L A G L I A I R
4200 1308
4201 1309
UGUGCGt~UGUUCUGUGCAAGG C U A U G A A C A C U U C U ~ G G A G C V V L C K A M N T S F L E
~ C ~ U A G G ~ S
~
C
4320 1348
4321 1349
AAUUCAAGGAUUAUAUUUACAGGAUGAUAG UUGGGGGGAUAACAC CAACUUCUUCAUAUGUGAAAGUUUCUGGUCUUACUGGUU~CAUUAGA~C F K D Y I Y R M I V G G I T P T S Y K V S G L T F D I R E
U W
~ E
S
4440 1388
4441 1389
UCACUC~/UCAGG~CGCGUUACACACAGAU~GG G~GUGAUC~GAUCAAC T L Q E T R Y T Q M G 5 D E K I K
CAGAUUAGA~ C C ~ A ~ A U ~ U ~ U G U ~ U ~ ~ U A ~ C ~ C ~ I A 5 Y D K G V N V M G K T M I S
P
4560 1428
4561 1429
AUCUAUCAAGAGUUUGCC~GA~GUUUCCGAUUGUG CAAGGAGUUACU~ACGAAACGCACAGAUGUAAAGGAGCUAG~C~C~GAG~GAUC~AG~AUA~U L S R V C E S F L C K E L L E T R C K G A S S T R V D P H
G
4680 1468
4601 1469
CUCCUGGAGUUGGGAAGUCCUUUGUGAUGGGAAAGCUUL~GGAUGAUGUCCUAGAtr01~JAUGAGUGAACCC CAGG C G G A U A G A U G t r G A ~ C U C ~ U A ~ G ~ P G V G K S F V M G K L L D D V L D F S E p Q A D R C Y S K T N E R W S
4800 1508
4801 1509
GAUACAUAGGACAAACGG CAGUGAAAUGUGAUGAUCUUGGGCAGGACU~ UCGA~ CUUUUCUCCUAC CUAUAACCA~U~UA~GAU~GAC~~AUAG Y I G Q T A V K C D D L G Q D L S K G F S P T Y N Q I I Q M K T N
4920 1548
CCAUL~ACCCCCUCAAACAUGG~GC~UG P F T P S M E K Q
C C
A
A R
~ T
G V
~ ~
G
~ E
A
2280 668
~ C ~ U ~ A U C ~ U G ~ G ~ G S T S I L N A A R
CAAGCUAAUA~ K L I E
G V
M
~ E
T
~ Q
A
A
M
I
R
2160 628
G G
E
C L
I
~ G ~ U A T G V V
s
A
~ S
~ U A ~ ~ G ~ U N T Q L S K C
~ S
V
C
A
CAGUUGAAGAGUAUAGAUAUUGUUGCGGUUGAUGACA~A~C Q L K S I D I V A V D D R R F U~CC~C C ~ U H P P M S
T
D
M ~
W
V
G A
D
L
~
F
~ A
G
~ R
T
G L
S
C
1680 d68
V
~UC~GUG~GUGU~AUAG D K C L
A ~ I F
A G
A
G V
G
~ S
G
U V
~ F
E
N
~ K
~
V
C
~
~ D
A ~ T L G
E
S
F
3600 1108
Y
~
V
H
~
~C~A P M
4000 1268
3205
3206
A. D. Turnbull-Ross and others 4921 1549
UGGCC~GAUCU~CAAACAAAGGUAGAACCUUCACUAGUAAGUAUAUCUUUUCUACCACAAA~UUCCCGGAUG~~CC~U~G~ A D A N K O R T F T 5 Y I F T T N V P G C G T K
5041 1589
GGAACAUUUUCOUUGAGGUU~CAGA~GACAUGAUACCAGGAAG N I V B V T E G D M P G
S
5161 1629
G~U(3AAGUACG~.,C, A U I ~ J C C ~ G ~ C C G A ~ M K V D F L C V V A
C ~ U G V Y F
5281 1669
AGGACGUUAUUGCCAUACUL~AAGAGCUUGGUGAUGGUGUUGUGGA~/~UUCUAGAAAA~GAG D V A I L E E L G D G V E G [ L E K R
5401 1709
AACUGGAACCAGGAAAAGCACAGG CAAGUGUGUGUUUUAGUACAGAUGCUU~AAUCCUUUAAAGAACCCAUL"JG~G~G L E G K A Q A S C F T D A F G N P L K N P F V
5521 1749
C ~ K
5641 1789
UGAUGAG CAGCUUCUUCACAU~AUU~A~G~CU~U~ACAAGCAGGAGCAGGAA~UA~~UAU~UG~ M S F F T F I F G K N I Y K Q E Q E F L
C A
C R
CACC~UCACAUGAGG ~ T N H M R F U A U A U E T Q N
C~GAGAUG CCAGAUGACAUACUGACCAAGL'UUC~ CU~/C~AACGC~GAGC Q M P D D I L T K F A S L T L G B
U T U
P
A
U ~ E P
S
5280 1668
~ G U ~ A G ~ A U ~ U C C C C C C C ~ A ~ A ~ G L L S F G V M D P P P F D A I
5400 1708
U M
~ B
C
C
~
~ N
L
F
~ N
U Q
L
S
U D
~ E
U R
A I
U M
~
C
~ G T K
T
~ L
K
~ E
S
A
K
D
5
N
L
~ R
A L
~ V
C D
A
5760 1828
C H
~ A G ~ C L V L A
T
5880 1868
S
P
U U G
L
~ E
5881 1869
G CGCAG C A A A G A A G G G A A G A A U C A A U G G C A U U A G G G A A A G G C U U G C U A A U U C ~ U U C A ~ C A G U G C U A G A A U U C U ~ G ~ U ~ ~ C ~ G C C A A K G R I N G R E L A N W F T S A R L S N I
L
6001 1909
UGGUGtFJAGCCACGUCUGUUGGUUCAUUAUAUCUUGC~AAGGGUCUUUCUGGAAUAGGAAG CAUGAUACUUGG~A~ V L T S V G S L L A K G L S G I G S M L G F
A
K
E
6121 1949
UCUCUCUCAAUGCUCUUAUGGGACAAG CUAAAAGUAAAGGGAGGAAUUUCAUAACAUCUGGAGAUGAACUCACCACAAGAUUGUCCA~U~AGU~ S L A L M G Q A S K R N F I T S G D E T T R S R M S
R
A
S
6241 1989
GAGCUCAAGGUGGGCGCUCCCAUAUGGACACUUGUGAAGCGCUGCU~CUCGACAAGGACAAAUUACCAAUAUGGCGACAGGU~C~G~C~A~A~C A Q G R S H M D C E L L A R Q G Q I T M A T L H V A
T
D
L
6361 2029
U~GCUCCUCUUCAUACU~CUGGAGCUGAGAAGGACC~ACALroTJUCCGU~CAGAAUGGGGC~AUUAC~C~GA~C L A L H T F A G A E K D I F R F Q N G A Y Y
V
S
6481 2069
AUGCCUG CAUUAUUAGAACUGACGCCAUUCCCAUC~UCAUCAAUUGUUAGCAU~CUAAAGAG A C I R T D A I M K S I V S I F A K
6601 2109
GGAAGGUCCC~GGCGGGGAAUUCAUAAGCGAG K V F G G E F I E
6721 2149
UCAAAAC CGAGGAUGG C CAGUGUGG CUCUUGUDUGGUUAGCACAAGU~CAAACL~GAUGG~GUCUUCUGUUCGC~GUAG K T D G Q C G S L V T S D K L D G K V C 5 L
6841 2189
CCACUUAUGUUCCAAUCAC~UGACAUGAUUAAGA~GCAUCUCACUCUUAACA~ T Y P I T C D M K K I S L L T
6961 2229
A~CCAUCAAAGUL'GACCAGCUAUUL~CGAG CAAGCCAC-GAGCAUCAC~UUL~GUGUUUGGUGU~UAC~U~~ T I V D Q L F S K P A S G K F G V F G N D
7081 2269
C CACG CCCAAAUCAAUUACCAAGAG CACAAUUGUG CCUAGUUUGAUUCAACCAUAUAUGCCCAGGAAGCCAUUGACAGAGC~GC~UA~AUCC~UGU~A~AGA T P S I T K S T V P L I Q p y M P R K L T E A I D P D
7201 2309
ACCGAUAUGACCCUAUGAUUGAUGGAAUCAAGAAAUAUGAGGAACAAG C CAGACCAAUCAAGAUUAG CUGC~/~UCAAAUCAUUGAAUCCAUGGCAGCC~UG~A~ R Y P M I D G I K Y E Q A R P I K I 5 W R N Q I E M A A Q M
7321 2349
CUUUUAUGGUGAGAGAAGGGUAUAUGACCAUGGAUCL"JCCAAUGAGUGUUGUUAU CRAUGGGAUUGACGGUGUUGAGUAUUAUGAGC CA~G~UA~UC F M R E G Y M T D L M S V V I N G I D G V E Y Y E P N M
7441 2389
UGAUACUCAACAGG CC U ~ A G A C G C A C A [ ~ A G U A U C ~ G A C A A U G G ~ U C A G G G G ~ U ~ C ~ G ~ C C ~ A ~ G ~ A ~ I L R P K D A H G K R L F E T M E S G E R R I K S A K E A Y E S Y G
H
7561 2429
C U C L ~ J C ~ G U A C A G A O C C A t ~ C C C U C U U A U C U G C A U U G A G U G U CC C A ~ G A U C , A G A G A A G A G C C U U G G A ~ A ~ C C L Q T E P F P L C I C P K D E R R A L K I
K
F
A T
7681 2469
AGUUCAAUAUGCAUGCUAC~GGCUU~CCUGGACUUCAACGUCUL~GUUAUGG CCAAUAGACAUAAGCAUGGUA~A~U~UC F N H A R R L F D F V F V M A N R H K G I M G
I
P
7801 2509
CUAUAUCUCUUG C CUCU~CUCCCCUUACC, GAUUUAAC~ C G A C ~ J U C G C C A A U L ~ U G G ~ U G W C ~ C C C ~ ~ U ~ A G ~ I S A S F S P Y F N D F A N F D G M F H P 5 S 5 M S E
A
7921 2549
ACUUUCUUUCUACAGAGAGGGACAACUUGACUAGGAUGUUAACCAAUAGAUUUUCACUUAUGAAGGGAGCUAUUCUCAGAGUUCCU~A~A~A~C~GUGA F L T E R D N L T R M T N R F S L M K G A I L R V P G G P
G
8041 2589
UAUUUAADUCU~AUUAAUUUAUUCUAUDUGCAGAGUGCGUGGAUAAUGUUGGCUAGGUUUAAU~UA~AUAU~GU~C F N F I N L F Y Q S W I M L A R F N G R Q D ~
P
N
F
8161 2629
UUUAUGGGGAUGAUAACA5-dGUGGCUAUCAAGAUGGAGGUUUUA~CUUGGUACAAUCUUCAGACUG~G~G~A~AGU~AUGA~U~ Y G D N I V A I M E L P W Y N L Q T V S E A L F D Y
G
V
8281 2669
ACAAAG CUUCAC~GCGAAACCAUAUGGG~GAUCCUUGAAU~AU~CUCAAGAGACAUUUCAAAG K A E A K P Y G I L F D F K R H F
8401 2709
DUGAGGAACAGGUUUAUUGGAUUA E E V Y W r R
8521 2749
AAGAUCA~UA~G D Q K
8641 2789
UCCCUAG CCAUGUAGGUUUG CUUAAAC~G C CACCAAGAACCAU~CUGC~CAC~ P S V G L L K E A T K H F S A L
8761 2829
AACAUGG~CALtGCAGCA~UCCUUCCUA~AUAUCUUUA~/AGG UCCAACAAGAAUC~CC~G H G N M Q Q L P N I F G P T [ F T
8881 2869
AAACCAGAUAUGGUGUGAAGCAUGGGAUCCA~GCUUG UCCAAGC CCGAUUUCACGUACAUUUCUGAAAGCUUACCAUGUUUGACGACCCCAAA~A~G~AUC T R Y G V K G I Q S L K P D T Y S E S L P C L T T N F M
9001 2909
UUGGAG~UUGGCCCUAG G G E L A L
K
A
E
_ G
G C A I E
~
E
N
L
B
L
L
B
G
A
~
G
L
~
N
G T
E
K
~ W L V
~
P A
~
U M
G
U
C
5640 1788 ~
M
L
6000 1908
G F
A E
M
6120 1948
L
A
T
G
6240 1988
G
G
G
F
6360 2028
U A ~ S E Y
6490 2068
~ U E D
~
5520 1748
K
I
~
D
C
~
C~G ~ Q L
G A G ~ ~ C S A H F V
P
~ C S S
P
6720 2148
CAGGGACUUACGACAGGGUAA~UA~UGU G ~ D R V T G K Y
V
6840 2188
~ G ~ A ~ G C ~ G U ~ G ~ C C C ~ U ~ C ~ A U A C ~ G C C G E F E S Q S I D S P I S D T V A
6960 2228
C A G A C U G U G G C U A A G A C ~ U A G C ~ G ~ U A ~ A U ~ A U A ~ Q V A K R I A S F K Y F M D E K L M
tr~C G
F
A ~ U L I
T
U G
E
C Q
F
V
~
~ G
G
U
A
C L
M
~
A R
UGACCACAACCACGAAAGGGGACAAGAAGAUCCUGAGUUUUGCUAAUAUUUAUGAUAACCAAGCAUL~CAG T T T K G D K K L S A N r ¥ D N Q A F Q
M
~
C E
5761 1829
G
~
C F
D
G
A
~ D
I
G
~
A R
H
~
G
A G C ~ N Q E
~
R
T
~
F
~ U A ~ F G Y
U
A
L
C
A
5160 1628
C ~ K Y
~
G
R
R
V
P
5040 1598
R
~
D
R
M
~
A
A V
P
G T
G
~ U ~ U C ~ U L L N P L N
K
C
H
K
A
A
~
A
A
U ~ U C N G W
~ T
G
I
D
~ Y
S
H
V
V
G
G
~
~ C
W ~
R
T
C F
C ~ P E
V
R
L
G
E
Q
D
W
E
T
~
~ ~ H
U
C
T
A
~ F
S
I
~ L
P
V
~ R
U E
~ W
A S
G D
L
A
7800 2508
C ~ U ~ A U ~ N I F Y G
N
7920 2548
F
P
8040 2588
CC~GUAUG~GAG~UG K Y V R A
C
V
8160 2628
A
K
N
8280 2668
R
S
T
D
~C~WG P L
~ H
M
76~0 2468
V
M
~
UA K
Q
8880 2868
D
P
I
9000 2908
C A A G A C U D U ~ C ~ U G ~ U ~ G ~ U A ~ U A U ~ G ~ A C K T F T M F M R H I Q W K V L
Q
9120 2948
9121 2949
AGGCUUACL"JCAGAGUGUGUGAGACCUUUGUCAG CAAAGAGUGGCAAAACUUCAAG CGUGACAUAAAGAGG C U U A G U C A A G A C G A U G ~ ~ A ~ C C C ~ U A ~ A Y F R V C T F V S K W Q N K R I K R L S Q D D V G C S T T P C G
R
9240 2988
~241 2989
GA~ F
9361
GAU~UGLrt~GUCU~AGAGGCUCCUUCAGCCCUUCAUGG~UUUA~C~GUUAAUAC,
9481
GUCAUCUACUCC-GCUCAGUCAAGCUAUCGAC~A~GGUAGCGACCCCACUCCCC~
CCACAG C~GUGCCUAUUACACGCAGCUC~U~UAAAUAC T A L C L H A A ~ I N
~ACCU~UG~CAGUUACC L T L D G L
P
T
G ~ G ~ C S L F N
CCAACACAUUAAAUCUCUAGACAAGAUAGAUUUCA~GACUAGGCG ~ Q H I S L D K I D F K K T R R I GGUUAGUAUA~
U K
~ G I A
~
~
G
I
D
V
C
~
~
~ D ~
~ E C
UGcUUGUCUCUUGUCUAACGCUUACC~UUGCAAUU~GGAAUCCU~CUAGAU~CUUGAUUGGACCGAAGACAGCGAUGGUGGU~UG~UAU~C~A
9721
ACA~G UC CAC C~CAG
9841 ~
c
~
G
~
U
~
u
~
C U A C U U A G G C U U A A C CUA~AGGGUG U U U U A U A G UAUAGA[~/UUAUUAUUAUUAG U~YJAG U A U ~ A ~ C
9971
U G
~ ~ A N G
~ N
U
G S
~ L
V
C
C
U
C~
~ F
C
I
A I
~ I
~ C ~ C ~ C
C U ~ A A G G A G C U U C A C ~ ~ G W ~ C ~ C ~ A ~
9601
UAG U~CUCA
~
8640 2788
G
~ N
C
L
C K
A G
A
~CCU~U~C T S S
8760 2828
U Y
~
A
K
K
G A E
A G E V
8520 2748
T
G
~ F
U
8400 2708
~ T
T
CAGGAAGA~UAAA~ ~ A Q E E I K A I
~ L
7560 2428
"T
CGGAUGAACU~UAC ~UC C ~ U G D E L I P S F H
UG U A ~ U ~ G ~ G A C C R Q R W L Q D
7320 2348 7440 2388
.GCAAACAUUC~%AAAUGCGCUGUACGAAGCUCACCAUCAUGGACGGGAAUAUUAUGAC-GAAUUGA A N ~ E N A L Y E A H H G E Y Y E L
CAAUC~UAGAGCUGG CUACAUGAGU~GUGGCUCCCUCAUAUCUUAUG M N R A G Y S F V A P S ¥ L M
7090 2268 7200 2308
~ G ~ U A U C ~ T S E G Y P
R
6600 2108
9360 3027
D C
9480 9600 9720
~ ~ U
~UC ~
~
9840
Nucleotide sequence of P YFV RNA
1 2345
(a)
6
,027 i
i
' - 1866
,
III/"
87K
li;
58K
i l ' l l ,
•
-
I
J
i
i
"I
I
i
i
i
I
i
,
I
I
(b) [
/c
3207
1
1
1
~027 • 2264
III/ .,'
92K
II,"
72K
",
i
i
]
"I
I
I
I
i
i
i
i
(c) '
'
'
'
L
'
'
i
'
'
'
1
1027 12179
'-'
• Ill
3D .
Fig. 2. Sequencing of PYFV RNA labelled with pCp. Samples of [32p]pCp 3' endqabelled PYFV RNA were treated with alkali for 10 min (lane 1) or 5 rain (lane 2), or with RNases T1 0ane 3), U2 0ane 4), Phy M (lane 5) or Bacillus cereus (lane 6). The deduced sequence is shown on the right.
The STEMLOOP or FOLD programs did not reveal any stable secondary structures in the 5' UTR of PYFV, and this is typical for plant viruses (Lommel et al., 1988). However, a short repeat, U C U C U Y Y , occurs six times in the 5' UTR of PYFV (starting at nts 45, 71,127, 225, 236 and 262) and U C U C U U G occurs once (starting at nt 57).
I1.,
2C
VP3
I ... I
i
I
I
I
,
~
,
,
(a) I
.....
.I
1
3027 _'2227
"I'H~ ' "
3D
II.,.
2C
Comparison of the putative P Y F V polyprotein with those of other viruses
The derived polypeptide sequence of PYFV was compared to those of CPMV (Lomonossoff & Shanks, 1983; van Wezenbeek et al., 1983), TBRV (Meyer et al., 1986; Greif et al., 1988), tobacco etch potyvirus (TEV; Allison et al., 1986), poliovirus (Nomoto et al., 1982), hepatitis A virus (HAV; Najarian et al., 1985), human rhinovirus 14 (HRV14; Stanway et al., 1984), foot-and-mouth disease virus (FMDV; Forss et al., 1984), bovine viral diarrhoea
i i l i l ' l
I
i
i,
I
,
J
'
i
l
Fig. 3. Dot plot comparisonofthe putative PYFV polyprotein with (a) CPMV 200K (Lomonossoff & Shanks, 1983), (b) TBRV 254K (Greif et al., 1988), (c) HRV14 (Stanway et aL, 1984) and (d) HAV (Najarian et al•, 1985) polyproteins. The COMPARE (window=30, stringency = 17) and DOTPLOT programs from the GCG package were used. I, II and III denote the three regions of polyprotein sequence similarity between PYFV and the other viruses detected in the comparisons.
Fig. 1. Nucleotide sequence of PYFV. Sequence variation was found in some c D N A clones at positions 3163 (C to U), 4395 (C to U), 6149 (U to A) and 9350 (U to C). The deduced amino acid sequence of the putative polyprotein is shown beneath the nucleotide sequence• The termination codon is indicated by an asterisk•
3208
A. D. Turnbull-Ross and others
pestivirus (BVDV; Collett et al., 1988) and yellow fever flavivirus (YFV; Rice et al., 1985). The dot matrix patterns showed no extensive regions of similarity between the polyprotein of PYFV and those of BVDV, YFV or TEV. The most extensive regions of similarity to the PYFV polyprotein were in the 200K polyprotein of CPMV (Fig. 3a, regions II and III) and the 254K polyprotein of TBRV (Fig. 3b). These regions of similarity were less extensive in picornaviruses (Fig. 3 c and d). A third small region of similarity was detected in the comparison of the PYFV polyprotein with that of HRV14 (Fig. 3c, region I). The dot matrix comparisons also showed that the PYFV polyprotein contains a region at the C terminus that does not correspond to any region in picornavirus polyproteins (Fig. 3c and d). Structural proteins
Recombinant 2ZAPII clones containing random-primed PYFV c D N A were screened with an anti-PYFV antiserum. A number of positively reacting clones were selected and excised in vivo to yield pBluescript clones. The largest clone selected in this way contained an insert which started at nt 1042 and terminated at nt 3165. This produced an in-frame fusion protein linked to the flgalactosidase protein encoded by pBluescript, with the PYFV moiety beginning at amino acid 256 and ending at amino acid 963. The PYFV polyprotein had no amino acid sequence similarity with the particle proteins of CPMV, TBRV or TEV. However, similarity was found between the PYFV polyprotein and HRV14 VP3 (Fig. 3c, region I). VP3 is the most highly conserved particle protein among picornaviruses (Acharya et al., 1989) and sequence conservation has been noted between HRV14 VP3 and regions of the capsid proteins of southern bean mosaic sobemovirus, tomato bushy stunt tombusvirus, carnation mottle carmovirus and turnip crinkle carmovirus (Vingron & Argos, 1991). The sequence similarity between PYFV (or HRV14) and these plant viruses is somewhat less than that between PYFV and HRV14.
Non-structural proteins
The conservation of domains in the non-structural proteins of R N A viruses is well documented (Argos et al., 1984; Haseloff et al., 1984; Goldbach, 1986, 1987) and two of these domains were found in PYFV by dot matrix comparison. The longest regions of similarity corresponded to the 87K protein of CPMV (WeUink et al., 1986) and the 92K protein of TBRV proposed by Greif et al. (1988) (Fig. 3a and b, region III). The similarity in this region between PYFV and the 3D
(RNA polymerase) proteins of picornaviruses was weaker but still readily detectable (Fig. 3 c and d, region III). The extent of the similarity between PYFV and the NTP-binding proteins of CPMV (58K), TBRV (72K) and picornaviruses (2C) was similar for all the comparisons (Fig. 3a to d, region II). The conserved domains ( G / A ) X X G X G K ( S / T ) and D(D/E) were found in the polyprotein sequence of PYFV (amino acids 1467 to 1474 and 1518 to 1519).
Discussion The particle properties of PYFV have led to speculation that it is more closely related to animal picornaviruses than any other plant virus described to date. We have determined the complete sequence of PYFV R N A (isolate P-121) and found that the genome organization shows both similarities to and differences from those of picornaviruses. The genome of PYFV is a ssRNA, 9871 nts in length, and contains a large ORF encoding 3027 amino acids. The presence of a polyprotein is consistent with the array of large proteins produced during in vitro translation of PYFV R N A (M. A. Mayo & A. D. Turnbull-Ross, unpublished data). The length of the 5' UTR, the absence of stable secondary structure and the probable use of the first A U G codon from the 5' end of the R N A suggest that ribosomal initiation occurs by a scanning mechanism. This differs from initiation in picornaviral RNAs where an internal ribosomal entry site is used (reviewed by Jackson et al., 1990). In PYFV isolate P-121 there are three distinct particle proteins (31K, 26K and 22.5K) compared with the four found in most picornaviruses. Antibodies in a polyclonal antiserum to PYFV particles reacted with the translation products of transcripts of clones from the 5' half of the genome, suggesting that this region contains some of the coat protein epitopes. The largest reactive clone encoded PYFV amino acids 256 to 963. In addition, the dot matrix comparison showed similarity between the PYFV polyprotein (amino acids 653 to 798) and VP3 of HRV14. The location of the coat proteins within the polyprotein of PYFV is similar to that found in picornaviruses. However, it is not known whether any non-structural proteins are present on the amino-terminal side of the structural proteins of PYFV, as occurs in the polyproteins of comoviruses and nepoviruses (van Wezenbeek et al., 1983; Meyer et al., 1986). Several alignments have been made between sequences of viral NTP-binding proteins (Gorbalenya & Koonin, 1989; Candresse et al., 1990). It has been suggested that the potyviruses form an outlying group
Nucleotide sequence o f P Y F V R N A
only distantly related to the picornavirus/comovirus/nepovirus cluster; this conclusion is based on the spacing of the conserved domains (G/A)XXGXGK(S/T) and D(D/E) and the presence of other conserved domains (Gorbalenya et al., 1988, 1989). The spacing between the domains in PYFV suggests that it belongs to the picornavirus/comovirus/nepovirus cluster rather than the potyvirus/flavivirus/pestivirus cluster. Sequence conservation in the RNA polymerase domain of RNA viruses has been used to produce several alignments and phylogenetic trees (Kamer & Argos, 1984; Poch et al., 1989; Candresse et al., 1990; Koonin, 1991). The strongest region of homology shown by the PYFV polyprotein was with the RNA polymerase protein of CPMV (39.7~ identity over 280 amino acids; Fig. 3 a, region III). The carboxy-terminal location of the putative polymerase of PYFV is analogous to the location of the polymerase in the polyproteins of picornaviruses, comoviruses and nepoviruses and differentiates PYFV from potyviruses in which the particle protein occupies the carboxy-terminal position in the polyprotein (Allison et al., 1985). In addition, the results of dot matrix comparisons showed that the putative polymerase of the PYFV polyprotein has a carboxyterminal region with no counterpart in picornaviruses. A carboxy-terminal extension of the polymerase protein sequence is also found in CPMV and TBRV, but there is no sequence conservation among the plant viruses in this region (Fig. 3a and b). The 3' UTR (509 nts) is longer than that of comovirus, nepovirus, potyvirus or picornavirus RNAs (35 to 350 nucleotides). In addition no evidence was found for a poly(A) tail in PYFV RNA as is normal for the RNA of picorna-like viruses. However, most of the oligo(dT)primed clones were co-terminal and sequence analysis showed that the clones originated in an A-rich sequence of the 3' UTR (nts 9555 to 9568). This sequence may have been responsible for the apparent binding of PYFV RNA to oligo(dT) columns reported previously (Murant, 1988). We could not detect binding of PYFV RNA to an oligo(dT) column under similar conditions, although we did detect binding of TBRV RNA. Analysis of potential secondary structures in the 3" UTR of PYFV revealed a possible stem-loop structure near the 3' terminus (Fig. 4). The free energy (AG°37) of the structure in PYFV, calculated by the method of Freier et al. (1986), is similar to those of the 3' UTR stem-loops of mosquito-borne flaviviruses (Grange et al., 1985; Brinton et al., 1986; Wengler & Castle, 1986; Hahn et al., 1987) and tick-borne encephalomyelitis virus strain Hypr (Mandl et al., 1991). There are differences in the predicted structures of the loops and PYFV has a short 'tail' which is not found in flaviviruses. None of the conserved sequence motifs identified in mosquito-borne
3209
AGU U
U
U A
U UAG AUU
U
U UAU
AUCUAU U U UAGAUAU C G-C A -U
U A _U C U-A U-A U A U
U G-C U-A G.U G-C G-C A-U A-U U .G C C-G C-G
AA _ jA U.G U.G C-G G C
U.G U-A C -G G A
5 ' -U A G U U U U C U C
A U-AA C-G A-U AAAUAUUAAAUAAGGCoH
Fig. 4. A possible secondary structure for the extreme 3'-terminal nucleotide sequenceof PYFV,
flaviviruses (Hahn et al., 1987) were found in the 3' UTR of PYFV. The structural protein coding regions of PYFV are positioned towards the 5' end, and those of the nonstructural proteins at the 3' end, of the genome which is reminiscent of the organization in picornaviruses. Sequences in the part of the polyprotein containing PYFV coat proteins were similar to sequences in VP3 of HRV14, but no sequence similarity was detected to the coat proteins of CPMV. In contrast, the similarities in the RNA polymerase domain were greater between PYFV and CPMV than between PYFV and picornaviruses. PYFV thus has characteristics which place it taxonomically between picornaviruses and comoviruses. However, it is distinct from both in lacking a poly(A) tail. The unusual combination of particle properties and the genome organization described in this paper justify placing PYFV in a new taxonomic group. It remains to be seen whether other possible members of the group [PYFV (Anthriscus serotype) and dandelion yellow mosaic virus (Murant, 1988)] share the same genome organization. In particular it will be of interest to determine whether PYFV isolate P-121 is typical of the group in having a long 3' UTR with the potential to form a stem-loop structure. This work was supported by a grant from the Scottish Office Agricultureand FisheriesDepartment(SOAFD)under the Increased FlexibilityScheme.We are gratefulto the SERCDaresburylaboratory for use of computingfacilities.
3210
A. D. Turnbull-Ross and others
References ACHARYA, R., FRY, E., STUART,D., FOX, G., ROWLANDS,D. & BROWN, F. (1989). The three-dimensional structure of foot-and-mouth disease virus at 2-9/k resolution. Nature, London 337, 709-716. AGOL, V. I. (1991). The 5'-untranslated region of picornaviral genomes. Advances in Virus Research 40, 103-180. ALLISON, R. F., SORENSON, J. C., KELLY, M. E., ARMSTRONG, F. B. & DOUGHERTY, W. G. (1985). Sequence determination of the capsid protein gene and flanking regions of tobacco etch virus: evidence for synthesis and processing of a polyprotein in potyvirus genome expression. Proceedings of the National Academy of Sciences, U. S. A. 82, 3969-3972. ALLISON, R., JOHNSTON, R. E. & DOUGHERTY, W. G. (1986). The nucleotide sequence of the coding region of tobacco etch virus genomic RNA: evidence for the synthesis of a single polyprotein. Virology 154, 9-20. ARGOS, P., KAMER, G., NICKLEN, M. J. H. & WINNER, E. (1984). Similarity in gene organization and homology between proteins of animal picornaviruses and a plant comovirus suggest common ancestry of these virus families. Nucleic Acids Research 12, 72517267. BIRNBOIM, H. C. & DULY, J. (1979). A rapid alkaline extraction procedure for screening recombinant plasmid DNA. Nucleic Acids Research 7, 1513-1523. BRINTON, M. A., FERNANDEZ, A. V. & DISPOTO, J. H. (1986). The 3'nucleotides of flavivirus genomic RNA form a conserved secondary structure. Virology 153, 113-121. BULUWELA, L., FORSTER, A., BOEHM, T. & RABBITTS,T. H. (1989). A rapid procedure for colony hybridisation using nylon filters. Nucleic Acids Research 17, 452. CANDRESSE, T., MORCH, M. D. & DUNEZ, J. (1990). Multiple alignment and hierarchical clustering of conserved amino acid sequences in the replication-associated proteins of plant RNA viruses. Research in Virology 141, 315-329. Cm, H.-C., HSIEH, J.-C. & TAM, M. F. (1988). Modified method for double stranded DNA sequencing and synthetic oligonucleotide purification. Nucleic Acids Research 16, 10382. COLLETT,M. S., EAR.SON,R., GOLD, C., STRICK, D., ANDERSON, D. K. & PURCHIO, A. F. (1988). Molecular cloning and nucleotide sequence of the pestivirus bovine viral diarrhea virus. Virology 165, 191-199. DEBORDE, D. C., NAEVE, C. W., HERLOCHER, M. L. & MAASSAB,H. F. (1986). Resolution of a common RNA sequencing ambiguity by terminal deoxynucleotidyl transferase. Analytical Biochemistry 157, 275-282. DEVEREUX, J., HAEBERLI, P. & SMITHIES, O. (1984). A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Research 12, 387-395. DONIS-KELLER, H., MAXAM, A. M. & GILBERT, W. (1977). Mapping adenines, guanines and pyrimidines in RNA. Nucleic Acids Research 4, 2527-2538. ELNAGAR, S. & MURANT, A. F. (1976). Relations of the semi-persistent viruses, parsnip yellow fleck and anthriscus yellows, with their vector Cavariella aegopodii. Annals of Applied Biology 84, 153-167. ENGLAND, T. E., BRUCE, A. G. & UHLENBECK, O. C. (1980). Specific labeling of 3' termini of R N A with T4 RNA ligase. Methods in Enzymology 65, 65-74. FORSS, S., STREBEL, K., BECK, E. & SCHALLER, H. (1984). Nucleotide sequence and genomic organization of foot-and-mouth-disease virus. Nucleic Acids Research 12, 6587 6601. FREIER, S. U., KIERZEK, R., JAEGER, J. A., SUGIMOTO,N., CARUTHERS, M. H., NEILSON, T. & TURNER, D. H. (1986). Improved free-energy parameters for predictions of RNA duplex stability. Proceedings of the National Academy of Sciences, U.S.A. 83, 9373-9377. GALLIE, D. R., SLEAT,D. E., WATTS, J. W., TURNER, P. C. & WILSON, T. M. A. (1987). A comparison of eukaryotic viral Y-leader sequences as enhancers of mRNA expression in vivo. Nucleic Acids Research 15, 8693-8711. GELIEBETER, J. (1987). Dideoxynucleotide sequencing of RNA and uncloned eDNA. Focus 9, 5-8. GOLDBACH, R. W. (1986). Molecular evolution of plant RNA viruses. Annual Review of Phytopathology 24, 289-310:
GOLDBACH, R. (1987). Genome similarities between plant and animal RNA viruses. Microbiological Sciences 4, 197-202. GORBALENYA,A. E. 8£ KOONIN, E. V. (1989). Viral proteins containing the purine NTP-binding sequence pattern. Nucleic Acids Research 17, 8413-8440. GORBALENYA, A. E., KOONIN, E. V., DONCHENKO, A. P. & BLINOV, V. M. (1988). A conserved NTP-motif in putative helicases. London, Nature 333, 22. GORBALENYA, A. E., KOONIN, E. V., DONCHEt,r~O, A. P. & BLINOV, V. M. (1989). Two related superfamilies of putative helicases involved in replication, recombination, repair and expression of DNA and RNA genomes. Nucleic Acids Research 17, 4713-4730. GRANGE, T., BOULOY, M. & GIRARD, M. (1985). Stable secondary structure at the 3' end of the genome of yellow fever virus (17D vaccine strain). FEBS Letters 188, 159-163. GREIF, C., HEMMER, O. & FRITSCH, C. (1988). Nucleotide sequence of tomato black ring virus RNA- 1. JournalofGeneral Virology 69, 15171529. HAHN, C. S., HAHN, Y. S., RICE, C. M., LEE, E., DALGARNO, L., STRAUSS,E. G. & STRAUSS,J. H. (1987). Conserved elements in the 3' untranslated region of flavivirus RNAs and potential cyclization sequences. Journal of Molecular Biology 198, 33-41. HANAHAN, D. (1983). Studies on transformation of Escherichia cob with plasmids. Journal of Molecular Biology 166, 557-580. HASELOFF, J., GOELET, P., ZIMMERN, D., AHLQUIST, P., DASGUPTA,R. & KAESBERG,P. (1984). Striking similarities in amino acid sequence among nonstructural proteins encoded by RNA viruses that have dissimilar genomic organization. Proceedings of the National Academy of Sciences, U.S.A. 81, 4358-4362. HEMIDA, S. K. & MURAN'r, A. F. (1989a). Particle properties of parsnip yellow fleck virus. Annals of Applied Biology 114, 87-100. HEM1DA, S. K. & MURANT, A. F. (1989b). Host ranges and serological properties of eight isolates of parsnip yellow fleck virus belonging to the two major serotypes. Annals of Applied Biology 114, 101109. HUYNH, T. V., YOUNG, R. A. & DAVIS, R. W. (1985). Constructing and screening eDNA libraries in 2gtl0 and 2gtl 1. In DNA Cloning, A Practical Approach, vol. 1, pp. 49-78. Edited by D. M. Glover. Oxford: IRL Press. JACKSON, R. J., HOWELL, M. T. & KAMINSKI, A. (1990). The novel mechanism of initiation of picornavirus RNA translation. Trends in Biochemical Sciences 15, 477-483. KAMER, G. & ARGOS, P. (1984). Primary structural comparison of RNA-dependent polymerases from plant, animal and bacterial viruses. Nucleic Acids Research 12, 7269-7282. KOONIN, E. V. (1991). The phylogeny of RNA-dependent RNA polymerases of positive-strand R N A viruses. Journal of General Virology 72, 2197-2206. KOZAK, M. (1987). At least six nucleotides preceding the A U G initiator codon enhance translation in mammalian cells. Journal of Molecular Biology 196, 947-950. LOMMEL, S. A., WESTON-FINA, M., XIONG, Z. & LOMONOSSOFF, G. P. (1988). The nucleotide sequence and gene organization of red clover mosaic virus RNA-2. Nucleic Acids Research 16, 8587 8602. LOMONOSSOFF, G. P. & SHANKS, M. (1983). The nucleotide sequence of cowpea mosaic virus B RNA. EMBO Journal 2, 2253-2258. LUI'CKE, H. A., CHOW, R. C., MICKEL, F. S., MOSS, K. A., KERN, H. F. & SCEELE, G. A. (1987). Selection of A U G initiation codons differs in plants and animals. EMBO Journal 6, 43-48. MANDL, C. W., KUNZ, C. & HEINZ, F. X. (1991). Presence of poly(A) in a flavivirus: significant differences between the 3' noncoding regions of the genomic RNAs of tick-borne encephalitis virus strains. Journal of Virology 65, 4070~077. MANIATIS, T., FRITSCH, E. F. & SAMBROOK, J. (1982). In Molecular Cloning: A Laboratory Manual, pp. 464-466. New York: Cold Spring Harbor Laboratory. MEYER, M., HEMMER, O., MAYO, M. A. & FRITSCH, C. (1986). The nucleotide sequence of tomato black ring virus RNA-2. Journal of General Virology 67, 1257-1271. MURANT, A. F. (1988). Parsnip yellow fleck virus, type member of a proposed new virus group, and a possible second member, dandelion yellow mosaic virus. In The Plant Viruses, vol. 3. Polyhedral Virions
Nucleotide sequence o f P Y F V R N A
with Monopartite RNA Genomes, pp. 273-288. Edited by R. Koenig. New York: Plenum Press. dURAN'r, A. F. (t991). Parsnip yellow fleck virus group. In
Classification and Nomenclature of Viruses. Fifth Report of the International Committeeon Taxonomy of Viruses, pp. 318-319. Edited by R. I. B. Francki, C. M. Fauquet, D. L. Knudson & F. Brown. Vienna: Springer-Verlag. VlURANT, A. F. & GOOLD, R. A. (1968). Purification, properties and transmission of parsnip yellow fleck, a semi-persistent, aphid-borne virus. Annals of Applied Biology 62, 123-137. VlURANT, A. F., HEMIDA, S. K. & MAYO, M. A. (1987). Plant viruses that resemble picoruaviruses. In Abstracts of the 7th International Congress of Virology, Edmonton, Canada, 1987, p. 183. N[AJARIAN, R., CAPUT, D., GEE, W., POTTER, S. J., RENARD, A., MERRYWEATHER, J., VAN NEST, G. & DINA, D. (1985). Primary structure and gene organization of human hepatitis A virus. Proceedings of the National Academy of Sciences, U.S.A. 82, 26272631. NATSUAKI, T., MAYO, M. A., JOLLY, C. A. & MURANT A. F. (1991). Nucleotide sequence of raspberry bushy dwarf virus RNA-2: a bicistronic component of a bipartite genome. Journal of General Virology 72, 2183-2189. NOMOTO,A., OMATA,T., TOYODA, H., KUGE, S., HORIE, H., KATAOKA, Y., GENBA, Y., NAKANO, Y. & IMURA, N. (1982). Complete nucleotide sequence of the attenuated poliovirus Sabin 1 strain genome. Proceedingsof the National Academy of Sciences, U.S.A. 79, 5793-5797. POCH, O., S#.UVk.GET, I., DELARUE, M. & TORDO, N. (1989). Identification of four conserved motifs among the RNA-dependent polymerase encoding elements. EMBO Journal 8, 3867 3874. RICE, C. M., LENCHES, E- M., EDDY, S. R., SHIN, S. J., SHEETS, R. L. &
3211
STRAUSS, J. H. (1985). Nucleotide sequence of yellow fever virus: implications for flavivirus gene expression and evolution. Science 229, 726-733. SANEYOSHI, M., OHASH1, Z., HARADA, F. & NISHIMURA, S. (1972). Isolation and characterization of 2-methyladenosine from Escherichia coli tRNA2 TM,tRNAI Asp, tRNA1 His and tRNA arg, Biochimh:a et biophysica acta 262, 1-10. SANGER, F., NICKLEN, S. & COULSON, A. R. (1977). DNA sequencing with chain-terminating inhibitors. Proceedings of the National Academy of Sciences, U.S.A. 74, 5463-5467. STANWAY, G., HUGHES, P. J., MOUNTFORD, R. C., MINOR, P. D. & ALMOND, J. W. (1984). The complete nucleotide sequence of a common cold virus: human rhinovirus 14. NucleicAcids Research 12, 7859-7875. VAN WEZENBEEK, P., VERVER, J., HARMSEN, J., VOS, P. & VAN KAMMEN, A. (1983). Primary structure and gene organisation of the middle component RNA of cowpea mosaic virus. EMBO Journal 2, 941-946. VINGRON, M. & ARGOS, P. ( 1991). Motif recognition and alignment for many sequences by comparison of dot-matrices. Journalof Molecular Biology 218, 33-43. WELLINK, J., REZELMAN, G., GOLDBACH,R. & BEYREUTHER,K. (1986). Determination of the proteolytic processing sites in the polyprotein encoded by the bottom-component R N A of cowpea mosaic virus. Journal of Virology 59, 50-58. WENGLER, G. & CASTLE, E. (1986). Analysis of structural properties which possibly are characteristic for the T-terminal sequence of the genome RNA of flaviviruses. Journal of General Virology 67, 11831188.
(Received 29 June 1991; Accepted 18 August 1992)