The nucleotide sequence of parsnip yellow fleck ... - Semantic Scholar

1 downloads 0 Views 1MB Size Report
The nucleotide sequence of parsnip yellow fleck virus: a plant picorna-like virus. A. D. Turnbull-Ross, B. Reavy,* M. A. Mayo and A. F. Mutant. Scottish Crop ...
Journal of General Virology (1992), 73, 3203-3211. Printedin Great Britain

3203

The nucleotide sequence of parsnip yellow fleck virus: a plant picorna-like virus A. D. Turnbull-Ross, B. Reavy,* M. A. Mayo and A. F. Mutant Scottish Crop Research Institute, Invergowrie, Dundee D D 2 5DA, U.K.

The complete sequence of 9871 nucleotides (nts) o f parsnip yellow fleck virus (PYFV; isolate P-121) was determined from c D N A clones and by direct sequencing o f viral RNA. The R N A contains a large open reading frame between nts 279 and 9362 which encodes a polyprotein o f 3027 amino acids with a calculated Mr o f 336212 (336K). A P Y F V polyclonal antiserum reacted with the proteins expressed from phage carrying c D N A clones from the 5' half o f the P Y F V genome. Comparison o f the polyprotein sequence of

P Y F V with other viral polyprotein sequences reveals similarities to the putative NTP-binding and R N A polymerase domains o f cowpea mosaic comovirus, t o m a t o black ring nepovirus and several animal picornaviruses. T h e 3' untranslated region o f P Y F V R N A is 509 nts long and does not have a poly(A) tail. The Y-terminal 121 nts may form a s t e m - l o o p structure which resembles that formed in the g e n o m i c R N A o f mosquito-borne flaviviruses.

Introduction

infectivity after treatment with pronase or proteinase K (Murant et al., 1987). It is of considerable interest to know whether the similarity of PYFV to animal picornaviruses in respect of particle properties holds true at the level of genome organization. Here we report the complete nucleotide sequence of genomic R N A from PYFV isolate P-121 (Murant & Goold, 1968) which belongs to the parsnip serotype and is the type member of the parsnip yellow fleck virus group.

Parsnip yellow fleck virus (PYFV) has a combination of properties unlike those of any other plant virus. Its particle constituents resemble those of the picornaviruses of vertebrates (Murant & Goold, 1968; Hemida & Murant, 1989a) and it depends on a helper virus, anthriscus yellows virus, for transmission in the semipersistent manner by the aphid Cavariella aegopodii (Murant & Goold, 1968; Elnagar & Murant, 1976). PYFV has therefore been placed in a new taxonomic group (Murant et al., 1987; Murant, 1988) provisionally called the 'parsnip yellow fleck virus group' (Murant, 1991). PYFV isolates fall into two major serotypes with somewhat different, though overlapping, host ranges (Hemida & Murant, 1989b). The serotypes are called the 'parsnip' and 'Anthriscus' serotypes after the hosts in which they occur. PYFV particles are isometric, 30 nm in diameter (Murant & Goold, 1968; Hemida & Murant, 1989 a) and contain three protein species of approximate Mr 31K, 26K and 22.5K (Hemida & Murant, 1989a). Sedimentation in sucrose gradients separates the particles into two fractions. The top component (60S) particles appear to be empty shells (Hemida & Murant, 1989a), whereas the bottom component (152S) particles contain a ssRNA estimated to be 9.9 kb by electrophoresis of glyoxylated R N A in agarose gels (Hemida & Murant, 1989a). PYFV R N A may also possess a genome-linked protein ( V P g ) j u d g i n g by the loss of 0001-1194 © 1992 SGM

Methods Virus and RNA purification. PYFV isolate P-121 was propagated in spinach plants (var. Medania) and purified from infected leaves ~ harvested 19 days post-inoculationas described by Hemida & Murant (1989a). Viral RNA was extracted by resuspending virus pellets in 10 mM-Tris HC1 pH 7-6, 0.1 mM-EDTA, 1.0~ (w/v) SDS, heating at 65 °C for 5 min and then extracting with an equal volume of phenol. The aqueous phase was extracted twice more with phenol:chloroform:3-methyl-l-butanol (25:24:1 v/v/v) and the RNA was stored in 70~ ethanol at -70 °C.

Escherichia coli strains and DNA manipulations. E. coli strain XL1Blue (Stratagene) was used, except in the initial cloning of cDNA into pUC19 when E. coli strain DH5~ (Bethesda Research Laboratories) was used. Transformationswere essentiallyas described by Hanahan (1983). Plasmid DNA was prepared by alkaline lysis (Birnboim & Doly, 1979). cDNA synthesis and screening of libraries. A cDNA synthesis kit (Pharmacia) was used for cDNA synthesis from viral RNA primed

3204

A. D. Turnbull-Ross and others

with oligo(dT), or with a mixture of oligo(dT) and random primers. N o t I / E c o R I linkers were ligated onto the cDNA which was then

purified by chromatography in Sephacryl S-400 columns (Pharmacia) and ligated into pUC19 or 2ZAPII (Stratagene). The ~[ZAPII tigation and in vitro packaging reactions were performed as described by the manufacturers (Stratagene, BCL). Recombinant phage were screened with a polyclonal antiserum raised against PYFV particles (Hemida & Murant, 1989b) as described by Huynh et al. (1985). Selected clones were excised in vivo from 2ZAPII into pBluescript, as described by the manufacturer, for further analysis. Dot blots (Buluwela et al., 1989) were probed with restriction fragments from selected cDNA clones to obtain additional clone coverage of the PYFV genome. The majority of the sequence was determined from clones in pUC19 or pBluescript by the dideoxynucleotide method (Sanger et al., 1977; Chi et al., 1988) with T7 D N A polymerase. The remainder of the sequence was obtained, after subcloning of the cDNA into M13, by using the Klenow fragment of DNA polymerase I and substituting deaza-dGTP for dGTP. The sequence was compiled and analysed using the GCG package (Devereux et al., 1984). Determination o f the R N A 5'-terminal sequence. The sequence of the 5'-terminal region was determined by direct sequencing of RNA with the aid of two synthetic 17-mer oligonucleotides (complementary to positions 47 to 63 and 138 to 154) and reverse transcriptase as described by Geliebeter (1987). Terminal deoxynucleotidyl transferase was used to add homopolymer tails to the reverse transcriptase products (DeBorde et al., 1986). Determination o f the R N A 3"-terminal sequence. Purified PYFV RNA was 3' end-labelled with [32p]pCp and T4 RNA ligase (England et al., 1980) and applied to 1.0 ml Sephadex G50 spun columns (Maniatis et al., 1982). The 3'-terminal nucleotide was determined by digesting an aliquot of the labelled RNA with ribonuclease T2 (Donis-Keller et al., 1977) and separating the products by two-dimensional thin layer chromatography on 100 mm x 100 mm cellulose plates (Polygram Cel 300MN). The solvent systems were isobutyric acidq)-5 M-NH4OH (5 : 3 v/v) in the first dimension and isopropanol-HCl-water (70:15:15 v/v/v) in the second dimension (Saneyoshi et al., 1972). Additional sequence was obtained by partial digestion of the [32p]pCp-labelled RNA with base-specific ribonucleases as described by Natsuaki et al. (1991).

Results The nucleotide sequence of P Y F V

The entire PYFV genome was sequenced in both directions from cloned cDNA except for the 5'-terminal 68 nucleotides (nts) which were sequenced directly from the RNA. Over 80% of the sequence was determined from at least two independent cDNA clones and on average each position was sequenced 4.9 times. The complete genome of PYFV was found to be 9871 nts long (Fig. 1), which is in good agreement with an earlier estimate of 9900 + 290 nts (Hemida & Murant, 1989a). Sequence heterogeneity was found at four positions and may have arisen from variation in the virus population, or from errors introduced by reverse transcriptase in the synthesis of the cDNA first strand, as reported by Lomonossoff & Shanks (1983) and Meyer et al. (1986). All the variation was located in the large open reading

frame (ORF; see below). A transversion from U to A (nt 6149, in one of five clones) and a transition from U to C (nt 9350, in one of three clones) did not alter the amino acid encoded, but two C to U transitions resulted in changes from Thr to Ile (nt 3163, in one of four clones) and Leu to Phe (nt 4395, in one of three clones). Determination of the 3"-terminal sequence of P Y F V

The 22 T-terminal nts of PYFV RNA were determined by partial ribonuclease digestion of the pCp-labelled RNA (Fig. 2). The sequence deduced from several repetitions was 5' GAAAGUAAAUAUUAAAUAAGGX 3'. The terminal nucleotide X could not be resolved in this system, but digestion of pCp-labelled RNA with ribonuclease T2 and chromatographic separation of the products revealed C to be labelled predominantly. A sequence identical to that deduced from the RNA was found at the 3' end of one of the cDNA clones. ORFs in P Y F V

The largest ORF occupies 92% of the genome and is in the plus viral (messenger RNA) sense (Fig. 1). The putative translation product is 3027 amino acids long and forms a polyprotein with a calculated Mr of 336212 (336K). The three next largest ORFs (588, 453 and 360 nts) are all encoded in the viral minus sense. The second longest ORF in the positive sense is only 303 nts long. The large ORF begins with the first AUG codon from the 5' end of the RNA (nts 279 to 281) and ends with an ochre codon (nts 9358 to 9360). The first AUG codon is located within a short sequence in which six out of nine nucleotides match the consensus (AACAAUGCC) proposed by Liitcke et al. (1987) for plant initiation codons. The only in-frame initiation codon in the first 1 kb with a better context is at nt 591 with seven out of nine matches to the consensus. The first AUG codon is probably the initiation codon for the PYFV polyprotein because (i) this AUG has an A residue in position - 3 which is unlikely to be skipped by a scanning ribosomal 40S subunit (Kozak, 1987), (ii) the base composition from nts 1 to 278 (34-2% U, 24.8% C, 13.3% G and 27.7% A) has a high U + A and low G residue content consistent with leader sequences in RNA of other plant viruses (Gallie et al., 1987) and if the 5' untranslated region (UTR) is extended to nt 590 the G content rises to 20.0 %; and (iii) an oligopyrimidine tract precedes the first AUG codon. Oligopyrimidine tracts also precede the initiation codons of tomato black ring virus (TBRV) RNA 1 (Greif et al., 1988) and cowpea mosaic virus (CPMV) B RNA (Lomonossoff & Shanks, 1983), and are important in translation initiation of picornavirus RNA (reviewed by Agol, 1991).

Nucleotide sequence of P YFV RNA 1 121

UUAAAAGCAAir~GCAUCGAUUACAGAAUUCAUUCUUCA~UCUUUCUCUUCAAAGCUCUCUUGAUUUCC~G~U~G~U~C~ ACUACGUCUCUCUAAGUGCUAACC~CGUGUGAL'UJk~A~]C~CCAAU*~G

1

~4

S

S

S

120

C A C C U C G U G ~ G ~ A ~ C ~

S

Q

S

V

N

M

V

D

G

V

D

L

~ 5UCUAUCUCUCUCGADUCUCU

N

D

T

A

V

~

~ L

C V

~ U ~ R I

~ G ~ A ~ U S F S D D E

~ E

C ~ I

C

I

D

S

~ G

A F

~ T

K

L

N

V

1081 269

AUCAAGAUcUGUUUGGGUGGCUUGAAGCGGCUCAAGGGGGUGGD~AUGUG~CCCAAAGt"JUGAGAUCUGCCA~CACGCA~G~G~A~C~GU~UC~ Q D L F G W E A A Q G G G Y V Q S R S A T T H T S N S V L

H

361 29

C U G C U A G U G ~ A C A U G G G G A G A A C U G G A U A G ~ . G C A U C C A C C U G CA~GUACAACUUCUUUGACGUGUCAGACGUUGAACGCAAUCCUGGAC, A ~ A S G T W G L D R A S T C M Y F F V S D V E R N P G S

G ~ S

481 69

AAAAGAGGGCUGGCGCUCUCCUUGGAGtrGUCUCGGG CUUUUAAGGACACUGAACAAGUUCUUAGUGC~CGCUG~ K R A G A L L G V R A F K D T E Q V S A E R C

601 109

CU~K/UGAGcCAUUAAA~UAGGGCUGAGAUUACUCCUAccGCAGCCAG~%AACUAGACA~C~cACACAGAGCCAAAUUUAA~A~AU~GUAU~G F E P L K D R A B T P T A A S K L D T L E A H R A K F N Y

M

721 149

CUGUCACL~CUUGAUGCACCAAGGAGAUUCUCGCGAAUGCAUCAUGUAUCUD~UGACCGCCGGUUCA~GGAcCCUCUU~ V T 5 L M H G D S R E C I M Y L C D R F K D P L L

A L

041 189

UGCAGACUC ~GUCUACAA~ACUGGA~UGAUGGCCUUCUCUAGGAAGGAAGCUAUAGCUGCAGACAGACUGCAGCU~AUUUGUAUGUCAA~UG~~ Q T H V Y K G R M M A F S R K A I A D R L Q L Y L Y V K G

961 229

ACACUCCAAUAACUGUcAAUGUUAGAACCUCGCUGAUUU~CUCcAAUGCAGAAAAUCUGCUCAAAUGUGAUUCUCAAAU~UG~UAUAG~A~ T P T V N V R T L I G S N A E N L K C D S Q I D T D M

C Q

I

A

240

k

2%

K

400 68

A

T

T

600 108

R

V

A

720 148

~ P

G

M

040 108

E

T

Q

N

960 220

5

F

A

Q

N

I080 268

K

W

N

P

V

1200 300

1201 309

U I ~ U U CUGUG~CIJ C A G ~ C A U A G C i ~ C G C A ~ A U U ~ C A C C A ~ C C A U A C 4 % U G ~ C ~ A ~ G ~ ~ G ~ U C C ~ A U A G G S V S Q R A S R M I T K A D A T 5 Y E D S D V V G S N A K C P S

I

G

1320 348

1321 349

GA~AGAACUUGAGAAUUGGAGUUGCUCAGGCCUCCAUUCAGAACACCAAGGAU~C~UA~~~UC~A~CC~UA~GUC L

1440 388

1441 389

UUGA~GUUUUAUGA~CUCU~CCUCUAUAUCUCAGACGCU~CCCCUUAUCAAUGAUCAAUUCACCAGACCCAUCUAUUCCAGAA~GU~G~A~C~U~GC~ E S F M S S S I S Q T P L I D Q T R P I Y S R T F E W K A

1561 429

CCAUUUUCCAGUUAGAGCUG CCAGGAGAUGUCGUL~CCCCAAGCCUCUUCCUUG I F Q L E L G D V V G P Q A S L

1681 469

C U G G G A A U G A A A G C U A C A U G G G A G CUCI/UAAAAUUGUUACAGACCAGCUAAGGCGUUUCCAUGAGG C U A ~ G ~ U ~ G N E S Y M A L K I V T D Q L R F E A K Q D D A R V

1801 509

CCA~UUC~UGGCAUCAAGAUACCCAUAGAGUUCAUGUCGAUACACA~CGGUUUCUG K D S D G I I P I E F M S I H A V

1921 549

CACUUUCUCAUAUL'UCGCUCUCUAGUCCUG U U C U G A G U A U A A C A A U C C A G G U U U U U G C C A A G A A U G U C A A A G C C ~ C U A C A U G A U G U G G A G A U ~ C ~ C ~ C ~ A ~ L 5 H I S L 5 P V L S T I Q F A N V K A D Y M M W R S L T

2041 589

A U G C C A C ' J C U G C C U U C U G C A G U U G G U G A C A A U U U U G G C C G U U U G C G CACAAG C C A G A G U G A G A U C I K J G U C C A C t " J ~ ~ A ~ G ~ ~ ~ G A T L P S A G D N F G R L R T Q S I L S T S Q I L ~ L L T R A F G T

2161 629

CA~UG K V

2281 669

U G U A C C CUUUC UGG CGGGG C U C U C U U G U U C U A A C U U U U G A G A U U A A U U G U U C A G C C U C U A C A A G A G G A A ~ U U G A U A G U G U ~ G U G A C C C C ~ A ~ G ~ G ~ G ~ A Y A W R G L V L T F I N C 5 A S R G K L I V S V T K G G V A L G I

2400 700

2401 709

CAG C U U C G C A C C A G G G A U A U G G U G C A G A ~ U U U G A C C U U G G A A C U U ~ C U C C A C A A G A U C U U U C A ~ C ~ ~ U ~ A G U C ~ U A ~ U ~ U ~ U ~ U ~ A S H Q G Y A E F D L G T S S R S T M F V S T D ~

2520 748

2521 749

GUGCCUUUCAAGGAGUGUGGGACUGC CCAGUGGCAAACUUG CUCGUUUUGCAUCCCAUUACUAGCAUAGCAGAAUCAACGCCCUCCCUGGAUAUCAGAUGCUACCUACAUCCUGGGC~G A F ~ G V W C P V A N L L V L P I S I A E S T P S V D I R C Y L H G P D

2640 7~8

2641 789

A~CAA[~JAAGAGGAAGAAGGCACAUUGGD~`~JAAGAGCCGCUU~CC~U~G~C~GU~GUA~AG~U~C~U~C~ S

Q

V

D

F

S

2760 828

2761 829

IFJGAUGCCACGGAGGAGAG U G U G G U G G U G G C A G U U C C C U G U G C U C C A U G G U A U U O G A A G G ~ G A ~ G U C G A U U A C A C A ~ C ~ U C ~ G ~ D A T E E S V V A V P C A P W S K E V D Y T L L Q

N

?

L

H

C F T

2880 068

2881 869

UGUGGAGAGGAGACAUUGAAUACAGGUUUGUGGUGAAAGAGGAGGCUCUUGG CGAUGGAUGGCAGAG CcCAAUt"J~G~ W R G D I E Y R F V V K E E A L G D G Q S P I S V W

U N

C P

~ K

I

3000 908

3001 909

CAAAGAUAUCAAACAAGA~AUAAGUAAGG~CUUAC K I S N K K S K E T Y

R

L

3120 948

3121 949

G CAAGAUUCUAGAUACAACCAAAGACACAGC~.GGUGAUUCUACUUCUCCCUCAGUGACCCAAAUAA~UA~ K I L D T T D T A G D S T S P S V T I T Y T

G

U I

K

3240 988

3241 909

UUCCGA~CAG CAUCAAC-~CUA~UAUACUCC~GCCAGG~U~GUUUAGGCAUC~AG~C~A~AGC P K N S I K K L K V Y S K P G E N F F R

L

G

G

U ~ Y K

3360 1028

3361 1029

AGCCAUUC CA~L"JCGGUCC CCGAUGU~AUAACCCCAUCCA~GAAUCAUCCAAGA~GAACUUG~UUUAAG P F Q N S V P D V F I T P S K E S 5 K E L F

~ G ~ C C ~ G ~ G ~ V P ~ L A V

3480 1068

1481 1069

G A C A G G C U CAAGGGUUC43 ~ U C A A ~ U U A ~ C - ~ U U U U ~ t~JC~ CU C U C ~ A A A U i V J G A U G A A G A ~ G A ~ ~ Q A Q G L V K I K G F G S L W F D E E T

3601 1109

UCGACC CCAUACAAGAUG~GGAUGGG CA~GG~UCAAUGAGAUAGUUUCCUCGGUGUCAAC D P I Q D E W A K R K N E I S S S

3721 1149

UUUCAACUCUUUUUGAUGUUAUGAUAGGGA~GUGAGGGGUGUCCUUUCUUCUCUGGUAGACAG CAUAUCCGGAGCUUUU~UGUG~A~ S T L F D V I G K V R G V L S S L V S I G A F K M C L

K

F

N

Q

L

R

I

G

A

Q

A

S

I

Q

N

T

K

D

S

P

Y

S

L

A

A

D

R

G

R

H

I

G

L

R

A

A

S

R

L

K

~ G

F

V

A

A

~ A

Q

D

A U

A

C

~

G

P

A

P

V

A

S

T

D

T

G

A

1560 428

~FGUAGUGACAC~ UG CAGAGAG C U U U C U G U U U U U C A A G CGAUUUUGAGCUUAGUAUUCULKIUGA F D T M Q R A F C F S S D E L S L L T

A

~ G C A U G A U ~ ~ H D S N S H N

CAGCAAGAUACAC~ UGCUCGA~UGAUAAUUGCUGAGU~CUCUCCAUCCAAIk3AGUUCAC~AA~CC~ Q Q D T A R V I I A E F A H P S S R N V

L

F

~

S

CAUC~GUUEK/GUL'UGAUG H G K K C L

A

Q

A

S

D

P

G

V

T

L

~ H

A

L

~

C

F

A

1800 508

~ C ~ G ~ G ~ G U A G ~ A ~ C L S R E A R V T

P

1920 548

N

2040 588

H

~ G ~ L L

W

E

~

C

~

S

P

U 5

~ Q

G

A

~ S

U Q

L

Q

v

K

CC~GU~U~AGU~G P K V V E

S

A

A

L

N

G K

A M

~ A

P

S

~

~ Q

~

~ S ~

U ~ U M V

A

A

3720 1148

L

I

3840 I188

3841 1189

G CAUAUCCAUUUCUGCUGUUCUUGGGUAUUGUACUCUUAAGUUAGUGGAAAACUCUGUUCCCGAUG CACUGGGAAUUUUCAAGG CGCUCAUGAUGGUGG CUAUAA~UC~U~G~U I S I S A V G Y C T L K L V E S V D A G I F K A L M M V A I T S S

A

3960 1228

3961 1229

U G U A C U G G C C A A A A G C C G C A A U C U C A A U A ~ U C A C C A A G U A U G A G G A G CAG U U U A A A G A C A U A G A G A A ~ A ~ ~ C ~ A ~ G Y W P K A A S I V T K Y B E Q F K D ~ N C S T Y K

K

4081 1269

U G G A A G G C G C G A C C C C A G C U A A A G CAUG CG C C A C C A A U D U U C A A G A U U U A G C U C A C G G A A A G G C G C A G G C A G G A G G A A A G U C U U U U C U C G A G UUGG C A G G U C U C A U A G C G U A C A U A A ~ C E G A T P A A C A T N F E D L H G A Q A G G K F L E L A G L I A I R

4200 1308

4201 1309

UGUGCGt~UGUUCUGUGCAAGG C U A U G A A C A C U U C U ~ G G A G C V V L C K A M N T S F L E

~ C ~ U A G G ~ S

~

C

4320 1348

4321 1349

AAUUCAAGGAUUAUAUUUACAGGAUGAUAG UUGGGGGGAUAACAC CAACUUCUUCAUAUGUGAAAGUUUCUGGUCUUACUGGUU~CAUUAGA~C F K D Y I Y R M I V G G I T P T S Y K V S G L T F D I R E

U W

~ E

S

4440 1388

4441 1389

UCACUC~/UCAGG~CGCGUUACACACAGAU~GG G~GUGAUC~GAUCAAC T L Q E T R Y T Q M G 5 D E K I K

CAGAUUAGA~ C C ~ A ~ A U ~ U ~ U G U ~ U ~ ~ U A ~ C ~ C ~ I A 5 Y D K G V N V M G K T M I S

P

4560 1428

4561 1429

AUCUAUCAAGAGUUUGCC~GA~GUUUCCGAUUGUG CAAGGAGUUACU~ACGAAACGCACAGAUGUAAAGGAGCUAG~C~C~GAG~GAUC~AG~AUA~U L S R V C E S F L C K E L L E T R C K G A S S T R V D P H

G

4680 1468

4601 1469

CUCCUGGAGUUGGGAAGUCCUUUGUGAUGGGAAAGCUUL~GGAUGAUGUCCUAGAtr01~JAUGAGUGAACCC CAGG C G G A U A G A U G t r G A ~ C U C ~ U A ~ G ~ P G V G K S F V M G K L L D D V L D F S E p Q A D R C Y S K T N E R W S

4800 1508

4801 1509

GAUACAUAGGACAAACGG CAGUGAAAUGUGAUGAUCUUGGGCAGGACU~ UCGA~ CUUUUCUCCUAC CUAUAACCA~U~UA~GAU~GAC~~AUAG Y I G Q T A V K C D D L G Q D L S K G F S P T Y N Q I I Q M K T N

4920 1548

CCAUL~ACCCCCUCAAACAUGG~GC~UG P F T P S M E K Q

C C

A

A R

~ T

G V

~ ~

G

~ E

A

2280 668

~ C ~ U ~ A U C ~ U G ~ G ~ G S T S I L N A A R

CAAGCUAAUA~ K L I E

G V

M

~ E

T

~ Q

A

A

M

I

R

2160 628

G G

E

C L

I

~ G ~ U A T G V V

s

A

~ S

~ U A ~ ~ G ~ U N T Q L S K C

~ S

V

C

A

CAGUUGAAGAGUAUAGAUAUUGUUGCGGUUGAUGACA~A~C Q L K S I D I V A V D D R R F U~CC~C C ~ U H P P M S

T

D

M ~

W

V

G A

D

L

~

F

~ A

G

~ R

T

G L

S

C

1680 d68

V

~UC~GUG~GUGU~AUAG D K C L

A ~ I F

A G

A

G V

G

~ S

G

U V

~ F

E

N

~ K

~

V

C

~

~ D

A ~ T L G

E

S

F

3600 1108

Y

~

V

H

~

~C~A P M

4000 1268

3205

3206

A. D. Turnbull-Ross and others 4921 1549

UGGCC~GAUCU~CAAACAAAGGUAGAACCUUCACUAGUAAGUAUAUCUUUUCUACCACAAA~UUCCCGGAUG~~CC~U~G~ A D A N K O R T F T 5 Y I F T T N V P G C G T K

5041 1589

GGAACAUUUUCOUUGAGGUU~CAGA~GACAUGAUACCAGGAAG N I V B V T E G D M P G

S

5161 1629

G~U(3AAGUACG~.,C, A U I ~ J C C ~ G ~ C C G A ~ M K V D F L C V V A

C ~ U G V Y F

5281 1669

AGGACGUUAUUGCCAUACUL~AAGAGCUUGGUGAUGGUGUUGUGGA~/~UUCUAGAAAA~GAG D V A I L E E L G D G V E G [ L E K R

5401 1709

AACUGGAACCAGGAAAAGCACAGG CAAGUGUGUGUUUUAGUACAGAUGCUU~AAUCCUUUAAAGAACCCAUL"JG~G~G L E G K A Q A S C F T D A F G N P L K N P F V

5521 1749

C ~ K

5641 1789

UGAUGAG CAGCUUCUUCACAU~AUU~A~G~CU~U~ACAAGCAGGAGCAGGAA~UA~~UAU~UG~ M S F F T F I F G K N I Y K Q E Q E F L

C A

C R

CACC~UCACAUGAGG ~ T N H M R F U A U A U E T Q N

C~GAGAUG CCAGAUGACAUACUGACCAAGL'UUC~ CU~/C~AACGC~GAGC Q M P D D I L T K F A S L T L G B

U T U

P

A

U ~ E P

S

5280 1668

~ G U ~ A G ~ A U ~ U C C C C C C C ~ A ~ A ~ G L L S F G V M D P P P F D A I

5400 1708

U M

~ B

C

C

~

~ N

L

F

~ N

U Q

L

S

U D

~ E

U R

A I

U M

~

C

~ G T K

T

~ L

K

~ E

S

A

K

D

5

N

L

~ R

A L

~ V

C D

A

5760 1828

C H

~ A G ~ C L V L A

T

5880 1868

S

P

U U G

L

~ E

5881 1869

G CGCAG C A A A G A A G G G A A G A A U C A A U G G C A U U A G G G A A A G G C U U G C U A A U U C ~ U U C A ~ C A G U G C U A G A A U U C U ~ G ~ U ~ ~ C ~ G C C A A K G R I N G R E L A N W F T S A R L S N I

L

6001 1909

UGGUGtFJAGCCACGUCUGUUGGUUCAUUAUAUCUUGC~AAGGGUCUUUCUGGAAUAGGAAG CAUGAUACUUGG~A~ V L T S V G S L L A K G L S G I G S M L G F

A

K

E

6121 1949

UCUCUCUCAAUGCUCUUAUGGGACAAG CUAAAAGUAAAGGGAGGAAUUUCAUAACAUCUGGAGAUGAACUCACCACAAGAUUGUCCA~U~AGU~ S L A L M G Q A S K R N F I T S G D E T T R S R M S

R

A

S

6241 1989

GAGCUCAAGGUGGGCGCUCCCAUAUGGACACUUGUGAAGCGCUGCU~CUCGACAAGGACAAAUUACCAAUAUGGCGACAGGU~C~G~C~A~A~C A Q G R S H M D C E L L A R Q G Q I T M A T L H V A

T

D

L

6361 2029

U~GCUCCUCUUCAUACU~CUGGAGCUGAGAAGGACC~ACALroTJUCCGU~CAGAAUGGGGC~AUUAC~C~GA~C L A L H T F A G A E K D I F R F Q N G A Y Y

V

S

6481 2069

AUGCCUG CAUUAUUAGAACUGACGCCAUUCCCAUC~UCAUCAAUUGUUAGCAU~CUAAAGAG A C I R T D A I M K S I V S I F A K

6601 2109

GGAAGGUCCC~GGCGGGGAAUUCAUAAGCGAG K V F G G E F I E

6721 2149

UCAAAAC CGAGGAUGG C CAGUGUGG CUCUUGUDUGGUUAGCACAAGU~CAAACL~GAUGG~GUCUUCUGUUCGC~GUAG K T D G Q C G S L V T S D K L D G K V C 5 L

6841 2189

CCACUUAUGUUCCAAUCAC~UGACAUGAUUAAGA~GCAUCUCACUCUUAACA~ T Y P I T C D M K K I S L L T

6961 2229

A~CCAUCAAAGUL'GACCAGCUAUUL~CGAG CAAGCCAC-GAGCAUCAC~UUL~GUGUUUGGUGU~UAC~U~~ T I V D Q L F S K P A S G K F G V F G N D

7081 2269

C CACG CCCAAAUCAAUUACCAAGAG CACAAUUGUG CCUAGUUUGAUUCAACCAUAUAUGCCCAGGAAGCCAUUGACAGAGC~GC~UA~AUCC~UGU~A~AGA T P S I T K S T V P L I Q p y M P R K L T E A I D P D

7201 2309

ACCGAUAUGACCCUAUGAUUGAUGGAAUCAAGAAAUAUGAGGAACAAG C CAGACCAAUCAAGAUUAG CUGC~/~UCAAAUCAUUGAAUCCAUGGCAGCC~UG~A~ R Y P M I D G I K Y E Q A R P I K I 5 W R N Q I E M A A Q M

7321 2349

CUUUUAUGGUGAGAGAAGGGUAUAUGACCAUGGAUCL"JCCAAUGAGUGUUGUUAU CRAUGGGAUUGACGGUGUUGAGUAUUAUGAGC CA~G~UA~UC F M R E G Y M T D L M S V V I N G I D G V E Y Y E P N M

7441 2389

UGAUACUCAACAGG CC U ~ A G A C G C A C A [ ~ A G U A U C ~ G A C A A U G G ~ U C A G G G G ~ U ~ C ~ G ~ C C ~ A ~ G ~ A ~ I L R P K D A H G K R L F E T M E S G E R R I K S A K E A Y E S Y G

H

7561 2429

C U C L ~ J C ~ G U A C A G A O C C A t ~ C C C U C U U A U C U G C A U U G A G U G U CC C A ~ G A U C , A G A G A A G A G C C U U G G A ~ A ~ C C L Q T E P F P L C I C P K D E R R A L K I

K

F

A T

7681 2469

AGUUCAAUAUGCAUGCUAC~GGCUU~CCUGGACUUCAACGUCUL~GUUAUGG CCAAUAGACAUAAGCAUGGUA~A~U~UC F N H A R R L F D F V F V M A N R H K G I M G

I

P

7801 2509

CUAUAUCUCUUG C CUCU~CUCCCCUUACC, GAUUUAAC~ C G A C ~ J U C G C C A A U L ~ U G G ~ U G W C ~ C C C ~ ~ U ~ A G ~ I S A S F S P Y F N D F A N F D G M F H P 5 S 5 M S E

A

7921 2549

ACUUUCUUUCUACAGAGAGGGACAACUUGACUAGGAUGUUAACCAAUAGAUUUUCACUUAUGAAGGGAGCUAUUCUCAGAGUUCCU~A~A~A~C~GUGA F L T E R D N L T R M T N R F S L M K G A I L R V P G G P

G

8041 2589

UAUUUAADUCU~AUUAAUUUAUUCUAUDUGCAGAGUGCGUGGAUAAUGUUGGCUAGGUUUAAU~UA~AUAU~GU~C F N F I N L F Y Q S W I M L A R F N G R Q D ~

P

N

F

8161 2629

UUUAUGGGGAUGAUAACA5-dGUGGCUAUCAAGAUGGAGGUUUUA~CUUGGUACAAUCUUCAGACUG~G~G~A~AGU~AUGA~U~ Y G D N I V A I M E L P W Y N L Q T V S E A L F D Y

G

V

8281 2669

ACAAAG CUUCAC~GCGAAACCAUAUGGG~GAUCCUUGAAU~AU~CUCAAGAGACAUUUCAAAG K A E A K P Y G I L F D F K R H F

8401 2709

DUGAGGAACAGGUUUAUUGGAUUA E E V Y W r R

8521 2749

AAGAUCA~UA~G D Q K

8641 2789

UCCCUAG CCAUGUAGGUUUG CUUAAAC~G C CACCAAGAACCAU~CUGC~CAC~ P S V G L L K E A T K H F S A L

8761 2829

AACAUGG~CALtGCAGCA~UCCUUCCUA~AUAUCUUUA~/AGG UCCAACAAGAAUC~CC~G H G N M Q Q L P N I F G P T [ F T

8881 2869

AAACCAGAUAUGGUGUGAAGCAUGGGAUCCA~GCUUG UCCAAGC CCGAUUUCACGUACAUUUCUGAAAGCUUACCAUGUUUGACGACCCCAAA~A~G~AUC T R Y G V K G I Q S L K P D T Y S E S L P C L T T N F M

9001 2909

UUGGAG~UUGGCCCUAG G G E L A L

K

A

E

_ G

G C A I E

~

E

N

L

B

L

L

B

G

A

~

G

L

~

N

G T

E

K

~ W L V

~

P A

~

U M

G

U

C

5640 1788 ~

M

L

6000 1908

G F

A E

M

6120 1948

L

A

T

G

6240 1988

G

G

G

F

6360 2028

U A ~ S E Y

6490 2068

~ U E D

~

5520 1748

K

I

~

D

C

~

C~G ~ Q L

G A G ~ ~ C S A H F V

P

~ C S S

P

6720 2148

CAGGGACUUACGACAGGGUAA~UA~UGU G ~ D R V T G K Y

V

6840 2188

~ G ~ A ~ G C ~ G U ~ G ~ C C C ~ U ~ C ~ A U A C ~ G C C G E F E S Q S I D S P I S D T V A

6960 2228

C A G A C U G U G G C U A A G A C ~ U A G C ~ G ~ U A ~ A U ~ A U A ~ Q V A K R I A S F K Y F M D E K L M

tr~C G

F

A ~ U L I

T

U G

E

C Q

F

V

~

~ G

G

U

A

C L

M

~

A R

UGACCACAACCACGAAAGGGGACAAGAAGAUCCUGAGUUUUGCUAAUAUUUAUGAUAACCAAGCAUL~CAG T T T K G D K K L S A N r ¥ D N Q A F Q

M

~

C E

5761 1829

G

~

C F

D

G

A

~ D

I

G

~

A R

H

~

G

A G C ~ N Q E

~

R

T

~

F

~ U A ~ F G Y

U

A

L

C

A

5160 1628

C ~ K Y

~

G

R

R

V

P

5040 1598

R

~

D

R

M

~

A

A V

P

G T

G

~ U ~ U C ~ U L L N P L N

K

C

H

K

A

A

~

A

A

U ~ U C N G W

~ T

G

I

D

~ Y

S

H

V

V

G

G

~

~ C

W ~

R

T

C F

C ~ P E

V

R

L

G

E

Q

D

W

E

T

~

~ ~ H

U

C

T

A

~ F

S

I

~ L

P

V

~ R

U E

~ W

A S

G D

L

A

7800 2508

C ~ U ~ A U ~ N I F Y G

N

7920 2548

F

P

8040 2588

CC~GUAUG~GAG~UG K Y V R A

C

V

8160 2628

A

K

N

8280 2668

R

S

T

D

~C~WG P L

~ H

M

76~0 2468

V

M

~

UA K

Q

8880 2868

D

P

I

9000 2908

C A A G A C U D U ~ C ~ U G ~ U ~ G ~ U A ~ U A U ~ G ~ A C K T F T M F M R H I Q W K V L

Q

9120 2948

9121 2949

AGGCUUACL"JCAGAGUGUGUGAGACCUUUGUCAG CAAAGAGUGGCAAAACUUCAAG CGUGACAUAAAGAGG C U U A G U C A A G A C G A U G ~ ~ A ~ C C C ~ U A ~ A Y F R V C T F V S K W Q N K R I K R L S Q D D V G C S T T P C G

R

9240 2988

~241 2989

GA~ F

9361

GAU~UGLrt~GUCU~AGAGGCUCCUUCAGCCCUUCAUGG~UUUA~C~GUUAAUAC,

9481

GUCAUCUACUCC-GCUCAGUCAAGCUAUCGAC~A~GGUAGCGACCCCACUCCCC~

CCACAG C~GUGCCUAUUACACGCAGCUC~U~UAAAUAC T A L C L H A A ~ I N

~ACCU~UG~CAGUUACC L T L D G L

P

T

G ~ G ~ C S L F N

CCAACACAUUAAAUCUCUAGACAAGAUAGAUUUCA~GACUAGGCG ~ Q H I S L D K I D F K K T R R I GGUUAGUAUA~

U K

~ G I A

~

~

G

I

D

V

C

~

~

~ D ~

~ E C

UGcUUGUCUCUUGUCUAACGCUUACC~UUGCAAUU~GGAAUCCU~CUAGAU~CUUGAUUGGACCGAAGACAGCGAUGGUGGU~UG~UAU~C~A

9721

ACA~G UC CAC C~CAG

9841 ~

c

~

G

~

U

~

u

~

C U A C U U A G G C U U A A C CUA~AGGGUG U U U U A U A G UAUAGA[~/UUAUUAUUAUUAG U~YJAG U A U ~ A ~ C

9971

U G

~ ~ A N G

~ N

U

G S

~ L

V

C

C

U

C~

~ F

C

I

A I

~ I

~ C ~ C ~ C

C U ~ A A G G A G C U U C A C ~ ~ G W ~ C ~ C ~ A ~

9601

UAG U~CUCA

~

8640 2788

G

~ N

C

L

C K

A G

A

~CCU~U~C T S S

8760 2828

U Y

~

A

K

K

G A E

A G E V

8520 2748

T

G

~ F

U

8400 2708

~ T

T

CAGGAAGA~UAAA~ ~ A Q E E I K A I

~ L

7560 2428

"T

CGGAUGAACU~UAC ~UC C ~ U G D E L I P S F H

UG U A ~ U ~ G ~ G A C C R Q R W L Q D

7320 2348 7440 2388

.GCAAACAUUC~%AAAUGCGCUGUACGAAGCUCACCAUCAUGGACGGGAAUAUUAUGAC-GAAUUGA A N ~ E N A L Y E A H H G E Y Y E L

CAAUC~UAGAGCUGG CUACAUGAGU~GUGGCUCCCUCAUAUCUUAUG M N R A G Y S F V A P S ¥ L M

7090 2268 7200 2308

~ G ~ U A U C ~ T S E G Y P

R

6600 2108

9360 3027

D C

9480 9600 9720

~ ~ U

~UC ~

~

9840

Nucleotide sequence of P YFV RNA

1 2345

(a)

6

,027 i

i

' - 1866

,

III/"

87K

li;

58K

i l ' l l ,



-

I

J

i

i

"I

I

i

i

i

I

i

,

I

I

(b) [

/c

3207

1

1

1

~027 • 2264

III/ .,'

92K

II,"

72K

",

i

i

]

"I

I

I

I

i

i

i

i

(c) '

'

'

'

L

'

'

i

'

'

'

1

1027 12179

'-'

• Ill

3D .

Fig. 2. Sequencing of PYFV RNA labelled with pCp. Samples of [32p]pCp 3' endqabelled PYFV RNA were treated with alkali for 10 min (lane 1) or 5 rain (lane 2), or with RNases T1 0ane 3), U2 0ane 4), Phy M (lane 5) or Bacillus cereus (lane 6). The deduced sequence is shown on the right.

The STEMLOOP or FOLD programs did not reveal any stable secondary structures in the 5' UTR of PYFV, and this is typical for plant viruses (Lommel et al., 1988). However, a short repeat, U C U C U Y Y , occurs six times in the 5' UTR of PYFV (starting at nts 45, 71,127, 225, 236 and 262) and U C U C U U G occurs once (starting at nt 57).

I1.,

2C

VP3

I ... I

i

I

I

I

,

~

,

,

(a) I

.....

.I

1

3027 _'2227

"I'H~ ' "

3D

II.,.

2C

Comparison of the putative P Y F V polyprotein with those of other viruses

The derived polypeptide sequence of PYFV was compared to those of CPMV (Lomonossoff & Shanks, 1983; van Wezenbeek et al., 1983), TBRV (Meyer et al., 1986; Greif et al., 1988), tobacco etch potyvirus (TEV; Allison et al., 1986), poliovirus (Nomoto et al., 1982), hepatitis A virus (HAV; Najarian et al., 1985), human rhinovirus 14 (HRV14; Stanway et al., 1984), foot-and-mouth disease virus (FMDV; Forss et al., 1984), bovine viral diarrhoea

i i l i l ' l

I

i

i,

I

,

J

'

i

l

Fig. 3. Dot plot comparisonofthe putative PYFV polyprotein with (a) CPMV 200K (Lomonossoff & Shanks, 1983), (b) TBRV 254K (Greif et al., 1988), (c) HRV14 (Stanway et aL, 1984) and (d) HAV (Najarian et al•, 1985) polyproteins. The COMPARE (window=30, stringency = 17) and DOTPLOT programs from the GCG package were used. I, II and III denote the three regions of polyprotein sequence similarity between PYFV and the other viruses detected in the comparisons.

Fig. 1. Nucleotide sequence of PYFV. Sequence variation was found in some c D N A clones at positions 3163 (C to U), 4395 (C to U), 6149 (U to A) and 9350 (U to C). The deduced amino acid sequence of the putative polyprotein is shown beneath the nucleotide sequence• The termination codon is indicated by an asterisk•

3208

A. D. Turnbull-Ross and others

pestivirus (BVDV; Collett et al., 1988) and yellow fever flavivirus (YFV; Rice et al., 1985). The dot matrix patterns showed no extensive regions of similarity between the polyprotein of PYFV and those of BVDV, YFV or TEV. The most extensive regions of similarity to the PYFV polyprotein were in the 200K polyprotein of CPMV (Fig. 3a, regions II and III) and the 254K polyprotein of TBRV (Fig. 3b). These regions of similarity were less extensive in picornaviruses (Fig. 3 c and d). A third small region of similarity was detected in the comparison of the PYFV polyprotein with that of HRV14 (Fig. 3c, region I). The dot matrix comparisons also showed that the PYFV polyprotein contains a region at the C terminus that does not correspond to any region in picornavirus polyproteins (Fig. 3c and d). Structural proteins

Recombinant 2ZAPII clones containing random-primed PYFV c D N A were screened with an anti-PYFV antiserum. A number of positively reacting clones were selected and excised in vivo to yield pBluescript clones. The largest clone selected in this way contained an insert which started at nt 1042 and terminated at nt 3165. This produced an in-frame fusion protein linked to the flgalactosidase protein encoded by pBluescript, with the PYFV moiety beginning at amino acid 256 and ending at amino acid 963. The PYFV polyprotein had no amino acid sequence similarity with the particle proteins of CPMV, TBRV or TEV. However, similarity was found between the PYFV polyprotein and HRV14 VP3 (Fig. 3c, region I). VP3 is the most highly conserved particle protein among picornaviruses (Acharya et al., 1989) and sequence conservation has been noted between HRV14 VP3 and regions of the capsid proteins of southern bean mosaic sobemovirus, tomato bushy stunt tombusvirus, carnation mottle carmovirus and turnip crinkle carmovirus (Vingron & Argos, 1991). The sequence similarity between PYFV (or HRV14) and these plant viruses is somewhat less than that between PYFV and HRV14.

Non-structural proteins

The conservation of domains in the non-structural proteins of R N A viruses is well documented (Argos et al., 1984; Haseloff et al., 1984; Goldbach, 1986, 1987) and two of these domains were found in PYFV by dot matrix comparison. The longest regions of similarity corresponded to the 87K protein of CPMV (WeUink et al., 1986) and the 92K protein of TBRV proposed by Greif et al. (1988) (Fig. 3a and b, region III). The similarity in this region between PYFV and the 3D

(RNA polymerase) proteins of picornaviruses was weaker but still readily detectable (Fig. 3 c and d, region III). The extent of the similarity between PYFV and the NTP-binding proteins of CPMV (58K), TBRV (72K) and picornaviruses (2C) was similar for all the comparisons (Fig. 3a to d, region II). The conserved domains ( G / A ) X X G X G K ( S / T ) and D(D/E) were found in the polyprotein sequence of PYFV (amino acids 1467 to 1474 and 1518 to 1519).

Discussion The particle properties of PYFV have led to speculation that it is more closely related to animal picornaviruses than any other plant virus described to date. We have determined the complete sequence of PYFV R N A (isolate P-121) and found that the genome organization shows both similarities to and differences from those of picornaviruses. The genome of PYFV is a ssRNA, 9871 nts in length, and contains a large ORF encoding 3027 amino acids. The presence of a polyprotein is consistent with the array of large proteins produced during in vitro translation of PYFV R N A (M. A. Mayo & A. D. Turnbull-Ross, unpublished data). The length of the 5' UTR, the absence of stable secondary structure and the probable use of the first A U G codon from the 5' end of the R N A suggest that ribosomal initiation occurs by a scanning mechanism. This differs from initiation in picornaviral RNAs where an internal ribosomal entry site is used (reviewed by Jackson et al., 1990). In PYFV isolate P-121 there are three distinct particle proteins (31K, 26K and 22.5K) compared with the four found in most picornaviruses. Antibodies in a polyclonal antiserum to PYFV particles reacted with the translation products of transcripts of clones from the 5' half of the genome, suggesting that this region contains some of the coat protein epitopes. The largest reactive clone encoded PYFV amino acids 256 to 963. In addition, the dot matrix comparison showed similarity between the PYFV polyprotein (amino acids 653 to 798) and VP3 of HRV14. The location of the coat proteins within the polyprotein of PYFV is similar to that found in picornaviruses. However, it is not known whether any non-structural proteins are present on the amino-terminal side of the structural proteins of PYFV, as occurs in the polyproteins of comoviruses and nepoviruses (van Wezenbeek et al., 1983; Meyer et al., 1986). Several alignments have been made between sequences of viral NTP-binding proteins (Gorbalenya & Koonin, 1989; Candresse et al., 1990). It has been suggested that the potyviruses form an outlying group

Nucleotide sequence o f P Y F V R N A

only distantly related to the picornavirus/comovirus/nepovirus cluster; this conclusion is based on the spacing of the conserved domains (G/A)XXGXGK(S/T) and D(D/E) and the presence of other conserved domains (Gorbalenya et al., 1988, 1989). The spacing between the domains in PYFV suggests that it belongs to the picornavirus/comovirus/nepovirus cluster rather than the potyvirus/flavivirus/pestivirus cluster. Sequence conservation in the RNA polymerase domain of RNA viruses has been used to produce several alignments and phylogenetic trees (Kamer & Argos, 1984; Poch et al., 1989; Candresse et al., 1990; Koonin, 1991). The strongest region of homology shown by the PYFV polyprotein was with the RNA polymerase protein of CPMV (39.7~ identity over 280 amino acids; Fig. 3 a, region III). The carboxy-terminal location of the putative polymerase of PYFV is analogous to the location of the polymerase in the polyproteins of picornaviruses, comoviruses and nepoviruses and differentiates PYFV from potyviruses in which the particle protein occupies the carboxy-terminal position in the polyprotein (Allison et al., 1985). In addition, the results of dot matrix comparisons showed that the putative polymerase of the PYFV polyprotein has a carboxyterminal region with no counterpart in picornaviruses. A carboxy-terminal extension of the polymerase protein sequence is also found in CPMV and TBRV, but there is no sequence conservation among the plant viruses in this region (Fig. 3a and b). The 3' UTR (509 nts) is longer than that of comovirus, nepovirus, potyvirus or picornavirus RNAs (35 to 350 nucleotides). In addition no evidence was found for a poly(A) tail in PYFV RNA as is normal for the RNA of picorna-like viruses. However, most of the oligo(dT)primed clones were co-terminal and sequence analysis showed that the clones originated in an A-rich sequence of the 3' UTR (nts 9555 to 9568). This sequence may have been responsible for the apparent binding of PYFV RNA to oligo(dT) columns reported previously (Murant, 1988). We could not detect binding of PYFV RNA to an oligo(dT) column under similar conditions, although we did detect binding of TBRV RNA. Analysis of potential secondary structures in the 3" UTR of PYFV revealed a possible stem-loop structure near the 3' terminus (Fig. 4). The free energy (AG°37) of the structure in PYFV, calculated by the method of Freier et al. (1986), is similar to those of the 3' UTR stem-loops of mosquito-borne flaviviruses (Grange et al., 1985; Brinton et al., 1986; Wengler & Castle, 1986; Hahn et al., 1987) and tick-borne encephalomyelitis virus strain Hypr (Mandl et al., 1991). There are differences in the predicted structures of the loops and PYFV has a short 'tail' which is not found in flaviviruses. None of the conserved sequence motifs identified in mosquito-borne

3209

AGU U

U

U A

U UAG AUU

U

U UAU

AUCUAU U U UAGAUAU C G-C A -U

U A _U C U-A U-A U A U

U G-C U-A G.U G-C G-C A-U A-U U .G C C-G C-G

AA _ jA U.G U.G C-G G C

U.G U-A C -G G A

5 ' -U A G U U U U C U C

A U-AA C-G A-U AAAUAUUAAAUAAGGCoH

Fig. 4. A possible secondary structure for the extreme 3'-terminal nucleotide sequenceof PYFV,

flaviviruses (Hahn et al., 1987) were found in the 3' UTR of PYFV. The structural protein coding regions of PYFV are positioned towards the 5' end, and those of the nonstructural proteins at the 3' end, of the genome which is reminiscent of the organization in picornaviruses. Sequences in the part of the polyprotein containing PYFV coat proteins were similar to sequences in VP3 of HRV14, but no sequence similarity was detected to the coat proteins of CPMV. In contrast, the similarities in the RNA polymerase domain were greater between PYFV and CPMV than between PYFV and picornaviruses. PYFV thus has characteristics which place it taxonomically between picornaviruses and comoviruses. However, it is distinct from both in lacking a poly(A) tail. The unusual combination of particle properties and the genome organization described in this paper justify placing PYFV in a new taxonomic group. It remains to be seen whether other possible members of the group [PYFV (Anthriscus serotype) and dandelion yellow mosaic virus (Murant, 1988)] share the same genome organization. In particular it will be of interest to determine whether PYFV isolate P-121 is typical of the group in having a long 3' UTR with the potential to form a stem-loop structure. This work was supported by a grant from the Scottish Office Agricultureand FisheriesDepartment(SOAFD)under the Increased FlexibilityScheme.We are gratefulto the SERCDaresburylaboratory for use of computingfacilities.

3210

A. D. Turnbull-Ross and others

References ACHARYA, R., FRY, E., STUART,D., FOX, G., ROWLANDS,D. & BROWN, F. (1989). The three-dimensional structure of foot-and-mouth disease virus at 2-9/k resolution. Nature, London 337, 709-716. AGOL, V. I. (1991). The 5'-untranslated region of picornaviral genomes. Advances in Virus Research 40, 103-180. ALLISON, R. F., SORENSON, J. C., KELLY, M. E., ARMSTRONG, F. B. & DOUGHERTY, W. G. (1985). Sequence determination of the capsid protein gene and flanking regions of tobacco etch virus: evidence for synthesis and processing of a polyprotein in potyvirus genome expression. Proceedings of the National Academy of Sciences, U. S. A. 82, 3969-3972. ALLISON, R., JOHNSTON, R. E. & DOUGHERTY, W. G. (1986). The nucleotide sequence of the coding region of tobacco etch virus genomic RNA: evidence for the synthesis of a single polyprotein. Virology 154, 9-20. ARGOS, P., KAMER, G., NICKLEN, M. J. H. & WINNER, E. (1984). Similarity in gene organization and homology between proteins of animal picornaviruses and a plant comovirus suggest common ancestry of these virus families. Nucleic Acids Research 12, 72517267. BIRNBOIM, H. C. & DULY, J. (1979). A rapid alkaline extraction procedure for screening recombinant plasmid DNA. Nucleic Acids Research 7, 1513-1523. BRINTON, M. A., FERNANDEZ, A. V. & DISPOTO, J. H. (1986). The 3'nucleotides of flavivirus genomic RNA form a conserved secondary structure. Virology 153, 113-121. BULUWELA, L., FORSTER, A., BOEHM, T. & RABBITTS,T. H. (1989). A rapid procedure for colony hybridisation using nylon filters. Nucleic Acids Research 17, 452. CANDRESSE, T., MORCH, M. D. & DUNEZ, J. (1990). Multiple alignment and hierarchical clustering of conserved amino acid sequences in the replication-associated proteins of plant RNA viruses. Research in Virology 141, 315-329. Cm, H.-C., HSIEH, J.-C. & TAM, M. F. (1988). Modified method for double stranded DNA sequencing and synthetic oligonucleotide purification. Nucleic Acids Research 16, 10382. COLLETT,M. S., EAR.SON,R., GOLD, C., STRICK, D., ANDERSON, D. K. & PURCHIO, A. F. (1988). Molecular cloning and nucleotide sequence of the pestivirus bovine viral diarrhea virus. Virology 165, 191-199. DEBORDE, D. C., NAEVE, C. W., HERLOCHER, M. L. & MAASSAB,H. F. (1986). Resolution of a common RNA sequencing ambiguity by terminal deoxynucleotidyl transferase. Analytical Biochemistry 157, 275-282. DEVEREUX, J., HAEBERLI, P. & SMITHIES, O. (1984). A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Research 12, 387-395. DONIS-KELLER, H., MAXAM, A. M. & GILBERT, W. (1977). Mapping adenines, guanines and pyrimidines in RNA. Nucleic Acids Research 4, 2527-2538. ELNAGAR, S. & MURANT, A. F. (1976). Relations of the semi-persistent viruses, parsnip yellow fleck and anthriscus yellows, with their vector Cavariella aegopodii. Annals of Applied Biology 84, 153-167. ENGLAND, T. E., BRUCE, A. G. & UHLENBECK, O. C. (1980). Specific labeling of 3' termini of R N A with T4 RNA ligase. Methods in Enzymology 65, 65-74. FORSS, S., STREBEL, K., BECK, E. & SCHALLER, H. (1984). Nucleotide sequence and genomic organization of foot-and-mouth-disease virus. Nucleic Acids Research 12, 6587 6601. FREIER, S. U., KIERZEK, R., JAEGER, J. A., SUGIMOTO,N., CARUTHERS, M. H., NEILSON, T. & TURNER, D. H. (1986). Improved free-energy parameters for predictions of RNA duplex stability. Proceedings of the National Academy of Sciences, U.S.A. 83, 9373-9377. GALLIE, D. R., SLEAT,D. E., WATTS, J. W., TURNER, P. C. & WILSON, T. M. A. (1987). A comparison of eukaryotic viral Y-leader sequences as enhancers of mRNA expression in vivo. Nucleic Acids Research 15, 8693-8711. GELIEBETER, J. (1987). Dideoxynucleotide sequencing of RNA and uncloned eDNA. Focus 9, 5-8. GOLDBACH, R. W. (1986). Molecular evolution of plant RNA viruses. Annual Review of Phytopathology 24, 289-310:

GOLDBACH, R. (1987). Genome similarities between plant and animal RNA viruses. Microbiological Sciences 4, 197-202. GORBALENYA,A. E. 8£ KOONIN, E. V. (1989). Viral proteins containing the purine NTP-binding sequence pattern. Nucleic Acids Research 17, 8413-8440. GORBALENYA, A. E., KOONIN, E. V., DONCHENKO, A. P. & BLINOV, V. M. (1988). A conserved NTP-motif in putative helicases. London, Nature 333, 22. GORBALENYA, A. E., KOONIN, E. V., DONCHEt,r~O, A. P. & BLINOV, V. M. (1989). Two related superfamilies of putative helicases involved in replication, recombination, repair and expression of DNA and RNA genomes. Nucleic Acids Research 17, 4713-4730. GRANGE, T., BOULOY, M. & GIRARD, M. (1985). Stable secondary structure at the 3' end of the genome of yellow fever virus (17D vaccine strain). FEBS Letters 188, 159-163. GREIF, C., HEMMER, O. & FRITSCH, C. (1988). Nucleotide sequence of tomato black ring virus RNA- 1. JournalofGeneral Virology 69, 15171529. HAHN, C. S., HAHN, Y. S., RICE, C. M., LEE, E., DALGARNO, L., STRAUSS,E. G. & STRAUSS,J. H. (1987). Conserved elements in the 3' untranslated region of flavivirus RNAs and potential cyclization sequences. Journal of Molecular Biology 198, 33-41. HANAHAN, D. (1983). Studies on transformation of Escherichia cob with plasmids. Journal of Molecular Biology 166, 557-580. HASELOFF, J., GOELET, P., ZIMMERN, D., AHLQUIST, P., DASGUPTA,R. & KAESBERG,P. (1984). Striking similarities in amino acid sequence among nonstructural proteins encoded by RNA viruses that have dissimilar genomic organization. Proceedings of the National Academy of Sciences, U.S.A. 81, 4358-4362. HEMIDA, S. K. & MURAN'r, A. F. (1989a). Particle properties of parsnip yellow fleck virus. Annals of Applied Biology 114, 87-100. HEM1DA, S. K. & MURANT, A. F. (1989b). Host ranges and serological properties of eight isolates of parsnip yellow fleck virus belonging to the two major serotypes. Annals of Applied Biology 114, 101109. HUYNH, T. V., YOUNG, R. A. & DAVIS, R. W. (1985). Constructing and screening eDNA libraries in 2gtl0 and 2gtl 1. In DNA Cloning, A Practical Approach, vol. 1, pp. 49-78. Edited by D. M. Glover. Oxford: IRL Press. JACKSON, R. J., HOWELL, M. T. & KAMINSKI, A. (1990). The novel mechanism of initiation of picornavirus RNA translation. Trends in Biochemical Sciences 15, 477-483. KAMER, G. & ARGOS, P. (1984). Primary structural comparison of RNA-dependent polymerases from plant, animal and bacterial viruses. Nucleic Acids Research 12, 7269-7282. KOONIN, E. V. (1991). The phylogeny of RNA-dependent RNA polymerases of positive-strand R N A viruses. Journal of General Virology 72, 2197-2206. KOZAK, M. (1987). At least six nucleotides preceding the A U G initiator codon enhance translation in mammalian cells. Journal of Molecular Biology 196, 947-950. LOMMEL, S. A., WESTON-FINA, M., XIONG, Z. & LOMONOSSOFF, G. P. (1988). The nucleotide sequence and gene organization of red clover mosaic virus RNA-2. Nucleic Acids Research 16, 8587 8602. LOMONOSSOFF, G. P. & SHANKS, M. (1983). The nucleotide sequence of cowpea mosaic virus B RNA. EMBO Journal 2, 2253-2258. LUI'CKE, H. A., CHOW, R. C., MICKEL, F. S., MOSS, K. A., KERN, H. F. & SCEELE, G. A. (1987). Selection of A U G initiation codons differs in plants and animals. EMBO Journal 6, 43-48. MANDL, C. W., KUNZ, C. & HEINZ, F. X. (1991). Presence of poly(A) in a flavivirus: significant differences between the 3' noncoding regions of the genomic RNAs of tick-borne encephalitis virus strains. Journal of Virology 65, 4070~077. MANIATIS, T., FRITSCH, E. F. & SAMBROOK, J. (1982). In Molecular Cloning: A Laboratory Manual, pp. 464-466. New York: Cold Spring Harbor Laboratory. MEYER, M., HEMMER, O., MAYO, M. A. & FRITSCH, C. (1986). The nucleotide sequence of tomato black ring virus RNA-2. Journal of General Virology 67, 1257-1271. MURANT, A. F. (1988). Parsnip yellow fleck virus, type member of a proposed new virus group, and a possible second member, dandelion yellow mosaic virus. In The Plant Viruses, vol. 3. Polyhedral Virions

Nucleotide sequence o f P Y F V R N A

with Monopartite RNA Genomes, pp. 273-288. Edited by R. Koenig. New York: Plenum Press. dURAN'r, A. F. (t991). Parsnip yellow fleck virus group. In

Classification and Nomenclature of Viruses. Fifth Report of the International Committeeon Taxonomy of Viruses, pp. 318-319. Edited by R. I. B. Francki, C. M. Fauquet, D. L. Knudson & F. Brown. Vienna: Springer-Verlag. VlURANT, A. F. & GOOLD, R. A. (1968). Purification, properties and transmission of parsnip yellow fleck, a semi-persistent, aphid-borne virus. Annals of Applied Biology 62, 123-137. VlURANT, A. F., HEMIDA, S. K. & MAYO, M. A. (1987). Plant viruses that resemble picoruaviruses. In Abstracts of the 7th International Congress of Virology, Edmonton, Canada, 1987, p. 183. N[AJARIAN, R., CAPUT, D., GEE, W., POTTER, S. J., RENARD, A., MERRYWEATHER, J., VAN NEST, G. & DINA, D. (1985). Primary structure and gene organization of human hepatitis A virus. Proceedings of the National Academy of Sciences, U.S.A. 82, 26272631. NATSUAKI, T., MAYO, M. A., JOLLY, C. A. & MURANT A. F. (1991). Nucleotide sequence of raspberry bushy dwarf virus RNA-2: a bicistronic component of a bipartite genome. Journal of General Virology 72, 2183-2189. NOMOTO,A., OMATA,T., TOYODA, H., KUGE, S., HORIE, H., KATAOKA, Y., GENBA, Y., NAKANO, Y. & IMURA, N. (1982). Complete nucleotide sequence of the attenuated poliovirus Sabin 1 strain genome. Proceedingsof the National Academy of Sciences, U.S.A. 79, 5793-5797. POCH, O., S#.UVk.GET, I., DELARUE, M. & TORDO, N. (1989). Identification of four conserved motifs among the RNA-dependent polymerase encoding elements. EMBO Journal 8, 3867 3874. RICE, C. M., LENCHES, E- M., EDDY, S. R., SHIN, S. J., SHEETS, R. L. &

3211

STRAUSS, J. H. (1985). Nucleotide sequence of yellow fever virus: implications for flavivirus gene expression and evolution. Science 229, 726-733. SANEYOSHI, M., OHASH1, Z., HARADA, F. & NISHIMURA, S. (1972). Isolation and characterization of 2-methyladenosine from Escherichia coli tRNA2 TM,tRNAI Asp, tRNA1 His and tRNA arg, Biochimh:a et biophysica acta 262, 1-10. SANGER, F., NICKLEN, S. & COULSON, A. R. (1977). DNA sequencing with chain-terminating inhibitors. Proceedings of the National Academy of Sciences, U.S.A. 74, 5463-5467. STANWAY, G., HUGHES, P. J., MOUNTFORD, R. C., MINOR, P. D. & ALMOND, J. W. (1984). The complete nucleotide sequence of a common cold virus: human rhinovirus 14. NucleicAcids Research 12, 7859-7875. VAN WEZENBEEK, P., VERVER, J., HARMSEN, J., VOS, P. & VAN KAMMEN, A. (1983). Primary structure and gene organisation of the middle component RNA of cowpea mosaic virus. EMBO Journal 2, 941-946. VINGRON, M. & ARGOS, P. ( 1991). Motif recognition and alignment for many sequences by comparison of dot-matrices. Journalof Molecular Biology 218, 33-43. WELLINK, J., REZELMAN, G., GOLDBACH,R. & BEYREUTHER,K. (1986). Determination of the proteolytic processing sites in the polyprotein encoded by the bottom-component R N A of cowpea mosaic virus. Journal of Virology 59, 50-58. WENGLER, G. & CASTLE, E. (1986). Analysis of structural properties which possibly are characteristic for the T-terminal sequence of the genome RNA of flaviviruses. Journal of General Virology 67, 11831188.

(Received 29 June 1991; Accepted 18 August 1992)