Nucleotide Sequence of Yellow Fever Virus

2 downloads 0 Views 2MB Size Report
fever was spread by ship to ports as far north as Boston and as far east as En- gland, where mortality rates in an epi- demic could exceed 20 percent of those.
RESEARCH ARTICLE

Nucleotide Sequence of Yellow Fever Virus: Implications for Flavivirus Gene Expression and Evolution Charles M. Rice, Edith M. Lenches, Sean R. Eddy Se Jung Shin, Rebecca L. Sheets, James H. Strauss

The Flavivirus genus, family Flaviviridae, consists of a group of some 70 closely related human or veterinary pathogens causing many serious illnesses, including dengue fever, Japanese encephalitis, St. Louis encephalitis, Murray Valley encephalitis, tick-borne encephalitis, and yellow fever (1). Most

fever was spread by ship to ports as far north as Boston and as far east as England, where mortality rates in an epidemic could exceed 20 percent of those contracting the disease. Walter Reed and colleagues in pioneering studies in Cuba in 1900 demonstrated that yellow fever is transmitted by mosquitoes, and 2 years

Abstract. The sequence of the entire RNA genome of the type flavivirus, yellow fever virus, has been obtained. Inspection of this sequence reveals a single long open reading frame of 10,233 nucleotides, which could encode a polypeptide of 3411 amino acids. The structural proteins are found within the amino-terminal 780 residues of this polyprotein; the remainder of the open reading frame consists of nonstructural viral polypeptides. This genome organization implies that mature viral proteins are produced by posttranslational cleavage of a polyprotein precursor and has implications forflavivirus RNA replication and for the evolutionary relation of this virus family to other RNA viruses.

flaviviruses are transmitted to vertebrate hosts by blood-sucking arthropods, mosquitoes or ticks, although some evidently lack an arthropod vector (2). Arthropodtransmitted flaviviruses replicate in the arthropod host as well as the vertebrate host. Human flavivirus diseases have diverse and complex pathologies and different viruses exhibit marked tissue tropisms. Many are neurotropic, causing encephalitic symptoms; others, such as the dengue group, replicate preferentially in host macrophages, whereas yellow fever is usually viscerotropic. The disease known as yellow fever has been recognized for several hundred years (3, 4). Until the early 1900's recurrent epidemics occurred in the Caribbean area which caused great human suffering and had a profound influence on human activities in the area. From its focus in the Caribbean, epidemic yellow C. M. Rice, E. M. Lenches, and J. H. Strauss are members of the Division of Biology, California Institute of Technology, Pasadena 91125. S. R. Eddy and S. J. Shin are students at the California Institute of Technology and R. L. Sheets is doing graduate work in the Department of Cellular, Viral and Molecular Biology, University of Utah, University Medical Center, Salt Lake City 84132. 726

later showed that the disease agent is filterable (5). With the recognition that the mosquito Aedes aegypti is the vector for urban yellow fever, mosquito control measures rapidly led to the elimination of urban yellow fever. Subsequently, a safe and effective attenuated vaccine strain (17D) was developed by in vitro passage of the virulent Asibi strain in chicken embryo tissue (6). However, the virus persists in a sylvan cycle in the forests of South America and Africa, transmitted by numerous mosquito species including those of the genus Haemagogus in South America and of the genus Aedes in Africa. The vertebrate hosts in this cycle appear to be almost exclusively primates, demonstrating the limited natural host range of yellow fever. From the sylvan cycle periodic outbreaks in neighboring human populations have arisen on both continents. Furthermore, since Aedes aegypti is widespread in the world, a situation exacerbated by relaxation of mosquito abatement procedures in the Caribbean and elsewhere, the potential exists for future epidemics of urban yellow fever.

Previous studies have shown that flaviviruses contain single-stranded infectious RNA (thus defining them as plusstranded RNA viruses in which the irnon RNA serves as a messenger) encapsidated in a nucleocapsid possessing icosahedral symmetry and containing a single species of capsid protein [C, apparent mass of about 14 kilodaltons (kD)]. This in turn is surrounded by a lipid bilayer containing an envelope protein (E; about 50 to 60 kD) that is usually but not invariably glycosylated (7) and a second, nonglycosylated protein (M; about 8 kD) (8, 9). How the envelope is obtained is unclear, as budding flaviviruses are seldom identified in electron microscopic studies, although maturation does appear to occur in association with intracellular membranes (9, 10). Replication of flaviviruses in tissue culture is slow, with a long latent period, and only moderate titers of virus are produced. Host cell protein and RNA synthesis are shut off only poorly (vertebrate cells) or not at all (mosquito cells), making study of flavivirus replication and structure somewhat more difficult. Virus-specific protein synthesis appears to be associated with the rough endoplasmic reticulum, and RNA replication is localized in the perinuclear region (11). No subgenomic RNA has been detected in cells infected with flaviviruses, and it is believed that the genomic length RNA which is capped but not polyadenylated (12, 13) is the only messenger RNA (mRNA) species (9, 12, 14). This mRNA is translated into the three structural proteins and several nonstructural proteins. Translation of the flavivirus genome in vitro produces polypeptides related to the structural proteins (15) which, in the presence of appropriate membrane fractions, can be processed efficiently to yield C and E (16). Peptide mapping of in vitro translation products as well as selective incorporation of Nformylmethionine suggest that initiation in vitro occurs only with the capsid protein. Alternatively, studies on the in vivo translation of flavivirus Kunjin have been based on the use of pactamycin or high salt inhibition of translation initiation (17) or ultraviolet inactivation of translation (18) in an attempt to map the genome order of flavivirus proteins on the assumption that there is just a single site for initiation of translation. These experiments have led Westaway and collaborators to suggest that multiple independent translation initiation sites are used within flavivirus RNA, a situation not typically found with other eukaryotic mRNA's (19). We now present the complete nucleoSCIENCE, VOL. 229

* |s P|&o t|P * | P*r fl*9|{fioe* rson*

CAP

AGUAAAUCCUSUOUGCUAAUUGAGGUGCAUUGOUCUGCA"UCGASUUGCUASSCAAUAAACACAUUUGGA.UUAAUUUUAAUCGUUCGUUGAGCGAUUASCAGAGAACUGACCAG

115

116

M IB5 6 R K A 0 K T V L V R R V R S L S N K I K O K T K 0 I G N N R P G AACJkIGUCUGOUCGUAAAOCUCABGGAAAAACCCUBSBCBUCAAUAUBGUACGACGAGGABUUCBC :uUCCUUGUCAAACAAAAUAAAACAAA.AAACAAAAAC,CAAAUUGGAAAAACAGACCUOBAA

39 235

40 238

p S R G V 0 G F I FF F L f N I L T G K K I T A H L K R L W K N L D P R O G L A ccuL UCAAGAGGUGUUCAAGGAUUUAUCUUUUUCUUUUUGUUCAACAUUUUGACUGGAAAAAAGAUC ACAGCCCACCUAAAGAGGUUGUGGAAAAUGCUG IGACCCAAGA CAAGGCUUGGCU

355

80 356

AS v L R K V K R V V A S L L S S R KR R S H 0 V L T V 0 F L I L G M L L M T GUUCcCUAAGGAAAGUCAAGAGAGUGGUGGCCAGUUUBAUGAGABBAUUGUCCUCAAGGAAACGCCGU JlUCCCAUGAUGUUCUGACUGUGCAAUUCCUAAULULUUGGGAAUGGCCUGUUGAUGACGG

119 475

120 478

V T L L N G L V T L VY R K N R S E 0 L G K T F S V G T G N C T T N I L E GGUG1-GGAGUGACCUUGGUGCGGAAAAACAGAUGGUUGCUCCUAAAUSUCACAUCUGAGGACCUCGGGAAAACAUUCUCUGUGGGCACAGGCAACUGCACAACAAACAUUUUGGA,AGCCAAG

159 595

10 596

Y W C P 0 S M E Y N C P N L S P R E E P 0 0 I 0 C W C Y G V E N V R V A Y G K C UACUGGUGCCCAGACUCAAUGGAAUAC AACUGUCCCAAUCUCA¢UCCAAGAGAGGAGCCAGAUGACAUUGAUUGCUGGUGCUAUGGGGUGGAAAACGUUAGAGUCGC AUAUGGUAAGUGU

199 715

200 716

NT 6 RNM G E R 0 L D S A O R S R R S R RFgMI D L PTr H E N H G L K T R OE K W GACUC.AGCAGGCAGGUCUAGGAGGUCAAGAAGGGCCAUUGACUUGCCUACGCAUGAAAACCAUGGUUUGAAGACCCGGCAAGAAAAAUGGAUGACUGGAAGAAUGGGUGAA AGGCAACUC

239 835

240 836

O K I E R W F V R N CAA AGAUUGAGAG AUGGUUCGUGAGGA C

P F F A V T A L T I A Y L V G S N N T O R V V I A L L V L A CCUUUUUUGCAGUGACGGCUCUGACC AUUGC CUACCUUGUGGGAGC A ACAUGAC GCAAC GAGUC GUGAUUGCCC U ACUGGUCUUGGCU

279 955

280 956

V G A Y S A H C I G I T O R D F I E G V H G G T W V S A T L E O O K C V T V M GUUGGUCCGGCCUACUCAGCUCACUGCAUUGGAAUUACUGACAGGGAUUUCAUUGAGGGGGUGCAUGGAGGAACUUGGGUUUCAGCUACCCUGGAGC AAGACAAGUGUGUCACUGUUAUG

319 1075

prM

A

~E

* P

K

I

79

320 1078

O K C A P 0 K P S L O I S L E T V A I D R P A E V R K V C Y N A V L THY V K N GCCCCUGACAAGCCUUCAUUGGACAUCUCACUAGAGACAGUAGCCAUUGAUAGACCUGCUGAGGUGAGGAAAGUGUGUUACAAUGC AGUUCUCACUC AUGUGAAGAUUAAUGA.CAA.GUGC

359 1195

380

1196

P S T G E A H L A E E N E G O N A C K R T V L F G K G S I V S O R G W G N G C O CCCAGCACUGGAGAGGCCCACCUAGCUGAAGAGAACGAAGGGGACAAUGCGUGCAAGCGCACUUAUUCUGAUAGAGGCUGGGGCAAUGGCUGUGGCCUAUUUGGGAAAGGGAGCAUUGUG

1315

400 1316

0O L H V G A K 0 E N W N T S L F E V D0O T K A C A K F T C A K S N Y V I R I GCAUGCGCCAAAUUCACUUGUGCCAAAUCCAUGAGUUUGUUUGAGGUUGAUCAGACCAAAAUUCAGUAUGUCAUCAGAGCACAAUUGCAUGUAGGGGCCAAGCAGGAAAAUUGGAAUACC

439 1435

440 1436

O N S Y O K A T L E C O V 0 T A V D F 0 I K T L K F O A L S G S G E V E F I G Y GACAUUAAGACUCUC AAGUUUGAUGCCCUGUCAGGCUCCCAGGAAGUCGAGUUCAUUGGGUAUGGAAAAGCUACACUGGAAUGCCAGGUGCAAACUGCGGUGGACUUUGGUAACAGUUAC

479 1555

480 1556

D R OW A 00 I A E M E T E S W I V L T L P W O S G S G G V W R E M H H L V E F AUCGCUGAGAUGGAAACAGAGAGCUGGAUAGUGGACAGACAGUGGGCCCAGGACUUGACCCUGCCAUGGC AGAGUGGAAGUGGCGGGGUGUGGAGAGAGAUGCAUC AUCUUGUCGAAUUU

519 1675

520 1876

E P P H A A T I R V L A L 6 N O E G S L K T A L T G A M R V T K D T N D N N L Y GAACCUCCGCAUGCCGCC ACUAUCAGAGUACUGGCCCUGGGAAACCAGGAAGGCUCCUUGAAAACAGCUCUUACUGGCGCAAUGAGGGUUACAAAGGACACAAAUGAC AACAA CCUUUAC

559 1795

560 1796

F V K N P T K G T O Y K I C T O K M F K L H G G H V S C R V K L S A L T L D T G A AACUACAUGGUGGACAUGUUUCUUGCAGAGUGAAAUU6UCA6CUUUGACACUCAAGGGGACAUCCUACAAAAUAUGCACUGACAAAAUGUUUUUUGUC AAGA ACCCAACUGAC ACUGGC

599 1915

600

H O T ,V V M O V K V S K G A P C R I P V I V A D0 L T A A I N K G I L V T V N P C AUGGCACUGUUGUG-AUGCAGGUGAAAGUGUCAAAAGGAGCCCCCUGCAGGAUUCCAGUGAUAGUAGCUGAUGAUCUUAC AGCGGCAAUCAAUAAAGGC AUUUUGGUUAC AGUUAACCCC

2035

640 2036

I A S T N 0 0 E V L I E V N P P F G D S Y I I V G R G O S R L T Y O W H K E G S AUCGCCUCAACCAAUGAUGAUGAAGUGCUGAUUGAGGUGAACCCACCUUUUGGAGACAGCUACAUUAUCGUUGGGAGAGGAGAUUC ACGUCUCACUUACC AGUGGCAC AAAGAGGGA AGC

679 2155

680 2156

O V E R L A V M G OT A W O F S S A G G F F T S V G K G I S I G K L F T O T N K UC AAUAGGAAAGUUGUUCACUCAGACCAUGAAAGGCGUGGAACGCCUGGCCGUCAUGGGAGACACCGCCUGGGAUUUCAGCUCCGCUGGAGGGUUCUUCACUUCGGUUGGGAAAGGAAUU *~~~~~~~~~~~~~~~

719 2275

399

639

M

N

720 2276

CAUACGGUGUUUGGCUCUGCCUUUCAGGGGCUAUUUGGCGGCUUGAACUGGAUAACAAAGGUCAUCAUGGGGGCGGUACUUAUAUGGGUUGGCAUCAACACAAGAAACAUGACAAUGUCC 0 o

759 2395

760 2396

N S N C A I N F G K R E L K C G0 I L V G V I M N F L S L G V G A G I F I AUGAGCAUGAUCUUGGUAGGAGUGAUC AUGAUGUUUUUGUCUCUAGGAGUUGGGGCGGAUCAAGGAUGCGCCAUCAACUUUGGCAAGAGAGAGCUCAAGUGCGGAGAUGGUAUCUUCAUA

799 2515

800

2516 840 2636

H

T V

F

G

S A

F

O

G

L

F G

G

L

N

W

I

T

K

V

I

H

G A

V

L

I

W

V

G

I

N

T

R

N

T

M

S

~~~~~~~~rNSI ) G OO

F

R D

S

OON W

L

N

K

Y

S

Y

Y

P

E D

P

V

K

L

A

S

I

V

K

A

S

F

E

E

G

K

C

G

L

N

S

V

0

UUUAGAGACUCUGAUGACUGGCUGAACAAGUACUCAUACUAUCCAGAAGAUCCUGUGAAGCUUGCAUCAAUAGUGAAAGCCUCUUUUGAAGAAGGGAAGUGUGGCCUAAAUUCAGUUGAC G ODP

A D

879 2755

K

919 2875

O G

959 2995

K N V Y O R G S L E H E M W R S R E I N A I F E E N E V O I S V V V UCCCUUGAGCAUGAGAUGUGGAGAAGCAGGGCAGAUGAGAUCAAUGCCAUUUUUGAGGAAAACGAGGUGGACAUUUCUGUUGUCGUGCAGGAUCC AAAGA AUGUUUACCAGAGAGGAACU

0

V O W

0

K

S

D

C

R

839 2635

T

2756

H P F S R I R D G L K T W G K N L V F S P G R K N G S F I I G CAUCCAUUUUCC AGAAUUCGGGAUGGUCUGCAGUAUGGUUGGAAGACUU6GGGGUAAGAACCUUGUGUUCUCCCC AGGGAGGAAGAAUGGAAGCUUCAUCAUAGAUGGAAAGUCC AGG^AAA

920 2876

GAAUGCCCGUUUUCAAACCGGGUCUGGAAUUCUUUCCAGAUAGAGGAGUUUGGGACGGGAGUGUUCACC AC ACGCGUGUACAUGGACGCAGUCUUUGAAUACACCAUAGACUGC GAUGGA

960 2996

UCUAUCUUGGGUGCAGCGGUGAACGGA^AAAAGAGUGCCCAUGGCUCUCCAACAUUUUGGAUGGGAAGUCAUGAAGUAA AUGGGACAUGGAUGAUCCACACCUUGGAGGC AUUAGAUUAC

999 3115

1000 3116

K E C E W P L T H T I 6 T S V E E S E M F M P R S I G G P V SS H N H I P G Y K AAGGAGUGUGAGUGGCCACUGAC AC AUACGAUUGGAACAUCAGUUGAAGAGAGUGAAAUGUUCAUGCCGAGAUCAAUCGGAGGCCCAGUUAGCUCUCACAAUCAUAUCCCUGGAUAC AAG

3235

1040 3238

0 T N G P W M O V P L E V K R E A C P G T S V II DO G N C O G R G K S T R S T V GUUCAG AC GAACGGACCUUGGAUGC AGGUACCACUAGAAGUGAAGAGAGAAGCUUGC CCAGGGACUAGCGUGAUC AUUGAUGGCAACUGUGAUGGACGGGG AAAAUC AAC CAGAUCC AC C

1079 3355

1080 3356

C R S C T N TO SOK V I PEW P P V S F H G S O G C N Y P N E I R P R K T H ACGGAUA GCGGGAA^AGUUAUUCCUGAAUGGUGUUGCCGCUCCUGCACAAUGCCGCCUGUGAGC UUCC AUGGUAGUGAUGGGUGUUGGUAUCCC AUGG AAAUUAGGCCA^AGGA AA ACGC AU

3475

1120 3476

E S H L V R S W V T A 0 E I H A V P F G L V S M N I A M E V V L R K R 0 G P K 0 GAAAGCC AUCUGGUGCGCUCCUGGGUUAC AGCUGGAGAAAUACAUGCUGUCCCUUUUGGUUUGGUGAGCAUGAUGAUAGCAAUGGAAGUGGUCCUAAGGAAAAGACAGGGACCAAAGCAA O

1159 3595

860

1160

E S

C

P

I L

L

V

D A

M

M

F

G

G

S A

N A

R V

G

V

V

N

A

L

V

N

L

W

G

L

N K

S F

0

I

E

B A N G

K

G

A N

A

F

L

V

G

E

S

F P

O V

G T

T

T

F

L

G W

L

V M

F T O S

D L

L

T H

K

R

E

L

V

V

T

Y N

V

NM

G

A

A T

W

V M

-ns 2a V G L

F E I

H

H

F

Y T

H

T

L

E

I

E

M

A

N

VY

L

N

G

G

1039

1119

1199 3715

3596

AUGUUGGUUGGAGGAGUAGUGCUCUUGGGAGCAAUGCUGGUCGGGCA^AGUA ACUCUCCUUGAUUUGCUGAAACUCAC AGUGGCUGUGGGAUUGC AUUUC CAUGAGAUGAAC AAUGGAGGA

1200 3716

GACGCCAUGUAUAUGGCGUUGAUUGCUGCCUUUUCAAUCAGACCAGGGCUGCUCAUCGGCUUUGGGCUCAGGACCCUAUGGAGCCCUCGGGAACGCCUUGUGCUGACCCUAGGAGC AGCC

1240 3838

AUGGUGGAGAUUGCCUUGGGUGGCGUGAUGGGCGGCCUGUGGAAGUAUCUAAAUGCAGUUUCUCUCUGCAUCCUGACAAUAAAUGCUGUUGCUUCUAGGAAAGCAUC

N T I L AA AUACCAUCUUG

1279 3955

1280 3956

P L M A L L T P V T M A E V R L A A M F F C A V V I I G V L H O N F K 0 T S M O CCCCUC AUGGCUCUGUUGAC ACCUGUC ACUAUGGCUGAGGUGAGACUUGCCGCAAUGUUCUUUUGUGCCGUGGUUAUCAUAGGGGUCCUUCACC AGA AUUUCAAGG ACACCUCCAUGC AG

1319 4075

1320 4076

AAGACUAUACCUCUGGUGGCCCUCACACUCACAUCUUACCUGGGCUUGACACAACCUUUUUUGGGCCUGUGUGCAUUUCUGGCAACCCGCAUAUUUGGGCGAAGGAGUAUCCCAbGUGA AU

4195

1380

4196

G E M E N F L G P I A V G OG L M M L V S V A L A F E A L A A A G L V G V L A O G AGGC.ACUCGC AGC AGCUGGUCUAGUGGGAGUGCUGGCAGGACUGGCUUUUCAGGAGAUGGAGAACUUCCUUGGuccGAUUGCAGUUGGAGGA CUC CUGAUG AUGCUGGUUAGCGUG GCU

4315

1400 4316

GGGAGGGUGGAUGGGCUAGAGCUCA^AGAAGCUUGGUGAAGUUUCAUGGGAAGAGGAGGCGGAGAUCAGCGGGAGUUCCGCCCGCUAUGAUGUGGC ACUCAGUGAAC AAGGGGAGUUC AA G

1439 4435

1440 4436

A L H P F A L L L V L A G W L F H V L L S E E K V P W O O V V N T S L A L V G A CUGCUUUCUGA AGAGA AAGUGCCAUGGGACC AGGUUGaUGAUGACCUCGCUGGCCUUGGUUGGGGCUGCC CUC CAUCCAUUUGC UCUUCUGCUGGUC CUUGCUGGGUGGCUGUUUC AUGUC

1479 4555

1480

ROA^ R RFs GD V LW D I P T P K I I E E C E H L E D G I Y G I F O S T F L G A AGGGGAGCUAGGAGAAGUGGGGAUGUCUUGUGGGAUAUUCCCACUCCUAAGAUCAUCGAGGAAUGUGAACAUCUGGAGGAUGGGAUUUAUGGCAUAUUCCAGUCAACCUUCUUGGGGGCC

1519 4875

1520

1559 4795

4798

O A F L V R N G K K L I P S W A S V K N W H V T R V A 0 0 G Y F H T S O R G V UCCC AGCGAGGAGUGGGAGUGGCACAGG6AGGG6UGUUCCACACAAUGUGGCAUGUCACAAGAGGAGCUUUCCUUGUCAGGAAUGGCAAGAAGUUGAUUCCAUCUUGGGCUUCAGUAAAG E E E V O L I A A V P G K N V V N V 0 T K P E D L V A Y G G S W K L E G R W D0 G AAG ACCUUGUC GCCUAUGGUGGCUC AUGGAAGUUGGAAGGC AGAUGGGAUGGAGAGGAA^GAGGUC CAGUUG AUCGC GGC UGUUC C AGGAAA^G AAC GUGGUC AACGUCC AGACA AAAC CG

1600 4916

A V A L D Y P S G T S G S P I V N R N O E V I G L Y G N S L F K V R N G G E I AGCUUGUUC AAAGUGAGGAAUGGGGGAGAAAUCGGGGCUGUCGCUCUUGACUAUCCGAGUGGCACUUCAGGAUCUCCUAUUGUUAACAGGAACGGAGAGGUGAUUGGGCUGUACGGCAAU

1640

O K E E L 0 E I P T H L K K G M T T V G I L V G 0 N S F V S A I S O T E V K E E GGC AUCCUUGUCGGUGACAACUCCUUCGUGUCCGCCAUAUCCCAGACUGAGGUGAAGGAAGAAGGAAAGGAGGAGCUC CAAGAGAUCCCGACAAUGCUAAAGAAAGGAAUGACAACUGUC

1879

5038

1680

1719

5156

P 0 1 L A E C A R R R L R T L V L A P T R V V L S E CUUGAUUUUCAUCCUGGAGCUGGGAAG AC AAGACGUUUCCUC CC ACAGAUCUUGGCC GA GUGCGC ACGGAG ACGCUUGC GC ACUCUUGUGUUGGCCCCC ACC AGGGUUGUUCUUUCUGAA

5275

1720 5276

Ff S A H G s G R E V I D A M C H A T L T Y R M L E A F H G L D V K F H T O AUGAAGGAGGCUUUUC ACGGCCUGGACGUGA AAUUCC AC AC ACAGGCUUUUUCCGCUC ACGGCAGCGGGAGAGAAGUC AUUGAUGCC AUGUGCC AUGCC ACCCUAACUUAC AGGAUGUUG

1759 5395

17W0

A E PT R V V NW E V II M D E H F L GAACC1AACUAGGUUGUUA9ACUGGGAAGUGAUCAUUAUGGAUGAAGC8CA5UUUUGGAUCCAGCUAGCAU:eseAGAGGUUGGCAGCGCACAGAGCUAGGGCAAAUGAAAGUGCA 5515

4556 4676 1560

5398

M V

K

G

T

R

I.

V

O F

L

M

E

Y

I

P

0

H

A

L

G

P

L

V

L

G

G

I G

A L

E

A

L

G

A V

T

K

K

M

L

K

T

G

T

L

R

S G

B

I

L W

Y

B E

R

R

F

P K

L B

V

S

G LL Y

L

N

L

T

E

N

O

E

I A

P

E

G V

F

A

F S

L

E

0

L

0

I

L

C

L

S

R

I

C

G

T

L

A

s

L T

F

s

W

I

L

A

S N

A

R

P A

T

Y

R

V

R

0

E R A

I

V

L

S R

F

A

G

L

V

K

R

S

L A

R

E

T

S

S

0

L

I

G

G A

P

E

V

F

A

N

K

L

K

23 AUGUSTr 1985

A

O P

A

S

I

A

A R O N

A

A

N R

A

R

N

E

S

A

1239 3835

1359 1399

1599 4915

1639 5035

5155

1799

727

1800 5516

T

1840

IL:

5636 1880

I

N

L

T

A

P

T

P

G

T

S

E

0

P

F

S

H

N

G

I

E

E

D

0

V

T

I

0

P

E

5

P

W

G

T

N

14

0

W

A D

K R

P TA

FL:

W

P SI

R

A

A

N V

A S

MA

R

L

A G

K

K S

V

V

L

V

N

R

K

T

F

E

R E

PT

V

I

O K

K

P D

K

F

I L

TO0

A

IA

E M

G A

N

C V

L

E R

O CR

L

V

T A

F

K

P

1960

V

V

DESG

L V

R

V

K

K G

A

P

L

R I

S A SS

A

O R

A

G

R

R IG

P

R N

N

R D

GD0

S

V

S E PT

V

SE N

A

N

H

H

C

V

L EA

W

L

S M

L

O N

M E

V

G

R

V

G M

A

PL Y

G

V

E

2040 6236

K

T

G

T

K

P

P G

V S

E M

R

L

RODSD

R

V

K

R E

F

L V

C DL

R N

W

V

P

L

S W

OV A

AOGL

K

O R

T N

W

K

F E

C

G

P EE

H

ElI

L

SE T

O S

N

V

CR;APSG

K

G A

K

K

P

2080 6356

2120

P

0

F

L

A

K

M L

F

I L

S

E

A

N

0

I

S

V

A G

L

L

T

SGNM

V

I1FF

S

K

T

F

L

S

E

E

S

R

A

V

M S

P

K

G IS

R

M S

H

S

R

N

S

L

A

N

P

E

A

N

T

T M

A

S C

S

V

N

2240 6836

V

2280 6956

AS

2320

N

MNFL

GSG

V

K

P T

ISVY

H

M ANS

2199

V

I1FF

M L

L

V

N

V

V I

V

P EP

GOG'

S IOD

R

NOG

2239 6835

A

V

I

L

I

S

L

I

T

V

L

S

A

A

V

A

N

E

S

L

M-LI E

T

K

E

K

0

L

F

S

K

N

K

L

I

P

5

5

2279 6955

WSWPL

L

K

P G

A

A

G SA

S

V

L

C

G IG

C

A

N

P T

V'

M T

V

V

S I

V

T M

L

S

S F

WODKSG

I

P

N

K

N

N

I SV

L

H

M S

L

PSGI K

A

DI E

E

A

A

V

P

N

H

H

W I

INM

L

L V

L

K

V

VSG

E

2319

7075

SSGI

L

S

2360 7196

IT;

V

2400

A

E N

P

2440 7436

V

A

C

2480 7556

NT

S

V

M R

S N

H

2520

RE;

L

N

L

O K

R

O F

E L

L

F

7076 N P

L

L

V

OSG

L I

OO0S K

LAO

G W

N

S

2359

G

V

2399

LA S

2439

A

V S

2479

rMIns4b F H

R ,V

R

7195

7315

V

N

7316

PA;L

PENM

K

L A

LVY

L

S

PL IESG

NT

S

L

L W

N

GP M

SA rGNS5

N

O K

T

L

G0EV

W

K

2519

ESG

K

V

0

T

S

V

2559

W

C

V

V

A

A

A

2599

K

O K

TO0

I

H

2639

2679

Y E

K

L

L A

L

S

7435

PFSL

ES

I V

LA S

A

A L

7555

V

A

F

VSG

N

V

V

TSGR

N

L W

K

N

K

TOD

I V

E

V

D R

V

V

K

L EG

H E

K

P M

S S SS

R

7675

L

V

KR

0

T AR

R H

LA:

7795

S R

ST

A

K

L

R

W

F

H ERS

OK;

E

V

SSG

V

K

S F

T

L

G

2640 8036

R

E

P

V

K

CODT

L

LC;D

ISG

2680 8156

CS;

VO0N

F

C

V

LA;P

V

2720 8276

N

2760 8396

E

2800

F

2560 7796

A

2600 7916

V

R

V IODL

N

V

O SLOG

V

T ESS

L

E L

S C

G

RO;G

7915

RO0G

W

lITI

N

F

8035

L

E S

R

T V

R

R

R

GO

V

LODT

V

E

K

W

L A

T V

RtN

P

L

S R

8155

V

K

N P

D V

L

E K

LO0

F

I

2719 8275

S TH

EN

V

GSA

V V

R

S N

V T

F

NO;T

T V

S R

L

L

M R

RNM

P PT

R

0 KV

T

L

2759 8395

DV I

A

L

P1SGT

R

VETOD

S

KSG

KS;

PLO

A

SRE

IS

VESRI

EVY

K

M T

S W

2799 8515

VODN

8516

D

N P

V

R

T

W

H

V

C

O SY V VT

K

SOGS

T

A

A

SNM

V

NOG

I1K

V

I L

TVY

P

W

2839

8835

2840 8636

DR;

2880 8756

IN;K

2920 8876

LE;E

2960

N

ISE EV

T

RNM

A

N

R

WML

F

R H

TODT

P

F

G000

R

R E

K

N

P R

L

GODP

K

F

T

V

F

K

S KV

0

T

R

A

K

O P

P

ASGT

C T

K

SEEF

I A

K

V

R

S

H

A

A

R

C R

R

K

ISGA

V

2919

T

V

V

2959

SF

L

V

A

T

2879 8755

V

V

N

L A

8875

G

EO0W

K

T A NS

R E

K

K

L

S EF

R E

N

SO;

A

V

M EL

V

HOGS0

D E ER

K

G A RY

L

C

8995

N

N

G K

8996

GSK

A

O S

R

AIM

V

GIG

L

0

V

L

S

DOEOSEI

J.

N Y

N

S R

RD

K

N M WL

FSE

L

A L

2999 9115

NED

W AS

H

9116

GO

VSE

VI

V

RD0

LA A

NOM0

SO

G

F

V

N

S M

3039 9235

3040 9236

00D

3080 9356

V

K

N

K

V

V

K

V

L

R

PA:P GSOK

A

V

N

3120 9476

IT;

N

L

K

V

0

L I

R

MA;E

V I

H

3160 9596

H

S C

3200

0

I

T AG

MOD

TRAITE

A

O LD

S PH

H

K

K

L AG

A

3079 9355

DVI

ORG

S GO

D ES

V

V

V

R

L

TVY

AL

N T

3119

9475

A

E M

HO0

H

V

00D

C

L

T

S AW

L

T

S

3159 9595

OR L K RNM A V s G DZD C V V R P1I ol 0F F S L A L S H L N A N S KV R K A CACGAUGGACAACUAAGAGAUGCGGGA GAGACACUGUGUGGUCCGGCCCAUCGAU GACAGGUUCGGCCUGGCCCUGUCCCAUCUCAACGCCAUGUCCAAGGUUAGAAAG S

S

U

0

P

5

K

S

U

U

S

VS; PSG

N

N

0

9716

N

V

P

F

C

S

H

F 'H

S

L

0

T A

C L

S K

A

H

L

K

0

0

R

I

V

V

P

C

R

L

N Y

F

H

K

R

3199 9715

3239

9835

3240 9836

EGOS0

3280 9956

RON

3320 10078

L

S

3380 10198

LIONG

728

2159

6715

L

10798

2119 6475

6595

IV;

2200 6716

00558

2079

6355

-

6476

10438

2039

6235

~~~~-~ns4a K' LUGRCCPRGGUGCDGAUERAG VUG SSUC GSCCG ALGGCGSEGAUUF UAAUUGCGAFGAGAGES GGRGAGCGUAAGEGV UGLV GUV UGLGSEACL

3400 10318

1999

6115

2000 6116

3000

1959

5995

5996

7676

1919 5875

1920 5876

6596

1879 5755

5756

2160

1839

5635

L I

S R

SR

OWNM

I KE

V AN

MUG

3279 9955

R

L

L

S L AV

SS;

A

V

P T

SW

V

POOGR

T

TMW

S I

H

0G

SUNEW

T

T ED

N

3319 10075

V

M N

R

V

M IT

N

N

PHN MO

O 0K

T M

V

K

K

M R

D V

P

V

L

T

K R

00DK

L

CBGS

3359

10095

TN

RAAT

W

AS;H

INH

L VI

HRA

I

R

T

L I

G0ESKVY TO

V

L

T V

NO0R

V

3399

10315

S

V

D AD

L

0

LOS

L I

3400

10435 00555

100875

10882

VOL. 229 ~~~~~~~~~~~~~~~~~~~~~~~~SCIENCE,

tide sequence of the yellow fever genome determined from complementary DNA (cDNA) clones of the 17D

5' 116 nt CAP4

Yellow fever 17D genome (10.862 nt) 10,288 nt 511 nt 3' l- natruntursu 1tru / Cotranslational processing j (Viral proteases?)

vaccine strain. Together with recent /^ NH2-terminal sequence analysis of both (Signalase?) structural (20) and some nonstructural a AYA ORR V RR a VGA D ORR S ARR ORRa yellow fever proteins, the amino acid C prM E NSI NS3 ns4a ne4b ns2a ne2b NS5 sequences of the encoded proteins have |, (Golgi protease?) SRR A been deduced and a preliminary picture G ? M of flavivirus gene organization and expression has begun to emerge. Fig. 2. Organization and processing of proteins encoded by the yellow fever genome. Sequence of yellow fever RNA. The Untranslated regions are shown as single lines and the translated region as an open box. The complete sequence of yellow fever RNA open triangle is the initiation codon (AUG); the solid diamond the termination codon (UGA). protein nomenclature is described in Table 1 and (35). The single letter amino acid code is is shown in Fig. 1. The 5'- and 3'- The used for sequences flanking assigned cleavage sites (solid lines). Two other potential cleavage terminal sequences presented were de- sites are shown as dotted lines. Structural proteins, identified nonstructural proteins, and rived from several independent clones, hypothesized nonstructural proteins (see text) are indicated by solid, open, and hatched boxes, are homologous to the 5' and 3' termini respectively. Other potential cleavage sites have been found and are described in Table 1, of West Nile flavivirus genomic RNA footnote asterisk. (21) (see below), and thus probably reflect the extreme ends of the yellow fever genome. Given these assumptions, possible reading frames (two in the viri- in agreement with in vitro translation the RNA genome is 10,862 nucleotides in on RNA and three in the complementary data from the flavivirus genomic RNA's length and has a mass of 3.75 x 106 RNA) reveals multiple stop codons in of tick-borne encephalitis virus, West daltons (expressed as the sodium form). every case, with the longest possible Nile virus, and Kunjin virus (15, 16), the Previous reports have shown that flavi- other open reading frame being 804 nu- translation of the yellow fever genome virus genomic RNA contains a type 1 cleotides (in the complementary strand). initiates with the capsid protein, and the cap at the 5' terminus but lacks a polya- Thus there is no reason to expect that NH2-terminal methionine is removed denylate tract at the 3' terminus (12, 13). any protein is translated from yellow during maturation of the protein (20). The base composition of the RNA is 27.3 fever RNA other than the polyprotein The capsid protein may be released from percent A, 23.0 percent U, 28.4 percent encoded by the long open reading frame the precursor polyprotein by cleavage at G, and 21.3 percent C. shown in Fig. 1. or just past a series of basic amino acids It is striking that the RNA contains an The structural proteins ofyellow fever (Figs. 1 and 2). From this deduced amino extremely long open reading frame, virus. The start points of the three yel- acid sequence, the capsid protein is quite which spans virtually the entire length of low fever virus structural proteins (C, M, basic containing about 25 percent lysine the genome. This open reading frame, and E) have been positioned within the and arginine distributed throughout the beginning from the first AUG triplet, is translated RNA sequence from NH2-ter- protein. The capsid protein of tick-borne 10,233 nucleotides in length, terminating minal amino acid sequences obtained for encephalitis virus contains a similar prowith a single opal codon (UGA), and the structural proteins isolated from yel- portion of basic amino acids (22). Since could encode a polypeptide of 380,763 low fever virions (20) (Fig. 1). The capsid the capsid protein forms complexes with daltons, leaving 5'- and 3'-noncoding re- protein is the first protein found in the the RNA, its highly basic character probgions of 118 and 511 nucleotides, respec- long open reading frame and begins one ably acts to neutralize some of the RNA tively. Examination of the remaining five residue past the first methionine. Thus, charges in such a compact structure. Fig. 1 (preceding page and opposite page). Entire sequence of the genome of yellow fever virus. Yellow fever virus, 17D vaccine strain, was obtained from the American Type Culture Collection. This sample represents in vitro passage 234 of the line originated by Theiler and colleagues who started with the virulent Asibi strain (6). After plaque purification in Vero cells and amplification in BHK cells, the virus was grown in SW13 monolayers (50) and purified by polyethylene glycol precipitation, in glycerol-tartrate gradients. The purified virus was diluted with aqueous buffer and sedimented in the ultracentrifuge; the RNA was isolated by phenol extraction (51). Briefly, single-stranded cDNA was synthesized with avian myeloblastosis virus reverse transcriptase using degraded calf thymus DNA for priming (47). Second strand synthesis was carried out essentially as previously described (52). After methylation of the Eco RI sites with Eco RI methylase, phosphorylated Eco RI linkers were added with T4 DNA ligase. Following complete digestion with Eco RI, the double-stranded cDNA was sized on an agarose gel and selected size fractions were inserted into the Eco RI site of a plasmid vector derived from pBR322. Colonies containing yellow fever-specific inserts were selected by colony hybridization and were characterized by restriction mapping to obtain clones which represented most of the yellow fever genome. Clones containing the 3' end of the genome were constructed by poly(A)-tailing (polyadenylation) the genomic RNA with Escherichia coli poly(A) polymerase followed by synthesis of double-stranded cDNA with an oligo(dT) primer. Addition of the poly(A) tract was relatively inefficient but after digestion of the double-stranded cDNA with Bgl I, 3'-terminal Bgl I fragments were selectively cloned with a plasmid vector derived from cloned yellow fever DNA (51). Clones containing the 5' end of the genome were constructed by primer extension followed by oligo(dC) tailing with terminal deoxynucleotidyl transferase and oligo(dG) primed second strand synthesis. The entire sequence was obtained by chemical sequencing of both strands of the DNA (53). In addition, sequence was obtained throughout from at least two clones. Wherever the sequence differed between two clones (due presumably to heterogeneity in the RNA population or errors introduced during cloning), a third and occasionally a fourth clone was sequenced in this area, and the preferred nucleotide is reported here. Nucleotides are numbered from the 5' terminus. Amino acids are numbered from the first methionine in the polyprotein sequence. The beginning of each protein is labeled (see Table I and text for nomenclature); tentative assignments are indicated by dashed arrows. Putative hydrophobic membrane-associated segments in the structural region are overlined. Potential N-linked glycosylation sites are denoted by an asterisk. The region of NS5 homologous to other RNA viruses (see text) is enclosed by brackets and the conserved Gly-Asp-Asp sequence is boxed. Repeated nucleotide sequences are underlined. Closely spaced in phase stop codons that terminate the long open reading frame are boxed. The single letter abbreviations for the amino acid residues are: A, alanine; C, cysteine; D, aspartic acid; E, glutamic acid; F, phenylalanine; G, glycine; H, histidine; I, isoleucine; K, lysine; L, leucine; M, methionine; N, asparagine, P, proline; Q, glutamine; R, arginine; S, serine; T, threonine; V, valine; W, tryptophan; Y, tyrosine. 729 23 AUGUST 1985

prM

4C

o i 41

E

LAW I&

-

200 ns2a

-4

r-ON

o

T0

.16

&a V''I ~ LII:. " iAV

-4

1400

4

V 600

400

ns2b

r.mm.

0-

.11i I. A I

(a) *6