Human Herpesvirus 6 Is Closely Related to Human Cytomegalovirus

32 downloads 0 Views 2MB Size Report
A sequence of 21,858 base pairs from the genome of human herpesvirus 6 (HHV-6) ... Epstein-Barr virus [EBV], human cytomegalovirus [HCMV], herpes simplex ...
Vol. 64, No. 1

JOURNAL OF VIROLOGY, Jan. 1990, p. 287-299

0022-538X/90/010287-13$02.00/0 Copyright C) 1990, American Society for Microbiology

Human Herpesvirus 6 Is Closely Related to Human Cytomegalovirus G. L. LAWRENCE,'* M. CHEE,1 M. A. CRAXTON,2 U. A. GOMPELS,2 R. W. HONESS,2 AND B. G. BARRELL1 Medical Research Council Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH,1 and Division of Virology, National Institute for Medical Research, London NW7 JAA,2 United Kingdom Received 29 June 1989/Accepted 25 September 1989

A sequence of 21,858 base pairs from the genome of human herpesvirus 6 (HHV-6) strain U1102 is presented. The sequence has a mean composition of 41% G+C, and the observed frequency of CpG dinucleotides is close to that predicted from this mononucleotide composition. The sequence contains 17 complete open reading frames (ORFs) and part of another at the 5' end of the sequence. The predicted protein products of two of these ORFs have no recognizable homologs in the genomes of other sequenced human herpesviruses (i.e., Epstein-Barr virus [EBV], human cytomegalovirus [HCMV], herpes simplex virus [HSV], and varicelia-zoster virus [VZV]). However, the products of nine other ORFs are clearly homologous to a set of genes that is conserved in all other sequenced herpesviruses, including homologs of the alkaline exonuclease, the phosphotransferase, the spliced ORF, and the major capsid protein genes. Measurements of similarity between these homologous sequences showed that HHV-6 is clearly most closely related to HCMV. The degree of relatedness between HHV-6 and HCMV was commensurate with that observed in comparisons between HSV and VZV or EBV and herpesvirus saimiri and significantly greater than its relatedness to EBV, HSV, or VZV. In addition, the gene for the major capsid protein and its 5' neighbor are reoriented with respect to the spliced ORFs in the genomes of both HHV-6 and HCMV relative to the organization observed in EBV, HSV, and VZV. Three ORFs in HHV-6 have recognizable homologs only in the genome of HCMV. Despite differences in gross composition and size, we conclude that the genomes of HHV-6 and HCMV are closely related.

of the herpesviruses recognizes their biological diversity and divides them into three subgroups (the alpha-, beta-, and gammaherpesviruses) on the basis of some of these biological properties (28, 30, 49). Alphaherpesviruses, exemplified by herpes simplex viruses (herpes simplex virus types 1 and 2 [HSV-1 and HSV-2]; human herpesviruses 1 and 2) and varicella-zoster virus (VZV; human herpesvirus 3), are distinguished by their capacity to establish latent infections of neural tissues and to reactivate from these sites. Betaherpesviruses include the cytomegaloviruses (e.g., human cytomegalovirus [HCMV]; human herpesvirus 5); they replicate productively in cultures of fibroblasts from the host species. The sites of their persistence in vivo are uncertain but may involve reticuloendothelial cells and do not appear to involve neural tissues (42). Gammaherpesviruses are typified by the B-cell lymphotropic human herpesvirus, Epstein-Barr virus (EBV; human herpesvirus 4), and the T-cell lymphotropic virus of the squirrel monkey, herpesvirus saimiri (HVS; saimiriine herpesvirus 2). The major mode of virus persistence of these lymphotropic viruses is as latent infections of circulating lymphocytes. The isolations of HHV-6 from peripheral blood lymphocytes have clearly shown that the virus can infect a population of lymphocytes in vivo. The major population of productively infected cells in cultures of cord blood or peripheral blood lymphocytes has the characteristics of immature CD4+ T cells (20, 39), and the virus can be propagated in cultures of lymphoblastoid cells in vitro (39, 59). Despite the lack of knowledge on the nature of the latent site of the virus or any demonstration that HHV-6 can transform lymphoid cells, it has been suggested that the virus should provisionally be classified as a gammaherpesvirus (33, 38). We have undertaken an analysis of the structure and sequence of the genome from a Ugandan isolate of HHV-6 (U1102 [20]). In this report, we present and interpret a

The first recognized isolations of a previously undetected human herpesvirus, now called human herpesvirus 6 (HHV6), were obtained in the course of in vitro cultivation of peripheral blood lymphocytes from patients with lymphoproliferative disorders, some of whom were also infected with human immunodeficiency virus (50). The viruses were shown to have the ultrastructural and morphogenetic properties characteristic of a herpesvirus (5, 59) but to be distinct from the five previously known human herpesviruses by their antigenic properties and by the failure to show homologous hybridization with nucleic acid sequences from each of these other five human viruses (32). Independent isolates of herpesviruses shown to be closely related to the initial isolate (HBLV/GS) were subsequently reported from human immunodeficiency virus-infected patients from Uganda (strains U1102 and U683 [20]), The Gambia (strain AJ [59]), and Zaire (strain Z29 [38]). A series of seroepidemiological investigations has since established that evidence of a prior infection with HHV-6 is widespread in populations of apparently healthy adults and that the virus is typically acquired in early infancy (7, 52). The primary infection in infants has been shown to cause the common childhood infection exanthem subitum (roseola infantum [34, 61]), and a series of virus isolations from the acute stages of this mild childhood disease has been obtained. There have also been reports of the common detection of HHV-6 in cervical lymph nodes (23) and of HHV-6 DNA sequences in a proportion of some rare B-cell tumors (31) and suggestions that infection or recurrence in adult life may be related to lymphadenopathy

(9, 46).

We are interested in the relationships between the divergent biological and molecular genetic properties of the herpesviruses and their evolution. The current classification *

Corresponding author. 287

~. . .

J. VIROL.

LAWRENCE ET AL.

288

(a) 0

2

4

III

OR

6

8

10

12

14

I

I~~pR9.1

I

x...

DS

...

I

16

18

20

22

I| |l l| || 1 l, |l l,1 ll Ikb

I

I*I -~~~~~~~~~~~~~~~~~~~~~~p,,

D

Eco RI Hind III Pst I

, ,,,,..Ts'f..fi.:'R>>'' PSD12l ......

Sal I

a' -tAa

Sma I 11R

8R

5R

15R

1OR

6R

17R

13R

9R

14R

mmusam Omni I, am ...IIIss.. _rn_*ma.*___sm*a _m .. mmf.*mI _m m_ -. -

16R

P.

3L 1L

2L

4L

7L

12L

FIG. 1. (a) Restriction map of the region sequenced showing EcoRI, HindIII, PstI, SmaI, and SalI restriction sites. The DNA sequence was determined from the shaded portions of the plasmids, whose names appear on the restriction fragments from which they were derived. (b) Positions of termination codons in each of the three possible ORFs for each strand of the region sequenced. Arrows indicate the location and direction of the major ORFs. ORFs are named 0 to 17; R or L indicates rightward or leftward orientation.

sequence of 21.8 kilobase pairs (kbp) from the genome of HHV-6 that includes the sequence previously recognized as having significant nucleotide sequence similarities to a region of the HCMV genome (22). The sequence and arrangement of the predicted open reading frames (ORFs) in this region of the HHV-6 genome bear a much closer resemblance to corresponding regions of HCMV (a betaherpesvirus) than to corresponding regions of the genome of EBV, HSV, or VZV. MATERIALS AND METHODS Isolation and characterization of recombinant DNA clones of HHV-6 DNA. All recombinant DNA clones were isolated from HHV-6 (U1102) DNA prepared from cultures of infected cord blood lymphocytes. The Sall and SmaI clones were prepared by cloning purified restriction endonuclease fragments into the SalI site of a pBS (Bluescribe; Stratagene) vector or the SmaI site of pUC13. Fragments were selected for sequencing on the basis of their linkage relationships to the 5.4-kbp HindlIl fragment (cloned into the HindIII site of pUC8 as pHD5), which was previously shown to be homologous to a region of the HCMV genome (22; unpublished results). The EcoRI plasmid pR9.1 was provided by M. Jones, and the PD12 clone was a 1.2-kbp PstI fragment cloned directly into the PstI site of M13mpl8. The relationships between these cloned fragments over the relevant portion of the HHV-6 genome are summarized in Fig. 1. A detailed description of the mapping and cloning of the HHV-6 (U1102) genome will be presented elsewhere, but the rightmost SalI site of pSAD3.5 in the region analyzed in this report is located approximately 31 kbp from the right unique/ repeat junction of the HHV-6 genome, the total size of which is about 170 kbp.

DNA sequencing and sequence analysis. The DNA sequence was determined from the regions shown in Fig. 1, using the methods described by Bankier et al. (2). Random subfragments of DNA from these plasmids were prepared by

sonication (19) and subcloned into M13mp8 (43), and singlestranded templates were sequenced by the dideoxynucleotide-chain termination method (51). Regions of sequence compression were resolved by replacing dGTP with deoxy7-deazaguanosine triphosphate in the sequencing reactions (45). Sequence data were assembled by using the computer programs DBAUTO and DBUTIL (55, 56) and analyzed for the presence of ORFs and transcription signals with the programs DIANA (J. Crooke, T. S. Horsnell, and B. G. Barrell, unpublished data) and ANALYSEQ (58). Predicted protein sequences were analyzed for hydrophobicity and potential glycosylation sites with ANALYSEP (58), and searches for homologous protein sequences contained in protein libraries were performed by using the computer program FASTP (37). The AMPS suite of programs (3, 4) was used to carry out pairwise computer alignments of predicted translation products of HHV-6 ORFs and the homologous genes of the other human herpesviruses. Twenty randomizations of each alignment were performed so that a significance score for the alignment could be obtained. The program uses the Dayhoff mutation data matrix (17, 18) for protein alignments. All computer programs were run on DEC VAX and microVAX computers. RESULTS The DNA sequence of a 21,858-bp region of the HHV-6 (U1102) genome has been determined for both strands, each base being sequenced an average of six times by the random

FIG. 2. DNA and predicted protein sequences. The nucleotide sequence reported here will appear in the EMBL and GenBank data bases under accession number M28243. The DNA sequence is given as the rightward 5'-to-3' strand only (numbered 1 to 21858). Rightward-encoded protein sequences are shown above the corresponding DNA sequences in single-letter code; leftward-encoded protein sequences are shown below the corresponding DNA sequences. The name of each ORF is given on the left of the first line of sequence, and amino acid sequences are numbered from the N terminus to the C terminus to the right of the sequence. Protein sequences are shown from the first ATG. The sequence continues on the following pages.

OR

G

D

E

P

Y

T

R

R

R

R

R

R

H

D

V

D

D

N

D

E

R

A

M

E

R

R

N

D

L

R

E

L

V

D

M

I

G

M

L

R

40 12 0

ACAAGAGATTAGTGCGTTGAAGCATGTTCGCGCTCAATCGCCGCAGAGACATATCGTTCCGATGGAGACTCTGCCTAC GATCGAGGAGAAAGGCGCCGCGTCCCCAAAGCCATCTATTTT

R

G

A

A

S

P

R

P

S

I

L

so 24 0

N A S L A P E T V N R S L A G Q N E S T D L L R L N R K L F V D A L N R M D S * AAACGC TTCTTTGGCGC CTGAAACCGTAAATAGGAGCCTTGCTGGTCAGAACGAATCCACGGATCTGCTGAAACTCAACAAGAAATTGTTTGTTGACGCGTTAAATAAAATGGATAGTTA

119 36 0

AAATGTATTTTTATGTTGT GATCAAGTGGTGTTTAGCTGT GTGTTATAAGGAAGATTCAGAGTGAATTCCTGGACATGGTGAAACTTGACACATAATTGTTTACCGACGC GTTCAATAAA

48 0

AC GAATGGTTAAAAGTTGTTTTTTGTTTTTT TTATTTTC GTATAAAATGGTATTTAGGCGCTAGTTACGGGGAAGATCTATGGGAGTGATTGTGACAAGGAATGAAATGAATGCCGACAT * P S S R H S H N H C P I F H I G V D

60 0 441

C T CCTTCTAATATCGTTAGGTTTTGTGGGGTTCCAGGCGGCATGAATTCCATGCGAGTAGATT CTCCGGGTTTTAGAGAAAAGTGCAAGGGTACCATATGTGCGGTGTCGAACGCAAC GC G E L I T L N Q P T G P P M F E M R T S E G P R L S F H L P V M H A T D F A V R

72 0 401

GGAGTGAACTCAAAAATTCCCATGTTGATGTCTGGCGTTAAGTTTGCAACCGTGCTTAGACTTGGTGTTCGCAATACTGCTTGGTTTGTGAAAGTTACGGTGTTTCTTACGGGACCAT L S S L F I G M N I D P T L N A V T S

N

84 0 361

T TGGAGATTCTGTGTTTACATAAATTTTGCTT GCTTCTGTCCAGTTGAATTCGACATTATTCCCCGGTGGTAAAATTTGTTTGGGGAAGAAGTATACTACGCCTATAGGTTGCGAGTAAT P S E T N V Y I X S A E T W N F E V N N G P P L I Q R P F F Y V V G I P Q S Y N

96 0 321

TGATGCGACGGTTCTTCTTAAATGCTTTCATGGTGATGTAGAGAGGCCGTGTCTCATCCCAAATGCCGGTGGTTGTAGAGATTCCGAGCAAGAGTTTGGGTATAAATAAACCCGTAAAGT I R R N K R F A R M T I Y L P R T

N

10 80 281

T TGAAT T TGAATTAAAGATAGTGT TAAAAGAACAAT GGACAGGAGTAACCGTGTTAGGGATGATATTT TTAGTTGC GCAGACAGTTATT GAAT GGGTGTT GCATTT TAGAAGTGTTTCT T S N S N F I T N F S C H V P T V T N P I I N K T A C V T I S H T N C R L L T E Q

120 0 241

GATTCTGCGACAATGTAATGTCTGCGGTTTGGGGGGTCAGTGAAATTTGTATGGCAAGTTGTTCAGGATAAAAATTATCGTCAAGGAGATAGTCGTTTTCGTAGTTTGCCTTCAAAATTA N Q S L T I D A T Q P T L S I Q I A L

L

1320 201

ATAGATTTCCGCAGAGTAAGATTCTAGAAATATAGACGTTTTTGTGTGATATAAATTCGATGTTACCAGTTTTAAAAATTTGGTCGAGCGGAGTGTTTTGGAGATTAACCCATATCTCTG L N G C L L I R S I Y V N K

144 0 161

CCTTAAATGAATATCGTGTTGGCTCTTCTGGTGTTTGTTCTATCCATCGTATATCAGCCACTTGAATTTTAATACGCGTGAGAGGGCCATATATGACAAACTTCGCGTCGCACTGTCCGT K F S Y R T P E E P T Q E I W R I D A

156 0 121

GGGGTATAATGTGGCGATCGTGGGCTTC GATGCGTTGCACCGGTAAATTTGATATATTCACGTTTGCGTGAGGAAGTGCAAATATATTAAGGACGATAGGTAATCTACTGAGATCCAGAT D

168 0 el

CTAGATCTGAGATGTTCTGCACC GCGAAGGGTATGTTGCCGTAATCTTTTGGATCGATGTAAGTAAAAGGCGATGATGTAAAACTGCTTTCGTCTCTACAAATGCACACTACGGCGGATG L D S I N Q V A F P I N G Y D K P D I Y T F P S S T F S S E D R C I C V V A S S

180 0 41

AAGGGGGGGTGATACCTAATCCGCATTTAAACAATCTCAGTTCGCCGGGTTTTAATATGGCGGTAGTTGTTACTCTTAGCTGTAGTACGTACGAAGACCATTGTAGAGTGGCGGGTTGCA P P T I G L G C K F

192 0 1

T GTTTGCTTT TCTGGATGTCTGAAGGGAATCCTGTTAGTGTATTTTGGATGTTAAATAGTTGATGTAAACGTAGAGAGTCATATGCGTATTTTTATAATAGAGTGTTTTTCCACGGACGG * I R I R I I S H K E V S P

2 04 0 420

GAGAGTCTGCGGGTTTCCACGCGCAGGCTTGTCTTCGTATTCTAAGGGATGTAATGTGAGTTCCGGTAGCAAGTATGCAGGAAAACAATTGCCCATGAAGTGTACGTCGAATGGTTTATC L T Q P N G R A P R D E Y E D

2 16 0 380

GGTATTCTCCGTGTGTGTTAAAAGAGATTTCATGTCGCTGTTTGAGAGTGAACGAAACGGCTTTCTAAACAACTTCTTCGTCACAAAAAAGACACGCCCTAACATGTCGTGTTCGGTTAC T N E T H T L

2 28 0

V

340

GGAAACTGTTCTTTCGCACGAAATCCGCAATTTTAACGGTTCTTGGGCAAACCATAAAAATGGATACAGCTTGAAGACATGCGTCTGGACAGGGATGAAGATGCCTACGATTTTTTTGTT S V T R E C S I R L K L P E Q A F N L

2 40 0 300

AGTGAAC T CGCCGC GTAGCGTGAT CTC TTTGTT TGTATTACTTACTAAGGACATAT CGTT GGGAAAGAAAAC CTC TAGAT TCTTCC CGTC TGCGCAGATTTTAAAATAC GGCATATT CAT T F E G R L T I E K N T N S V L S M D N P F F V E L N R G D A C I R F Y P M N M

2 52 0 260

AT TTAGCGTGAC GGTGCTCCGAT CAGGGGGGGAAAAGGTCAAGCGAAGATAACATTC TTT CGTGATT GGAAAC CGTCGCTTGTTT TCCGGCAACACTACGTTACGCAAAAAGACTGTGAG N L T V T S R D P P S F T L R L Y C E K T I P F R R K N E P L V V N R L F V T L

2 64 0 220

AAAAGGATCGTTAAATTCTATTTTGTAGATAGAGAT GCCGGGTTCGGAAATCATGTTGACTCTGCAGAGTGGGATGATACTTTTGAATAGGTTTTCCCAATAGGGACATTCTCTGGTATC F P D N F E I R Y I S I G P E S I M N V R C L P I I S R F L N E I Y P C E A T D

2 760 180

GAT GC TGAACAC GC TAAAAACTAT TAGCAGC TTT CTGCCATAGC GGTCTAATTT TTGAGACTGATAAGGACT TTCTGTC CAGTT TAGGTTAAATAAGATCAAAGACATT TCAACGC TATC I S F V S F V I L L R R G Y R D L K Q S Q Y P S E T I N L N F L I L S M E V S D

2 88 0 140

AT T CTGAAC TACTTTAAATTGATTTAT CACGTTGGTTGTCGAGATAGGCATATTGGTTAGGCCATATTGTTCG TATACGGTCGGATCGATGTTTTCTTTTTGACATAGCGCATTGATGTC N Q V V K F Q N I V N T T S I P M N T L G Y Q E Y V T P D I N E K Q C L A N I D

3 00 0

GTCAATCTCCAGTGCATGTGACGGCAGATAAATGGGTATGGAAAATAAATGAATCCGTACGGGAACCGTCCTAGAAATAGGTAAAATTTTTATTTTGATAGTCGTTGTTTTTTCGAGAAA D I

3120 60

CTGTACATCGAACAGAACGTCCTCAGTTTCCGATTCAGTGCCATCTGTTATGAATAGGATTCCTGGTTTTGTGCTTGAGATTCGACATTTCAAAATGATTTCTTGGTAAGAATTCGAAAC

3 24 0 20

GGGAGATCCGGAATACACAAAAAAAAGGAGGCGCCATAGAGTTGACAACGACGACGATAAGGAGATGGCTCGAGAAAAGAACGATTTGAGAGAATTGGTGGATATGATAGGAATGTTAAG Q

IL

E

S

I

A

L

R

V

H

R

A

Q

S

P

Q

R

I

H

P

V

M

L

E

S

W

D

S

I

F

E

V

P

I

I

H

D

R

H

E

A

I

Q

R

R

L

2L

V

P

L

L

E

P

G

L

L

S

R

D

M

S

N

S

N

P

S

L

I

K

H S

V

N

L

I

L

F

L

P

F

E

Q

V

L

D

A

F

L

H

V

S

P

D

L

E

T

I

Y

E

I

P

S

E

S

T

L

F

G

D

H

T

I

I

R

V

L

F

A

T

E

R

P

P

I

G

R

I

N

A

T

R

Q

I

H

T

P

T

P

L

F

Y

L

V

T

P

R

L

R

R T

A

R

V

L

R

L

Y

R

F

S S

F

L

I

Q

P

A

T

R

V

H

P

I

I

S

N

L

F

V T

C

N

F

Q

S

F

V

I

G

V P

I

R

C

R I

R

L

R

P

F

H

L

I

V

M

I

F

I

L

G

T

T

E

Q

F

D

H I

V

T

R

Y

S

E

K

E

N

H

L

Q

P

P

A

G

D

A

D

E

Q

L

T

I

I

C

S

F

L

N

D

R

Q

G

I

A

L

BI

S

M

R

F

I

V

Y

V

L

R

L

V

G

T

K

V

N

P

G

A

N

L

V

L

F

Y

Q

R

N

I

E

N

I

T

P

N

T

Y

V

R

D

P

G

T

L

Y

L

P

F

L

L

D

L

T

L

L

Q

T

R

N

G

D

I

E

Q

I

D

F

E

A

S

N

K

I

V

T

F

T

T

L

T

Y

G

P

R

T

P

N

L

T

G

E

I

T

P

I

Q

H

E

M

R

T

K

N

L

100

F

S

V

TATGT GATTGCAATCAAAAACGAGTTGCAAGACATGGTTTAGTTGAGGAGAAAACAT GCTGGTACAAAAAACATTTTATGTTTCATCTCCCTTTTTAAATACTGTGTTTTTTTAAGGGAT I H N C D F V L Q L V H N L Q P S F M

3L

GAGGTCATTTTGTATGACGGATGGAGGAC GCGGGTGTCCCGGCGGGGTGTGTTTTTATAATCCTGTGTATGAGTGATAGAAAAGTCATAGTAGGCTAGTGTTTTTTTAAAAAGCATTTAA *

N

3 48 0 296

TTTTTATAAATACATGTAGC CATTCTGTTATCTGCGGAAAC GTCACAGACAACAAAATACGTTTTCTCGTTGGTCAAAGAATTAATTTGGTCGATCATAGTTAAGACTGATTTGACTTTT K Y I C T A M R N D A S V D C V V F Y T R E N T L S N I Q D I M T L V S K V R R

3 60 0

TTTAATTCATCTTTAATTTC CGTAGCGTGAAGTTGTCCGCAGC GCTGTGTGTTTAACGTTGTTCTTTTAGATAATAGTTCTTGGCATTTGATCAAAAGCAT TGAATGGTCTTCCAGCGTT L E D R I E T A H L Q G C R Q T N L T T R K S L L E Q C R I L L M S H D E L T L

3 72 0

AGC CGAT CTATGTAAGT T CTCACGATGTCTGGGGCCAT GGC CGT GATCATGGACAGC GAGATGCAAGCT GTTTTCAT GGAGTACATTGATTGT TTGT CGTT GACTATGT CTGGTAATTTA R D I Y T R V I D P A M A T I M S L S I C A T K M S Y M S Q K D N V I D P L R I

3 84 0 176

ATGAGTACGTCTCTGTATTGGATAGAGCATAACTCATCGATGATTGTTTGCATCTCATTGTATTCT CTGTGAACTGCTATAAGACCAATGCATAGCAATCTGATGTTGATCTCGGC CGCA I

3 96 0 136

AT GGCTGTCGGAACGACTAAAGGTACAGTGAGCTCCCAATCACCTAGT TTCAATAGCTTTTCTTCCGAGTGCGTCGATAAAGGT GGAATCAATGTCAAGGTGTCACCCTTTTCCCAGGGG A T P V V L P V T L E W D G L K L L K E E S H T S L P P I L T L T D G R E W P F

4 0 80 96

AAGGGTCCT GTGTTCTTTATGGCATACTGATGGCCGGTCACCGGCTTTCTCAAAACCAGTTGATTGC CTTCCACCTTTTGCAGAATGGTTACGACCAT GGTCCGTAATACGTTTCGGACG P G T N K I A Y Q H G T V P K R L V L Q N G E V K Q L I T V V M T R L V N R V H

4 20 0 56

T GGACGTAATCTTT GTT GGAGGAGACGATGGGATAAAGAC CTAAATTGCCGCTAC CTATTAGATGGTGGTGAGCTGGGATCGGTATGACGATGTTCATGAGCTTGCATAGGGTGCTGATA V Y D K N S S V I P Y L G L N G S G I L H H H A P I P I V I N M L K C L T S I D

4 32 0 16

TCGGAAAGTGACAGT TTGTGATCGAAAGTGCAGTAGACGGTTTCCATTTTATATGGATGATTCGATAATGAGTTGGAAAGGTATGGTTTCTCCTATGGCGTAATTACAATAATGGGTTTC S L S L K H D F T C Y V T E M * I S S E I I L Q F P I T E G I A Y N C Y H T E

4 44 0 1 1323

L

4L

3 36 0 1

V

D

R

Y

Q

I

S

C

L

E

D

I

I

T

Q

M

E

N

Y

E

R

289

H

V

A

I

L

G

I

C

L

L

R

I

N

I

E

A

A

256 216

GCTGATCTGCGTATTGCCGGTTTTAGACCGACTCTCTAATAGAGCTTGGTTGGATGAACAGTGAAGAGGCAGAGCGTTTTGTAAGAACTGGCATGGTCTGCTTACGAAGCTATCCGTGCC I

S

Q

T

N

G

T

S

K

R

S

E

L

L

A D

N

S

S

H

C

L

P

L

A

N

Q

L

F

D C

P

R

S

V

F

S

D

T

4560 1283

G

CTCAGAGCACGAGTATTGTATTTCGCTATCGCTGTTTAAACACGAGTTCGATTTCATGCAGTATTCGCTTATCACTTTGTACATCATCTTGTTTGTTTTTAAGATGTCAGATTCGGTAAA F

4680 1243

AAATTGAGCGTTGGGACTGTATGTCTTGGGGTTGTAACATAGTTGTTCTCTGTGTGCCGTGTTGTATAGAATATCGCCTAAGGAGCCTGGTAGAGATGCCCAAGGGTTTGTGGTTGCGGC

4800

E F

S

C

Q A

S

Y

N

P

Q S

I

E

S

T

Y

K

D

S

P

N

N

Y

L

C

C

S

N

L Q

K M

S

E

R

H

C

A

Y

T

E

N

S

I

L

Y

I

V

K

D

Y

K

M

N

K

T

L

I

D

S

T

E

A

1203

GAATGTATCGCTGTCTGTTTGGGTGTGGTCGTACAATGCTTTTCTAGCCGCCTCTTCGTTGTGCGGGTCCGTTCCCATCATACATGATTCCCTGCCTCTTGGGTTGTGGGGCGATTTGAA

4920 1163

F

T

D

S

D

T

Q

H

T

D

Y

L

A

K

R

A

A

E

E

N

H

P

D

T

G

G

L

S

M N

C

G

P

S

S

L

E

R

A W

G

R

P

P

N

T

H

N

T

A

S K

P

F

AAAGTTAATGTTTGTCGTAACCGGGGTCAGTATAACTTCGCAGATAGCTTGTTGACCATGCAGTAGTATGGAGGGTGGGTTTTTGTTAATTCCGCCGAACGTTATGATGTTTAGTGCTTC F

N

I

N

T

T

V

P

T

L

I

V

E

C

I

A

H

G

Q

L

L

I

S

P

P

K

N

N

I

G

F

G

T

I

I

N

L

A

5040 1123

E

GCTCTCGGAGGGATTAGGTTTTTCTATCCCGACGTGATGTCGAATCCACGTATTTATGTCCGCGTTGGTAAACGCGTGTATCGGGAAGGCGGAGAAAAGGTTTTGTATCTTGCTCCCCAT S

E

S

P

N

P

K

E

G

I

V

H

H

R

I

U T

N

I

D

A

N

T

F

A

H

P

I

F

A

S

F

N Q

L

I K

S

G

5160 1083

M

ATCTGATTTAATTCGCTTAAGATTGGCAGTGGCGGTGGTAGAGCTAAAACCTAAGCCCATATCTACAAAACTTAGGTGCTGGGTGAAGTTGTAAGTGGTTGCGATATCTTTAGCTTCCGC D

S

I

K

K

R

N

L

A

T

A

T

T

S

S

F

L

G

M

G

D

V

F

S

L

H

T

Q

F

N

T

Y

T

A

I

D

K

A

E

5280 1043

A

GGTGACCGTGGGATCGTCTAAAATTATAGATGTGGCTGCTCTGGAACTGTATAATAGGCACTCGACGTCGAAATTATCCGTCCGGACTAGTGTGGCCGCAAAGCCCGGATGGATTTTGTT T

V

T

P

D

D

L

I

I

S

T

A

A

R

S

S

L

Y

L

C

E

V

D

F

N

D

T

R

L

V

T

A

A

F

P

G

I

H

K

5400 1003

N

TTTGCTTTGGATCGCTATGGCAACGGGAGACAATTTTGAGTGCATGGCGGCAAGCGTCATAATGCTCAACAGGGAATTGGGGCAGAAGGCTGAGTACACGGAAAACGGTGTTCGTTGCCA K

I

S Q

A

I

A

P

V

S

L

K

S

M

H

A

A

T M

L

I

S

L

L

S

N

P

C

F

A

S

Y

V

S

P

F

T

R

5520 963

QW

ATTATAAAATTCTAATGCTAATGGGGGCGGGAGAGGAAAACCGCCGTCGTTTCTTTGACAGTGAGGAAATGTATTTAGGAAAACTTGGATATCAGCATTCATGGTGCCGCAGATTCCCGG N

Y

F

E

L

A

L

P

P

P

L

F

P

G

G

D

R

N

H

C

Q

P

F

T

N

L

F

V

D

D I

A

N

M

T

G

I

C

G

5640 923

P

GTCGGAGAAGAATCGATGAAATGGGATTGGTTGATAGTACCGTCTTAGGGTGGTGATCGGTGAGATCAGGCAAAGCCC GTTGTATAGAkATGTGTTGTAGTGATTGAAAATGAACGGAGTT D

S

F

F

R

H

F

P

P

I

Y

Q

Y

R

R

T

L

T

I

P

S

I

L

C

G

L

N

Y

L

HN

I

L

S

Q F

V

H

S

5760 883

N

GCCGT GT CGCTGCGCGGGATCTAAGGAGACTTCCACTTCCAATATTTTC GTAT TTTCATTAATGTAGATTAGCGATTTAAACAACT GTTTTGCGATACAGGATGTGTTGGCTACGGTGTA G H R D A P D L S V E V E L I K T N E N I Y I L S K F L Q K A I C S T N A V T Y

5880 843

GCCAGGGCCCATACCTTCTCGGATCAGAGACATTAGAACGTCGCTGGTAATCGGATTCTCCTGGAAATAATCATCTGGACTGATGAACGGTTCCGTGTAGAATAAGTCTAGAACTAGTTC

6000 803

G

P

G

M G

E

R

I

L

S

M

L

V

D

S

T

I

P

E D

N

F

Y

D

D

P

S

I

F

P

T

E

Y

L

F

D

L

L

V

E

C TTTACGTTGACGCCAGCGCCACAGGCCTTGTTATT TGATAGTGCCGGGAGTACGCAGAAGTAAAATATCTCGCTCAGGAT GGTGGTTTCGTTCGATGGTCTGTCATTGTCGGTAAAGAC K V N V G A G C A K N N S L A P L V C F Y F I E S L I T T E N S P R D N D T F V

6120 783

GAC GCTT GAATCTATTAGATTCATTCTTTGCACAT CGGATATTTCGTAATTTCTAACTCTTACGGTGTTCTGTGTCAGTGGT GTATCATCCGCTGTTATTTTTGCATTCGTGTCGTTTCT V S S D I L N M R D V D S I E Y N R V R V T N D T L P T D D A T I K A N T D N R

6240 723

GGGCAT GGTATGGACGAACGGGCAGAACAGACGTCCGTCGAACAACGCGTTGGCGAAATTCACCAGAGGTTC GCCGCAAAGTTGCTCGTTGAGGTTGGAGATAGAGATTGTTCTCTTCAC P M T H V F P C F L R G D F L A N A F N V L P E G C L D E N L N S I S I T R K V

6360 683

TAGGCGAATTAGCGACACAAGATTTCTGTAGTGAGCGAAAGCTGCTCCCGGGATCAGTTCGTCGCCCATGTGGT TAGAGATTAGCATGATCATCTCGAkAGCTGTTGCAAAAAAGAAGTAT

6480 643

L

I

R

L

S

V

L

R

N

Y

A

H

F

A

P

G

A

I

E

L

D

G

M

N

H

S

L

I

M

I

E

M

F

N

S

F

C

L

L

I

ATGT TTCATGTTAAAC CAATAAGAAATACAC TGGC TAATTACTTGTT TTAGGATCATAAAAGCATGCTTGT TTC CAT GCACTAAGGTCTCGATAACGTAAGCCAATT CTGGGTACGAGCC H K M N F W Y S I C D S I V D K L I M F A H K N G H V L T E I V Y A L E P Y S G

6600 603

ACTCGTCAAGCTTTCGGTGACAATTTTGAGGGTGAAGTCGTACTTTTGGCAGTTCGTCTTTGCATGTTCCAGTATTTGATTTGTACGGGCTTCGTGGAAAGAGACTGGAGCTAACGGTAA

6720 563

T

S

L

E

S

T

V

I

K

L

T

F

D

Y

K

C

Q

N

K

T

A

E

H

I D

L

N

T

R

A

E

S

F

H

V

P

L

A

P

L

AGGTATGTTGCCTATCATTATTCTTGGTGTGCAGAGTACTTCTGTGGCTCGGTTTTTCTGGATATACGTGAGATCGAAGAAGGGGTGTCTTTCGGTCGGCAGTGTTGGATTGTCCGGTTT I

P

N

I

G

I

M

R

P

T

C

L

V

T

E

A

R

D I

K

N

Y

T

L

F

D

F

P

H

R

T

E

P

L

T

N

P

D

P

6840 523

K

GTAAAAATCTTCAATGTTTAATTCGTTTTTTAAGACGTTGGTGTTTTTGGGTACTTCTTGTCGACTTTCGTATAAGTTGTATAAGGTCTGTAAGATCCTGGCTGGGTTTTCTCGGGTAAT F

Y

D

E

I

N

L

E

K

N

L

V

T

N

N

P

K

E D

V

R

E

S

Y

L

N

Y

L

T D

L

I

A

R

E

N

P

R

T

6960 483

I

TTTTAGTTGACATAGTTGGGAAAAGCGGTGGCCGGTAGGCAATGGTTCGCTTTTCATGAGTCGTTCGACGATGGTCGAGTCGTGGACAATGGGATGGCAAACCGACGGTAATATGTCGGC L D

K

L D

C

S

F

R

G

H

T

L

P

E

P

S

K

L

M

E

R

I

V

T

S

D

H

V

P

I

C

H

V

S

L

P

I

D

7080 443

A

AAAATCAATTCGCTGCACGACTTGGTCTTTATTGTGGAAAAACGCTGGTGGGTAGGTTGTTTTCCATGGTGTCGTTTAATTTCACACGTGTTTCCATTGTACTGAAACCGGTCTCGCT F

I

D

D V

R

V D

K

D

N

H

F

F

S

V

P

T

L

N

N

M T

E

D

N

K

L

R

V

E

T

M

T

S

G

F

T

E

7200 403

S

CGGTATGTAGATACCCAAAGGGAAAAAGAATGTTAGTTGTAAGCTCTGTTCTAGCGGATCGTTGACGTCTGTGTTTTTGTAAACTTTGTGTAGGTGGTCGAATGCGACTAATTTATCTCC P

I

Y

I

G

L

P

F

F

F

T

D L

L

S D

E

L

D

P

N

V

D

T

K Y

N

V K

L

H

D

H

A

F

K

L

V

D

7320 363

G

CAATTTCAACGTGTTCAGCTTTAGGTCGGCGTACGC GCGGCTTTTGTCGAAGATTTCTGATTTGTT GTTAGTGTCTTGTTGTTCTGTCCCGGCGTTCACGGTGAATTGAGTAAAATCGGC L K L T N L K L D A Y A R S K D F I E S K N N T D D D E T G A N V T F D T F D A

7440 323

CATGATTGCACGGTATGCAATCGCAGTCACTGCGTTTTCTTTGCCCATAATAAAGTGACCATATGAGATTGGGGCGGAGACGACTTGTGTGGAGATGTACTGGGAGAGTATGCTGGTGAG

7560 283

M

I

A

R

Y

A

A

I

T

A

V

N

E

K

G

M

I

F

Y

G

H

I

S

A

P

S

V Q

V

T

I

S

Y

L

S

Q

I

T

S

L

T TTTTGGATAGAGCCGATCGGTCCTAATAAAACACCATCTAGGGGGACGTTTTCTTTGGAAGTGTAGTTGTTCGTATCATTTAGAATGCTTTCCGTGACCGATTCCATCATGTCGTTTAG K Q I S G I P G L L V G D L P V N E K S T Y N N T D N L I S E T V S E M M D N L

7680 243

AATTCTATAGATATATGAAATATTATTATTCCTGTTCAAAAAATAATGACGTTAACAATCTGTTCTTCAGGCTCTGAAACATGTTGCTGCGCTGTACTCGGCTCAGAGCTTGTTTATT

7800 203

I

Y

R

I

Y

I

S

N

N

N

R

N

L

F

F

L

L

T

S

L

R

L

K

N

D F

S

M

N

R

S

V

Q

R

L

S

K MN

A

ATGTACTTTGTTCTCATTCATGGTCAGGACGATAAATTGCGGGGGAGATTTTCTCAATAAGGTTTGCATGAATGCGTGAATCAAACCTCGCTCGAGCGAATCGGCTGAATTCTTTAAAGA V

H

K

E

N

N

M

T

L

I

V

F

S K

P

P

Q

R

L

T

L

Q

M

F

A

H

L

I

G

R

L

E

S

D

A

S

N

K

L

7920 163

S

TCTTAAGACTGTATGTAACGCATTGATGTTTAAAATCTGATCGATGACTGTGTTTTTAAATGTGTTTTCCAGATGTTCCAGACAGGCAGCGCTCAAATCGAACGATATGTTTATGGGGTG R

L

V

T

L

H

A

I

N

N

I D

L

D

I

T

V

K

N

F

T

N

E

L

E

H

L

C

A

A

S

L

D

F

S

N

I

P

I

8040 123

H

TTTTT CTGAGTGT TTGGCTAC CATGAT GGT GGTTTCTTTTGAAGCAGT TAC GTCGTTTC CTGTAGCGACTCTAGGTAATTGAATAAAGAAGAGAACC TTC C CGAGGGTCATTCGACT CAC K E S H K A V M I TT E K S A T V D N G T A V R P L Q I F F L V K G L T M R S V

8160 83

ATCATTGAAGCGGATTACGTTCGCT GCAACGGCGATCGGCGTGGTGAAAAAGTTAATCCATTCTATTTTGTTGCAGTAGATTCCGAGTAAGGCTTCAAAACTGATGTTGTAACGGCTCGG D N F R I V N A A V A I P T T F F N I N E I K N C Y I G L L A E F S I N Y R S P

8280 43

ATCGTCACCGAAATAAATTCGCAAATTGTCAAAAAGTTGTTC GGCTGTGTATGTTTTAATGTCATTGAAAATATTTAGAGGTGCTTCGATCTTAGGTAAAATTTCGGTCGCCTGCCAATT

8400 3

D

D

G

F Y

I

5R

R

L

M D

H

T

D

F L

D

N

G

N

E A T

C E

T

Y

K

T

L I

V

N

D

I S C

F I

N F

G

N

S T

L

P A

C A R

E S

K

I I

P

P V

L I F

I

T

A D W

N

S C

D L

T A

E D

TTCCATGTTTCGGCGTTATGCAGCACACCGGTAATT GTGAAACTTTAATAGTTAATTCATGTTTTGGTTCCACTT GTGCACGGTCGATTCCAGTTTTTATAGATTCGTGTGATCTGACCG E M

E

V

S R

D E E

T

R

L A

R

S M

P

V V

L

E

K

I

E

S

I

I

E

K

I

F

D

T

S

G

P

I V

K

75 8640

D R A K I A L C R L L L G P V A V P C F C E E W D T N D Y L S K S G C K C I G P GACAGGGCTAAAATTGCCTTGTGTCGTCTGTTGTTAGGACCAGTTGC(GGTGCCATGCTTCTGTGAAGAGTGGGACACCAACGACTATTTGTCGAAATCTGGCTGTAAATGCATAGGTC

115 8760

N

H

D

35 8520 1

CTGAAGTGTCTCGCGACGAGGAAACTCGTTTAGCGCGATCCATGCCTGTAGTTCTGGAAAAGATCGAGTCAATCATAGAGAAAATTTTTCAAACGTCCGGGCCAAATATCGTTCACGATA I

I

L

Y

I

H

T

S

R

C

R

C

S

D

I

P

V

F

K

F

S

I

M

K

D

Y

Y

A

S

H

V

F

R

G

L

L

S

L

K

E

CGATCTTATATATTCATACCAGTCGATGTCGTTGTAGCGACATTCCGGTITTTTAAGTTTTCTATTATGAAAGATTACTATGCTTCACACGTGTTCAGAGGTTTATTATCTCTGAAAGAGT

FIG. 2-Continued. 290

155 8880

N

T

L

H

P

N

L

V

T

C

C

E

L

S

S

M

D

Y

R

V

T

A

V

Y

Q

K

P

S

N

I

Y

L

Y

Y

E

P

Y

F

C

L

GGAATACACATCTACCGAATGTATTGTGTACGTGTGAGTTGTCGATGAGCGATAGATATGTGGCGACAGTGTATCCTAAGCAGAATTCTATTTATCTAGAATACTATCCGTATTTTTTGT Y

L

C

H

R

L

T

I

V

E

I

Q

E

C

N

T

D

L

I

L

S

G

L

P

K

V

Q

A

R

V

I

I

H

K

F

L

L

F

F

G

GCTACCTGTGTCGCCATCTTACTGTCATTGAGATTGAGCAGTGTACAAATGATTTAATTTCGCTTCTTGGCCCTAAAGTAGCTCAGCGAGTCATAATTCATTTTAAACTGCTTTTTGGTT R

K

H

P

I

H

G

T

V

D

N

S

N

F

E

N

F

N

F

L

L

E

H

K

N

L

T

L

K

V

V

H

N

R

V

D

T

T

F

F

L

N

K

I

T

V

K

V

K

K

N

£

R

M

C

L

N

F

G

Y

V

K

G

T

Y

L

V

V

S

E

Q

S

L

I

F

R

N

L

TGTTGTATCTAAATATTAAGGTTACGGTAAAAAAATAAGCGAGAAATGTGTTTGAATGGCTTTGTTTACGGTAAAACATTGTATGTCGTTGAATCTTCTCAGTTAATCTTCCGGAATT L

L

L

Y

Y

Y

D

S

L

P

D

K

C

E

T

E

N

N

E

V

L

A

T

H

Y

I

R

I

V

S

R

S

L

F

K

R

S

R

S

P

P

G

V

D

P

R

F

I

F

Q

A

V

Q

K

P

R

K

P

L

E

N

P

V

G

I

G

D

F

E

A

I

T

S

V

R

A

M

Y

CGCTCCCGCCAGGCGTGAGACCAGATTTTATCTTTGTGGCACAACAGCCTAAACGTAAAGAGTTACCTAATGTTCCCGGTGGTATCGATTTTGCTGAAATTACCTCAGTGAGGCATGGCG V

L

T

N

A

F

N

T

K

N

V

N

M

K

L

A

I

T

S

K

R

A

N

F

V

H

Y

R

I

P

K

N

T

T

S

H

F

V

CGGTAACTCTTAACGCGTTTAATACGAACAAAGTCATGAATTTAAAAGCAACCATTTCAAAAAGGGCTAACTTTGTATATCATCGCATTCCTAAGACGATGACCCACAGTTTTGTCATGT K

T

H

F

K

E

P

F

A

T

S

V

T

F

S

V

N

D

L

D

D

S

S

M

L

N

I

I

N

R

G

P

D

C

Y

F

L

L

ACAAGCATACGTTTAAAGAACCTGCGTTTACCGTAAGCACGTTTGTTTCAAACGATGATTTAGATATGAGTTCGTTGAATATCAACATACGTGGACCTTACTGCGACTTTTTATATGCTT

G V Y K M H V S I R D L F L P A F V C N S N N S V D L Q G L E N Q D V V R N R K TAGGC GTTTATAAGATGCATGTTTCTATCCGAGATCTAT TTTTAC C GGCGTTCGTTTGCAATAGCAATAATTCAGTGGATTTACAGGGACTGGAAAATCAGGATGTTGTGAGAAATAGAA K

K

V

N

Y

I

N

T

F

P

C

I

M

N

S

A

K

N

N

V

V

N

G

F

K

G

A

T

I

G

P

I

R

S

G

E

D

L

Q

H

Q

I

P

F

L

V

395

435 9720 475

98490

A

Y

355

99600

G

H

315 9360

890 94 A

TGCTTCTGTTGTATTACGATTACAGTTTGCCGGACGAATGCAAGACAAACGAAGAsAAACGTTTTGACGGCTCATTACATACGAGTAATTTCGAGATTGTCGTTTAAGCGGTCTCGGAGTG L

275

92490

N V V Y E K I Q N Y K Q Y A I K T L R M S S K A V P A I Q R L C L A K F K Q Q L T TAATGTAGTCTAC GAGAAAATTCAAAAC TATAAACAATAC G CAATCAAGAC C CTGAGGATGT C GTCTAAGGC GGTTCCT GCAATACAGAGG TTGT GTTTGGCAAAATTTAAGCAGCAAT Y

235

91290

TTCGTCACAAGCCGCATATTGGTACTGTTGATTCGTGGTTCTGGGAAAATTTTTTTATGTTAGAATTGCATAAGCTTTGGTTAACCGTAGTCAAACATAATCGGGTGACGACAGATTTTT

L

195

090 90

515 9960

555

101080 595

AGAAAAAGGTGTATTGGATCACTAACTTTCCGTGCATGATTTCTAATGCTAACAAAGTGAACGTGGGATGGTTTAAAGCAGGAACGGGTATTATTCCTCGGGTGTCTGGGGAGGACCTTC 101200 N

V

L

L

Q

E

L

N

N

V

R

E

I

P

G

L

F

V

D

D

M

L

Q

H

L

L

V

L

L

Q

E

R

N

L

635

AAAATGTTTTGCTTCAGGAATTAAATAACGTTCGAGAGATTCCCGGGTTAGTCTTTGATATGGATTTACATCAACTGCTTGTTTTATTGGAACAGCGAAATCTACATCAGATTCCGTTTC 103120 Q

K

V

L

F

I

L

F

R

L

L

G

L

G

M

Y

H

G

S

R

R

K

N

V

D

H

I

L

M

H

L

I

S

G

N

F

L

F

D

675

N

TCGTTAAACAGTTTCTTATTTTTTTACGTCTCGGTCTGTTAATGGGTTACGGGCACTCTCGGCGCAACAAGGTGCATGATATTATGTTACATTTAATTTCGAATGGTCTGTTTGATTTTA 101440 K

N

S

V

A

K

T

N

K

I

G

H

C

L

A

G

V

T

L

R

A

N

N

P

V

K

I

I

Q

R

A

N

K

K

L

K

D

H

G

M

715

ATAAGAACTCCGTAGCAAATACAAAAATCAAACACGGGTGTGCGTTGGTTGGGACGCGGCTCGCCAACAATGTTCCGAAAATCATTGCTAGGCAGAAGAAAATGAAGCTAGATCACATGG 1560 10

R N A N S L A V L R F I V K S G E Q K N K T V F I K L L E Y L A E T S T A I N T 755 GGC GAAAT GCTAATTC GCT C GCCGTGTT GCGTTTTATCGTTAAAAGT GGGGAACAGAAAAATAAAACTGTTTTCAT TAAATTGTT GGAATATTTAGCGGAAACCTCAACTGC CATAAATA 10680 R

N

V

E

A

R

L

Q

L

T

T

L

K

A

V

T

K

6R

*

N

M

V

L V A

772

E N F

D

D

C

A

C

I

L

S

D

T

E

I

CGCGGAATGAAGTCGCCAGATTACTTrCAGACTCTGACGGCTAAGGTGAAAACATGAATGTACTCGTGGCCGACGAATGGTTCGAl TrTGCGCGATTAGGTTAGATTCGGAAACCATAGCTGT

V

23 10800

H E I F N P E L S K L L N L H S K T V Y M S D L C A F I S G C V N R N V G K L T cCCATGAGATTTTCAATCCGGAGTTA MLGTAAACTGCTTAACTTGCACTCGAAAACAGTCTACATGTCCGACCTATGCGCTTTTAT7TrTCTGGTTGTGTTAATCGGAATGTCGGTAAACTTAC

63 10920

I Y N H V N G D I I Y A L T G I L H C V K I K I E C G E R I A D G R Y R L Y E I CATATATTGGCATGTGAACGGAGAT. rxLTAATCTACGCATTGACGGGTATTTTACATTGTGTAAAAATAAAGATAGAGTGCGGGGAt IGGAGAATTGCCGATGGTCGATATAGATTATACGAAAT

11040

P K L F L M R G Q S T P N E L K N K H A V G I A T T N K P L L T H V L T D V L E TCCTAAATTATTTTTAATGAGAGGA kc, :AGTCAACACCCATGGAATTGAAGTGGAAGCACGCCGTGGGTATCGCGACGACGAATAAGGGCCTTTGCTGACGCACGTTTTAACAGATGTGTTGGA

11160

A

103 143

I

T

S

P

F

T

L P

D

T

L

S

L

V Q

E

L

S

I

F

R

E

L

R

S

Y

Y

Y

I

V

L

S

G

D

D

V

iAACATCTCCTTTTACCTTGCCAGAT. MI kCGCTTCTGTCGGTGCAGGAGTTGTCTATTTTCAGAGAGAGATTGTCGTACATTTACTAIITTGTGCTGGGGTCAGATGTTGATATCGTAGCGCGGAC

T

163 11230

E R E I F Q K C A E L A R L Q Q V F L I Q G N I M E N F V L A Q A C L F Q L G A iAGAGAGAGAGATTTTTCAAAAATGTIrG,;CAGAACTAGCTCGCCTACAGCAAGTGTTCCTTATTCAAGGAAATATTATGGAAAACTTI'TTGTCCTCGCCCAGGCTTGTCTATTTCAGCTGGGGGC

223 11400

D

G

L N

E

E

I

S

G

S

V R

P

P

R

L

E

M

S

S

Q

I

F

A

H

V

R

M

L

N

N

C

Y

C

I

A

V

I

A

R

TGATGGTTTGTGGGAAGAGATATCTr rG;GTTCTGTACGTCCTAGGCCGGAATTGATGTCCAGTGCGTTCATTCAACACAGAGTAAT( .GGTTGAATAATTGTTATTGTATCGCTGTCATCTTCAA

N

263 11520

A I Y K H K L S L P T V E R S H E T V N R V A Q E Y Y K S Y V N A P L S V L V C TGCCATTTACAAACACAAACTTTCC IC:TGCCTACCGTAGAAAGAAGCCACGAAACCGTTAATCGTGTAGCTCAGGAATATTATAAt AGTCTTATGTGAATGCTCCTCTCTCTGTTCTTGTGTG

303 11640

A T

V

K

L

T

L

T

F

E

Y

E

N

F

K

S

A

L

V

F

V

Q

S

Q V D

F

S

V E A

A

R

D

I

V

V

I

F

R

L

F

TGCGACTAAGGTGCTTACTTTATTTACAGAAGAATATAACTTTAAGTCAGCTCTCGTATTTGTCAGTCAGTTTTTCCAGGTGGACGTCGAGGCTTCGAGAGCGGATGTGATTCGTCTGTT L A C

L K

G D

*

TTTAGCGTGTCTAAAGGGTGATTAAATCTCTCGGAAAGAGGCTGAACTGTTTCCAGAGCATACATAAATCGCCATAATTATAGCGATAATTAAGTCGTCGGAACAGGCTTGTTTTTTAGC *

I

R

E

F

S A S S

N

G

S C

V

Y

I A N

I

I

A

I

L

I

D

D

S C

A Q

K

KA

GCTGTATGTTATGTAATTGTTAAC GGAGATCTGGTGAATGTTTCTGATTTGTTCTAGTGCGTATTCCACGGGATCGTAGGTAATTTTTAT TGTAAACGATATCAATTCCTGTGACGCCTT S Y T I Y N N V S I Q H I N R I Q E L A Y E V P D Y T I K I T F S I L E Q S A K

GATGTTTCCTGAATTAAAATTCGAGATAAAAAACTCAACTGCTAGTTTTTTTTCTTTACCGAGTAGGTAAAATGGCTGTGCTATCTGATTTTGGTCTGGGGTGTGGCAAGTCACCTG I N

G

S N

F

N

S I F F

E V

A

L

E

KK

K G

L

L Y

F P

Q A

I

Q N Q D

P T

H F

F

T V Q

TATAGATTTATTAGCCGTGATGTTTTCCTTTATGATACATGCAATTTTTACGGCAGAAGCTTGATTGGAATTCCCTTCTATGATGATTTTTACTTCCGTGAAGAAAGGATGTAGATCTAG I S K

N

T I N E

A

C

I I

K

A

I

V A

K

A Q

S

N

S N

G E

I

I

I

R

V

T

E

F

F

P H

L

D L

343 I

350 : 11880 344 I

L I

N H

A A

C E

A

I A

T

D

S S

T M

L

S E L F

Y

H

E

G

M

Y

V I

Y

Q

D L

Y

T

G

; 12240 224

12360 184

I

TGCTGCTATGCCTGTGCCAGAAGCACGGCGGTTGCCTGTATAGGCAGGGTCTAGGTATACGTACAGATCTTTACCTAAAAAGGGAATTAGATTTTTATTGATGGTGCTGTATCGGAAAAA A A

I

G

T

G

S

A R

R N

G

T

Y A P

L Y V Y

D

L D

K G

L F

P

I

E N

L N

I T S

12000 304

; 12120 264

GATTGACAATATCATATGCGCGGCACATTCAGCTATTGCCGTGTCGGAACTTGTCATTAAACTTTCTAGAAAGTAGTGCTCCATGCCGTAGACGATATATTGATCTAGATAGGTGCCTAT I S

11760

12480 144

Y R F F

CTC GAATTCGGTTTGGCCTTGTTCCGTAATTAAAACGTCGTTGATCACATTGCAAGTGGCTCCGCCCATGATTTCATGGATGAAAGCTCCTTCTAGGAAGAGGTTGGCCGTTTTTTTAAC : 12600 E F E T Q G Q E T I L V D N I V N C T A G G M I E H I F A G E L F L N A T K K V 104

TTCGGCGTTAATGCTGATGAACTTGGGTTTGTGGAGTCGGTAGCAAGAACAGGCTGTGGCGTTACCGCGTTCGTTTAACATATGGGCGTGATCTTCACATACGTAAGAAACTACGGAGAG ; 12720 E

A

N

I S I

F

P

K

K

H

L

R

Y

S C A T

C

A

N

G R E

N

L N

H A H D

E

C

V

S

Y

V V

S

L

64

CATTTCAAACGGAGAGTTGTTCAGCTTCATTAAAAAAGATGTTGAATGGTTTCCGGAATTGGTCGAAGATATAAATAGGATCTTGGTAGATGCTTGGGGCAGGAAACCTAAAATCGTGCT

12840 24

F

M E

P

S N

N

L

K

F

L

M

S T

S

H

N

G

T

S N

S

S

I

F

K

L I

SR

M

T N

S A Q S A L

L

P N

G

F I

G K

L I T D

D

S

F E

IN

14 12960 1

ID

54

GAACGCGTCTTTCTTTATAAAGTGGCTCTCGTCGACAATAAGCAGATTGAAGCTCTGTCCGCGTATACTCTGAGAGGGGAATGAATTCGGCGTTAAACGGGATAAAAGACGATTTTGAGA F A D K K I F H S E D V I L L N F S Q G R I S / ACCEPTOR FROM 12L C E

T

K

D

D L

F

K

I

I

D

K

I S K

N

C

N

F I V E

Q V

E

S

L

P R

R

V

D

S

A A

I

L F

ACTGTGAAACGAAAGACGACCTTTTTAAAATAATTGATAAAATAAGCAAAAATTGCAATTTTATAGTGGAACAGGTCGAGTCTTTGCCTCGGAGGGTGGATTCAGCGGCCATCCTATTTG ; 13080 N

9R

L

A V E

I F

N

D V I

Y R

Q

N

G

N E

V A A K

L P

R

I

K Y

Q

R

D

G N G Q D I D T * R V T G R I L T H R N

N Q

M C

T

T

85 25

ATAATCTCGCGGTGGAGATATTTAACGATGTAATATATCGACAAAATGGAGTTGCCGCGAAAATACGACAGGGTAACGGGCAGGATATTGACACATAAGAATAACCAGATGTGTACAACC : 13200

FIG. 2-Continued. 291

E C S Q M Y N L H N P I T F E L G L G N V F V C M R C L T V H H C D M G T D C T GAATGTT CTCAGATGTATAATTTACACAATCCTATCACGTTTGAGTTGGGACTTGGAAACGTGTTTGTCTGTATGCGGTGTTTGACGGTTCACCACTGTGATATGCAAACTGACTGTACC

I

V

T

N

H

E

G

Y

C

V

K

A

T

L

G

F

S

Y

G W M

P

A

A

Y

D

C

L

F

E

P

I

C

N

P

E

E

I

T

65 13320

V

ATTGTCAACACGCATGAGGGGTATGTCTGTGCAAAAACGGGTTTATTTTATAGCGGTTGGATGCCTGCATATGCAGACTGCTTCTTAGAACCGATCTGTGAGCCGAATATTGAAACGGTT

105 13440

N V V V V L L S Y V Y S F L M E N K E R Y A A I I D S I I K D G K F I K N V E D AATGTCGTGGTGGTCTTGTTATCTTAT GTATATAGTTTTTTGATGGAAAACAAGGAACGATATGCTGCCATTATTGATAGCATTATTAAAGATGGGAAATTTATAAAAAACGTGGAAGAC

145 13560

A V F Y T F N A V F T N S T F N K I P L T T I S R L F V Q L I I G G H A K G T I GCTGTGTTTTATACTTTTAACGCGGTTTTTACTAACTCAACTTTCAATAAGATTCCTCTGACGACGATAAGTCGTCTTTTTGTTCAGTTGATTATAGGAGGACACGCTAAAGGAACGATT

185 13680

Y

D

S

N

V

R

I

V

S

R

R

R

K

E

D

S

L

L

K

K

M

R

Y G N A L I L * M E T H L Y Y D

E

L

lOR

T

Y Q Y Q

L

G

216 16 13800

G

TATGACAGTAATGTAATTCGCGTCAGTCGTCGGAAACGAGAAGACAGTTTACTAAAAAAGATGAGATTGGAGTATGGAAACGCACTTATACTATGACACCCTGTATCAATATCAAGGCGG V

Y

A

P

I

H

C

L

P

T

D

V

L

C

P

M

R

D

V

C

I

S

E

L

Y

F

R

C

F

V

F

K

S

G

M

Y

H

T

EK

56 13920

AGTGTATCCGGCTCATATTTGCCTGCCGACAGATGTGTGTCTTCCGATGAGAGTGGATTGTATCGAGTCTTTATATTTTCGGTGTGTATTTTTTAAGAGTGGGATGCATTATACTGAATG S

K

K

L

F

T

V

I

S

R

I

E

K

K

F

D

V

L K D

A

D

S

D

E

V

F

T

L

G

V

T

M

V

I

P IP

I

V

GAGTAAATTAAAGTTTACTGTGATTTCACGTGAAATAAAGTTTAAAGATGTGTTAAAGGATGCGGACTCTGACGAAGTTTTTACCGGTTTGGTGGTAATGACTATCCCAATTCCGATAGT

I

96 14040

D F H F D I D S V I L K L V Y P R L V H R E I V L R L Y D L I C V R P P S N R P AGATTTTCATTTTGATATCGAT TCTGTAAT TTTGAAAT TGG TTTAT CCGCGGTTAGTGCACCGGGAAATAGTGCTGAGAC TCTAC GATC TTATAT GCGTCAGACCT CCGTCAAACCGGCC

136 14160

S E A S A K N I A N D F Y Q L T S R E N K Q T P D E E K R C L F F Q Q G P L E P GTCGGAAGCATCAGCTAAAAATATTGCTAATGATT TCTATCAACTAACCTCACGTGAAAATAAACAGACACCCGATGAGGAAAAACGTTGTCTATTTTTTCAGCAGGGACCTTTAGAGCC

176 14280

P

S

T

V

R

L

G

K

A

P

N

G

E

K

P

I

Q F

A

P

A

H

E K

N

M

T

E

S

F

L

S

D

S

W

G Q K

F

V

ACCCTCTACCGTCAGAGGCTTAAAGGCGCCCGGTAATGAAAAGCCAATACAATTTCCCGCCCATGCTAACGAAAAAATGACCGAATCTTTTTTAAGCGATAGTTGGTTCGGACAAAAAGT

216 14400

R C K K I L D F T Q T Y Q V V V C W Y E L S F S R E M Q I E N N L L S A S Q L K CAGAT GCAASAAAAATTTTGGATTTTAC GCAAAC GTATCAAGTCGTGGTATGTTGGTACGAGCTTTCGTTTTCCC GCGAGATGCAGATCGAGAATAATTTACTGTCC GCTTCCCAGCTAAA

256 14 520

R

V

A

N

D

A

F

D

W

T

R

N

R

Y

L

R

D

I

G

R

S

L

V

T

I

H

K

V

T

L Q

I

H

N

R

F

Q

K

Q

K

GCGGGTTAACGCTGCGGATTTTTGGGATAGAACTAATCGGTATTTGCGAGATATTGGAAGCAGGGTATTGACACACATCGTGAAAACGCTTCAGATTCATAATAGGCAATTTAAACAGAA

296 14640

F N C N F P D N F S F D R L L S F M Q L G K D F W I L N L T L D S C I IK A I I ATTTAATT GCAATTTTCCAGATAATTTCAGCTTTGATCGTCTATTATCATTTATGCAGCTCGGGAAAGATTTTTGGATTTTAAACTTAACTTTAGACAGCTGCATTATTAAGGCAATTAT

336 14760

C F L G F Q N G G K S F L A G D E V W G D L I D C S K G S V I Y G E K I QW I L CTGTTTCCTAGGTTTTCAAAACGGGGGAAAATCTTTTTTAGCCCAAGATGAAGTTTGGGGGGATTTAATAGACTGTTCTAAAGGATCGGTGATCTACGGGGAAAAGATCCAATGGATTTT

376 14880

D

T

S

N

N

L

Y

T

S

C

R

E

K

K S

N

Q

W E

L

Y

D

V

C

C

L

A

V

Y

E

S

K

E

L

D

L

F

V

L

P

406 15000

GGACTCGACTAACAATTTATATTCGACGTGTCGGGAAAAACAGAATAAGTCGTGGGAATTATATGTTGATTGCTGTGCTTTGTATGTATCTGAAAAGTTAGAGTTGGATTTTGTGCTACC 11R

G G F A I T G K F A L T D G D I D F F N W R F G L S * 432 M A I S T F S I G D L G Y L R N F L Q N E C N W F R I C 28 C GGCGGATTTGCAAT CACCGGTAAATTCGCGCTTACTGATGGCGATATCGACTTTTTCAATTGGCGATTTGGGTTATCTTAGAAATTTTCTGCAGAATGAATGTAACTGGTTCAGAATTT 15120

K

K

F

T

Y

R

R

Y

E

S

A

V

T

S

S

T

P

F

N

L

S

K

N

P

K

F

K

C

M

C

H

I

E

V

I

K

F

R

S

E

GTAAAAAACATTCTATCGCGAATATCGCAGCGTTGCGACATCGTCTCCTACATTCTCGTTGAATAATAAGCCTAAGAAATTTTGCATGCATTGCGAGATTGTAATATTCAAGCGAAGTG F

E

F

M

S

L

A

V

N

G

I

F

H

G

F

Q

T

L

G

K

K

M

F

K

N

K

A

V

P

G

E

L

Y

Y

I

Y

L

L

E

G

I

T

P

I

D

L

G

F

I

P

R

Y

S

N

D

C

T

V

N

C

R

M

T

V

P

E V

Y

I

N

E

C

S

I

V

C

P

E

108

151360

AAGAATTTATGTTCAGCCTTGCGGTAAATGGCATACATTTTGGGCAGTTTTTAACCGGAAAAATGAAATTTAATAAGAAAGCAGTTCCGGAAGGGCTCTATTACTATATATTGGAATTGG S

68

151240

E

148

GAAGCATAACCCCTATCGATTTGGGCTTTATTCCGAGATATAATTCCGACTGTGTTACAAACATGCGTTGTGTTACACCGGAGGTTATTTATGAAAATTGCTCTATTGTGTGTCCCGAAG

151480

A N R L T V K G S G D N K L T P L G G C G A W C L K N G G D L Y I Y T F A L A Y AGGCAAATCGCCTCACGGTAAAAGGGTCC GGGGACAATAAATTGACTCCCTTAGGTGGGTGTGGAGCATGGTGTCTGAAAAATGGTGGCGATCTGTATATCTATACTTTTGCACTCGCTT

151600

D L F L T C Y D K S T F P S L A K I I F D N I A C E S E D C V F C K D H N K H V AC GATCTT T TCCTAACTTGTTATGACAAAT CCACCTTTCCATCTCTGGCAAAAATTATTTTTGATATGATAGCTTGCGAATCCGAAGATTGTGTCTTTTGTAAAGATCACAACAAACATG

151720

S

G

A

Q

I

Q

G

V

C

V

N

S

Q

T

E

F

C

C

Y

T

K

C

S

K

K

M

A

I

N

N

N

P

L

E

I

S

L

L

C

D

TATCGCAAGCTGGACAGATTGTAGGGTGCGTCTCTAATCAAGAAACCTGTTTTTGCTACACATCGTGTAAGAAAAAAATGGCTAATATTAACAATCCGGAGTTAATCTCTCTGCTCTGTG

188

228 268

115840

Q E I N K I D I M Y P K I K A S L S L D I N S Y A H G Y F G D D P Y A L K C V N 308 AT CAGGAAATTAATAAGATAGATATTAT GTATCCCAAAATAAAAGCATC GTTATCACTGGACATTAATT C TTACGCT CAT GGGTACTTC GGT GACGACCCTTAT GC GTTAAAATGT GTTA 115960

I

W

12L

V

R

I

S

A

A

L

R

S

L

I

L

V

S

P

C

V

K

C

R

V

N

V

D

*

335 16 08 0

AAATAAAGTTAAATTGATAGTACT TACGTGTGTATTGTAGCAGCTGGCGAAAAGTGCTGTGCTCTTTATATTTTGATGGTCGATTGTAATTACATTATCCAGGCATGTGATTGTCTTTTC V H T N Y C S A F L A T S K I N Q H D I T I V N D L C T I T K E SPLICE DONOR 7L /

1620 0 261

T GGAAACATTCGGCGGCATTTAAACTCGACTTCTTTCATCACAAAGTGAGATACGTGTTTTTGATGTGCCACGTAACCGATGCTGATTCCTTCAATGTTCTTCAATAAAAAACTTATAAT P F M R R C K F E V E K M V F H S V H K Q H A V Y G I S I G E I N K L L F S I I

163120

AGGAACTATGAACCACGTTTTTCCATGTCTTCTTGGGACTAGGAATACGTTAGTTTTTTGTTTCAGCGTATTCAGAGTACTTTCGTTTACGAATTCAATGTCGAACACGTGGGTCAGATA

164140

P

V

I

F

W

K

T

G

H

R

R

P

V

L

F

V

T

N

K

Q K

L

T

N

L

T

S

E

N

V

F

E

I

D

F

V

H

T

L

Y

221 181

GTTGATAACAC GATTGGCTAAAGCTGGAAGTTTGGTGACGGCGATGAAGAAGATTACATGTATGAGAATGTTCTTCTGAAATGGTTCCAGTTGTATTTTGCGCGTGTCTCCAAAGTCGTT N I V R N A L A P L K T V A I F F I V H I L I N K Q F P E L Q I K R T D G F D N

161560

GAATTCTCCTTTGATCCATTTTCTGAAATCTTTGATAAAACCGTCTATTTGCAAGAACAGCGGTTCACGATATAGAATCTTAAAGGATTCAAGAAATTCCGTGTATTGTGATTCAAAGGA

161680

F

E

G

K

I

W

K

R

F

D

K

I

F

G

D

I

L

Q

F

P

L

E

R

Y

L

I

K

F

S

L

F

T

0

Y

141

S

101

TTT GTCGGAAGCGGGTAAAAACTTCATTTCTTGTAGCTTGTGCGAAAGGCTCGGTAGTATAGTAAGAGGCTTACGGTTTTTAAGATGTCGCTGTTTTTCACAAAATGTATATAGAGGTTT K D S A P L F K M E 0 L K H S L S P L I T L P K R N K L H R Q K E C F T Y L P K

161800

TACCTGTTGGTTATATGAATGGGCAAAGCCGAGTTCTGGCGTGAGCATTATAAAACGCTTCTTATAGAAGATGGCACTGTTGGGGTACTTGGTAGATATTGTCGAGCAGTCTCTTTCGCC

161920

V

13R

P

ATTGGATACCAGTCAGGATCAGCGCAGCTCTGAGCAGGCTGATTGTTTTGTCGTGTCCTGTGTGTAAGCGTGTGGTAATGGACTAATTGTGTGTTTTATGTATTAATTTTTTATTTCTGA

Q

Q N

Y

S

H

A

F

G

L

E

P

T

L

M

I

F

R

K

K

Y

F

I

A

S

N

P

Y

E

K

T

S

e

I

T

C

S

S

D

E F

R

E G

M S Q V R S M E P D L T L A A V Y 0 A A A N L T E 0 D T TTCCATATGATCGCTTCATAGTTATTTTTTAT GTGGGTTATGTCGCAGGTACGTAGCATGGAGCCCGACCTTACGTTGGCGGCGGTCTATCAGGCGGCGGCGAACCTCACAGAGCAAGA W

K

I

I A

E

Y

N

N

K I

H T I

D C

T

R

L M

K

E I F A E A V K T A F S V C S S A A P S A R L R M I E T P T 0 N F M F V T S V TAAGGAGATTTT TGCCGAAGCGGTAAAAACTGC GTT TTCAGTGTGTAGTTCGGCAGCCCCGAGCGCTAGGTTGAGGATGATCGAAACGCCTACACAGAATTTTATGTTTGTGACGAGCGT

61 21 27 17 04 0 1 67

I P S G V T S G E K K T K L N I D A A L D N L A L S F A N K K S K K M A R T Y L TATTCCTTC GGGTGTGACGTCTGGTGAAAAAAACAAAGTTAAATATCGATGCCGCTCTGGATAATT TGGCTTTGTCGTTTGCGAACAAAAAATCAAAGAAGATGGCTAGAACGTATTT

17 28 0

L Q N V L R T Q D Q Q V A I S G K Y I L Y T K K H I E T S L M I D K T K L V K K GCT GCAGAAC GTTTTGC GGACTCAAGATCAACAAGTT GCCATTTCGGGGAAGTACATTTTGTATACAAAAAAACACATTGAAACGTCTTTGATGATCGATAAGACGAAGTTAGTTAAAAA

17 40 0

I

L

E

Y

A

E

T

P

N

L

L

G

Y

T

D

V

R

D

L

E

C

L

L

N

L

V

F

C

G

P

K

S

F

C

Q

S

D

S

C

F

AATTCTCGAGTATGCCGAGACCCCTAATCTGTTAGGATATACCGATGTGCGTGATCTTGAATGTTTACTTTGGTTAGTGTTTTGTGGTCCTAAAAGTTTTTGCCAGTCAGACAGTTGTTT

FIG. 2-Continued. 292

107 147 187 17 52 0

G Y S K T G Y N A A F P N L L P P Y L Y E C G Q N N G L F F G I V Q A Y V F S CGGAACATAAGCGGATATATGCGCGTTCAAATTATGCCTCGTTCTGACGATGCGCCAAATATGACTGTTTTGGCTTGGCAACTTCGTGTTT

Y

S

D

F

D

F

S

A

L

E

I

S

E

R

A

R

R

R

I

R

L

S

L

D

Y

K

L

Q

K

F

K

A

Q

V

E

S

L

V

04

227

1764064

S

267

V

1776076

GTACCAGTTTTATTTTCGCGCTGAGTTTAGAAGCGTCGTGTCAATCGGTACTCTGTCGACTAAACAGAGTTGCGAGCAGAATTTTGTTTAT A

Q

S

04

C

I

F

C

A

L

Y

Q

K

N

K

L

S

L

Y

E

S

V

G

D

L

K

T

S

F

V

S

P

I

I

I

K

D

C

307

C

L

1788088

AGCGCACGATGGTACTTTGTGATTAATAACAAACAACTTGTCAGAAACGTTCTGTGCTTAAGATTCTTTTTAGTCAATATATAAGGATGTT A Q T T I S T T Q N L P G T K S S A I F P V Y D L R K L L G A L V I S K G S V CGCGAGAGACATTTTACACTCGATGTGCAGGTCAAGAGCCAGCATATTCGGTAATGCCTTGTAGCTGTCGTGCGTTGCATTCGGAGGTGTG F

14R

I

D

1800000 350 36

*

N4 5 L K D Y L R Q S I S K D L K V R N R D S L K I R L G K R H P L S V GTTCACAATAAGTCTTGAGGATATCGAGCAGTCATTCTAAGATTGGGGTAGACTCGGATTTTTAAGATAGTTAGGGAAGACTCCTTGATGT

N

H M4 I A A R Q I I K S C N A K Q Q N V I S S L 5 G F L D K Q K S F L R V Q Q CAGCTATATCGCGCAGGCGATATCAATCGACATGCGAACACACATGAATTCTTTTTAGTGTTTTTGGTAACAGAGAGTTTTAAGGTGCACA

K

1812012

Q

A L

347

K

Q

K

L E

K

L

K

D VD

I ID

TA

AKEVK A VS

N

D I KE

T

L IT

LKE

K

S T

76

1824024 N

15K D N G V E T P Q G Q K T Q P I N L P P V R K K L R K H K G L G K G V K K K L F TGGAAACGTGTGAGCACCCAGGTCAAAACTCACCGTAAATTACACCGTCGGAAAAGTAAGAAAATGAGGATCGGAAAGTGTAAAGAAACTT K

D

S

S

P

L

K

Q

K

I

S

C

A

S

M4 K

D

L

T

S

P

S

V

K

K

S

C

E

R

S

S

A

S

D

L

E

S

A

41

1848048

G

F

114 1

1836036

GCTTAAACAGCAGAAAATGGAGTCGCGAATAATGAACGGAGCGAAGGAAGCAGCAGAATGTATAAAGAACCTTAAACAGCATGATTAGATA

K

81

1860060

CCGAGATGCTCCCCTAAAAAAAGATTCCCCTGAGTATATGAACACTTCTCGCCGTAAGTCGAAGCGATCGGAAGGCTCTCTGATAAAGTTC

C K H K I A C D C S A I K K L L C N K S L L D 5 P M4 K L S N A H T I F S S N K 04 121 AATGAAAACGAATTCTTGGATGTTCGCGTAGAGAATGCTTGTACGATCGTTCTGACCGCCATGAACTTCGATGCCACCCATTTCGCTCAAC 1872072 K L K L K K I I A S K 0 I F L D 04 5 K N A K L A A Y G K T L C N L R I F K K I GGAACTGAGCTGAGAAATATACTTCAAGAGATTTTTAGAATGGTGAAATCTGACTTCGGCTACGCGAACTTGTGAACTGAGATTTCGAAAG S

P

F

L

F

D

Q

V

S

E

K

E

S

Y

S

V

V

Y

V

P

H

N

K

K

L

C

C

Q

F

C

C

K

P

K

N4

T

K

A

5 L

V

G

V

A

Y

G

K

V

F

D

L

D

K

V

A

I

K

A

T

K

N

K

D

S

I

V

A

S

F

I

A

G

I

V

R

K

A

S

G

A

L

L

S

K

H

C

V

I

N

L

N

I

L

S

N

S

04

C

V

5

K

H

S

V

L

K

S

T

Y

D

I

L

D

K

H

F

C

E

04

V

N

R

04

V

N

Y

S

Y

V

C

F

K

A

L

D

A

R

V

L

F

N

L

K

C

R

I

N

I

D

F

H

M4

P

S

N

F

I

L

H

K

K

K

I

I

C

F

A

V

L

A

D

Y

S

K

S

L

04

P

H

Y

N

N

G

T

C

A

A

I

K

K

Y

D

K

N

Q

L

L

P

I

S

R

N

K

F

04

D

C

F

P

N

F

G

L

P

R

V

A

N

04

A

I

V

L

N

V

C

G

A

F

D

K

G

N

P

N

L

H

C

L

N

C

L

C

A

C

A

F

V

L

V

S

C

K

L

V

04

T

K

K

D

G

K

C

K

C

A

L

Y

K

Y

K

K

L

F

A

A

K

N

A

K

C

L

N

L

P

K

P

Y

F

A

R

Y

D

A

C

C

K

L

V

K

A

V

H

L

V

L

G

L

F

L

Y

R

D

V

V K I Y K K L Y D F L D K R G K F G 5 K D L F K A T F L N N S K L T R R Q P I TGGTGAGTATAGAAAACTTACATTTCTAATGAAGAGGGATTTGGTCCGAACCTTTTAGGCACTTTTTAATATAGAAATTACAGAGTCACCA

G

L

A

C

L

S

S

K

S

Y

K

G

L

K

L

C

H

K

L

K

F

L

L

N

I

T

S

A

D

L

D

C

K

T

S

S

L

481

1980080

CCTTGCTACGAGCCGTCGTTGATCCCTCAATACCATTGCTACGGGAGCTGCTGAAATATTGCTAGCAGTATATTCTAGGTTTTGTTTACGA

K

441

1968068

TTAGCACGTAATTGATCTTGTCTTTGCCAGGTGTATATCTGTTGCTAGATGACGATAACGGGAGCCGGAACTCATTAACTAGAGAAAGTTG L

401

1956056

TTGTCCATTAGCGTACAATTCGTGAATGTTAACCTGATTCGACACTGTCCCAAGCATGATTTGTCAAGTGGCGGGCTTTGAGGTAAAAAAT K

361

1944044

TAAACATAAAAGAGTCATTTTATGCGTGTGGCGATACAGTTGCCGAATGATCCAATATAAGGCCGTGGCTTTGCAAAAGTAGACAAAACTT V

321

1932032

GGGAGTCGGAAGTATGAATACACAGGTGTTTGAAGTAGCGATCTGTAGGTTCTAATTGAATGTGAATAATATTTGATTCTCCCTTGAAATA N

281

1920020

CCGATTATATCCACAGTGGTTTTAAAACTATTATTCAAATCCTTTGATGGTCAAAATGTCTTGCACGACTATGAATTATCTCATAGTTGAA C

241

1908008

TCGGGTGGTGCTACGGAAGTGTTGACTAATAAGTGCCATAAGCGGCAATAAGAGAGGTGTATTCGGCTTCTAGCGGTTCATCGTCAAATCA C

201

1896096

GCTCCCGTTTTTTTATGTCAAGCGAGAGGTTCTATCAGTGTCACGTCCTACAAAAAAACTTGTGACATTTGTCACCTAGAAACTTGGCCGA G

161

1884084

R

521

1992092

F

H

561

04

16K

*562 04 C L C Q I S K T L S S V A K K K P L T 04 F L L D K L Y A I R K K I K 35 2016016 ATATTGAATAGAATGATCTGACAAAATCGAAAACTAGTTTGTGCCGAGAGAGCTTTACCAGTTTTACTGAAAACGTAGCAAACGGAAAGAT V P F S I V R L C H V Y C 04 L I K Y N A S N N N C I L G K K L I K K 04 Q Q F CAAGTCCTTTTAATGTTCCTTTGTCTGTTACTCATCTAAAAATATACGCTCTACAAAATTCATCTGGCCGAAACTATGAGGAATCAGCGTT

C

C

G

T

K

V

C

G

S

K

C

I

5

04

C

L

S

K

L

C

K

L

Y

C

C

Y

P

L

L

C

S

A

L

C

R

A

P

C

V

S

75

L

2028028 115

V

2040040

TGCGCACAGAGGGAGGATGGAGACATTCATGGTCTAGTGATTTGCAGCTTATGTTATGTCATTTTGTTTCGCTCGTGCGTGGCCTGTGATC N

K

L

F

K

I

V

K

R

K

T

K

G

C

S

K

N

P

L

04

H

A

L

R

K

Y

T

V

T

A

T

K

L

Y

D

I

Y

T

T

155

K

AACAGTTATTAAATCTAGGCGTAAATCGGGGCGTCGAAATCCCTCTGCACCGTTCGGAATAACCTGACGCGCTAATTGATGAATCATACACA

2052052

K VK K K I RNH L V AT F Y VK R F LKE Y K G CQ0 F F G K A V I Y G AK H4 RV TGTTTTTGAGTTAAGGACGCATTTTTGGGAAGGGTATTTTGGGCGAACAGAGCTGTATTAACACTCGAGCACCTTTAGTTAAAGGAAGTAA

195 2064064

C

L

T

G

L

L

L

D

P

5

5

G

V

F

G

L

S

A

C

A

C

F

G

I

F

S

N

D

K

C

F

L

04

V

K

K

K

A

L

I

F

235 2076076

ACGTGGGTTGCGCTGATCTTCTCTGAGTTTTGCGCTCTCAGAGCGTTTTGGAATTCTTCATGAGATGATTCTGAGGTAAGGAAAGCGCGAT K

I

K

F

K

Y

K

Y

L

K

C

K

K

C

H

F

V

S

K

L

L

K

N

P

T

K

K

S

F

S

C

F

I

L

S

N

P

V

P

V

275

I

K

F

K

K

K

C

K

I

P

5

5

K

K

Y

L

04

T

Y

C

F

Q

Y

K

P

C

K

K

L

K

T

C

P

T

P

A

I

L

A

P

315

I

K

C

L

L

C

L

N

K

T

Q

K

S

T

V

I

V

F

C

C

K

S

C

L

C

K

C

K

L

S

V

F

C

K

A

V

F

T

V

N

355

2112012

CATACAACAACGCTTGTTGAAGAAAACAAAATTACGTAACGTTTTGTTGAAGACGATTGTTGACAGAGCTTCTGGTTCAGAGGCGTGTTAC N

V

F

N

V

P

N

K

K

Y

F

C

F

S

C

L

L

C

V

Y

04

T

F

C

N

I

Y

C

N

N

N

P

Y

K

I

T

S

K

395 2124024

V

K

AATGATTGTAATCCAAACCAGTATTTTTCAGATCTTTACACATATGAATACTCGTTTATATAAGATCTAAAATCAGATATACGAAGTAGGA 5

P

I

H

V

V

A

T

K

F

F

K

K

K

T

K

K

K

S

V

L

N

L

I

C

K

T

K

I

Y

K

K

I

K

L

P

A

L

I

435

V

2136036

CCTTTGTCACATGTACGGTTTTTCAGAGAGAAAGAGAAGAAGAGTTGCATTGGGATGATGGACGAATTATGAAGAGAATACTTTGCCTGAT T

P

V

A

P

N

K

P

F

T

C

C

I

V

T

I

C

C

N

L

04

N

K

N

I

C

K

T

C

L

S

V

C

17K

A

04 04

C S A V N A K C C K P

ACTCGGTGCACGAACCGAATTACTGTTTGTTTAAAGAATATCAACTGGGGAAATATATTGCAGCGACAGTTACAGTAGGGCCAAGTGTGT C Y L AA C VKR K P K T P * V S C C M4 C K K T K N T L I

C

Y

K

C

N

P

I

L

L

A

N

K

F

T

V

L

T

C

T

K

S

K

K

K

CAGTTCTGCGGATGGTAAAAACCGAAACCCTTATTATTAAAGGAAACCTTATTCTACGAAGAATTACGTTTAACGATCCGATCGAAGAGAA A

C

L

K

K

P

L

L

K

K

V

V

A

K

C

C

T

K

A

K

K

K

L

P

C

K

S

K

K

*

C

04

475 8 2148048 488 48 2160060 77

21858

AACTAGGTAGCGTACAAG

FIG. 2-Continued. 293

294

J. VIROL.

LAWRENCE ET AL.

TABLE 1. Summary of data: ORFs, putative translation start sites, TATA consensus sequences, and lengths and relative molecular masses of predicted translation products Name Name

ORF start

ORF end

Length (amino acids)

size (kilodaltons)

ATG position

ATG context sequence

TATA position

TATTTAA CATAAAA TATTTAA

458 432

51.5 50.2

296

TATTAAA

1,345

33.5 151.9

772 350

88.7 39.9

666

76.2

85 216 432

9.6 24.9 51.4

335 353

37.9 39.4

114 561 488 77

13.2 63.7 56.6 8.3

TATA sequence

(119)a

OR 1L 2L

(2)a 1954 3315

358 544 2002

1921 3298

AACATGC AGCATGT

3L 4L

4412 8415

3480 4372

4367 8406

AAAATGG AACATGG

SR 6R 7L

8376 10724 12984

10733 11782 11785

8418 10733

GTTATGC AACATGG

1978 3330 3341 ?b 8450 8501 8352 10673

16989

TCCATGCe

17146

CATAAAA

12856 13099 13524 13542 ? 16849 16849 16944 17861 18318 20016 21398

TATAAAG TAT-TTAA TATTAAAC

TATAAAAC TATTTAGAC CATAAATA

12909d 12L

17007

16066

8R 9R 1OR

12891 13114 13745

13175 13773 15079

12921 13126 13754

GGAATGA AAAATGG AGTATGG

11R 13R

14948 16943

16043 18009

14R 15R 16R 17R

17962 18348 19996 21432

18354 20045 21519 21689

15039 16952 16961 16979 18013 18360 20056 21459

CTGATGG TTTATGTe GTTATGT AGCATGG ATAATGT ATTATGG GTAATGG GGTATGG

16108f

TATAAAAC ? TATAAAAC TATAAAAC TATTTTT TATAATAC TATAATAC GATAAAGC TATAACA

Incomplete ORF. ?, No obvious TATA consensus 5' to the first ATG of the ORF. c An intervening ATG codon lies between the proposed TATA consensus sequence and the first ATG of the ORF. d Splice acceptor site of ORF 7L. e ATG context sequence does not conform to the Kozak (36) consensus sequence (RNNATG or NNNATGG). f Splice donor site of ORF 12L. a

b

method. The overall G+C content was found to be 41%, considerably lower than the values for HSV-1 (68%) (40), EBV (60%) (1), and HCMV (58%) (60; M. S. Chee, A. T. Bankier, S. Beck, R. Bohni, C. M. Brown, R. Cerny, T. Horsnell, C. A. Hutchison III, T. Kouzarides, J. A. Martignetti, E. Preddie, S. C. Satchwell, P. Tomlinson, K. M. Weston, and B. G. Barrell, Curr. Top. Microbiol. Immunol., in press) and in the same range as those for VZV (46%) (15) and HVS L DNA (36%) (25). Observed frequencies of CpG dinucleotides in this portion of the HHV-6 DNA sequence did not differ significantly from those expected on the basis of random associations between mononucleotides (not shown). In contrast, eucaryotic DNA (6) and the genomes of the gammaherpesviruses EBV and HVS do show an overall CpG dinucleotide deficiency (27, 29). The region sequenced contains 1 partial and 17 complete ORFs, numbered OR (R, rightward; L, leftward) to 17R in TABLE 2. Summary of optimized FASTP scores observed in comparisons between HHV-6 ORF 11R and the homologous genes from the other human herpesviruses Virus

Score

ORF 11R

HHV-6 HCMV EBV HSV-1 VZV

11R 1821 HCMVUL94 530 BGLF2 221 UL16 108 44