A sequence of 21,858 base pairs from the genome of human herpesvirus 6 (HHV-6) ... Epstein-Barr virus [EBV], human cytomegalovirus [HCMV], herpes simplex ...
Vol. 64, No. 1
JOURNAL OF VIROLOGY, Jan. 1990, p. 287-299
0022-538X/90/010287-13$02.00/0 Copyright C) 1990, American Society for Microbiology
Human Herpesvirus 6 Is Closely Related to Human Cytomegalovirus G. L. LAWRENCE,'* M. CHEE,1 M. A. CRAXTON,2 U. A. GOMPELS,2 R. W. HONESS,2 AND B. G. BARRELL1 Medical Research Council Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH,1 and Division of Virology, National Institute for Medical Research, London NW7 JAA,2 United Kingdom Received 29 June 1989/Accepted 25 September 1989
A sequence of 21,858 base pairs from the genome of human herpesvirus 6 (HHV-6) strain U1102 is presented. The sequence has a mean composition of 41% G+C, and the observed frequency of CpG dinucleotides is close to that predicted from this mononucleotide composition. The sequence contains 17 complete open reading frames (ORFs) and part of another at the 5' end of the sequence. The predicted protein products of two of these ORFs have no recognizable homologs in the genomes of other sequenced human herpesviruses (i.e., Epstein-Barr virus [EBV], human cytomegalovirus [HCMV], herpes simplex virus [HSV], and varicelia-zoster virus [VZV]). However, the products of nine other ORFs are clearly homologous to a set of genes that is conserved in all other sequenced herpesviruses, including homologs of the alkaline exonuclease, the phosphotransferase, the spliced ORF, and the major capsid protein genes. Measurements of similarity between these homologous sequences showed that HHV-6 is clearly most closely related to HCMV. The degree of relatedness between HHV-6 and HCMV was commensurate with that observed in comparisons between HSV and VZV or EBV and herpesvirus saimiri and significantly greater than its relatedness to EBV, HSV, or VZV. In addition, the gene for the major capsid protein and its 5' neighbor are reoriented with respect to the spliced ORFs in the genomes of both HHV-6 and HCMV relative to the organization observed in EBV, HSV, and VZV. Three ORFs in HHV-6 have recognizable homologs only in the genome of HCMV. Despite differences in gross composition and size, we conclude that the genomes of HHV-6 and HCMV are closely related.
of the herpesviruses recognizes their biological diversity and divides them into three subgroups (the alpha-, beta-, and gammaherpesviruses) on the basis of some of these biological properties (28, 30, 49). Alphaherpesviruses, exemplified by herpes simplex viruses (herpes simplex virus types 1 and 2 [HSV-1 and HSV-2]; human herpesviruses 1 and 2) and varicella-zoster virus (VZV; human herpesvirus 3), are distinguished by their capacity to establish latent infections of neural tissues and to reactivate from these sites. Betaherpesviruses include the cytomegaloviruses (e.g., human cytomegalovirus [HCMV]; human herpesvirus 5); they replicate productively in cultures of fibroblasts from the host species. The sites of their persistence in vivo are uncertain but may involve reticuloendothelial cells and do not appear to involve neural tissues (42). Gammaherpesviruses are typified by the B-cell lymphotropic human herpesvirus, Epstein-Barr virus (EBV; human herpesvirus 4), and the T-cell lymphotropic virus of the squirrel monkey, herpesvirus saimiri (HVS; saimiriine herpesvirus 2). The major mode of virus persistence of these lymphotropic viruses is as latent infections of circulating lymphocytes. The isolations of HHV-6 from peripheral blood lymphocytes have clearly shown that the virus can infect a population of lymphocytes in vivo. The major population of productively infected cells in cultures of cord blood or peripheral blood lymphocytes has the characteristics of immature CD4+ T cells (20, 39), and the virus can be propagated in cultures of lymphoblastoid cells in vitro (39, 59). Despite the lack of knowledge on the nature of the latent site of the virus or any demonstration that HHV-6 can transform lymphoid cells, it has been suggested that the virus should provisionally be classified as a gammaherpesvirus (33, 38). We have undertaken an analysis of the structure and sequence of the genome from a Ugandan isolate of HHV-6 (U1102 [20]). In this report, we present and interpret a
The first recognized isolations of a previously undetected human herpesvirus, now called human herpesvirus 6 (HHV6), were obtained in the course of in vitro cultivation of peripheral blood lymphocytes from patients with lymphoproliferative disorders, some of whom were also infected with human immunodeficiency virus (50). The viruses were shown to have the ultrastructural and morphogenetic properties characteristic of a herpesvirus (5, 59) but to be distinct from the five previously known human herpesviruses by their antigenic properties and by the failure to show homologous hybridization with nucleic acid sequences from each of these other five human viruses (32). Independent isolates of herpesviruses shown to be closely related to the initial isolate (HBLV/GS) were subsequently reported from human immunodeficiency virus-infected patients from Uganda (strains U1102 and U683 [20]), The Gambia (strain AJ [59]), and Zaire (strain Z29 [38]). A series of seroepidemiological investigations has since established that evidence of a prior infection with HHV-6 is widespread in populations of apparently healthy adults and that the virus is typically acquired in early infancy (7, 52). The primary infection in infants has been shown to cause the common childhood infection exanthem subitum (roseola infantum [34, 61]), and a series of virus isolations from the acute stages of this mild childhood disease has been obtained. There have also been reports of the common detection of HHV-6 in cervical lymph nodes (23) and of HHV-6 DNA sequences in a proportion of some rare B-cell tumors (31) and suggestions that infection or recurrence in adult life may be related to lymphadenopathy
(9, 46).
We are interested in the relationships between the divergent biological and molecular genetic properties of the herpesviruses and their evolution. The current classification *
Corresponding author. 287
~. . .
J. VIROL.
LAWRENCE ET AL.
288
(a) 0
2
4
III
OR
6
8
10
12
14
I
I~~pR9.1
I
x...
DS
...
I
16
18
20
22
I| |l l| || 1 l, |l l,1 ll Ikb
I
I*I -~~~~~~~~~~~~~~~~~~~~~~p,,
D
Eco RI Hind III Pst I
, ,,,,..Ts'f..fi.:'R>>'' PSD12l ......
Sal I
a' -tAa
Sma I 11R
8R
5R
15R
1OR
6R
17R
13R
9R
14R
mmusam Omni I, am ...IIIss.. _rn_*ma.*___sm*a _m .. mmf.*mI _m m_ -. -
16R
P.
3L 1L
2L
4L
7L
12L
FIG. 1. (a) Restriction map of the region sequenced showing EcoRI, HindIII, PstI, SmaI, and SalI restriction sites. The DNA sequence was determined from the shaded portions of the plasmids, whose names appear on the restriction fragments from which they were derived. (b) Positions of termination codons in each of the three possible ORFs for each strand of the region sequenced. Arrows indicate the location and direction of the major ORFs. ORFs are named 0 to 17; R or L indicates rightward or leftward orientation.
sequence of 21.8 kilobase pairs (kbp) from the genome of HHV-6 that includes the sequence previously recognized as having significant nucleotide sequence similarities to a region of the HCMV genome (22). The sequence and arrangement of the predicted open reading frames (ORFs) in this region of the HHV-6 genome bear a much closer resemblance to corresponding regions of HCMV (a betaherpesvirus) than to corresponding regions of the genome of EBV, HSV, or VZV. MATERIALS AND METHODS Isolation and characterization of recombinant DNA clones of HHV-6 DNA. All recombinant DNA clones were isolated from HHV-6 (U1102) DNA prepared from cultures of infected cord blood lymphocytes. The Sall and SmaI clones were prepared by cloning purified restriction endonuclease fragments into the SalI site of a pBS (Bluescribe; Stratagene) vector or the SmaI site of pUC13. Fragments were selected for sequencing on the basis of their linkage relationships to the 5.4-kbp HindlIl fragment (cloned into the HindIII site of pUC8 as pHD5), which was previously shown to be homologous to a region of the HCMV genome (22; unpublished results). The EcoRI plasmid pR9.1 was provided by M. Jones, and the PD12 clone was a 1.2-kbp PstI fragment cloned directly into the PstI site of M13mpl8. The relationships between these cloned fragments over the relevant portion of the HHV-6 genome are summarized in Fig. 1. A detailed description of the mapping and cloning of the HHV-6 (U1102) genome will be presented elsewhere, but the rightmost SalI site of pSAD3.5 in the region analyzed in this report is located approximately 31 kbp from the right unique/ repeat junction of the HHV-6 genome, the total size of which is about 170 kbp.
DNA sequencing and sequence analysis. The DNA sequence was determined from the regions shown in Fig. 1, using the methods described by Bankier et al. (2). Random subfragments of DNA from these plasmids were prepared by
sonication (19) and subcloned into M13mp8 (43), and singlestranded templates were sequenced by the dideoxynucleotide-chain termination method (51). Regions of sequence compression were resolved by replacing dGTP with deoxy7-deazaguanosine triphosphate in the sequencing reactions (45). Sequence data were assembled by using the computer programs DBAUTO and DBUTIL (55, 56) and analyzed for the presence of ORFs and transcription signals with the programs DIANA (J. Crooke, T. S. Horsnell, and B. G. Barrell, unpublished data) and ANALYSEQ (58). Predicted protein sequences were analyzed for hydrophobicity and potential glycosylation sites with ANALYSEP (58), and searches for homologous protein sequences contained in protein libraries were performed by using the computer program FASTP (37). The AMPS suite of programs (3, 4) was used to carry out pairwise computer alignments of predicted translation products of HHV-6 ORFs and the homologous genes of the other human herpesviruses. Twenty randomizations of each alignment were performed so that a significance score for the alignment could be obtained. The program uses the Dayhoff mutation data matrix (17, 18) for protein alignments. All computer programs were run on DEC VAX and microVAX computers. RESULTS The DNA sequence of a 21,858-bp region of the HHV-6 (U1102) genome has been determined for both strands, each base being sequenced an average of six times by the random
FIG. 2. DNA and predicted protein sequences. The nucleotide sequence reported here will appear in the EMBL and GenBank data bases under accession number M28243. The DNA sequence is given as the rightward 5'-to-3' strand only (numbered 1 to 21858). Rightward-encoded protein sequences are shown above the corresponding DNA sequences in single-letter code; leftward-encoded protein sequences are shown below the corresponding DNA sequences. The name of each ORF is given on the left of the first line of sequence, and amino acid sequences are numbered from the N terminus to the C terminus to the right of the sequence. Protein sequences are shown from the first ATG. The sequence continues on the following pages.
OR
G
D
E
P
Y
T
R
R
R
R
R
R
H
D
V
D
D
N
D
E
R
A
M
E
R
R
N
D
L
R
E
L
V
D
M
I
G
M
L
R
40 12 0
ACAAGAGATTAGTGCGTTGAAGCATGTTCGCGCTCAATCGCCGCAGAGACATATCGTTCCGATGGAGACTCTGCCTAC GATCGAGGAGAAAGGCGCCGCGTCCCCAAAGCCATCTATTTT
R
G
A
A
S
P
R
P
S
I
L
so 24 0
N A S L A P E T V N R S L A G Q N E S T D L L R L N R K L F V D A L N R M D S * AAACGC TTCTTTGGCGC CTGAAACCGTAAATAGGAGCCTTGCTGGTCAGAACGAATCCACGGATCTGCTGAAACTCAACAAGAAATTGTTTGTTGACGCGTTAAATAAAATGGATAGTTA
119 36 0
AAATGTATTTTTATGTTGT GATCAAGTGGTGTTTAGCTGT GTGTTATAAGGAAGATTCAGAGTGAATTCCTGGACATGGTGAAACTTGACACATAATTGTTTACCGACGC GTTCAATAAA
48 0
AC GAATGGTTAAAAGTTGTTTTTTGTTTTTT TTATTTTC GTATAAAATGGTATTTAGGCGCTAGTTACGGGGAAGATCTATGGGAGTGATTGTGACAAGGAATGAAATGAATGCCGACAT * P S S R H S H N H C P I F H I G V D
60 0 441
C T CCTTCTAATATCGTTAGGTTTTGTGGGGTTCCAGGCGGCATGAATTCCATGCGAGTAGATT CTCCGGGTTTTAGAGAAAAGTGCAAGGGTACCATATGTGCGGTGTCGAACGCAAC GC G E L I T L N Q P T G P P M F E M R T S E G P R L S F H L P V M H A T D F A V R
72 0 401
GGAGTGAACTCAAAAATTCCCATGTTGATGTCTGGCGTTAAGTTTGCAACCGTGCTTAGACTTGGTGTTCGCAATACTGCTTGGTTTGTGAAAGTTACGGTGTTTCTTACGGGACCAT L S S L F I G M N I D P T L N A V T S
N
84 0 361
T TGGAGATTCTGTGTTTACATAAATTTTGCTT GCTTCTGTCCAGTTGAATTCGACATTATTCCCCGGTGGTAAAATTTGTTTGGGGAAGAAGTATACTACGCCTATAGGTTGCGAGTAAT P S E T N V Y I X S A E T W N F E V N N G P P L I Q R P F F Y V V G I P Q S Y N
96 0 321
TGATGCGACGGTTCTTCTTAAATGCTTTCATGGTGATGTAGAGAGGCCGTGTCTCATCCCAAATGCCGGTGGTTGTAGAGATTCCGAGCAAGAGTTTGGGTATAAATAAACCCGTAAAGT I R R N K R F A R M T I Y L P R T
N
10 80 281
T TGAAT T TGAATTAAAGATAGTGT TAAAAGAACAAT GGACAGGAGTAACCGTGTTAGGGATGATATTT TTAGTTGC GCAGACAGTTATT GAAT GGGTGTT GCATTT TAGAAGTGTTTCT T S N S N F I T N F S C H V P T V T N P I I N K T A C V T I S H T N C R L L T E Q
120 0 241
GATTCTGCGACAATGTAATGTCTGCGGTTTGGGGGGTCAGTGAAATTTGTATGGCAAGTTGTTCAGGATAAAAATTATCGTCAAGGAGATAGTCGTTTTCGTAGTTTGCCTTCAAAATTA N Q S L T I D A T Q P T L S I Q I A L
L
1320 201
ATAGATTTCCGCAGAGTAAGATTCTAGAAATATAGACGTTTTTGTGTGATATAAATTCGATGTTACCAGTTTTAAAAATTTGGTCGAGCGGAGTGTTTTGGAGATTAACCCATATCTCTG L N G C L L I R S I Y V N K
144 0 161
CCTTAAATGAATATCGTGTTGGCTCTTCTGGTGTTTGTTCTATCCATCGTATATCAGCCACTTGAATTTTAATACGCGTGAGAGGGCCATATATGACAAACTTCGCGTCGCACTGTCCGT K F S Y R T P E E P T Q E I W R I D A
156 0 121
GGGGTATAATGTGGCGATCGTGGGCTTC GATGCGTTGCACCGGTAAATTTGATATATTCACGTTTGCGTGAGGAAGTGCAAATATATTAAGGACGATAGGTAATCTACTGAGATCCAGAT D
168 0 el
CTAGATCTGAGATGTTCTGCACC GCGAAGGGTATGTTGCCGTAATCTTTTGGATCGATGTAAGTAAAAGGCGATGATGTAAAACTGCTTTCGTCTCTACAAATGCACACTACGGCGGATG L D S I N Q V A F P I N G Y D K P D I Y T F P S S T F S S E D R C I C V V A S S
180 0 41
AAGGGGGGGTGATACCTAATCCGCATTTAAACAATCTCAGTTCGCCGGGTTTTAATATGGCGGTAGTTGTTACTCTTAGCTGTAGTACGTACGAAGACCATTGTAGAGTGGCGGGTTGCA P P T I G L G C K F
192 0 1
T GTTTGCTTT TCTGGATGTCTGAAGGGAATCCTGTTAGTGTATTTTGGATGTTAAATAGTTGATGTAAACGTAGAGAGTCATATGCGTATTTTTATAATAGAGTGTTTTTCCACGGACGG * I R I R I I S H K E V S P
2 04 0 420
GAGAGTCTGCGGGTTTCCACGCGCAGGCTTGTCTTCGTATTCTAAGGGATGTAATGTGAGTTCCGGTAGCAAGTATGCAGGAAAACAATTGCCCATGAAGTGTACGTCGAATGGTTTATC L T Q P N G R A P R D E Y E D
2 16 0 380
GGTATTCTCCGTGTGTGTTAAAAGAGATTTCATGTCGCTGTTTGAGAGTGAACGAAACGGCTTTCTAAACAACTTCTTCGTCACAAAAAAGACACGCCCTAACATGTCGTGTTCGGTTAC T N E T H T L
2 28 0
V
340
GGAAACTGTTCTTTCGCACGAAATCCGCAATTTTAACGGTTCTTGGGCAAACCATAAAAATGGATACAGCTTGAAGACATGCGTCTGGACAGGGATGAAGATGCCTACGATTTTTTTGTT S V T R E C S I R L K L P E Q A F N L
2 40 0 300
AGTGAAC T CGCCGC GTAGCGTGAT CTC TTTGTT TGTATTACTTACTAAGGACATAT CGTT GGGAAAGAAAAC CTC TAGAT TCTTCC CGTC TGCGCAGATTTTAAAATAC GGCATATT CAT T F E G R L T I E K N T N S V L S M D N P F F V E L N R G D A C I R F Y P M N M
2 52 0 260
AT TTAGCGTGAC GGTGCTCCGAT CAGGGGGGGAAAAGGTCAAGCGAAGATAACATTC TTT CGTGATT GGAAAC CGTCGCTTGTTT TCCGGCAACACTACGTTACGCAAAAAGACTGTGAG N L T V T S R D P P S F T L R L Y C E K T I P F R R K N E P L V V N R L F V T L
2 64 0 220
AAAAGGATCGTTAAATTCTATTTTGTAGATAGAGAT GCCGGGTTCGGAAATCATGTTGACTCTGCAGAGTGGGATGATACTTTTGAATAGGTTTTCCCAATAGGGACATTCTCTGGTATC F P D N F E I R Y I S I G P E S I M N V R C L P I I S R F L N E I Y P C E A T D
2 760 180
GAT GC TGAACAC GC TAAAAACTAT TAGCAGC TTT CTGCCATAGC GGTCTAATTT TTGAGACTGATAAGGACT TTCTGTC CAGTT TAGGTTAAATAAGATCAAAGACATT TCAACGC TATC I S F V S F V I L L R R G Y R D L K Q S Q Y P S E T I N L N F L I L S M E V S D
2 88 0 140
AT T CTGAAC TACTTTAAATTGATTTAT CACGTTGGTTGTCGAGATAGGCATATTGGTTAGGCCATATTGTTCG TATACGGTCGGATCGATGTTTTCTTTTTGACATAGCGCATTGATGTC N Q V V K F Q N I V N T T S I P M N T L G Y Q E Y V T P D I N E K Q C L A N I D
3 00 0
GTCAATCTCCAGTGCATGTGACGGCAGATAAATGGGTATGGAAAATAAATGAATCCGTACGGGAACCGTCCTAGAAATAGGTAAAATTTTTATTTTGATAGTCGTTGTTTTTTCGAGAAA D I
3120 60
CTGTACATCGAACAGAACGTCCTCAGTTTCCGATTCAGTGCCATCTGTTATGAATAGGATTCCTGGTTTTGTGCTTGAGATTCGACATTTCAAAATGATTTCTTGGTAAGAATTCGAAAC
3 24 0 20
GGGAGATCCGGAATACACAAAAAAAAGGAGGCGCCATAGAGTTGACAACGACGACGATAAGGAGATGGCTCGAGAAAAGAACGATTTGAGAGAATTGGTGGATATGATAGGAATGTTAAG Q
IL
E
S
I
A
L
R
V
H
R
A
Q
S
P
Q
R
I
H
P
V
M
L
E
S
W
D
S
I
F
E
V
P
I
I
H
D
R
H
E
A
I
Q
R
R
L
2L
V
P
L
L
E
P
G
L
L
S
R
D
M
S
N
S
N
P
S
L
I
K
H S
V
N
L
I
L
F
L
P
F
E
Q
V
L
D
A
F
L
H
V
S
P
D
L
E
T
I
Y
E
I
P
S
E
S
T
L
F
G
D
H
T
I
I
R
V
L
F
A
T
E
R
P
P
I
G
R
I
N
A
T
R
Q
I
H
T
P
T
P
L
F
Y
L
V
T
P
R
L
R
R T
A
R
V
L
R
L
Y
R
F
S S
F
L
I
Q
P
A
T
R
V
H
P
I
I
S
N
L
F
V T
C
N
F
Q
S
F
V
I
G
V P
I
R
C
R I
R
L
R
P
F
H
L
I
V
M
I
F
I
L
G
T
T
E
Q
F
D
H I
V
T
R
Y
S
E
K
E
N
H
L
Q
P
P
A
G
D
A
D
E
Q
L
T
I
I
C
S
F
L
N
D
R
Q
G
I
A
L
BI
S
M
R
F
I
V
Y
V
L
R
L
V
G
T
K
V
N
P
G
A
N
L
V
L
F
Y
Q
R
N
I
E
N
I
T
P
N
T
Y
V
R
D
P
G
T
L
Y
L
P
F
L
L
D
L
T
L
L
Q
T
R
N
G
D
I
E
Q
I
D
F
E
A
S
N
K
I
V
T
F
T
T
L
T
Y
G
P
R
T
P
N
L
T
G
E
I
T
P
I
Q
H
E
M
R
T
K
N
L
100
F
S
V
TATGT GATTGCAATCAAAAACGAGTTGCAAGACATGGTTTAGTTGAGGAGAAAACAT GCTGGTACAAAAAACATTTTATGTTTCATCTCCCTTTTTAAATACTGTGTTTTTTTAAGGGAT I H N C D F V L Q L V H N L Q P S F M
3L
GAGGTCATTTTGTATGACGGATGGAGGAC GCGGGTGTCCCGGCGGGGTGTGTTTTTATAATCCTGTGTATGAGTGATAGAAAAGTCATAGTAGGCTAGTGTTTTTTTAAAAAGCATTTAA *
N
3 48 0 296
TTTTTATAAATACATGTAGC CATTCTGTTATCTGCGGAAAC GTCACAGACAACAAAATACGTTTTCTCGTTGGTCAAAGAATTAATTTGGTCGATCATAGTTAAGACTGATTTGACTTTT K Y I C T A M R N D A S V D C V V F Y T R E N T L S N I Q D I M T L V S K V R R
3 60 0
TTTAATTCATCTTTAATTTC CGTAGCGTGAAGTTGTCCGCAGC GCTGTGTGTTTAACGTTGTTCTTTTAGATAATAGTTCTTGGCATTTGATCAAAAGCAT TGAATGGTCTTCCAGCGTT L E D R I E T A H L Q G C R Q T N L T T R K S L L E Q C R I L L M S H D E L T L
3 72 0
AGC CGAT CTATGTAAGT T CTCACGATGTCTGGGGCCAT GGC CGT GATCATGGACAGC GAGATGCAAGCT GTTTTCAT GGAGTACATTGATTGT TTGT CGTT GACTATGT CTGGTAATTTA R D I Y T R V I D P A M A T I M S L S I C A T K M S Y M S Q K D N V I D P L R I
3 84 0 176
ATGAGTACGTCTCTGTATTGGATAGAGCATAACTCATCGATGATTGTTTGCATCTCATTGTATTCT CTGTGAACTGCTATAAGACCAATGCATAGCAATCTGATGTTGATCTCGGC CGCA I
3 96 0 136
AT GGCTGTCGGAACGACTAAAGGTACAGTGAGCTCCCAATCACCTAGT TTCAATAGCTTTTCTTCCGAGTGCGTCGATAAAGGT GGAATCAATGTCAAGGTGTCACCCTTTTCCCAGGGG A T P V V L P V T L E W D G L K L L K E E S H T S L P P I L T L T D G R E W P F
4 0 80 96
AAGGGTCCT GTGTTCTTTATGGCATACTGATGGCCGGTCACCGGCTTTCTCAAAACCAGTTGATTGC CTTCCACCTTTTGCAGAATGGTTACGACCAT GGTCCGTAATACGTTTCGGACG P G T N K I A Y Q H G T V P K R L V L Q N G E V K Q L I T V V M T R L V N R V H
4 20 0 56
T GGACGTAATCTTT GTT GGAGGAGACGATGGGATAAAGAC CTAAATTGCCGCTAC CTATTAGATGGTGGTGAGCTGGGATCGGTATGACGATGTTCATGAGCTTGCATAGGGTGCTGATA V Y D K N S S V I P Y L G L N G S G I L H H H A P I P I V I N M L K C L T S I D
4 32 0 16
TCGGAAAGTGACAGT TTGTGATCGAAAGTGCAGTAGACGGTTTCCATTTTATATGGATGATTCGATAATGAGTTGGAAAGGTATGGTTTCTCCTATGGCGTAATTACAATAATGGGTTTC S L S L K H D F T C Y V T E M * I S S E I I L Q F P I T E G I A Y N C Y H T E
4 44 0 1 1323
L
4L
3 36 0 1
V
D
R
Y
Q
I
S
C
L
E
D
I
I
T
Q
M
E
N
Y
E
R
289
H
V
A
I
L
G
I
C
L
L
R
I
N
I
E
A
A
256 216
GCTGATCTGCGTATTGCCGGTTTTAGACCGACTCTCTAATAGAGCTTGGTTGGATGAACAGTGAAGAGGCAGAGCGTTTTGTAAGAACTGGCATGGTCTGCTTACGAAGCTATCCGTGCC I
S
Q
T
N
G
T
S
K
R
S
E
L
L
A D
N
S
S
H
C
L
P
L
A
N
Q
L
F
D C
P
R
S
V
F
S
D
T
4560 1283
G
CTCAGAGCACGAGTATTGTATTTCGCTATCGCTGTTTAAACACGAGTTCGATTTCATGCAGTATTCGCTTATCACTTTGTACATCATCTTGTTTGTTTTTAAGATGTCAGATTCGGTAAA F
4680 1243
AAATTGAGCGTTGGGACTGTATGTCTTGGGGTTGTAACATAGTTGTTCTCTGTGTGCCGTGTTGTATAGAATATCGCCTAAGGAGCCTGGTAGAGATGCCCAAGGGTTTGTGGTTGCGGC
4800
E F
S
C
Q A
S
Y
N
P
Q S
I
E
S
T
Y
K
D
S
P
N
N
Y
L
C
C
S
N
L Q
K M
S
E
R
H
C
A
Y
T
E
N
S
I
L
Y
I
V
K
D
Y
K
M
N
K
T
L
I
D
S
T
E
A
1203
GAATGTATCGCTGTCTGTTTGGGTGTGGTCGTACAATGCTTTTCTAGCCGCCTCTTCGTTGTGCGGGTCCGTTCCCATCATACATGATTCCCTGCCTCTTGGGTTGTGGGGCGATTTGAA
4920 1163
F
T
D
S
D
T
Q
H
T
D
Y
L
A
K
R
A
A
E
E
N
H
P
D
T
G
G
L
S
M N
C
G
P
S
S
L
E
R
A W
G
R
P
P
N
T
H
N
T
A
S K
P
F
AAAGTTAATGTTTGTCGTAACCGGGGTCAGTATAACTTCGCAGATAGCTTGTTGACCATGCAGTAGTATGGAGGGTGGGTTTTTGTTAATTCCGCCGAACGTTATGATGTTTAGTGCTTC F
N
I
N
T
T
V
P
T
L
I
V
E
C
I
A
H
G
Q
L
L
I
S
P
P
K
N
N
I
G
F
G
T
I
I
N
L
A
5040 1123
E
GCTCTCGGAGGGATTAGGTTTTTCTATCCCGACGTGATGTCGAATCCACGTATTTATGTCCGCGTTGGTAAACGCGTGTATCGGGAAGGCGGAGAAAAGGTTTTGTATCTTGCTCCCCAT S
E
S
P
N
P
K
E
G
I
V
H
H
R
I
U T
N
I
D
A
N
T
F
A
H
P
I
F
A
S
F
N Q
L
I K
S
G
5160 1083
M
ATCTGATTTAATTCGCTTAAGATTGGCAGTGGCGGTGGTAGAGCTAAAACCTAAGCCCATATCTACAAAACTTAGGTGCTGGGTGAAGTTGTAAGTGGTTGCGATATCTTTAGCTTCCGC D
S
I
K
K
R
N
L
A
T
A
T
T
S
S
F
L
G
M
G
D
V
F
S
L
H
T
Q
F
N
T
Y
T
A
I
D
K
A
E
5280 1043
A
GGTGACCGTGGGATCGTCTAAAATTATAGATGTGGCTGCTCTGGAACTGTATAATAGGCACTCGACGTCGAAATTATCCGTCCGGACTAGTGTGGCCGCAAAGCCCGGATGGATTTTGTT T
V
T
P
D
D
L
I
I
S
T
A
A
R
S
S
L
Y
L
C
E
V
D
F
N
D
T
R
L
V
T
A
A
F
P
G
I
H
K
5400 1003
N
TTTGCTTTGGATCGCTATGGCAACGGGAGACAATTTTGAGTGCATGGCGGCAAGCGTCATAATGCTCAACAGGGAATTGGGGCAGAAGGCTGAGTACACGGAAAACGGTGTTCGTTGCCA K
I
S Q
A
I
A
P
V
S
L
K
S
M
H
A
A
T M
L
I
S
L
L
S
N
P
C
F
A
S
Y
V
S
P
F
T
R
5520 963
QW
ATTATAAAATTCTAATGCTAATGGGGGCGGGAGAGGAAAACCGCCGTCGTTTCTTTGACAGTGAGGAAATGTATTTAGGAAAACTTGGATATCAGCATTCATGGTGCCGCAGATTCCCGG N
Y
F
E
L
A
L
P
P
P
L
F
P
G
G
D
R
N
H
C
Q
P
F
T
N
L
F
V
D
D I
A
N
M
T
G
I
C
G
5640 923
P
GTCGGAGAAGAATCGATGAAATGGGATTGGTTGATAGTACCGTCTTAGGGTGGTGATCGGTGAGATCAGGCAAAGCCC GTTGTATAGAkATGTGTTGTAGTGATTGAAAATGAACGGAGTT D
S
F
F
R
H
F
P
P
I
Y
Q
Y
R
R
T
L
T
I
P
S
I
L
C
G
L
N
Y
L
HN
I
L
S
Q F
V
H
S
5760 883
N
GCCGT GT CGCTGCGCGGGATCTAAGGAGACTTCCACTTCCAATATTTTC GTAT TTTCATTAATGTAGATTAGCGATTTAAACAACT GTTTTGCGATACAGGATGTGTTGGCTACGGTGTA G H R D A P D L S V E V E L I K T N E N I Y I L S K F L Q K A I C S T N A V T Y
5880 843
GCCAGGGCCCATACCTTCTCGGATCAGAGACATTAGAACGTCGCTGGTAATCGGATTCTCCTGGAAATAATCATCTGGACTGATGAACGGTTCCGTGTAGAATAAGTCTAGAACTAGTTC
6000 803
G
P
G
M G
E
R
I
L
S
M
L
V
D
S
T
I
P
E D
N
F
Y
D
D
P
S
I
F
P
T
E
Y
L
F
D
L
L
V
E
C TTTACGTTGACGCCAGCGCCACAGGCCTTGTTATT TGATAGTGCCGGGAGTACGCAGAAGTAAAATATCTCGCTCAGGAT GGTGGTTTCGTTCGATGGTCTGTCATTGTCGGTAAAGAC K V N V G A G C A K N N S L A P L V C F Y F I E S L I T T E N S P R D N D T F V
6120 783
GAC GCTT GAATCTATTAGATTCATTCTTTGCACAT CGGATATTTCGTAATTTCTAACTCTTACGGTGTTCTGTGTCAGTGGT GTATCATCCGCTGTTATTTTTGCATTCGTGTCGTTTCT V S S D I L N M R D V D S I E Y N R V R V T N D T L P T D D A T I K A N T D N R
6240 723
GGGCAT GGTATGGACGAACGGGCAGAACAGACGTCCGTCGAACAACGCGTTGGCGAAATTCACCAGAGGTTC GCCGCAAAGTTGCTCGTTGAGGTTGGAGATAGAGATTGTTCTCTTCAC P M T H V F P C F L R G D F L A N A F N V L P E G C L D E N L N S I S I T R K V
6360 683
TAGGCGAATTAGCGACACAAGATTTCTGTAGTGAGCGAAAGCTGCTCCCGGGATCAGTTCGTCGCCCATGTGGT TAGAGATTAGCATGATCATCTCGAkAGCTGTTGCAAAAAAGAAGTAT
6480 643
L
I
R
L
S
V
L
R
N
Y
A
H
F
A
P
G
A
I
E
L
D
G
M
N
H
S
L
I
M
I
E
M
F
N
S
F
C
L
L
I
ATGT TTCATGTTAAAC CAATAAGAAATACAC TGGC TAATTACTTGTT TTAGGATCATAAAAGCATGCTTGT TTC CAT GCACTAAGGTCTCGATAACGTAAGCCAATT CTGGGTACGAGCC H K M N F W Y S I C D S I V D K L I M F A H K N G H V L T E I V Y A L E P Y S G
6600 603
ACTCGTCAAGCTTTCGGTGACAATTTTGAGGGTGAAGTCGTACTTTTGGCAGTTCGTCTTTGCATGTTCCAGTATTTGATTTGTACGGGCTTCGTGGAAAGAGACTGGAGCTAACGGTAA
6720 563
T
S
L
E
S
T
V
I
K
L
T
F
D
Y
K
C
Q
N
K
T
A
E
H
I D
L
N
T
R
A
E
S
F
H
V
P
L
A
P
L
AGGTATGTTGCCTATCATTATTCTTGGTGTGCAGAGTACTTCTGTGGCTCGGTTTTTCTGGATATACGTGAGATCGAAGAAGGGGTGTCTTTCGGTCGGCAGTGTTGGATTGTCCGGTTT I
P
N
I
G
I
M
R
P
T
C
L
V
T
E
A
R
D I
K
N
Y
T
L
F
D
F
P
H
R
T
E
P
L
T
N
P
D
P
6840 523
K
GTAAAAATCTTCAATGTTTAATTCGTTTTTTAAGACGTTGGTGTTTTTGGGTACTTCTTGTCGACTTTCGTATAAGTTGTATAAGGTCTGTAAGATCCTGGCTGGGTTTTCTCGGGTAAT F
Y
D
E
I
N
L
E
K
N
L
V
T
N
N
P
K
E D
V
R
E
S
Y
L
N
Y
L
T D
L
I
A
R
E
N
P
R
T
6960 483
I
TTTTAGTTGACATAGTTGGGAAAAGCGGTGGCCGGTAGGCAATGGTTCGCTTTTCATGAGTCGTTCGACGATGGTCGAGTCGTGGACAATGGGATGGCAAACCGACGGTAATATGTCGGC L D
K
L D
C
S
F
R
G
H
T
L
P
E
P
S
K
L
M
E
R
I
V
T
S
D
H
V
P
I
C
H
V
S
L
P
I
D
7080 443
A
AAAATCAATTCGCTGCACGACTTGGTCTTTATTGTGGAAAAACGCTGGTGGGTAGGTTGTTTTCCATGGTGTCGTTTAATTTCACACGTGTTTCCATTGTACTGAAACCGGTCTCGCT F
I
D
D V
R
V D
K
D
N
H
F
F
S
V
P
T
L
N
N
M T
E
D
N
K
L
R
V
E
T
M
T
S
G
F
T
E
7200 403
S
CGGTATGTAGATACCCAAAGGGAAAAAGAATGTTAGTTGTAAGCTCTGTTCTAGCGGATCGTTGACGTCTGTGTTTTTGTAAACTTTGTGTAGGTGGTCGAATGCGACTAATTTATCTCC P
I
Y
I
G
L
P
F
F
F
T
D L
L
S D
E
L
D
P
N
V
D
T
K Y
N
V K
L
H
D
H
A
F
K
L
V
D
7320 363
G
CAATTTCAACGTGTTCAGCTTTAGGTCGGCGTACGC GCGGCTTTTGTCGAAGATTTCTGATTTGTT GTTAGTGTCTTGTTGTTCTGTCCCGGCGTTCACGGTGAATTGAGTAAAATCGGC L K L T N L K L D A Y A R S K D F I E S K N N T D D D E T G A N V T F D T F D A
7440 323
CATGATTGCACGGTATGCAATCGCAGTCACTGCGTTTTCTTTGCCCATAATAAAGTGACCATATGAGATTGGGGCGGAGACGACTTGTGTGGAGATGTACTGGGAGAGTATGCTGGTGAG
7560 283
M
I
A
R
Y
A
A
I
T
A
V
N
E
K
G
M
I
F
Y
G
H
I
S
A
P
S
V Q
V
T
I
S
Y
L
S
Q
I
T
S
L
T TTTTGGATAGAGCCGATCGGTCCTAATAAAACACCATCTAGGGGGACGTTTTCTTTGGAAGTGTAGTTGTTCGTATCATTTAGAATGCTTTCCGTGACCGATTCCATCATGTCGTTTAG K Q I S G I P G L L V G D L P V N E K S T Y N N T D N L I S E T V S E M M D N L
7680 243
AATTCTATAGATATATGAAATATTATTATTCCTGTTCAAAAAATAATGACGTTAACAATCTGTTCTTCAGGCTCTGAAACATGTTGCTGCGCTGTACTCGGCTCAGAGCTTGTTTATT
7800 203
I
Y
R
I
Y
I
S
N
N
N
R
N
L
F
F
L
L
T
S
L
R
L
K
N
D F
S
M
N
R
S
V
Q
R
L
S
K MN
A
ATGTACTTTGTTCTCATTCATGGTCAGGACGATAAATTGCGGGGGAGATTTTCTCAATAAGGTTTGCATGAATGCGTGAATCAAACCTCGCTCGAGCGAATCGGCTGAATTCTTTAAAGA V
H
K
E
N
N
M
T
L
I
V
F
S K
P
P
Q
R
L
T
L
Q
M
F
A
H
L
I
G
R
L
E
S
D
A
S
N
K
L
7920 163
S
TCTTAAGACTGTATGTAACGCATTGATGTTTAAAATCTGATCGATGACTGTGTTTTTAAATGTGTTTTCCAGATGTTCCAGACAGGCAGCGCTCAAATCGAACGATATGTTTATGGGGTG R
L
V
T
L
H
A
I
N
N
I D
L
D
I
T
V
K
N
F
T
N
E
L
E
H
L
C
A
A
S
L
D
F
S
N
I
P
I
8040 123
H
TTTTT CTGAGTGT TTGGCTAC CATGAT GGT GGTTTCTTTTGAAGCAGT TAC GTCGTTTC CTGTAGCGACTCTAGGTAATTGAATAAAGAAGAGAACC TTC C CGAGGGTCATTCGACT CAC K E S H K A V M I TT E K S A T V D N G T A V R P L Q I F F L V K G L T M R S V
8160 83
ATCATTGAAGCGGATTACGTTCGCT GCAACGGCGATCGGCGTGGTGAAAAAGTTAATCCATTCTATTTTGTTGCAGTAGATTCCGAGTAAGGCTTCAAAACTGATGTTGTAACGGCTCGG D N F R I V N A A V A I P T T F F N I N E I K N C Y I G L L A E F S I N Y R S P
8280 43
ATCGTCACCGAAATAAATTCGCAAATTGTCAAAAAGTTGTTC GGCTGTGTATGTTTTAATGTCATTGAAAATATTTAGAGGTGCTTCGATCTTAGGTAAAATTTCGGTCGCCTGCCAATT
8400 3
D
D
G
F Y
I
5R
R
L
M D
H
T
D
F L
D
N
G
N
E A T
C E
T
Y
K
T
L I
V
N
D
I S C
F I
N F
G
N
S T
L
P A
C A R
E S
K
I I
P
P V
L I F
I
T
A D W
N
S C
D L
T A
E D
TTCCATGTTTCGGCGTTATGCAGCACACCGGTAATT GTGAAACTTTAATAGTTAATTCATGTTTTGGTTCCACTT GTGCACGGTCGATTCCAGTTTTTATAGATTCGTGTGATCTGACCG E M
E
V
S R
D E E
T
R
L A
R
S M
P
V V
L
E
K
I
E
S
I
I
E
K
I
F
D
T
S
G
P
I V
K
75 8640
D R A K I A L C R L L L G P V A V P C F C E E W D T N D Y L S K S G C K C I G P GACAGGGCTAAAATTGCCTTGTGTCGTCTGTTGTTAGGACCAGTTGC(GGTGCCATGCTTCTGTGAAGAGTGGGACACCAACGACTATTTGTCGAAATCTGGCTGTAAATGCATAGGTC
115 8760
N
H
D
35 8520 1
CTGAAGTGTCTCGCGACGAGGAAACTCGTTTAGCGCGATCCATGCCTGTAGTTCTGGAAAAGATCGAGTCAATCATAGAGAAAATTTTTCAAACGTCCGGGCCAAATATCGTTCACGATA I
I
L
Y
I
H
T
S
R
C
R
C
S
D
I
P
V
F
K
F
S
I
M
K
D
Y
Y
A
S
H
V
F
R
G
L
L
S
L
K
E
CGATCTTATATATTCATACCAGTCGATGTCGTTGTAGCGACATTCCGGTITTTTAAGTTTTCTATTATGAAAGATTACTATGCTTCACACGTGTTCAGAGGTTTATTATCTCTGAAAGAGT
FIG. 2-Continued. 290
155 8880
N
T
L
H
P
N
L
V
T
C
C
E
L
S
S
M
D
Y
R
V
T
A
V
Y
Q
K
P
S
N
I
Y
L
Y
Y
E
P
Y
F
C
L
GGAATACACATCTACCGAATGTATTGTGTACGTGTGAGTTGTCGATGAGCGATAGATATGTGGCGACAGTGTATCCTAAGCAGAATTCTATTTATCTAGAATACTATCCGTATTTTTTGT Y
L
C
H
R
L
T
I
V
E
I
Q
E
C
N
T
D
L
I
L
S
G
L
P
K
V
Q
A
R
V
I
I
H
K
F
L
L
F
F
G
GCTACCTGTGTCGCCATCTTACTGTCATTGAGATTGAGCAGTGTACAAATGATTTAATTTCGCTTCTTGGCCCTAAAGTAGCTCAGCGAGTCATAATTCATTTTAAACTGCTTTTTGGTT R
K
H
P
I
H
G
T
V
D
N
S
N
F
E
N
F
N
F
L
L
E
H
K
N
L
T
L
K
V
V
H
N
R
V
D
T
T
F
F
L
N
K
I
T
V
K
V
K
K
N
£
R
M
C
L
N
F
G
Y
V
K
G
T
Y
L
V
V
S
E
Q
S
L
I
F
R
N
L
TGTTGTATCTAAATATTAAGGTTACGGTAAAAAAATAAGCGAGAAATGTGTTTGAATGGCTTTGTTTACGGTAAAACATTGTATGTCGTTGAATCTTCTCAGTTAATCTTCCGGAATT L
L
L
Y
Y
Y
D
S
L
P
D
K
C
E
T
E
N
N
E
V
L
A
T
H
Y
I
R
I
V
S
R
S
L
F
K
R
S
R
S
P
P
G
V
D
P
R
F
I
F
Q
A
V
Q
K
P
R
K
P
L
E
N
P
V
G
I
G
D
F
E
A
I
T
S
V
R
A
M
Y
CGCTCCCGCCAGGCGTGAGACCAGATTTTATCTTTGTGGCACAACAGCCTAAACGTAAAGAGTTACCTAATGTTCCCGGTGGTATCGATTTTGCTGAAATTACCTCAGTGAGGCATGGCG V
L
T
N
A
F
N
T
K
N
V
N
M
K
L
A
I
T
S
K
R
A
N
F
V
H
Y
R
I
P
K
N
T
T
S
H
F
V
CGGTAACTCTTAACGCGTTTAATACGAACAAAGTCATGAATTTAAAAGCAACCATTTCAAAAAGGGCTAACTTTGTATATCATCGCATTCCTAAGACGATGACCCACAGTTTTGTCATGT K
T
H
F
K
E
P
F
A
T
S
V
T
F
S
V
N
D
L
D
D
S
S
M
L
N
I
I
N
R
G
P
D
C
Y
F
L
L
ACAAGCATACGTTTAAAGAACCTGCGTTTACCGTAAGCACGTTTGTTTCAAACGATGATTTAGATATGAGTTCGTTGAATATCAACATACGTGGACCTTACTGCGACTTTTTATATGCTT
G V Y K M H V S I R D L F L P A F V C N S N N S V D L Q G L E N Q D V V R N R K TAGGC GTTTATAAGATGCATGTTTCTATCCGAGATCTAT TTTTAC C GGCGTTCGTTTGCAATAGCAATAATTCAGTGGATTTACAGGGACTGGAAAATCAGGATGTTGTGAGAAATAGAA K
K
V
N
Y
I
N
T
F
P
C
I
M
N
S
A
K
N
N
V
V
N
G
F
K
G
A
T
I
G
P
I
R
S
G
E
D
L
Q
H
Q
I
P
F
L
V
395
435 9720 475
98490
A
Y
355
99600
G
H
315 9360
890 94 A
TGCTTCTGTTGTATTACGATTACAGTTTGCCGGACGAATGCAAGACAAACGAAGAsAAACGTTTTGACGGCTCATTACATACGAGTAATTTCGAGATTGTCGTTTAAGCGGTCTCGGAGTG L
275
92490
N V V Y E K I Q N Y K Q Y A I K T L R M S S K A V P A I Q R L C L A K F K Q Q L T TAATGTAGTCTAC GAGAAAATTCAAAAC TATAAACAATAC G CAATCAAGAC C CTGAGGATGT C GTCTAAGGC GGTTCCT GCAATACAGAGG TTGT GTTTGGCAAAATTTAAGCAGCAAT Y
235
91290
TTCGTCACAAGCCGCATATTGGTACTGTTGATTCGTGGTTCTGGGAAAATTTTTTTATGTTAGAATTGCATAAGCTTTGGTTAACCGTAGTCAAACATAATCGGGTGACGACAGATTTTT
L
195
090 90
515 9960
555
101080 595
AGAAAAAGGTGTATTGGATCACTAACTTTCCGTGCATGATTTCTAATGCTAACAAAGTGAACGTGGGATGGTTTAAAGCAGGAACGGGTATTATTCCTCGGGTGTCTGGGGAGGACCTTC 101200 N
V
L
L
Q
E
L
N
N
V
R
E
I
P
G
L
F
V
D
D
M
L
Q
H
L
L
V
L
L
Q
E
R
N
L
635
AAAATGTTTTGCTTCAGGAATTAAATAACGTTCGAGAGATTCCCGGGTTAGTCTTTGATATGGATTTACATCAACTGCTTGTTTTATTGGAACAGCGAAATCTACATCAGATTCCGTTTC 103120 Q
K
V
L
F
I
L
F
R
L
L
G
L
G
M
Y
H
G
S
R
R
K
N
V
D
H
I
L
M
H
L
I
S
G
N
F
L
F
D
675
N
TCGTTAAACAGTTTCTTATTTTTTTACGTCTCGGTCTGTTAATGGGTTACGGGCACTCTCGGCGCAACAAGGTGCATGATATTATGTTACATTTAATTTCGAATGGTCTGTTTGATTTTA 101440 K
N
S
V
A
K
T
N
K
I
G
H
C
L
A
G
V
T
L
R
A
N
N
P
V
K
I
I
Q
R
A
N
K
K
L
K
D
H
G
M
715
ATAAGAACTCCGTAGCAAATACAAAAATCAAACACGGGTGTGCGTTGGTTGGGACGCGGCTCGCCAACAATGTTCCGAAAATCATTGCTAGGCAGAAGAAAATGAAGCTAGATCACATGG 1560 10
R N A N S L A V L R F I V K S G E Q K N K T V F I K L L E Y L A E T S T A I N T 755 GGC GAAAT GCTAATTC GCT C GCCGTGTT GCGTTTTATCGTTAAAAGT GGGGAACAGAAAAATAAAACTGTTTTCAT TAAATTGTT GGAATATTTAGCGGAAACCTCAACTGC CATAAATA 10680 R
N
V
E
A
R
L
Q
L
T
T
L
K
A
V
T
K
6R
*
N
M
V
L V A
772
E N F
D
D
C
A
C
I
L
S
D
T
E
I
CGCGGAATGAAGTCGCCAGATTACTTrCAGACTCTGACGGCTAAGGTGAAAACATGAATGTACTCGTGGCCGACGAATGGTTCGAl TrTGCGCGATTAGGTTAGATTCGGAAACCATAGCTGT
V
23 10800
H E I F N P E L S K L L N L H S K T V Y M S D L C A F I S G C V N R N V G K L T cCCATGAGATTTTCAATCCGGAGTTA MLGTAAACTGCTTAACTTGCACTCGAAAACAGTCTACATGTCCGACCTATGCGCTTTTAT7TrTCTGGTTGTGTTAATCGGAATGTCGGTAAACTTAC
63 10920
I Y N H V N G D I I Y A L T G I L H C V K I K I E C G E R I A D G R Y R L Y E I CATATATTGGCATGTGAACGGAGAT. rxLTAATCTACGCATTGACGGGTATTTTACATTGTGTAAAAATAAAGATAGAGTGCGGGGAt IGGAGAATTGCCGATGGTCGATATAGATTATACGAAAT
11040
P K L F L M R G Q S T P N E L K N K H A V G I A T T N K P L L T H V L T D V L E TCCTAAATTATTTTTAATGAGAGGA kc, :AGTCAACACCCATGGAATTGAAGTGGAAGCACGCCGTGGGTATCGCGACGACGAATAAGGGCCTTTGCTGACGCACGTTTTAACAGATGTGTTGGA
11160
A
103 143
I
T
S
P
F
T
L P
D
T
L
S
L
V Q
E
L
S
I
F
R
E
L
R
S
Y
Y
Y
I
V
L
S
G
D
D
V
iAACATCTCCTTTTACCTTGCCAGAT. MI kCGCTTCTGTCGGTGCAGGAGTTGTCTATTTTCAGAGAGAGATTGTCGTACATTTACTAIITTGTGCTGGGGTCAGATGTTGATATCGTAGCGCGGAC
T
163 11230
E R E I F Q K C A E L A R L Q Q V F L I Q G N I M E N F V L A Q A C L F Q L G A iAGAGAGAGAGATTTTTCAAAAATGTIrG,;CAGAACTAGCTCGCCTACAGCAAGTGTTCCTTATTCAAGGAAATATTATGGAAAACTTI'TTGTCCTCGCCCAGGCTTGTCTATTTCAGCTGGGGGC
223 11400
D
G
L N
E
E
I
S
G
S
V R
P
P
R
L
E
M
S
S
Q
I
F
A
H
V
R
M
L
N
N
C
Y
C
I
A
V
I
A
R
TGATGGTTTGTGGGAAGAGATATCTr rG;GTTCTGTACGTCCTAGGCCGGAATTGATGTCCAGTGCGTTCATTCAACACAGAGTAAT( .GGTTGAATAATTGTTATTGTATCGCTGTCATCTTCAA
N
263 11520
A I Y K H K L S L P T V E R S H E T V N R V A Q E Y Y K S Y V N A P L S V L V C TGCCATTTACAAACACAAACTTTCC IC:TGCCTACCGTAGAAAGAAGCCACGAAACCGTTAATCGTGTAGCTCAGGAATATTATAAt AGTCTTATGTGAATGCTCCTCTCTCTGTTCTTGTGTG
303 11640
A T
V
K
L
T
L
T
F
E
Y
E
N
F
K
S
A
L
V
F
V
Q
S
Q V D
F
S
V E A
A
R
D
I
V
V
I
F
R
L
F
TGCGACTAAGGTGCTTACTTTATTTACAGAAGAATATAACTTTAAGTCAGCTCTCGTATTTGTCAGTCAGTTTTTCCAGGTGGACGTCGAGGCTTCGAGAGCGGATGTGATTCGTCTGTT L A C
L K
G D
*
TTTAGCGTGTCTAAAGGGTGATTAAATCTCTCGGAAAGAGGCTGAACTGTTTCCAGAGCATACATAAATCGCCATAATTATAGCGATAATTAAGTCGTCGGAACAGGCTTGTTTTTTAGC *
I
R
E
F
S A S S
N
G
S C
V
Y
I A N
I
I
A
I
L
I
D
D
S C
A Q
K
KA
GCTGTATGTTATGTAATTGTTAAC GGAGATCTGGTGAATGTTTCTGATTTGTTCTAGTGCGTATTCCACGGGATCGTAGGTAATTTTTAT TGTAAACGATATCAATTCCTGTGACGCCTT S Y T I Y N N V S I Q H I N R I Q E L A Y E V P D Y T I K I T F S I L E Q S A K
GATGTTTCCTGAATTAAAATTCGAGATAAAAAACTCAACTGCTAGTTTTTTTTCTTTACCGAGTAGGTAAAATGGCTGTGCTATCTGATTTTGGTCTGGGGTGTGGCAAGTCACCTG I N
G
S N
F
N
S I F F
E V
A
L
E
KK
K G
L
L Y
F P
Q A
I
Q N Q D
P T
H F
F
T V Q
TATAGATTTATTAGCCGTGATGTTTTCCTTTATGATACATGCAATTTTTACGGCAGAAGCTTGATTGGAATTCCCTTCTATGATGATTTTTACTTCCGTGAAGAAAGGATGTAGATCTAG I S K
N
T I N E
A
C
I I
K
A
I
V A
K
A Q
S
N
S N
G E
I
I
I
R
V
T
E
F
F
P H
L
D L
343 I
350 : 11880 344 I
L I
N H
A A
C E
A
I A
T
D
S S
T M
L
S E L F
Y
H
E
G
M
Y
V I
Y
Q
D L
Y
T
G
; 12240 224
12360 184
I
TGCTGCTATGCCTGTGCCAGAAGCACGGCGGTTGCCTGTATAGGCAGGGTCTAGGTATACGTACAGATCTTTACCTAAAAAGGGAATTAGATTTTTATTGATGGTGCTGTATCGGAAAAA A A
I
G
T
G
S
A R
R N
G
T
Y A P
L Y V Y
D
L D
K G
L F
P
I
E N
L N
I T S
12000 304
; 12120 264
GATTGACAATATCATATGCGCGGCACATTCAGCTATTGCCGTGTCGGAACTTGTCATTAAACTTTCTAGAAAGTAGTGCTCCATGCCGTAGACGATATATTGATCTAGATAGGTGCCTAT I S
11760
12480 144
Y R F F
CTC GAATTCGGTTTGGCCTTGTTCCGTAATTAAAACGTCGTTGATCACATTGCAAGTGGCTCCGCCCATGATTTCATGGATGAAAGCTCCTTCTAGGAAGAGGTTGGCCGTTTTTTTAAC : 12600 E F E T Q G Q E T I L V D N I V N C T A G G M I E H I F A G E L F L N A T K K V 104
TTCGGCGTTAATGCTGATGAACTTGGGTTTGTGGAGTCGGTAGCAAGAACAGGCTGTGGCGTTACCGCGTTCGTTTAACATATGGGCGTGATCTTCACATACGTAAGAAACTACGGAGAG ; 12720 E
A
N
I S I
F
P
K
K
H
L
R
Y
S C A T
C
A
N
G R E
N
L N
H A H D
E
C
V
S
Y
V V
S
L
64
CATTTCAAACGGAGAGTTGTTCAGCTTCATTAAAAAAGATGTTGAATGGTTTCCGGAATTGGTCGAAGATATAAATAGGATCTTGGTAGATGCTTGGGGCAGGAAACCTAAAATCGTGCT
12840 24
F
M E
P
S N
N
L
K
F
L
M
S T
S
H
N
G
T
S N
S
S
I
F
K
L I
SR
M
T N
S A Q S A L
L
P N
G
F I
G K
L I T D
D
S
F E
IN
14 12960 1
ID
54
GAACGCGTCTTTCTTTATAAAGTGGCTCTCGTCGACAATAAGCAGATTGAAGCTCTGTCCGCGTATACTCTGAGAGGGGAATGAATTCGGCGTTAAACGGGATAAAAGACGATTTTGAGA F A D K K I F H S E D V I L L N F S Q G R I S / ACCEPTOR FROM 12L C E
T
K
D
D L
F
K
I
I
D
K
I S K
N
C
N
F I V E
Q V
E
S
L
P R
R
V
D
S
A A
I
L F
ACTGTGAAACGAAAGACGACCTTTTTAAAATAATTGATAAAATAAGCAAAAATTGCAATTTTATAGTGGAACAGGTCGAGTCTTTGCCTCGGAGGGTGGATTCAGCGGCCATCCTATTTG ; 13080 N
9R
L
A V E
I F
N
D V I
Y R
Q
N
G
N E
V A A K
L P
R
I
K Y
Q
R
D
G N G Q D I D T * R V T G R I L T H R N
N Q
M C
T
T
85 25
ATAATCTCGCGGTGGAGATATTTAACGATGTAATATATCGACAAAATGGAGTTGCCGCGAAAATACGACAGGGTAACGGGCAGGATATTGACACATAAGAATAACCAGATGTGTACAACC : 13200
FIG. 2-Continued. 291
E C S Q M Y N L H N P I T F E L G L G N V F V C M R C L T V H H C D M G T D C T GAATGTT CTCAGATGTATAATTTACACAATCCTATCACGTTTGAGTTGGGACTTGGAAACGTGTTTGTCTGTATGCGGTGTTTGACGGTTCACCACTGTGATATGCAAACTGACTGTACC
I
V
T
N
H
E
G
Y
C
V
K
A
T
L
G
F
S
Y
G W M
P
A
A
Y
D
C
L
F
E
P
I
C
N
P
E
E
I
T
65 13320
V
ATTGTCAACACGCATGAGGGGTATGTCTGTGCAAAAACGGGTTTATTTTATAGCGGTTGGATGCCTGCATATGCAGACTGCTTCTTAGAACCGATCTGTGAGCCGAATATTGAAACGGTT
105 13440
N V V V V L L S Y V Y S F L M E N K E R Y A A I I D S I I K D G K F I K N V E D AATGTCGTGGTGGTCTTGTTATCTTAT GTATATAGTTTTTTGATGGAAAACAAGGAACGATATGCTGCCATTATTGATAGCATTATTAAAGATGGGAAATTTATAAAAAACGTGGAAGAC
145 13560
A V F Y T F N A V F T N S T F N K I P L T T I S R L F V Q L I I G G H A K G T I GCTGTGTTTTATACTTTTAACGCGGTTTTTACTAACTCAACTTTCAATAAGATTCCTCTGACGACGATAAGTCGTCTTTTTGTTCAGTTGATTATAGGAGGACACGCTAAAGGAACGATT
185 13680
Y
D
S
N
V
R
I
V
S
R
R
R
K
E
D
S
L
L
K
K
M
R
Y G N A L I L * M E T H L Y Y D
E
L
lOR
T
Y Q Y Q
L
G
216 16 13800
G
TATGACAGTAATGTAATTCGCGTCAGTCGTCGGAAACGAGAAGACAGTTTACTAAAAAAGATGAGATTGGAGTATGGAAACGCACTTATACTATGACACCCTGTATCAATATCAAGGCGG V
Y
A
P
I
H
C
L
P
T
D
V
L
C
P
M
R
D
V
C
I
S
E
L
Y
F
R
C
F
V
F
K
S
G
M
Y
H
T
EK
56 13920
AGTGTATCCGGCTCATATTTGCCTGCCGACAGATGTGTGTCTTCCGATGAGAGTGGATTGTATCGAGTCTTTATATTTTCGGTGTGTATTTTTTAAGAGTGGGATGCATTATACTGAATG S
K
K
L
F
T
V
I
S
R
I
E
K
K
F
D
V
L K D
A
D
S
D
E
V
F
T
L
G
V
T
M
V
I
P IP
I
V
GAGTAAATTAAAGTTTACTGTGATTTCACGTGAAATAAAGTTTAAAGATGTGTTAAAGGATGCGGACTCTGACGAAGTTTTTACCGGTTTGGTGGTAATGACTATCCCAATTCCGATAGT
I
96 14040
D F H F D I D S V I L K L V Y P R L V H R E I V L R L Y D L I C V R P P S N R P AGATTTTCATTTTGATATCGAT TCTGTAAT TTTGAAAT TGG TTTAT CCGCGGTTAGTGCACCGGGAAATAGTGCTGAGAC TCTAC GATC TTATAT GCGTCAGACCT CCGTCAAACCGGCC
136 14160
S E A S A K N I A N D F Y Q L T S R E N K Q T P D E E K R C L F F Q Q G P L E P GTCGGAAGCATCAGCTAAAAATATTGCTAATGATT TCTATCAACTAACCTCACGTGAAAATAAACAGACACCCGATGAGGAAAAACGTTGTCTATTTTTTCAGCAGGGACCTTTAGAGCC
176 14280
P
S
T
V
R
L
G
K
A
P
N
G
E
K
P
I
Q F
A
P
A
H
E K
N
M
T
E
S
F
L
S
D
S
W
G Q K
F
V
ACCCTCTACCGTCAGAGGCTTAAAGGCGCCCGGTAATGAAAAGCCAATACAATTTCCCGCCCATGCTAACGAAAAAATGACCGAATCTTTTTTAAGCGATAGTTGGTTCGGACAAAAAGT
216 14400
R C K K I L D F T Q T Y Q V V V C W Y E L S F S R E M Q I E N N L L S A S Q L K CAGAT GCAASAAAAATTTTGGATTTTAC GCAAAC GTATCAAGTCGTGGTATGTTGGTACGAGCTTTCGTTTTCCC GCGAGATGCAGATCGAGAATAATTTACTGTCC GCTTCCCAGCTAAA
256 14 520
R
V
A
N
D
A
F
D
W
T
R
N
R
Y
L
R
D
I
G
R
S
L
V
T
I
H
K
V
T
L Q
I
H
N
R
F
Q
K
Q
K
GCGGGTTAACGCTGCGGATTTTTGGGATAGAACTAATCGGTATTTGCGAGATATTGGAAGCAGGGTATTGACACACATCGTGAAAACGCTTCAGATTCATAATAGGCAATTTAAACAGAA
296 14640
F N C N F P D N F S F D R L L S F M Q L G K D F W I L N L T L D S C I IK A I I ATTTAATT GCAATTTTCCAGATAATTTCAGCTTTGATCGTCTATTATCATTTATGCAGCTCGGGAAAGATTTTTGGATTTTAAACTTAACTTTAGACAGCTGCATTATTAAGGCAATTAT
336 14760
C F L G F Q N G G K S F L A G D E V W G D L I D C S K G S V I Y G E K I QW I L CTGTTTCCTAGGTTTTCAAAACGGGGGAAAATCTTTTTTAGCCCAAGATGAAGTTTGGGGGGATTTAATAGACTGTTCTAAAGGATCGGTGATCTACGGGGAAAAGATCCAATGGATTTT
376 14880
D
T
S
N
N
L
Y
T
S
C
R
E
K
K S
N
Q
W E
L
Y
D
V
C
C
L
A
V
Y
E
S
K
E
L
D
L
F
V
L
P
406 15000
GGACTCGACTAACAATTTATATTCGACGTGTCGGGAAAAACAGAATAAGTCGTGGGAATTATATGTTGATTGCTGTGCTTTGTATGTATCTGAAAAGTTAGAGTTGGATTTTGTGCTACC 11R
G G F A I T G K F A L T D G D I D F F N W R F G L S * 432 M A I S T F S I G D L G Y L R N F L Q N E C N W F R I C 28 C GGCGGATTTGCAAT CACCGGTAAATTCGCGCTTACTGATGGCGATATCGACTTTTTCAATTGGCGATTTGGGTTATCTTAGAAATTTTCTGCAGAATGAATGTAACTGGTTCAGAATTT 15120
K
K
F
T
Y
R
R
Y
E
S
A
V
T
S
S
T
P
F
N
L
S
K
N
P
K
F
K
C
M
C
H
I
E
V
I
K
F
R
S
E
GTAAAAAACATTCTATCGCGAATATCGCAGCGTTGCGACATCGTCTCCTACATTCTCGTTGAATAATAAGCCTAAGAAATTTTGCATGCATTGCGAGATTGTAATATTCAAGCGAAGTG F
E
F
M
S
L
A
V
N
G
I
F
H
G
F
Q
T
L
G
K
K
M
F
K
N
K
A
V
P
G
E
L
Y
Y
I
Y
L
L
E
G
I
T
P
I
D
L
G
F
I
P
R
Y
S
N
D
C
T
V
N
C
R
M
T
V
P
E V
Y
I
N
E
C
S
I
V
C
P
E
108
151360
AAGAATTTATGTTCAGCCTTGCGGTAAATGGCATACATTTTGGGCAGTTTTTAACCGGAAAAATGAAATTTAATAAGAAAGCAGTTCCGGAAGGGCTCTATTACTATATATTGGAATTGG S
68
151240
E
148
GAAGCATAACCCCTATCGATTTGGGCTTTATTCCGAGATATAATTCCGACTGTGTTACAAACATGCGTTGTGTTACACCGGAGGTTATTTATGAAAATTGCTCTATTGTGTGTCCCGAAG
151480
A N R L T V K G S G D N K L T P L G G C G A W C L K N G G D L Y I Y T F A L A Y AGGCAAATCGCCTCACGGTAAAAGGGTCC GGGGACAATAAATTGACTCCCTTAGGTGGGTGTGGAGCATGGTGTCTGAAAAATGGTGGCGATCTGTATATCTATACTTTTGCACTCGCTT
151600
D L F L T C Y D K S T F P S L A K I I F D N I A C E S E D C V F C K D H N K H V AC GATCTT T TCCTAACTTGTTATGACAAAT CCACCTTTCCATCTCTGGCAAAAATTATTTTTGATATGATAGCTTGCGAATCCGAAGATTGTGTCTTTTGTAAAGATCACAACAAACATG
151720
S
G
A
Q
I
Q
G
V
C
V
N
S
Q
T
E
F
C
C
Y
T
K
C
S
K
K
M
A
I
N
N
N
P
L
E
I
S
L
L
C
D
TATCGCAAGCTGGACAGATTGTAGGGTGCGTCTCTAATCAAGAAACCTGTTTTTGCTACACATCGTGTAAGAAAAAAATGGCTAATATTAACAATCCGGAGTTAATCTCTCTGCTCTGTG
188
228 268
115840
Q E I N K I D I M Y P K I K A S L S L D I N S Y A H G Y F G D D P Y A L K C V N 308 AT CAGGAAATTAATAAGATAGATATTAT GTATCCCAAAATAAAAGCATC GTTATCACTGGACATTAATT C TTACGCT CAT GGGTACTTC GGT GACGACCCTTAT GC GTTAAAATGT GTTA 115960
I
W
12L
V
R
I
S
A
A
L
R
S
L
I
L
V
S
P
C
V
K
C
R
V
N
V
D
*
335 16 08 0
AAATAAAGTTAAATTGATAGTACT TACGTGTGTATTGTAGCAGCTGGCGAAAAGTGCTGTGCTCTTTATATTTTGATGGTCGATTGTAATTACATTATCCAGGCATGTGATTGTCTTTTC V H T N Y C S A F L A T S K I N Q H D I T I V N D L C T I T K E SPLICE DONOR 7L /
1620 0 261
T GGAAACATTCGGCGGCATTTAAACTCGACTTCTTTCATCACAAAGTGAGATACGTGTTTTTGATGTGCCACGTAACCGATGCTGATTCCTTCAATGTTCTTCAATAAAAAACTTATAAT P F M R R C K F E V E K M V F H S V H K Q H A V Y G I S I G E I N K L L F S I I
163120
AGGAACTATGAACCACGTTTTTCCATGTCTTCTTGGGACTAGGAATACGTTAGTTTTTTGTTTCAGCGTATTCAGAGTACTTTCGTTTACGAATTCAATGTCGAACACGTGGGTCAGATA
164140
P
V
I
F
W
K
T
G
H
R
R
P
V
L
F
V
T
N
K
Q K
L
T
N
L
T
S
E
N
V
F
E
I
D
F
V
H
T
L
Y
221 181
GTTGATAACAC GATTGGCTAAAGCTGGAAGTTTGGTGACGGCGATGAAGAAGATTACATGTATGAGAATGTTCTTCTGAAATGGTTCCAGTTGTATTTTGCGCGTGTCTCCAAAGTCGTT N I V R N A L A P L K T V A I F F I V H I L I N K Q F P E L Q I K R T D G F D N
161560
GAATTCTCCTTTGATCCATTTTCTGAAATCTTTGATAAAACCGTCTATTTGCAAGAACAGCGGTTCACGATATAGAATCTTAAAGGATTCAAGAAATTCCGTGTATTGTGATTCAAAGGA
161680
F
E
G
K
I
W
K
R
F
D
K
I
F
G
D
I
L
Q
F
P
L
E
R
Y
L
I
K
F
S
L
F
T
0
Y
141
S
101
TTT GTCGGAAGCGGGTAAAAACTTCATTTCTTGTAGCTTGTGCGAAAGGCTCGGTAGTATAGTAAGAGGCTTACGGTTTTTAAGATGTCGCTGTTTTTCACAAAATGTATATAGAGGTTT K D S A P L F K M E 0 L K H S L S P L I T L P K R N K L H R Q K E C F T Y L P K
161800
TACCTGTTGGTTATATGAATGGGCAAAGCCGAGTTCTGGCGTGAGCATTATAAAACGCTTCTTATAGAAGATGGCACTGTTGGGGTACTTGGTAGATATTGTCGAGCAGTCTCTTTCGCC
161920
V
13R
P
ATTGGATACCAGTCAGGATCAGCGCAGCTCTGAGCAGGCTGATTGTTTTGTCGTGTCCTGTGTGTAAGCGTGTGGTAATGGACTAATTGTGTGTTTTATGTATTAATTTTTTATTTCTGA
Q
Q N
Y
S
H
A
F
G
L
E
P
T
L
M
I
F
R
K
K
Y
F
I
A
S
N
P
Y
E
K
T
S
e
I
T
C
S
S
D
E F
R
E G
M S Q V R S M E P D L T L A A V Y 0 A A A N L T E 0 D T TTCCATATGATCGCTTCATAGTTATTTTTTAT GTGGGTTATGTCGCAGGTACGTAGCATGGAGCCCGACCTTACGTTGGCGGCGGTCTATCAGGCGGCGGCGAACCTCACAGAGCAAGA W
K
I
I A
E
Y
N
N
K I
H T I
D C
T
R
L M
K
E I F A E A V K T A F S V C S S A A P S A R L R M I E T P T 0 N F M F V T S V TAAGGAGATTTT TGCCGAAGCGGTAAAAACTGC GTT TTCAGTGTGTAGTTCGGCAGCCCCGAGCGCTAGGTTGAGGATGATCGAAACGCCTACACAGAATTTTATGTTTGTGACGAGCGT
61 21 27 17 04 0 1 67
I P S G V T S G E K K T K L N I D A A L D N L A L S F A N K K S K K M A R T Y L TATTCCTTC GGGTGTGACGTCTGGTGAAAAAAACAAAGTTAAATATCGATGCCGCTCTGGATAATT TGGCTTTGTCGTTTGCGAACAAAAAATCAAAGAAGATGGCTAGAACGTATTT
17 28 0
L Q N V L R T Q D Q Q V A I S G K Y I L Y T K K H I E T S L M I D K T K L V K K GCT GCAGAAC GTTTTGC GGACTCAAGATCAACAAGTT GCCATTTCGGGGAAGTACATTTTGTATACAAAAAAACACATTGAAACGTCTTTGATGATCGATAAGACGAAGTTAGTTAAAAA
17 40 0
I
L
E
Y
A
E
T
P
N
L
L
G
Y
T
D
V
R
D
L
E
C
L
L
N
L
V
F
C
G
P
K
S
F
C
Q
S
D
S
C
F
AATTCTCGAGTATGCCGAGACCCCTAATCTGTTAGGATATACCGATGTGCGTGATCTTGAATGTTTACTTTGGTTAGTGTTTTGTGGTCCTAAAAGTTTTTGCCAGTCAGACAGTTGTTT
FIG. 2-Continued. 292
107 147 187 17 52 0
G Y S K T G Y N A A F P N L L P P Y L Y E C G Q N N G L F F G I V Q A Y V F S CGGAACATAAGCGGATATATGCGCGTTCAAATTATGCCTCGTTCTGACGATGCGCCAAATATGACTGTTTTGGCTTGGCAACTTCGTGTTT
Y
S
D
F
D
F
S
A
L
E
I
S
E
R
A
R
R
R
I
R
L
S
L
D
Y
K
L
Q
K
F
K
A
Q
V
E
S
L
V
04
227
1764064
S
267
V
1776076
GTACCAGTTTTATTTTCGCGCTGAGTTTAGAAGCGTCGTGTCAATCGGTACTCTGTCGACTAAACAGAGTTGCGAGCAGAATTTTGTTTAT A
Q
S
04
C
I
F
C
A
L
Y
Q
K
N
K
L
S
L
Y
E
S
V
G
D
L
K
T
S
F
V
S
P
I
I
I
K
D
C
307
C
L
1788088
AGCGCACGATGGTACTTTGTGATTAATAACAAACAACTTGTCAGAAACGTTCTGTGCTTAAGATTCTTTTTAGTCAATATATAAGGATGTT A Q T T I S T T Q N L P G T K S S A I F P V Y D L R K L L G A L V I S K G S V CGCGAGAGACATTTTACACTCGATGTGCAGGTCAAGAGCCAGCATATTCGGTAATGCCTTGTAGCTGTCGTGCGTTGCATTCGGAGGTGTG F
14R
I
D
1800000 350 36
*
N4 5 L K D Y L R Q S I S K D L K V R N R D S L K I R L G K R H P L S V GTTCACAATAAGTCTTGAGGATATCGAGCAGTCATTCTAAGATTGGGGTAGACTCGGATTTTTAAGATAGTTAGGGAAGACTCCTTGATGT
N
H M4 I A A R Q I I K S C N A K Q Q N V I S S L 5 G F L D K Q K S F L R V Q Q CAGCTATATCGCGCAGGCGATATCAATCGACATGCGAACACACATGAATTCTTTTTAGTGTTTTTGGTAACAGAGAGTTTTAAGGTGCACA
K
1812012
Q
A L
347
K
Q
K
L E
K
L
K
D VD
I ID
TA
AKEVK A VS
N
D I KE
T
L IT
LKE
K
S T
76
1824024 N
15K D N G V E T P Q G Q K T Q P I N L P P V R K K L R K H K G L G K G V K K K L F TGGAAACGTGTGAGCACCCAGGTCAAAACTCACCGTAAATTACACCGTCGGAAAAGTAAGAAAATGAGGATCGGAAAGTGTAAAGAAACTT K
D
S
S
P
L
K
Q
K
I
S
C
A
S
M4 K
D
L
T
S
P
S
V
K
K
S
C
E
R
S
S
A
S
D
L
E
S
A
41
1848048
G
F
114 1
1836036
GCTTAAACAGCAGAAAATGGAGTCGCGAATAATGAACGGAGCGAAGGAAGCAGCAGAATGTATAAAGAACCTTAAACAGCATGATTAGATA
K
81
1860060
CCGAGATGCTCCCCTAAAAAAAGATTCCCCTGAGTATATGAACACTTCTCGCCGTAAGTCGAAGCGATCGGAAGGCTCTCTGATAAAGTTC
C K H K I A C D C S A I K K L L C N K S L L D 5 P M4 K L S N A H T I F S S N K 04 121 AATGAAAACGAATTCTTGGATGTTCGCGTAGAGAATGCTTGTACGATCGTTCTGACCGCCATGAACTTCGATGCCACCCATTTCGCTCAAC 1872072 K L K L K K I I A S K 0 I F L D 04 5 K N A K L A A Y G K T L C N L R I F K K I GGAACTGAGCTGAGAAATATACTTCAAGAGATTTTTAGAATGGTGAAATCTGACTTCGGCTACGCGAACTTGTGAACTGAGATTTCGAAAG S
P
F
L
F
D
Q
V
S
E
K
E
S
Y
S
V
V
Y
V
P
H
N
K
K
L
C
C
Q
F
C
C
K
P
K
N4
T
K
A
5 L
V
G
V
A
Y
G
K
V
F
D
L
D
K
V
A
I
K
A
T
K
N
K
D
S
I
V
A
S
F
I
A
G
I
V
R
K
A
S
G
A
L
L
S
K
H
C
V
I
N
L
N
I
L
S
N
S
04
C
V
5
K
H
S
V
L
K
S
T
Y
D
I
L
D
K
H
F
C
E
04
V
N
R
04
V
N
Y
S
Y
V
C
F
K
A
L
D
A
R
V
L
F
N
L
K
C
R
I
N
I
D
F
H
M4
P
S
N
F
I
L
H
K
K
K
I
I
C
F
A
V
L
A
D
Y
S
K
S
L
04
P
H
Y
N
N
G
T
C
A
A
I
K
K
Y
D
K
N
Q
L
L
P
I
S
R
N
K
F
04
D
C
F
P
N
F
G
L
P
R
V
A
N
04
A
I
V
L
N
V
C
G
A
F
D
K
G
N
P
N
L
H
C
L
N
C
L
C
A
C
A
F
V
L
V
S
C
K
L
V
04
T
K
K
D
G
K
C
K
C
A
L
Y
K
Y
K
K
L
F
A
A
K
N
A
K
C
L
N
L
P
K
P
Y
F
A
R
Y
D
A
C
C
K
L
V
K
A
V
H
L
V
L
G
L
F
L
Y
R
D
V
V K I Y K K L Y D F L D K R G K F G 5 K D L F K A T F L N N S K L T R R Q P I TGGTGAGTATAGAAAACTTACATTTCTAATGAAGAGGGATTTGGTCCGAACCTTTTAGGCACTTTTTAATATAGAAATTACAGAGTCACCA
G
L
A
C
L
S
S
K
S
Y
K
G
L
K
L
C
H
K
L
K
F
L
L
N
I
T
S
A
D
L
D
C
K
T
S
S
L
481
1980080
CCTTGCTACGAGCCGTCGTTGATCCCTCAATACCATTGCTACGGGAGCTGCTGAAATATTGCTAGCAGTATATTCTAGGTTTTGTTTACGA
K
441
1968068
TTAGCACGTAATTGATCTTGTCTTTGCCAGGTGTATATCTGTTGCTAGATGACGATAACGGGAGCCGGAACTCATTAACTAGAGAAAGTTG L
401
1956056
TTGTCCATTAGCGTACAATTCGTGAATGTTAACCTGATTCGACACTGTCCCAAGCATGATTTGTCAAGTGGCGGGCTTTGAGGTAAAAAAT K
361
1944044
TAAACATAAAAGAGTCATTTTATGCGTGTGGCGATACAGTTGCCGAATGATCCAATATAAGGCCGTGGCTTTGCAAAAGTAGACAAAACTT V
321
1932032
GGGAGTCGGAAGTATGAATACACAGGTGTTTGAAGTAGCGATCTGTAGGTTCTAATTGAATGTGAATAATATTTGATTCTCCCTTGAAATA N
281
1920020
CCGATTATATCCACAGTGGTTTTAAAACTATTATTCAAATCCTTTGATGGTCAAAATGTCTTGCACGACTATGAATTATCTCATAGTTGAA C
241
1908008
TCGGGTGGTGCTACGGAAGTGTTGACTAATAAGTGCCATAAGCGGCAATAAGAGAGGTGTATTCGGCTTCTAGCGGTTCATCGTCAAATCA C
201
1896096
GCTCCCGTTTTTTTATGTCAAGCGAGAGGTTCTATCAGTGTCACGTCCTACAAAAAAACTTGTGACATTTGTCACCTAGAAACTTGGCCGA G
161
1884084
R
521
1992092
F
H
561
04
16K
*562 04 C L C Q I S K T L S S V A K K K P L T 04 F L L D K L Y A I R K K I K 35 2016016 ATATTGAATAGAATGATCTGACAAAATCGAAAACTAGTTTGTGCCGAGAGAGCTTTACCAGTTTTACTGAAAACGTAGCAAACGGAAAGAT V P F S I V R L C H V Y C 04 L I K Y N A S N N N C I L G K K L I K K 04 Q Q F CAAGTCCTTTTAATGTTCCTTTGTCTGTTACTCATCTAAAAATATACGCTCTACAAAATTCATCTGGCCGAAACTATGAGGAATCAGCGTT
C
C
G
T
K
V
C
G
S
K
C
I
5
04
C
L
S
K
L
C
K
L
Y
C
C
Y
P
L
L
C
S
A
L
C
R
A
P
C
V
S
75
L
2028028 115
V
2040040
TGCGCACAGAGGGAGGATGGAGACATTCATGGTCTAGTGATTTGCAGCTTATGTTATGTCATTTTGTTTCGCTCGTGCGTGGCCTGTGATC N
K
L
F
K
I
V
K
R
K
T
K
G
C
S
K
N
P
L
04
H
A
L
R
K
Y
T
V
T
A
T
K
L
Y
D
I
Y
T
T
155
K
AACAGTTATTAAATCTAGGCGTAAATCGGGGCGTCGAAATCCCTCTGCACCGTTCGGAATAACCTGACGCGCTAATTGATGAATCATACACA
2052052
K VK K K I RNH L V AT F Y VK R F LKE Y K G CQ0 F F G K A V I Y G AK H4 RV TGTTTTTGAGTTAAGGACGCATTTTTGGGAAGGGTATTTTGGGCGAACAGAGCTGTATTAACACTCGAGCACCTTTAGTTAAAGGAAGTAA
195 2064064
C
L
T
G
L
L
L
D
P
5
5
G
V
F
G
L
S
A
C
A
C
F
G
I
F
S
N
D
K
C
F
L
04
V
K
K
K
A
L
I
F
235 2076076
ACGTGGGTTGCGCTGATCTTCTCTGAGTTTTGCGCTCTCAGAGCGTTTTGGAATTCTTCATGAGATGATTCTGAGGTAAGGAAAGCGCGAT K
I
K
F
K
Y
K
Y
L
K
C
K
K
C
H
F
V
S
K
L
L
K
N
P
T
K
K
S
F
S
C
F
I
L
S
N
P
V
P
V
275
I
K
F
K
K
K
C
K
I
P
5
5
K
K
Y
L
04
T
Y
C
F
Q
Y
K
P
C
K
K
L
K
T
C
P
T
P
A
I
L
A
P
315
I
K
C
L
L
C
L
N
K
T
Q
K
S
T
V
I
V
F
C
C
K
S
C
L
C
K
C
K
L
S
V
F
C
K
A
V
F
T
V
N
355
2112012
CATACAACAACGCTTGTTGAAGAAAACAAAATTACGTAACGTTTTGTTGAAGACGATTGTTGACAGAGCTTCTGGTTCAGAGGCGTGTTAC N
V
F
N
V
P
N
K
K
Y
F
C
F
S
C
L
L
C
V
Y
04
T
F
C
N
I
Y
C
N
N
N
P
Y
K
I
T
S
K
395 2124024
V
K
AATGATTGTAATCCAAACCAGTATTTTTCAGATCTTTACACATATGAATACTCGTTTATATAAGATCTAAAATCAGATATACGAAGTAGGA 5
P
I
H
V
V
A
T
K
F
F
K
K
K
T
K
K
K
S
V
L
N
L
I
C
K
T
K
I
Y
K
K
I
K
L
P
A
L
I
435
V
2136036
CCTTTGTCACATGTACGGTTTTTCAGAGAGAAAGAGAAGAAGAGTTGCATTGGGATGATGGACGAATTATGAAGAGAATACTTTGCCTGAT T
P
V
A
P
N
K
P
F
T
C
C
I
V
T
I
C
C
N
L
04
N
K
N
I
C
K
T
C
L
S
V
C
17K
A
04 04
C S A V N A K C C K P
ACTCGGTGCACGAACCGAATTACTGTTTGTTTAAAGAATATCAACTGGGGAAATATATTGCAGCGACAGTTACAGTAGGGCCAAGTGTGT C Y L AA C VKR K P K T P * V S C C M4 C K K T K N T L I
C
Y
K
C
N
P
I
L
L
A
N
K
F
T
V
L
T
C
T
K
S
K
K
K
CAGTTCTGCGGATGGTAAAAACCGAAACCCTTATTATTAAAGGAAACCTTATTCTACGAAGAATTACGTTTAACGATCCGATCGAAGAGAA A
C
L
K
K
P
L
L
K
K
V
V
A
K
C
C
T
K
A
K
K
K
L
P
C
K
S
K
K
*
C
04
475 8 2148048 488 48 2160060 77
21858
AACTAGGTAGCGTACAAG
FIG. 2-Continued. 293
294
J. VIROL.
LAWRENCE ET AL.
TABLE 1. Summary of data: ORFs, putative translation start sites, TATA consensus sequences, and lengths and relative molecular masses of predicted translation products Name Name
ORF start
ORF end
Length (amino acids)
size (kilodaltons)
ATG position
ATG context sequence
TATA position
TATTTAA CATAAAA TATTTAA
458 432
51.5 50.2
296
TATTAAA
1,345
33.5 151.9
772 350
88.7 39.9
666
76.2
85 216 432
9.6 24.9 51.4
335 353
37.9 39.4
114 561 488 77
13.2 63.7 56.6 8.3
TATA sequence
(119)a
OR 1L 2L
(2)a 1954 3315
358 544 2002
1921 3298
AACATGC AGCATGT
3L 4L
4412 8415
3480 4372
4367 8406
AAAATGG AACATGG
SR 6R 7L
8376 10724 12984
10733 11782 11785
8418 10733
GTTATGC AACATGG
1978 3330 3341 ?b 8450 8501 8352 10673
16989
TCCATGCe
17146
CATAAAA
12856 13099 13524 13542 ? 16849 16849 16944 17861 18318 20016 21398
TATAAAG TAT-TTAA TATTAAAC
TATAAAAC TATTTAGAC CATAAATA
12909d 12L
17007
16066
8R 9R 1OR
12891 13114 13745
13175 13773 15079
12921 13126 13754
GGAATGA AAAATGG AGTATGG
11R 13R
14948 16943
16043 18009
14R 15R 16R 17R
17962 18348 19996 21432
18354 20045 21519 21689
15039 16952 16961 16979 18013 18360 20056 21459
CTGATGG TTTATGTe GTTATGT AGCATGG ATAATGT ATTATGG GTAATGG GGTATGG
16108f
TATAAAAC ? TATAAAAC TATAAAAC TATTTTT TATAATAC TATAATAC GATAAAGC TATAACA
Incomplete ORF. ?, No obvious TATA consensus 5' to the first ATG of the ORF. c An intervening ATG codon lies between the proposed TATA consensus sequence and the first ATG of the ORF. d Splice acceptor site of ORF 7L. e ATG context sequence does not conform to the Kozak (36) consensus sequence (RNNATG or NNNATGG). f Splice donor site of ORF 12L. a
b
method. The overall G+C content was found to be 41%, considerably lower than the values for HSV-1 (68%) (40), EBV (60%) (1), and HCMV (58%) (60; M. S. Chee, A. T. Bankier, S. Beck, R. Bohni, C. M. Brown, R. Cerny, T. Horsnell, C. A. Hutchison III, T. Kouzarides, J. A. Martignetti, E. Preddie, S. C. Satchwell, P. Tomlinson, K. M. Weston, and B. G. Barrell, Curr. Top. Microbiol. Immunol., in press) and in the same range as those for VZV (46%) (15) and HVS L DNA (36%) (25). Observed frequencies of CpG dinucleotides in this portion of the HHV-6 DNA sequence did not differ significantly from those expected on the basis of random associations between mononucleotides (not shown). In contrast, eucaryotic DNA (6) and the genomes of the gammaherpesviruses EBV and HVS do show an overall CpG dinucleotide deficiency (27, 29). The region sequenced contains 1 partial and 17 complete ORFs, numbered OR (R, rightward; L, leftward) to 17R in TABLE 2. Summary of optimized FASTP scores observed in comparisons between HHV-6 ORF 11R and the homologous genes from the other human herpesviruses Virus
Score
ORF 11R
HHV-6 HCMV EBV HSV-1 VZV
11R 1821 HCMVUL94 530 BGLF2 221 UL16 108 44