Complete Genome Sequence of Elephant

2 downloads 0 Views 3MB Size Report
Jun 15, 2016 - termined the complete genomic DNA sequence of a strain of EEHV4 .... its status as a common core herpesvirus gene or is shared between ...
RESEARCH ARTICLE Ecological and Evolutionary Science

crossmark

Complete Genome Sequence of Elephant Endotheliotropic Herpesvirus 4, the First Example of a GC-Rich Branch Proboscivirus Paul D. Ling,a Simon Y. Long,b Angela Fuery,a Rong-Sheng Peng,a Sarah Y. Heaggans,b Xiang Qin,c Kim C. Worley,c Shannon Dugan,c Gary S. Haywardb Baylor College of Medicine, Houston, Texas, USAa; Viral Oncology Program, The Johns Hopkins School of Medicine, Baltimore, Maryland, USAb; The Human Genome Sequencing Center, Houston, Texas, USAc

A novel group of mammalian DNA viruses called elephant endotheliotropic herpesviruses (EEHVs) belonging to the Proboscivirus genus has been associated with nearly 100 cases of highly lethal acute hemorrhagic disease in young Asian elephants worldwide. The complete 180-kb genomes of prototype strains from three AT-rich branch viruses, EEHV1A, EEHV1B, and EEHV5, have been published. However, less than 6 kb of DNA sequence each from EEHV3, EEHV4, and EEHV7 showed them to be a hugely diverged second major branch with GC-rich characteristics. Here, we determined the complete 206-kb genome of EEHV4(Baylor) directly from trunk wash DNA by nextgeneration sequencing and de novo assembly procedures. Among a total of 119 genes with an overall colinear organization similar to those of the AT-rich EEHVs, major features of EEHV4 include a family of 26 paralogous 7xTM and vGPCR-like genes plus 25 novel or missing genes. The genome also contains an unusual distribution of tracts of 5 to 11 successive A or T nucleotides in intergenic domains between the mostly much higher GC content protein coding regions. Furthermore, an extremely high GC-rich bias in the third wobble position of codons clearly delineates the coding regions for many but not all proteins. There are also two novel captured cellular genes, including a C-type lectin (vECTL) and an O-linked acetylglucosamine transferase (vOGT), as well as an unusually large and complex Ori-Lyt dyad symmetry domain. Finally, 30 kb from a second strain proved to include three small chimeric domains, indicating the existence of distinct EEHV4A and EEHV4B subtypes.

ABSTRACT

Received 13 April 2016 Accepted 9 May 2016 Published 15 June 2016 Citation Ling PD, Long SY, Fuery A, Peng R-S, Heaggans SY, Qin X, Worley KC, Dugan S, Hayward GS. 2016. Complete genome sequence of elephant endotheliotropic herpesvirus 4, the first example of a GC-rich branch proboscivirus. mSphere 1(3):e00081-15. doi:10.1128/mSphere.00081-15. Editor Blossom Damania, UNC-Chapel Hill Copyright © 2016 Ling et al. This is an openaccess article distributed under the terms of the Creative Commons Attribution 4.0 International license. Address correspondence to Gary S. Hayward, [email protected]. For a companion article on this topic, see http://doi.org/10.1128/mSphere.00091-16.

IMPORTANCE Multiple species of herpesviruses from three different lineages of the Proboscivirus genus (EEHV1/6, EEHV2/5, and EEHV3/4/7) infect both Asian and African elephants, but lethal hemorrhagic disease is largely confined to Asian elephant calves and is predominantly associated with EEHV1. Milder disease caused by EEHV5 or EEHV4 is being increasingly recognized as well, but little is known about the latter, which is estimated to have diverged at least 35 million years ago from the others within a distinctive GC-rich branch of the Proboscivirus genus. Here, we have determined the complete genomic DNA sequence of a strain of EEHV4 obtained from a trunk wash sample collected from a surviving Asian elephant calf undergoing asymptomatic shedding during convalescence after an acute hemorrhagic disease episode. This represents the first example from among the three known GC-rich branch Proboscivirus species to be assembled and fully annotated. Several distinctive features of EEHV4 compared to AT-rich branch genomes are described KEYWORDS: elephant endotheliotropic herpesviruses, Elephas maximus calf, G-plusC nucleotide content bias, acute hemorrhagic disease, evolutionary divergence, trunk wash shedding Volume 1 Issue 3 e00081-15

msphere.asm.org 1

Ling et al.

T

he deaths of 62 young Asian elephants with acute hemorrhagic disease in North America and Europe have been attributed to rapidly developing uncontrolled primary systemic infections by members of a novel genus of mammalian herpesviruses, designated elephant endotheliotropic herpesviruses (EEHVs) (1–5). At least 43 additional lethal cases have also been confirmed by DNA tests within Asian range countries, including 16 published examples from India, Thailand, Cambodia, and Laos (6–8). Seven major genotypes, named EEHV1 to EEHV7, which all qualify as distinct species within the Proboscivirus genus, have been identified within North American elephants (1, 2, 9–12), although EEHV1A has been associated with the majority of lethal cases. Overall, some 46 highly divergent strains of EEHV1A have been identified by selective PCR sequencing-based gene subtyping at multiple loci. Only 10 examples of EEHV1B plus a smaller number of EEHV4 and EEHV5 viruses have also all been associated with systemic disease in Asian elephants (Elephas maximus), whereas three others, EEHV2, EEHV3, and EEHV6, were detected in the few rare disease cases in zoo African elephant (Loxodonta africana) calves (9–11). The latter three viruses as well as EEHV7 have also all been detected as quiescent infections in lung nodules from healthy adult African elephants (12). Just 12 young captive Asian elephants with confirmed EEHV1 DNA-positive systemic disease that had been treated with human antiherpesvirus drugs are considered survivors (9, 13, 14). Until recently, it seemed that EEHV1A and its less common partially chimeric variant EEHV1B (15–18) were the predominant viruses that Asian elephants might be encountering. However, routine testing between 2011 and 2015 within the most closely monitored U.S. zoo herd (which consists of just eight individual Asian adults and calves) has now also detected episodes in which multiple herdmates underwent sequential mild primary viremic infections with subsequent trunk wash shedding by first EEHV5 (19) and then later EEHV4 (20). Because at least four different EEHV1A strains had been documented in lethal cases at this same facility within the previous 15 years and most of this herd had already been observed to undergo subclinical infections by strains of EEHV1A or EEHV1B or both either several years earlier or later (21–23), we conclude that it is likely that most Asian elephants eventually become infected with multiple EEHV species and subtypes. The relative timing and order of these primary EEHV infections are expected to have major impacts on the levels of immune protection to disease caused by the others. Although no probosciviruses have yet been grown in cell culture, the complete genomes of four reference strains of AT-rich branch Asian elephant EEHVs have been determined previously directly from necropsy tissue, including two of EEHV1A and one each of EEHV1B and EEHV5A (24–26). Therefore, we wished to take the opportunity that this latest EEHV4 episode provided to learn more about the genetic relationships among the different EEHV lineages and species by determining the complete genomic DNA sequence of EEHV4 strain Baylor as the prototype example of a GC-rich branch proboscivirus. In the accompanying paper (27), we compare and contrast the genomes of these two major branches of the Proboscivirus genus as well as describe a number of additional characteristic novel features of the entire group. RESULTS Assembly of the complete 206-kb EEHV4(Baylor) genome sequence. The intact genomes of four prototype Asian strains have all been assembled by random highthroughput and de novo assembly approaches directly from high-quality necropsy tissue DNA obtained from young elephants that died of acute hemorrhagic disease (24–26). For EEHV4(Baylor), we instead used a trunk wash pellet DNA sample with a high measured ratio of viral to host cell DNA. From a total of 420 million 100-bp-long runs of raw data, close to half resembled African elephant DNA and were filtered out before the remainder were assembled de novo by the Velvet program using a variety of different k-mer size parameters. Four independent assembled contig libraries that were searched for matches to EEHV1(Kimba) genomic DNA produced results of between one and four contigs each, with the largest being 202,155 bp. There were three Volume 1 Issue 3 e00081-15

msphere.asm.org 2

Complete Genome of GC-Rich Branch Proboscivirus EEHV4

small gaps in the data overall, each of which was repaired by Sanger PCR sequencing and proved to be located in internal repetitive regions either within E34 (ORF-C) or in the predicted Ori-Lyt locus between U41 (major DNA-binding protein [MDBP]) and U42 (MTA) or close to the right terminus. Because of this being intracellular and not virion-derived DNA, the final results in each case were identical contiguous circular genomes of 205,896 bp (average coverage of 110-fold). To generate a linear version, we arbitrarily defined a G10 tract at the beginning of the original largest contig as the left end of the EEHV4(Baylor) genome. Importantly, two regions mapping very close to the right end contained distinct sets of potential packaging signal motifs, of which the first matched at 45 out of 54 nucleotides (83%) to both copies of the terminal repeats of EEHV1A(Emelia) and EEHV1A(Raman) and the second matched at 103 of 144 positions (72%) to all three copies of the terminal repeat “a” sequence of herpes simplex virus 1 KOS [HSV-1(KOS)]. The extreme left side of the genome (outside gene E1) contains several segments, including a set of 17-bp tandem repeats as well as a cluster of 8-bp palindromic cyclic AMP (cAMP) response elements (CREs) that have high-level homology to repetitive sequences present within the terminal repeats of the other EEHV genomes. Therefore, there is clear evidence that both ends of our assembled EEHV4 genome map within the predicted terminal repeat and lie close to the legitimate physical ends of the EEHV1A, EEHV1B, and EEHV5 genomes as determined by Wilkie et al. (25, 26). Furthermore, on the far right side of the EEHV4(Baylor) genome there is a 5.3-kb noncoding segment lying outside and immediately upstream of the gene encoding the U44 (ORF-L) transcription factor-like protein. This region is similar in size to the nearly 7 kb of noncoding DNA (which includes all of one copy of the 2.9-kb terminal repeat) at that position in the linear AT-rich branch genomes of the work of Wilkie et al. (25, 26). Based on the fact that we could assemble a single intact contig with terminal repeat sequences near both expected ends, we deduced that this must represent the entire EEHV4(Baylor) genome. Global features and initial comparisons with other EEHV genomes. The genome of EEHV4(Baylor) contains 119 open reading frames (ORFs) arranged as shown in the map in Fig. 1. A full listing of the names, sizes, and map coordinate positions for all designated protein coding ORFs arranged from left to right across the EEHV4(Baylor) genome oriented in the same direction as in our prototype EEHV1(Kimba) is presented in Table 1. These data include comparative information relative to the intact genomes of EEHV1A(Kimba) and EEHV1A(Raman), EEHV1B(Emelia), and EEHV5A(Vijay) to indicate all of the genes present in EEHV4 that are not found in the others, as well as all genes present within the others that are missing from EEHV4. Table 1 also shows the GC content of each ORF in EEHV4 and whether the gene is assigned to a gene family and its status as a common core herpesvirus gene or is shared between subsets of the alphaherpesvirus, betaherpesvirus, and gammaherpesvirus subfamilies or is unique to the Proboscivirus genus. Because we are proposing here that the probosciviruses have numerous novel biological properties and genetic and evolutionary features that may justify their future designation as a new deltaherpesvirus subfamily of mammalian herpesviruses, to reduce possible confusion later on we have adopted an interim dual nomenclature of either p or ␦ when referring to unique features of the Proboscivirus gene and protein sets in a phylogenetic context. The gene-ORF-protein nomenclature used here for EEHV4(Baylor) is also based on that used originally for EEHV1B(Kiba) by Ehlers et al. (17) and expanded upon for EEHV1A(Kimba) by Ling et al. (24) to include an E-series numbering system for all novel Proboscivirus-specific proteins. All proteins with identifiable orthologues in EEHV1A(Kimba) retain those same numbers, whereas all newly assigned proteins that are unique to EEHV4 have been given distinctive E number descriptors. The most obvious feature about the EEHV4 genome is that the overall orientation and gene content (especially within the central core gene segment) are essentially conserved and colinear with those of the three AT-rich branch genomes, although with considerable divergence toward both ends as revealed in a full-length genomic dot Volume 1 Issue 3 e00081-15

msphere.asm.org 3

Ling et al.

FIG 1 Annotated physical gene map of the complete EEHV4(Baylor) genome. The intact 206-kb EEHV4B(Baylor) genome determined here (GenBank accession no. KT832477) is depicted to scale. The relative sizes and orientations of all predicted open reading frames (ORFs) are indicated by horizontal arrows. Gene nomenclature is shown below each of the ORFs. The color key indicates groups of ORFs shared between all herpesviruses or subsets of herpesvirus subfamilies or multiple paralogues of repetitive gene families. Gray arrows indicate captured cellular genes, and white arrows denote novel genes that do not have obvious orthologues outside of the probosciviruses. Thin lines connecting arrows indicate introns. The position of the putative lytic replication origin is marked by a black rectangle.

matrix comparison (Fig. 2a). In particular, the large 40-kb inversion of the conserved core blocks I, II, and III between U27 and U44 in betaherpesviruses and a second smaller inversion of a weakly conserved 24-kb gene block are both retained in common with the organization found in EEHV1A, EEHV1B, and EEHV5. Part of the core gene region involved in the first of these two genomic inversions is clearly revealed on the left side within a dot matrix comparison with the prototype betaherpesvirus genome human cytomegalovirus Merlin [HCMV(Merlin)] (Fig. 2b), although the second inverted region (further toward the left side) is not sufficiently conserved to be detectable in the diagram. The other half of the conserved core segment (blocks IV, V, VI, and VII) produces the largest visible signal, located further toward the right side, but is not inverted. No other mammalian herpesviruses have this same type of overall gene organization as found within both major branches of the EEHVs. Volume 1 Issue 3 e00081-15

msphere.asm.org 4

Complete Genome of GC-Rich Branch Proboscivirus EEHV4

TABLE 1 Gene content and major features of the complete 205,896-bp EEHV4(Baylor) genomeh,i % amino acid identity (% length matched) to: Gene name and orientation Protein name TR TR Nil Nil Nil Nil Nil Nil

3.5⫻ 22-bp Regulatory motifs vFUT9

Nil Nil Nil Nil Nil Nil Nil Nil Nil Nil Nil Nil Nil Nil E1, F Nil E3, F

vGPCR8

E3.1, F E3.2, F E2A, F

Type

Position coordinates

% GC content

Protein size (aa)

340–420 1070–1410 7xTM

vGPCR7

Family or status

7xTM

vIgF1

7xTM 7xTM 7xTM

E3fam E49fam IgFam E49fam IgFam IgFam IgFam IgFam Novel IgFam IgFam IgFam IgFam IgFam E3fam E3fam E3fam

2061–3608

69

515

3981–4928

68

315

vGPCR6.1 vGPCR6.2

7xTM 7xTM 7xTM

E3fam E3fam Novel

4991–6169 6747–7727 8015–9295

62 67 58

392 328 426

E3.3, F E3.4, F E4, F Nil Nil

vGPCR6.3 vGPCR6.4 vGCNT1 vGPCR5

7xTM 7xTM AcTransf 7xTM

E3fam E3fam Novel E3fam Novel

9714–10721 11637–12674 13026–14642

50 56 63

335 345 538

Nil E4A, F E4B, F E4C, F E6A, C E6B, F E6, C Nil, C E7, C E7B, C Nil E9, C

vCD48

15171–15653 15712–16578 16779–17234 17811–17398 17978–18256 19679–18855

44 56 54 57 49 60

160 288 151 137 92 274

20663–19956 21647–20871

55 26

235 258

22653–21757

27

298

Novel Novel

22993–24237 24664–24281

42 23

414 127

Novel E6fam

25233–24718 26078–25197

27 61

171

E6fam E6fam E6fam Novel E6fam E14fam E14fam E14fam E15fam

27127–26363 28314–27457 28547–28281 29780–28965 31024–30215 32191–31280 33228–32419 34435–33425

61 60 60 58 59 52 57 58

254 285 88 271 269 303 269 336

35571–34768

57

267

E9A, F E9B, C

vIgF2

vIgF2.4 vIgF2.5 vOX2-1 vIgF3 vCD48 vCD48

Cys-rich vGPCR6

E27ex1 7xTM vCXCL2? 7xTM 7xTM 7xTM 7xTM vOGT (Truncated)

E9C, C E10A, C Nil E11, C E12, C E12A, C E13, C E14.1, C E14.2, C E14, C E15, C

7xTM

Novel Novel IgFam E3fam E49fam Novel

AcTransf

7xTM 7xTM 7xTM 7xTM

vGPCR4

E16, C Nil

7xTM 7xTM 7xTM 7xTM 7xTM 7xTM

IgFam Novel Novel Novel Novel Novel E6fam E7A E6fam Novel E6fam E6fam

E14fam Novel

EEHV1A Kimba

EEHV1A Raman

EEHV1B Emelia

EEHV5 Vijay

Note or comment

Related to multimerized 17-bp repeats in TR of EEHV1A/1B/5 All have a cluster of 6–9⫻ palindromic (8-bp) CREB-binding sites E47 EE63 EE63 EE63 Absent in EEHV4a Nil Nil Nil EE62B Unique to EEHV5 Nil Nil Nil EE62A Unique to EEHV5 E48 EE62 Nil EE62 Absent in EEHV4 E49 EE61 Nil Frag Absent in EEHV4 E50 EE60 Nil Nil Absent in EEHV4, -5 outside the probosciviruses Frag Frag EE59 Nil Absent in EEHV4 E51 EE58 EE58 Nil Absent in EEHV4 E52 EE57 Frag Nil Absent in EEHB4 Nil Nil EE56 Nil Absent in EEHV4 Nil Nil EE55 Nil Absent in EEHV4 Nil Nil EE54 Nil Absent in EEHV4 Frag EE53 EE53 Nil Absent in EEHV4 E53 EE52 EE52 EE52 Absent in EEHV4 E54 EE51 EE51 EE51 Absent in EEHV4 E55 EE50 EE50 EE50 Absent in EEHV4 Nil Nil Nil EE49D Unique to EEHV5 Nil Nil Nil EE49C Unique to EEHV5 Nil Nil Nil EE49B Unique to EEHV5 Nil Nil Nil EE49A Unique to EEHV5 25 (45) EE49 EE49 EE49 N-term S/T extended E2 EE48 EE48 EE48 Absent in EEHV4 30 (70) EE47 EE47 EE47 Match to RAIP3 or C-5-Afam 28 (60) EE45 EE45 EE45 Unique to EEHV4 30 (45) EE45 EE45 EE45 Unique to EEHV4 Nil Nil Nil Nil Unique to EEHV4; S/T dom 34 (41) EE45 EE45 EE45 Unique to EEHV4 38 (69) EE45 EE45 EE45 Unique to EEHV4 61 (68) EE46 EE46 EE46 —b E5 EE45 EE45 EE45 Absent in EEHV4 E5A EE44 EE44 Nil Short, memb, absent EEHV4 Nil Nil Nil EE44A Unique to EEHV5 Nil Nil Nil Nil Unique to EEHV4 Nil Nil Nil Nil Unique to EEHV4 Nil Nil Nil Nil Unique to EEHV4 E27 EE20 Nil EE20 35% (44%) match to E27 Nil Nil Nil Nil Unique to EEHV4 33 (89) EE43 EE43 EE43 Nil Nil Nil Absent in EEHV1B/4/5 32 (84) EE42 EE42 EE42 Nil Nil Nil Nil Unique to EEHV4 E8 EE41 EE41 EE41 Absent in EEHV4 34 (18) EE40 Frag EE40 Match to C-term EE40 only (esp EEHV5) Nil Nil Nil Nil Unique to EEHV4c Nil Nil Nil EE39/40 Match to EEHV5 N-term16-aa EE39/40 only Nil Nil Nil Nil Unique to EEHV4 Nil Nil Nil 25 (46) Matches central EE40 (EEHV5) only E10 EE39 EE39 EE39 Absent in EEHV4 36 (96) EE38 EE38 EE38 34 (77) EE37 EE37 EE37 Nil Nil Nil Nil Unique to EEHV4 52 (88) EE36 EE36 EE36 27 (91) Nil Nil Nil Duplication of E14 24 (77) Nil Nil Nil Duplication of E14 26 (79) EE35 EE35 EE35 34 (86) EE34 EE34 EE34 26% (49%) match to Lox C-5-C 40 (97) EE33 EE33 EE33 E16C Nil Nil Nil Conserved in EEHV1 and EEHV5

(Continued on following page)

Volume 1 Issue 3 e00081-15

msphere.asm.org 5

Ling et al.

TABLE 1 (CONTINUED) % amino acid identity (% length matched) to: Gene name and orientation Protein name

Type

Nil E16D, C E17, F

E20B, C E20A, F E21, C E22, F E22A, F E23B, C E24B, C Nil Nil Nil E26, C E27, F

vECTL E27ex1

Protein size (aa)

ORF-F2 vGPCR4A

vGPCR4B

vOX2-Bex2 vOX2-Bex1 vOX2-3 vOX2-V (E23A) vOX2-2 vGPCR3 E27ex1 E27ex2

EEHV1A Kimba

EEHV1A Raman

EEHV1B Emelia

EEHV5 Vijay

E16A/B

Nil

Nil

Nil

36141–35584 36491–36826

45 45

185 111

Nil 37 (99)

Nil EE32

Nil EE32

Nil EE32

36944–37252

63

102

7xTM

Novel Novel E18fam

37215–38018

57

267

Nil Nil 32 (70)

Nil Nil EE31

Nil Nil EE31

Nil EE32A EE31

7xTM

Novel Novel Novel U54.5fam E15fam

38433–38720 39303–41042 43188–42154

59 68 57

94 579 344

E18B E18A Nil 52 (89) 46 (84)

EE30A EE30 Nil EE29 EE28

EE30A EE30 Nil EE29 EE28

EE30A EE30 Nil EE29 EE28

7xTM

Novel Novel E15fam

45323–44904 45338–45664 46925–45810

65 61 59

139 108 371

Nil 43 (33) 35 (76)

Nil EE27 EE26

Nil EE27 EE26

Nil EE27 EE26

Novel Novel Novel Novel Novel

47646–47921 48621–48730 48993–49325 49596–49250 50001–49950

49 53 57 56

91 79 110 132

48 (96) 46 (48) Nil E54

EE25 EE24 Nil EE51

EE25 EE24 Nil EE51

EE25 EE24 Nil EE51

E24 Nil E25 42 (92) 57 (58) Nil 44 (90)

EE23 Nil EE22 EE21 EE20ex1 Nil EE19

EE23 Nil EE22 EE21 EE20ex1 Nil EE19

EE23 EE22A EE22 EE21 EE20ex1 Nil EE19

EE18 EE17 Nil Nil EE16 EE15 Nil EE14 EE13 EE12A EE12 U14 UL34 U12 U12 U11

EE18 EE17 Nil Nil EE16 EE15 Nil EE14 EE13 Frag EE12 U14 UL34 U12 U12 U11

EE18 EE17 Nil Nil EE16 EE15 Nil EE14 EE13 Nil EE12 U14 UL34 U12 U12 U11

U4 EE11 EE10 U44

U4 EE11 EE10 U44

U4 EE11 EE10 U44

7xTM E27

E3fam Novel Novel E18fam

51287–50418 52138–52606 52785–53053 53115–53861

50 55 59 54

289 245

E30Aex2 E30Aex1

Novel Novel Novel Novel

54092–54781 55453–54911 55883–55534 56146–56095

55 55 42 40

229 180 133

56923–56321 57113–56610 57646–57182 60473–57717

56

200

E31Cex U14.5

Novel Novel Novel ␤␦?

61 60

136 918

Novel ␤␦ ␤␦ ␤␦

50 59 52 57

81 556 432 783

Novel

60991–61236 63078–61408 64742–63444 67327–65060 67537–67454 68025–74276

59

2,083

42 (91) ⬍15 Nil Nil E31 35 (46) Nil 65 (12) 45 (82) E33 37 (59) 37 (75) 76 (54) 50 (53) 56 (100) 42 (16)

␤␦ ␤␦ Novel Core

74389–76053 76687–78444 79137–81659 82518–83729

61 63 63 55

554 585 840 403

58 59 51 77

59

1,201

64

1,401

51 (84) 65 (51) 52 (24)

U43 U42 U42

U43 U42 U42

U43 U42 U42

61 59 57 62 59 63 48 53 64 58 65 61

1,171 697 862 1,084 268 538 95 300 534 87 2,321 1,713

63 73 64 65 64 69 69 63 49 46 44 45

U41 U40 U39 U38 U37 U36 U35 U34 U33 U32 U31 U30

U41 U40 U39 U38 U37 U36 U35 U34 U33 U32 U31 U30

U41 U40 U39 U38 U37 U36 U35 U34 U33 U32 U31 U30

7xTM

E29, F E30, C E30A, C

7xTM

E34, F

U14 U13.5 vGPCR2ex2 vGPCR2ex1 ORF-C

U4, F U4.5, F E35, F U44, C

U4 ORF-B ORF-A U44

U43, F U42, F

PRI MTAex1 MTAex2

Core Core

83629–87234 87495–87627 87948–92020

MDBP TER2 gB POL DOC

Core Core Core Core Core Core Core Core ␤␥␦ Core Core Core

92346–93325 93861–97376 97498–99591 99533–102121 102273–105527 106545–105740 108154–106538 108266–108553 108717–109619 109914–111518 111409–111672 118856–111873 124420–119289

Ori-Lyt U41, F U40, F U39, F U38, F U37, C U36, C U35, F U34, F U33, F U32, F U31, C U30, C

% GC content

Novel Novel

E28, F

Nil E31A, C E31B, C E31C, C E32, C Nil, F E33A U14, C U13.5, C U12, C

Position coordinates

Novel

E17A, F Nil E18, F Nil Nil E18C, F E19, F E20, C

Family or status

CRP SCP TEG-L TEG-S

7xTM

U4 U4

248

(94) (93) (46) (23)

(99) (98) (94) (99) (97) (89) (98) (99) (91) (67) (89) (53)

Note or comment Spliced; unique to EEHV1A/B Match to E27, 30% (50%) Unique to EEHV4 Unique to EEHV5 Related to E28 by 30% (51%) Absent in EEHV4 Absent in EEHV4 Unique to EEHV4 25% (77%) to ORF-F1 23% (67%) match to Lox RAIP3 Unique to EEHV4 27% (35%) match to Lox RAIP3

Unique to EEHV4 Short first exon Absent in EEHV4 Unique to EEHV5 Absent in EEHV4 Match to ChemR C-5-C Related to E6A, E17 Unrelated to EE20ex2 Related to E18 by 30% (51%) Acidic similarity only Unique to EEHV4 Unique to EEHV4 Absent in EEHV4 Only N-term cons Unique to EEHV4 No ATG, splice to E32? Unique to EEHV1A

Short first exon Only N-term cons in EEHV1, and -5 24% (38%) HHV6 U4 24% (30%) U4 Only C-term cons in EEHV1 and -5 Primase subunit Short first exon Posttranscriptional regulator —d SS DNA binding protein Env glycoprotein B DNA polymerase Docking protein

Cys-rich protein Small capsid protein Large tegument Small tegument

(Continued on following page)

Volume 1 Issue 3 e00081-15

msphere.asm.org 6

Complete Genome of GC-Rich Branch Proboscivirus EEHV4

TABLE 1 (CONTINUED) % amino acid identity (% length matched) to: Gene name and orientation Protein name U29, F U28, F U27.5, F U27, F U45.7, F U46, F U47, C U48, C U48.5, C U49, F U50, F U51, F U52, C U53, F U54.5, C

TRI1 RRA RRB (ORF-H) PPF ORF-J gN gO (ORF-D) gH TK (ORF-E)

U56, C U5, C U58, F U59, F U60, C U62, F U63, F U64, F U65, F U66, C

TRI2 MCP vTBP

U67, U68, U69, U70, U71, U72,

F F F F F C

PAC2 vGPCR1 SCA/PRO ORF-F1

TERex3

PAC1 TERex2 TERex1

CPK EXO MyrTeg gM

Type

7xTM

Family or status

Position coordinates

% GC content

Protein size (aa)

EEHV1A Kimba

EEHV1A Raman

EEHV1B Emelia

EEHV5 Vijay

Core Core ␣␥␦ Core Novel Core 〉␦ Core ␣␥␦ Core Core ␤␦ Core Core U54.5fam

124423–125313 125563–128073 128212–129117 129702–131023 131035–131784 131800–132105 132837–132187 135086–132816 136101–135052 136100–136801 136620–138347 138430–139647 140605–139832 140698–142485 144154–142715

60 61 53 64 58 49 52 49 56 57 57 56 53 60 61

296 836 301 437 216 101 216 756 349 233 575 405 257 595 479

60 (98) 68 (66) 75 (99) 52 (65) 44 (33) 63 (53) 34 (94) 47 (96) 50 (87) 48 (85) 64 (99) 42 (95) 65 (98) 49 (91) 38 (99)

U29 U28 EE9 U27 EE8 U46 U47 U48 EE7 U49 U50 U51 U52 U53 U54

U29 U28 EE9 U27 EE8 U46 U47 U48 EE7 U49 U50 U51 U52 U53 U54

U29 U28 EE9 U27 EE8 U46 U47 U48 EE7 U49 U50 U51 U52 U53 U54

Core Core ␤␥␦ ␤␥␦ Core ␤␥␦ ␤␥␦ Core Core Novel Core ␤␥␦ Core Core Core Core Core

145344–144445 149563–145511 150117–153134 152794–154152 155504–154377 155733–156005 155944–156546 156527–158552 158055–159071 159272–159153 160224–159490 160612–161742 161739–162104 162613–164232 164529–166094 166031–166342 167655–166537

59 63 61 62 57 54 51 64 59

299 1,350 1,005 452 660 90 200 541 338

53 58 51 59 61 56 55

376 121 539 521 103 372

68 71 63 48 92 57 67 48 48 90 89 61 66 57 53 41 66

(99) (99) (87) (79) (99) (97) (72) (63) (98) (100) (99) (98) (98) (96) (97) (67) (93)

U56 U57 U58 U59 U60 U62 U63 U64 U65 U66 U66 U67 U68 U69 U70 U71 U72

U56 U57 U58 U59 U60 U62 U63 U64 U65 U66 U66 U67 U68 U69 U70 U71 U72

U56 U57 U58 U59 U60 U62 U63 U64 U65 U66 U66 U67 U68 U69 U70 U71 U72

168094–171645 171659–173830 174625–173813 176750–174579 176701–179574 180833–183805

61 63 63 63 63 65

1,183 723 270 723 957 990

65 61 54 74 79 62

(68) (91) (84) (78) (79) (19)

U73 U74 U75 U76 U77 U79

U73 U74 U75 U76 U77 U79

U73 U74 U75 U76 U77 U79

E36A

EE6

Nil

EE6

68 (68) 37 (94) 37 (95)