Evolution of gene fusions: horizontal transfer versus ... - BioMedSearch

12 downloads 0 Views 186KB Size Report
Apr 26, 2002 - COG2. 030. COG0. 703. COG1. 003. COG0. 511. COG1. 038. COG0. 825. COG0. 607. COG0. 825. COG1. 098. COG1. 752. COG2. 049. COG1.
http://genomebiology.com/2002/3/5/research/0024.1

Research

Itai Yanai*, Yuri I Wolf† and Eugene V Koonin†

comment

Evolution of gene fusions: horizontal transfer versus independent events Addresses: *Bioinformatics Graduate Program and Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA. †National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MA 20894, USA. Correspondence: Eugene V Koonin. E-mail: [email protected]

Genome Biology 2002, 3(5):research0024.1–0024.13

Received: 12 November 2001 Revised: 7 February 2002 Accepted: 26 March 2002

reviews

Published: 26 April 2002

The electronic version of this article is the complete one and can be found online at http://genomebiology.com/2002/3/5/research/0024 © 2002 Yanai et al., licensee BioMed Central Ltd (Print ISSN 1465-6906; Online ISSN 1465-6914) reports

Abstract

Gene fusion leading to the formation of multidomain proteins is one of the major routes of protein evolution. Gene fusions characteristically bring together proteins that function in a concerted manner, such as successive enzymes in metabolic pathways, enzymes and the domains involved in their regulation, or DNA-binding domains and ligand-binding domains

in prokaryotic transcriptional regulators [1-3]. The selective advantage of domain fusion lies in the increased efficiency of coupling of the corresponding biochemical reaction or signal transduction step [1] and in the tight co-regulation of expression of the fused domains. In signal transduction systems, such as prokaryotic two-component regulators and sugar phosphotransferase (PTS) systems, or eukaryotic

information

Background

interactions

Conclusion: These findings suggest a major role for horizontal transfer of gene fusions in the evolution of protein-domain architectures, but also indicate that independent fusions of the same pair of domains in distant species is not uncommon, which suggests positive selection for the multidomain architectures.

refereed research

Results: The evolutionary history of gene fusions was studied by phylogenetic analysis of the domains in the fused proteins and the orthologous domains that form stand-alone proteins. Clustering of fusion components from phylogenetically distant species was construed as evidence of dissemination of the fused genes by horizontal transfer. Of the 51 examined gene fusions that are represented in at least two of the three primary kingdoms (Bacteria, Archaea and Eukaryota), 31 were most probably disseminated by cross-kingdom horizontal gene transfer, whereas 14 appeared to have evolved independently in different kingdoms and two were probably inherited from the common ancestor of modern life forms. On many occasions, the evolutionary scenario also involves one or more secondary fissions of the fusion gene. For approximately half of the fusions, stand-alone forms of the fusion components are encoded by juxtaposed genes, which are known or predicted to belong to the same operon in some of the prokaryotic genomes. This indicates that evolution of gene fusions often, if not always, involves an intermediate stage, during which the future fusion components exist as juxtaposed and co-regulated, but still distinct, genes within operons.

deposited research

Background: Gene fusions can be used as tools for functional prediction and also as evolutionary markers. Fused genes often show a scattered phyletic distribution, which suggests a role for processes other than vertical inheritance in their evolution.

2 Genome Biology

Vol 3 No 5

Yanai et al.

receptor kinases, domain fusion is the main principle of functional design [4-6]. Furthermore, accretion of multiple domains appears to be one of the important routes for increasing functional complexity in the evolution of multicellular eukaryotes [7-9]. Pairs of distinct genes that are fused in at least one genome have been termed fusion-linked [3]. A gene fusion is presumably fixed during evolution only when the partners cooperate functionally and, by inference, a functional link can be predicted to exist between fusion-linked genes. Recently, this simple concept has been used by several groups as a means of systematic prediction of the functions of uncharacterized genes [1-3,10,11]. In addition to their utility for functional prediction, analysis of gene fusions may help in addressing fundamental evolutionary issues. Gene fusions often show scattered phyletic patterns, appearing in several species from different lineages. By investigating the phylogenies of each of the two fusion-linked genes, it may be possible to determine the evolutionary scenario for the fusion itself. A recent study provided evidence that the fission of fused genes occurred during evolution at a rate comparable to that of fusion [12]. Here, we address another central aspect of the evolution of gene fusions, namely, do fusions of the same domains in different phylogenetic lineages reflect vertical descent, possibly accompanied by multiple lineage-specific fission events, or independent fusion events, or horizontal transfer of the fused gene? In other words, is a fusion of a given pair of genes extremely rare and, once formed, is it spread by horizontal gene transfer (HGT) perhaps also followed by fissions in some lineages? Alternatively, are independent fusions of the same gene pair in distinct lineages relatively common during evolution? Among fusions that are found in at least two of the three primary kingdoms of life (Bacteria, Archaea and Eukaryota), we detected both modes of evolution, but horizontal transfer of a fused gene appeared to be more common than independent fusion events or vertical inheritance with multiple fissions.

Results and discussion To distinguish between a single fusion event followed by HGT and/or fission of the fused gene and multiple, independent fusion events in distinct organisms, we analyzed phylogenetic trees that were constructed separately for each of the fusion-linked domains (proteins). The fusion was split into the individual component domains and phylogenetic trees were built for each of the corresponding orthologous sets from 32 complete microbial genomes (Figure 1, and see Materials and methods), including both fusion components and products of stand-alone genes. The topologies of the resulting trees were compared to each other and to the topology of a phylogenetic tree constructed on the basis of a concatenated alignment of ribosomal proteins, which was

chosen as the (hypothetical) species tree of the organisms involved [13]. If the fusion events either occurred independently of each other or were vertically inherited, perhaps followed by fission in some lineages, the distribution of the fusion components in the phylogenetic trees for the orthologous clusters to which they belong is expected to mimic the distribution of the species carrying the fusion in the species tree. In contrast, if the fusion gene has been disseminated by HGT, fusion components will form odd clusters different from those in the species tree. This could be a straightforward approach to reconstructing the evolutionary history of gene fusions, if only the topology of the species trees was well resolved. However, this is not necessarily the case for bacteria or archaea, where relationships between major lineages remain uncertain [14,15], although a recent detailed analysis suggested some higherlevel evolutionary affinities [13]. Because the distinction between the three primary kingdoms is widely recognized [14,16] and is clear in the trees for most protein families [17], trans-kingdom horizontal transfers of fused genes can be more reliably detected with the proposed approach. Therefore, we concentrated on the evolutionary histories of gene fusions that are shared by at least two of the three primary kingdoms. As the framework for this analysis, we used the database of clusters of orthologous groups (COGs) of proteins [18,19], which contains sets of orthologous proteins and domains from complete microbial genomes (32 genomes at the time of this analysis; see Materials and methods). Domain fusions represented in some genomes by stand-alone versions of the fusion components are split in the COG database so that each fusion component can be assigned to a different COG. Whenever distinct domains of a fusion protein belong to separate COGs, the corresponding COGs are said to be fusion-linked [3]. A search of the COGs database revealed 405 pairs of fusion-linked COGs. The vast majority (87%) of fusion links include fusion present in only one primary kingdom (Table 1). Only 52 pairs of fusion-linked COGs

Table 1 Phyletic patterns of gene fusions Kingdom profile*

Number of fusion links between COGs

abe ab-be a-e a--b--e Total *a, Archaea; b, Bacteria; e, Eukaryota.

3 27 20 1 82 215 56 405

http://genomebiology.com/2002/3/5/research/0024.3

comment

tma aae syn uur mge mpn bsu bha ctr cpn tpa bbu cje hpy rpr nme xfa pae bap eco hin vch dra mtu

reviews

COG0046 COG0067 COG0067 COG0069 COG0139 COG0145 COG0147 COG0169 COG0294 COG0304 COG0331 COG0337 COG0403 COG0439 COG0439 COG0439 COG0476 COG0511 COG0557 COG0664 COG1984 COG1155 COG0025 COG0062 COG0069 COG0077 COG0108 COG0280 COG0287 COG0301 COG0340 COG0351 COG0468 COG0475 COG0550 COG0558 COG0560 COG0649 COG0662 COG0674 COG0777 COG1013 COG1112 COG1239 COG1361 COG1387 COG1683 COG1788 COG3261 COG0511 COG0518 COG0674

included fusions represented in two or three kingdoms (Table 1), and for reasons discussed above, we chose these pairs of COGs for an evolutionary analysis of gene fusions.

information

Figure 2 shows the pair of phylogenetic trees for the fusionlinked COGs 1788 and 2057. In both trees, the fusion components from E. coli and B. halodurans (YdiF and BH3898,

interactions

Figure 1 shows a genome-COG matrix that reveals the phyletic (phylogenetic) patterns of the presence or absence of the orthologs across the spectrum of the sequenced genomes [18] for each of the 52 pairs of fusion-linked COGs containing cross-kingdom fusions. When assessed against the topology of the tentative species tree based on the concatenated alignments of ribosomal proteins [13], fusions showed a scattered distribution in phyletic patterns (depicted by columns in Figure 1). For example, the fusion between COG1788 and COG2057 (, and -subunits of acylCoA:acetate CoA transferase) is seen in the bacteria Escherichia coli, Deinococcus radiodurans and Bacillus halodurans, and in the archaea Aeropyrum pernix,

Thermophilus acidophilum and Halobacterium sp. Similarly, the fusion between COG1683 and COG3272 (uncharacterized, conserved domains) was found in the bacteria Pseudomonas aeruginosa and Vibrio cholerae, and in the archaeon Methanobacterium thermoautotrophicum. In each of these cases, with the species tree used as a reference, the bacteria involved are phylogenetically distant from each other and more so from the archaea, and non-fused versions of the two domains exist within the same bacterial lineages and in archaea (Figure 1). These observations emphasize the central question of this work: are the fusions between the same pair of domains in different species independent or are they best explained by HGT?

refereed research

Figure 1 Phyletic patterns of fusion-linked COGs. Each pair of COGs is represented by a double column. The dark-gray rectangles indicate fusions, the light-gray rectangles indicate that the fusion components are represented by stand-alone genes in the given genomes, and the white rectangles indicate that there is no representative of the given COG in the given genome. Where one rectangle in a double column is light gray and the other is white, the genome in question has a representative of only one of the pair of fusion-linked COGs. Species abbreviations are as listed in Materials and methods.

deposited research

COG0047 COG0069 COG0070 COG0070 COG0140 COG0146 COG0512 COG0710 COG0801 COG0331 COG2030 COG0703 COG1003 COG0511 COG1038 COG0825 COG0607 COG0825 COG1098 COG1752 COG2049 COG1372 COG0569 COG0063 COG1037 COG1605 COG0807 COG0281 COG1605 COG0607 COG1654 COG1992 COG1372 COG1226 COG0551 COG1213 COG2716 COG0852 COG0836 COG1014 COG0825 COG1014 COG2251 COG1240 COG1470 COG1796 COG3272 COG2057 COG3262 COG1038 COG0519 COG1013

reports

sce hbs tac ape afu pho mth mja

Ta0 818 _1Tac Ta0 820 _1- T ac

86

ra 1-D

Fusion component Eukaryota Archaea Bacteria

DR

A00

54_

8-D ra 04 c-M tu

06

co

ha -BBha 2 _ 82 38998_ BHH38 B

-Tac 20_2Tac 94 Taa008820_2T

98 67

2-Ecoo ydiF_2-Ec ydiF_

pe 2-Ape 8_ -A 468_2 E246 APE2 AP

in -H

25

-E

74

Rv

oD

07

A0

at HI

e su Pa 9- jD-B 9 19 yx PA BS DR

BS yodS-Bsu BH2259-Bha

97 61 98

90

100

818

98 76

67 98

0

PA

67 81

Ta0

96

84

a -P

7 22

100 92

63

A-E co

98

BH3898_1-Bha BH3898_1-Bha

0 HP u jE-Bs BS yx

_2Tac

100

69

69

100 -Bsu BS yodR -Drara 57 RA0054_2D DDRA0054_2-Hin AF11 773 0 I 99-Afu H

e

y

p -H

2 69

a to

-Ap 8_1 246 APE

tu 03c-M Rv25 ra -D 067 e A0 DR Pa 00 20 a Bh 8-

PA

25

PA

100

HP0691-Hpy

ae -P 26

02

93

co diF_1-Eco yydiF_1-E

(b)

Rv3551-Mtu

2 BH

(a)

Yanai et al.

Vol 3 No 5

e

4 Genome Biology

Figure 2 Phylogenetic trees for fusion-linked COGs: , and -subunits of acyl-CoA:acetate CoA transferase. Fusion components are denoted by shading and by a number after an underline (_1 for the amino-terminal domain and _2 for the carboxy-terminal domain). The three primary kingdoms are color-coded as indicated in the figure. The RELL bootstrap values are indicated for each internal branch. (a) ,subunit (domain) (COG1788); (b) -subunit (domain) (COG2057). The proteins are designated using the corresponding systematic gene names followed (after the underline) by the abbreviated species names. Species abbreviations are as in Materials and methods and Figure 1.

respectively) confidently group with the archaeal fusion components, to the exclusion of the non-fused orthologs. This position of the E. coli and B. halodurans fusion components is unexpected and is in contrast to the placement of the orthologs from other gamma-proteobacteria and Gram-positive bacteria, as well as non-fused paralogs from the same species (AtoA/D and BH2258/2259, respectively) within the bacterial cluster. These observations strongly suggest that the gene for fused subunits of acyl-CoA:acetate CoA transferase was disseminated horizontally between E. coli, B. halodurans, and archaea. The presence of nonfused paralogs in both these bacterial species appears to be best compatible with gene transfer from archaea to bacteria. In contrast, the fusion of the pair of domains from the same COGs seen in D. radiodurans seems to be an independent event because, in both trees, the D. radiodurans branch is in the middle of the bacterial cluster (Figure 2a,b). Thus, the history of this pair of fusion-linked COGs appears to involve horizontal transfer of the fused gene between bacteria and archaea (and possibly also within kingdoms), as well as at least one additional, independent fusion event in bacteria.

Figure 3 shows the phylogenetic trees for the two domains of phosphoribosylformylglycinamidine (FGAM) synthase, a purine biosynthesis enzyme. The components of this fusion, which is found in proteobacteria and eukaryotes, form a tight cluster separated by a long internal branch from the non-fused bacterial and archaeal orthologs. This tree topology suggests HGT between bacteria and eukaryotes, possibly a relocation of the fused gene from the pro-mitochondrion to the eukaryotic nuclear genome or, alternatively, gene transfer from eukaryotes to proteobacteria. An additional aspect of the evolution of this gene is the apparent acceleration of evolution upon gene fusion, which is manifest in the long branch that separates the proteobacterial-eukaryotic cluster from the rest of the bacterial and archaeal species (Figure 3a,b). The fusion-linked COGs 1605 and 0077 (chorismate mutase and prephenate dehydratase, respectively) show a more complicated history, with distinct fusion events resulting in different domain architectures (see legend to Figure 4). The presence, in both trees, of two distinct clusters of fusion components and the isolated fusion in Campylobacter jejuni

http://genomebiology.com/2002/3/5/research/0024.5

(a)

XF1423_1-Xfa XF1423_1-Xfa

99 78

(b)

XF1423_2-Xfa -Xfa PA3763_2-Pae PA3763_2-Pae XF1423_2 purL_2-Eco purL_2-Eco 50

YGR061c_2-Sce -Sce YGR061c_2

89 72

67

VC0869_2-Vch

97

76 VC0869_2-Vch

HI0752_2-Hin HI0752_2-Hin

HI0752_1-Hin HI0752_1-Hin

75

comment

NMB1996_1-Nme NMB1996_1-Nme PA3763_1-Pae PA3763_1-Pae YGR061c_1-Sce YGR061c_1-Sce VC0869_1-Vch VC0869_1-Vch 100 purL_1-Eco purL_1-Eco

NMB1996_2-Nme NMB1996_2-Nme Ta1318-Tac

reviews

100

100

AF1940-Afu

TM1246-Tma

TM1245-Tma 49

Ta1066-Tac Cj0955c-Cje

AF1260-Afu aq 1105-Aae

53

92

aq 1836-Aae BH0629-Bha 100

73 77

84 99 69 95

89 99

MJ1264-Mja

83

100

Rv0788-Mtu

99 69

100

43 73

PAB1201-Pab

Rv0803-Mtu

VNG0864G-Hbs

VNG1945G-Hbs

MJ1648-Mja

DR0223-Dra Cj0514-Cje

Fusion component Eukaryota Archaea Bacteria

The results of phylogenetic analyses of the 51 cross-kingdom fusion links are summarized in Tables 2 and 3 and the

information

The pair of fusion-linked COGs 0777 and 0825 (, and subunits of acetyl-CoA carboxylase, respectively) shows unequivocal clustering of the fusion components from numerous archaeal and bacterial species, which indicates a prevalent role for HGT in the evolution of this fusion (Figure 5a,b). Moreover, archaea are scattered among bacteria, suggesting multiple HGT events. However, an apparent independent fusion is seen in Mycobacterium tuberculosis (Figure 5a,b). It could be argued that, in cases

like those in Figure 5, where there is a sharp separation (a long, strongly supported internal branch in each of the trees) between the fusion components and stand-alone proteins, the COGs involved needed to be reorganized, to form one COG consisting of fusion proteins only and two separate COGs consisting of stand-alone proteins. Formally, this would eliminate the need for HGT as an explanation of the tree topology for any of these new COGs. However, this solution (even if attractive from the point of view of classification) does not seem to be correct in light of the principle of orthology that underlies the COG system: it appears that, in both of the COGs involved, the fusion components and stand-alone proteins are bona fide orthologs, as judged by the high level of sequence conservation and by the fact that, in the majority of species involved, they are the only versions of this key enzyme.

interactions

suggest at least three independent fusion events, two of which apparently were followed by horizontal dissemination of the fused gene (Figure 4a,b). The single archaeal fusion, the Arachaeoglobus fulgidus protein AF0227, belongs to one of these clusters and shows a strongly supported affinity with the ortholog from the hyperthermophilic bacterium Thermotoga maritima (Figure 4a,b). Given the broad distribution of this fusion in bacteria, horizontal transfer of the bacterial fused gene to archaea is the most likely scenario.

refereed research

Figure 3 Phylogenetic trees for fusion-linked COGs: phosphoribosylformylglycinamidine (FGAM) synthase. (a) Synthetase domain (subunit) (COG0046); (b) glutamine amidotransferase domain (subunit) (COG0047). Protein designations are as in Figure 2.

deposited research

DR0222-Dra

43

BS purL-Bsu BH0628-Bha

Slr0520-Syn MTH1374-Mth

MTH168-Mth

76

PH1955-Pho100 PH1953-Pho

BS purQ-Bsu

Sll1056-Syn

100 71

PAB1200-Pae 56

reports

97

6 Genome Biology

Yanai et al.

Vol 3 No 5

(a)

XF1141-Xfa BB UU 33 99 2211 -B-B uu cc phpeheA A 1 1-E -Ec co o HH 145 I1I1 14 1-Hin 5 1-H in V C VC007 0 7055 1 60 MT 1-V-V H80 chch VNG 4-M 1244 90 th C-Hb 98 s 61 92

MJ024 6-Mja

58 64 59 65 64

85

73

Rv0948c-Mtu

82

tyrA_1-Eco

HI1290_1-Hin

VC0696_1-Vch

91

Ta0245-Tac

pheA_2-Eco pheA_2-Eco HI1145_2-Hin HI1145_2-Hin 82

MJ0637-Mja

89

100

MTH1220-Mth

AF0227_2-Afu AF0227_2-Afu

VC0705_2-Vch Vch VC0705_2-

XF2325_2-Xfa XF2325_2-Xfa

71

98

85

52

72 68

92

52 75 66

62

NN MM BB 0404446 6_1_1-N -Nmme PP A A31316 e 66_6_1 1-P-Pa XX F2F23 ae e 32 25 5_ _1 1-Xfa

Sll1662-Syn

71

100 53

aq951_2-Aae 951_2-Aae aq

BS_pheA-Bsu

BH1215-Bha Ta0915-Tac

APE0563a-Ape

Cj0316_2-Cje Cj0316_2-Cje

Fusion component Eukaryota Archaea Bacteria

PA4230-Pae

TM0155_2-Tma TM0155_2-Tma

50

73

NMB0446_2-Nme NMB0446_2-Nme

BSaroA_1-Bsu aroA_1-Bsu BS BH3242_1-Bha BH3242_1-Bha

AF0227_3-Afu AF0227_3-Afu

100

PA3166_2-Pae PA3166_2-Pae

-Aaee 51_1-Aa aq 951_1 DR1001_1-Dra aq 9 DR1001_1-Dra

85

je 1-CCje 31_1Cj0j031_ C

BU392_2-Buc BU392_2-Buc

TM0155_1 TM0155_1

82 51 60 65

DR1147-Dra

YPR060c-Sce

99

49 92

YNL316c-Sce

Rv1885c-Mtu

56

90

HP0291-Hpy

(b)

PA5184-Pae

VNG2222G-Hbs

Rv3838c-Mtu

_1-Dra ae R15422_1-Dra _1-Pae DDR154 A20144_1-P PPA201 BH1136_1-Bha BH1136_1-Bha

BSyngE_1-Bsu yngE_1-Bsu BS VVNNGG00623G 623G__1-H 1-Hbbss CP n00 58Cpn CT2 93-C tr HP0950-Hpy

95

PP 140 AA 14 DD RR 000_4 AA 00 33 11 00 - _4-P- ae __ 44 -D ra B BHH 113 11 6_2 36_ 2 BB yngE -Bha SS __ yn gE__ 2 2 VV -B- su 100 NN 0 GG 066 2 3 2 G 3 G__ -b v2502 22 RR -H v2 100 s c_2502c_ 2-Mtu PA2014_2 PA 2014_2-Pa -Pae 99 e

_2-Mtu v0974c_2-Mtu RRv0974c

83 63

89

100 77 100

100

48 41

-Bssuu jD_2-B BS_yqqjD_2 BS_y TM0716_2-Tma TM0716_2-Tma AF2217_2-Afu AF2217_2-Afu

55 57 98

PH1287_2-Pho PH1287_2-Pho

55

e -Paae a 8_22-P -Drra 288888_ 2_2-D A 4 5 P2 1542_2 R PA D 1 DR

47

100

aq_1206-Aae 99

100

e Cj 3-

YNR016c_3-Sce

Fusion component Eukaryota Archaea Bacteria

pn

yttI 166 -Bsu -Bha

-C

BS

04 14

sll0336-Syn

BH3

4 j04

C

98

62

HP0557-Hpy

Sll0728-Sy n

99

68 96

DR

121

XF

4-D

02

in -H 06 04 HI ccA-Eco a VC2244-Vch PA3639-Pae

Rv0904c_1-Mtu Rv0904c_1-Mtu

100

aq 445-Aae

84 67

VC1000-Vch PA3112-Pae NMB0679-Nme

PA0211-Pae

55

NMB1177-Nme C T 26 5-C CP tr n

93 100

DR1215-Dra

BH3165-Bha

100 74 91

51 100

100

BS_accA-Bsu

86

100

82 61 98 79 56 55

Rv0904c_2-Mtu Rv0904c_2-Mtu

49 74 96

Rv2247_2-Mtu Rv2247_2-Mtu

XF1467-Xfa

81 accD-Eco 93

PA1400_3-Pae

97

Cj0127c-Cje HI1260-Hin

DRA0310_3-Dra

100

(b)

Rv3799c_2-Mtu VNG1529G_2-Hbs RP619_ 2-Rpr BH29 57_2 -Bha DR1 316_ 2-Dr a

tu Mtu _1-M 799c_1Rv3799c u Rv3 t -Mtu _1-M 80_1 3280 r Rvv32 Rpr R _1--Rp 619_1 RPP619 R -Tma 89 82 16_1-Tma 98 97 TM0716_1 54 TM07 93 53 PP 100 68 55 128 HH 12 7_1 87_ 65 AF 1-P79 79 71 AF2 ho 221 2 17_7_1 100 1-A 69 94 fu

Rv3280_2-Mtu

BH2957_1-Bha VNG15 29G_1 -Hbs

P PA RvRv A2828 0909 88 7474 8_81_1 c_c_ 1-1- - P-a M e RR tu v2v25 50 02 2c c_ _1 1-M tu

6_1-Dra DR131 su BS yqjD_1-B

(a)

-Mtu Rv2247_1

Figure 4 Phylogenetic trees for fusion-linked COGs: chorismate mutase and prephenate dehydratase. (a) Chorismate mutase (COG1605); (b) prephenate dehydratase (COG0077). Protein designations are as in Figure 2. The protein AF0227contains a prephenate dehydrogenase domain in addition to the chorismate mutase and prephenate dehydratase domains.

03

-X

ra

fa

Figure 5 Phylogenetic trees for fusion-linked COGs: , and -subunits of acetyl-CoA carboxylase. (a) - subunit (domain) (COG0777); (b) ,subunit (domain) (COG0825). Protein designations are as in Figure 2. The proteins DRA0310 and PA1400, in addition to the domains corresponding to the , and subunits of acetyl-CoA carboxylase, contain a biotin carboxylase domain and a biotin carboxyl carrier protein domain. The clustering of these proteins in phylogenetic trees almost certainly reflects HGT between the respective bacterial lineages.

HGT

HGT

HGT

IFE

IFE IFE

HGT

HGT

IFE

-be

-be

-be

-be

-be

-be

-be -be

-be

-be

-be

Glutamate synthase domain 2 Glutamate synthase domain 3 Glutamate synthase domain 3 Phospho-ribosyl-ATP pyrophospho-hydrolase (histidine biosynthesis) N-methylhydaintoinase B

Anthranilate/paraaminobenzoate synthase component II 3-dehydro-quinate dehydratase 7,8-dihydro-6hydroxymethylpterinpyrophosphokinase (acyl-carrier-protein) S-malonyl-transferase Acyl dehydratase

Shikimate kinase

COG0069

COG0070

COG0070

COG0140

COG0146

COG0512

COG0710 COG0801

COG0331

COG2030

COG0703

Glutamate synthase domain 1

Glutamate synthase domain 1

Glutamate synthase domain 2

Phospho-ribosyl-AMP cyclohydrolase (histidine biosynthesis)

N-methylhydaintoinase A

Anthranilate/paraaminobenzoate synthase component I

Shikimate 5-dehydrogenase

Dihydropteroate synthase

3-oxoacyl-(acyl-carrierprotein) synthase

3-oxoacyl-(acyl-carrierprotein) synthase

3-dehydroquinate synthetase

COG0067

COG0067

COG0069

COG0139

COG0145

COG0147

COG0169

COG0294

COG0304

COG0331

COG0337

HGT

HGT

-be

Phospho-ribosylformyl-glycinamidine (FGAM) synthase glutamine Amidotransferase domain

COG0047

Phospho-ribosylformylglycinamidine (FGAM) synthase, synthetase domain

COG0046

Principal mode of evolution†

Kingdom pattern*

information

Protein function

interactions

COG B

deposited research refereed research

Protein function

Gene juxtaposition‡ Evolutionary scenario

One fusion event, fused gene transfer between eukaryotes and proteobacteria One fusion event, fused gene transfer between eukaryotes and bacteria One fusion event, fused gene transfer between eukaryotes and bacteria One fusion event, fused gene transfer between eukaryotes and bacteria Uncertain

One fusion event, fused gene transfer between eukaryotes and (the ancestor of) Cyanobacteria and Actinomycetes Independent fusion events in eukaryotes and bacteria

Independent fusion events in eukaryotes and bacteria Independent fusion events in eukaryotes and bacteria One fusion event, fused gene transfer between eukaryotes and bacteria Fused gene transfer between eukaryotes and Actinomycetes; additional, independent fusions in bacteria Independent fusion events in eukaryotes and bacteria (with different domain organizations)

Pyro, Paby, Tmar, Drad, Bsub, Bhal

Aful, Mjan, Tmar

-

Aful, Mjan, Mthe

-

Mjan, Aero, Hpyl

Aful, Mthe, Taci, Aero, Tmar, Drad, Bsub, Bhal, Ecol, Vcho, Xfas Paby¶, Ecol Llac¶, Tmar, Drad, Bsub, Bhal Drad, Ecol, Vcho

-

Drad, Mtub, Proteo-bacteria, Ctra, Cpne

Fusion

Ecol, Paer, Vcho,Hinf, Xfas, Nmen Most bacteria

Most bacteria

Most bacteria

Most bacteria

Mtub, Syne, Scer

Nmen, Cjej, Paer, Scer

Ctra, Cpne, Scer Ctra, Cpne, Scer

Mtub, Scer

Mtub, Bsub, Scer

Tmar, Scer

reports

COG A

reviews

Evolutionary history of trans-kingdom gene fusions

comment

Table 2

http://genomebiology.com/2002/3/5/research/0024.7

COG1372

COG0569

Biotin carboxylase

Biotin carboxylase

Biotin carboxylase

Dinucleotide-utilizing enzyme involved in molybdopterin and thiamine biosynthesis

Biotin carboxyl carrier protein

cAMP-binding domain

Allophanate hydrolase subunit 2

Archaeal/vacuolartype H+-ATPase subunit A

Na+/H+ and K+/H+ antiporters

Uncharacterized, conserved protein

Glutamate synthase domain 2

COG0439

COG0439

COG0439

COG0476

COG0511

COG0664

COG1984

COG1155

COG0025

COG0062

COG0069

COG1037

COG0063

COG2049

COG1752

COG0825

COG0607

COG0825

Ferredoxin-like domain

Predicted sugar kinase

K+ transport systems, NAD-binding component

Intein

Allophanate hydrolase subunit 1

Esterase

Acetyl-CoA carboxylase ,-subunit

Rhodanese-related sulfurtransferase

Acetyl-CoA carboxylase ,-subunit

Pyruvate carboxylase, carboxy-terminal domain/subunit

Biotin carboxyl carrier protein

Glycine cleavage system protein P (pyridoxalbinding), carboxyterminal domain

Protein function

ab-

ab-

ab-

a-e

-be

-be

-be

-be

-be

-be

-be

-be

Kingdom pattern*

HGT

AF

IFE

IFE

HGT

IFE

IFE

HGT

HGT

HGT

HGT

Principal mode of evolution†

NA

-

-

Most bacteria

Aful, Mjan, Mthe, NA Tmar; (all that have COG1037)

All archaea; all bacteria that have COG0062

Hbsp, Bhal, Syne

Taci, Pyro, Scer

Bsub, Scer

-

Mtub, Ccre||, Scer

One ancestral fusion; fused gene transfer from archaea to bacteria (Thermotoga)

One ancestral fusion; fission in eukaryotes

Uncertain

Independent fusion events in eukaryotes and archaea

Independent fusion events in eukaryotes and bacteria

One fusion event, fused gene transfer between eukaryotes and actinomycetes; an additional, independent fusion event in bacteria

Independent fusion events in eukaryotes and bacteria

Pyro, Tmar, Hbsp¥

Drad, Paer, Scer

One fusion event, fused gene transfer between eukaryotes and bacteria; subsequent domain accretion in eukaryotes

One fusion event, fused gene transfer between eukaryotes and bacteria; subsequent domain accretion in eukaryotes

One fusion event, fused gene transfer between eukaryotes and bacteria; additional, independent fusions in bacteria

One fusion event, fused gene transfer between eukaryotes and proteobacteria

Independent fusion events in x sulfurtransferase

Hbsp, Rpxx

Mjan

Bhal, Ecol, Paer Vcho, Hinf, Xfas, Nmen, Hpyl, Ctra, Cpne

Hbsp, Pyro, Taci, Aero, Tmar, Bsub, Bhal

Gene juxtaposition‡ Evolutionary scenario

-

Mtub, Syne, Paer, Scer

Mtub, Scer

Bsub, Scer

Hbsp, Mtub, Rpxx, Scer

Drad, Mtub, Syne, Ecol, Paer, Xfas, Nmen

Fusion

Vol 3 No 5

COG1038

COG0511

COG1003

Glycine cleavage system protein P (pyridoxalbinding), aminoterminal domain

COG0403

COG B

Protein function

COG A

Table 2 (continued from the previous page)

8 Genome Biology Yanai et al.

ACT-domain-containing protein ab-

COG2716

Phosphoserine phosphatase

COG0560

HGT

ab-

Predicted sugar nucleotidyltransferase

COG1213

Phosphatidylglycerophosphate synthase

COG0558

AF

ab-

Zn-finger domain associated with topoisomerase type IA

COG0551

Topoisomerase IA

COG0550

HGT

ab-

Kef-type K+ transport systems, NAD-binding component

COG1226

Kef-type K+ transport systems, membrane component

COG0475

IFE

ab-

Intein

COG1372

RecA/RadA recombinase

COG0468

HGT

ab-

Uncharacterized conserved protein

COG1992

Hydroxymethylpyrimidine/phosphomethylpyrimidine kinase

COG0351

HGT

ab-

Biotin operon repressor

COG1654

Biotin-(acetyl-CoA carboxylase) ligase

COG0340

IFE

Rhodanese-related sulfurtransferase

COG0607

ATP pyrophosphatase (thiamine biosynthesis)

COG0301

ab-

HGT

ab-

ab-

Chorismate mutase

COG1605

Prephenate dehydrogenase

COG0287

information

IFE

Malic enzyme

COG0281

Phosphotransacetylase

COG0280

interactions

ab-

GTP cyclohydrolase II

COG0807

3,4-dihydroxy-2butanone 4-phosphate synthase

COG0108

refereed research

HGT

Chorismate mutase

COG1605

Prephenate dehydratase

COG0077

Principal mode of evolution†

Kingdom pattern*

deposited research

ab-

Protein function

COG B

One fusion event, fused gene transfer from bacteria to archaea (Halobacterium) Independent fusion events in archaea and bacteria

-

-

-

Taci, Aero, Ccre -

NA

-

NA -

Aero

-

Aful, Aqua, Tmar, Ecol, Vcho, Paer, Hinf, Xfas, Nmen, Cjej Aful, Aqua, Tmar, Drad, Mtub, Bsub, Bhal, Syne, Paer, Vcho, Xfas, Nmen, Hpyl, Cjej, Ctra, Cpne Hbsp, Ecol, Hinf, Xfas, Rpxx Aful, Ecol, Vcho, Hinf Taci, Ecol, Vcho, Paer, Hinf

Aful, Paby, Drad, Bsub, Bhal, Ecol, Paer, Vcho, Xfas; (all that have COG1654) Hbsp, Mjan, Pyro, Aero, Tmar Hbsp, Pyro, Mtub Mthe, Ecol, Paer, Hinf, Xfas, Nmen, Cjej, Rpxx Most bacteria and archaea Aful, Pyro, Aqua

Aful, Mtub, Paer

Uncertain

One fusion event, fused gene transfer from archaea to bacteria (AquIFEx)

One ancestral fusion with subsequent fission in Aper, Aqua

One fusion event, fused gene transfer from bacteria to archaea (Methanobacterium)

Independent fusion events in archaea and bacteria

One fusion event, fused gene transfer from archaea to bacteria (Thermotoga)

One fusion event, fused gene transfer from bacteria to archaea (Archaeoglobus)

Independent fusion events in archaea and bacteria

Uncertain

Fused gene transfer between bacteria and archaea (Archaeoglobus and Thermotoga lineages); additional, independent fusions in bacteria

Gene juxtaposition‡ Evolutionary scenario

Fusion

reports

Protein function

reviews

COG A

comment

Table 2 (continued from the previous page)

http://genomebiology.com/2002/3/5/research/0024.9

Mannose-6-phosphate isomerase

Pyruvate:ferredoxin COG1014 oxidoreductase and related 2-oxoacid:ferredoxin oxidoreductases, alpha subunit

Acetyl-CoA carboxylase - subunit

Pyruvate:ferredoxin COG1014 oxidoreductase and related 2-oxoacid:ferredoxin oxidoreductases, beta subunit

Superfamily I DNA and RNA helicases and helicase subunits

Mg-chelatase subunit ChlI

S-layer domain

Histidinol phosphatase and related hydrolases of the PHP family

Uncharacterized conserved protein

Acyl-CoA:acetate CoA transferase alpha subunit

COG0662

COG0674

COG0777

COG1013

COG1112

COG1239

COG1361

COG1387

COG1683

COG1788

COG2057

COG3272

COG1796

COG1470

COG1240

COG2251

Acyl-CoA:acetate CoA transferase beta subunit

Uncharacterized conserved protein

DNA polymerase IV (family X)

Predicted membrane protein

Mg-chelatase subunit ChlD

Predicted metal-binding domain

Pyruvate:ferredoxin oxidoreductase and related 2-oxoacid:ferredoxin oxidoreductases, gamma subunit

Acetyl-CoA carboxylase , subunit

Pyruvate:ferredoxin oxidoreductase and related 2-oxoacid:ferredoxin oxidoreductases, gamma subunit

Mannose-1-phosphate guanylyltransferase

NADH:ubiquinone oxidoreductase 27 kD subunit

Protein function

ab-

ab-

ab-

ab-

ab-

ab-

ab-

ab-

ab-

ab-

ab-

Kingdom pattern*

HGT

HGT

HGT

HGT

HGT

IFE

IFE

HGT

HGT

HGT

HGT

Principal mode of evolution†

Hbsp, Taci, Aero, Drad, Bhal, Ecol

Mthe, Paer, Vcho

Mthe, Taci, Drad, Bsub, Bhal; (all prokaryotes that have COG1796)

Aful, Pyro, Bhal

Hbsp, Mthe, Taci, Mtub, Syne

Pyro, Mtub

Mthe, Syne, Ecol, Vcho, Tpal

Aful, Hbsp, Pyro, Tmar, Drad, Mtub, Bsub, Bhal, Paer, Rpxx

Aful, Hbsp, Taci, Aero, Mtub, Bhal, Syne, Ecol, Vcho, Tpal

Aful, Pyro, Aqua, Ecol, Paer, Vcho, Xfas, Hpyl, Cjej

Hbsp, Aqua, Ecol, Paer

Fusion

Mtub, Bsub, Paer, Hinf, Hpyl

-

NA

-

Mjan, Paer

-

Aful, Taci, Aero, Mtub, Bhal

-

Mjan, Mthe, Aqua, Tmar, Hpyl, Cjej

-

Most archaea and bacteria

Fused gene transfer between bacteria and archaea; a second, independent fusion event in bacteria

One fusion event, fused gene transfer between archaea and bacteria (Methanobacterium and Vibrio/Pseudomonas, respectively)

One fusion event, fused gene transfer between archaea to bacteria

One fusion event, fused gene transfer from archaea to bacteria

Fused gene transfer between bacteria and archaea, with subsequent fissions

Independent fusion events in archaea and bacteria

Independent fusion events in archaea and bacteria

Fused gene transfer from bacteria to archaea; a second, independent fusion event in bacteria

Fused gene transfer from archaea to bacteria; a second, independent fusion event in bacteria

Fused gene transfer from bacteria to archaea; a second, independent fusion event in bacteria

One fusion event, fused gene transfer from bacteria to archaea (Halobacterium)

Gene juxtaposition‡ Evolutionary scenario

Vol 3 No 5

COG0825

COG0836

COG0852

NADH:ubiquinone oxidoreductase subunit 7

COG0649

COG B

Protein function

COG A

Table 2 (continued from the previous page)

10 Genome Biology Yanai et al.

Pyruvate:ferredoxin COG1013 oxidoreductase and related 2-oxoacid:ferredoxin oxidoreductases, alpha subunit COG0674

*Abbreviations: a, archaea, b, bacteria, e, eukaryotes; a dash indicates that the given kingdom is not represented in at least one of the fusion-linked COGs. †AF, ancestral fusion, HGT, horizontal gene transfer, IFE, independent fusion events. ‡In several cases, the indicated genes are separated by one to three genes or their order is switched compared to that of the fusion components. §Paby, Pyrococcus abyssi, an archaeal genome not included in the master set of genomes analyzed in this study. ¶Llac, Lactococcus lactis, a bacterial genome not included in the master set of genomes analyzed in this study. ||Ccre, Caulobacter crescentus, a bacterial genome not included in the master set of genomes analyzed in this study. ¥Hbsp, Halobacterium sp., an archaeal genome not included in the master set of genomes analyzed in this study.

Fused gene transfer from archaea to bacteria (,-proteobacteria) Hbsp, Mjan, Aero, Aqua, Tmar, Mtub, Hpyl

COG0519 GMP synthase Glutamine amidotransferase domain COG0518

HGT abe Pyruvate:ferredoxin oxidoreductase and related 2-oxoacid:ferredoxin oxidoreductases, beta subunit

Aful, Mthe, Taci, Pyro, Paby, Scer, Syne, Ecol, Vcho, Cjej, Tpal

Fused gene transfer among bacteria, archaea, and eukaryotes Mthe, Pyro, Paby Aero, Scer, most bacteria HGT abe GMP synthase -PP-ATPase domain

Pyro Paby, Mtub, Ecol HGT COG3262 Ni,Fe-hydrogenase III large subunit COG3261

Ni,Fe-hydrogenase III component G

ab-

Gene juxtaposition‡ Evolutionary scenario Fusion Principal mode of evolution† Kingdom pattern* Protein function COG B Protein function

information

COG A

interactions

Table 2 (continued from the previous page)

refereed research

The apparent independent fusion of the same pair of genes (or, more precisely, members of the same two COGs) on multiple occasions during evolution might seem unlikely. However, we found that one-fourth to one-third of the gene fusions shared by at least two kingdoms might have evolved

deposited research

The results of the present analysis point to HGT as a major route of cross-kingdom dissemination of fused genes. Horizontal transfer might be even more prominent in the evolution of fused genes within the bacterial and archaeal kingdoms. This notion is supported by the topologies of some of the phylogenetic trees analyzed, which show unexpected clustering of bacterial species from different lineages (note, for example, the grouping of D. radiodurans with P. aeruginosa in Figure 5). Massive HGT between archaea and bacteria, particularly hyperthermophiles, has been suggested by genome comparisons [20-24]. However, proving HGT in each individual case is difficult, and the significance of cross-kingdom HGT has been disputed [25,26]. With gene fusions, the existence of a derived shared character (fusion) supporting the clades formed by fusion components and the concordance of the independently built trees for each of the fusion components make a solid case for HGT.

reports

Examination of the genomic context of the genes that encode stand-alone counterparts of the fusion components showed that, in 25 of the 51 cases, these genes were juxtaposed in some, and in certain cases, many prokaryotic genomes (Table 2). This suggests that evolution of gene fusions often, if not always, passes through an intermediate stage of juxtaposed and co-regulated, but still distinct, genes within known or predicted operons. In addition, some of the juxtaposed gene pairs might have evolved by fission of a fused gene.

reviews

Additional data. In 31 of the 51 links, an inter-kingdom horizontal transfer of the fused gene appeared to be the evolutionary mechanism by which the fusion entered one of the kingdoms. In contrast, only 14 fusion-linked pairs of COGs show evidence of independent fusion in two kingdoms, and in just two cases, the fusion seems to have been inherited from the last universal common ancestor. The latter two scenarios were distinguished on the basis of the parsimony principle, that is, by counting the number of evolutionary events (fusions or fissions) that were required to produce the observed distribution of fusion components and stand-alone versions of the domains involved across the tree branches. Accordingly, it needs to be emphasized that we can only infer the most likely scenario under the assumption that the probabilities of fusion and fission are comparable. It cannot be ruled out that some of the scenarios we classify as independent fusions in reality reflect the existence of an ancestral fused gene and subsequent multiple, independent fissions. The detection of ancestral domain fusions may call for the unification of the respective COG pairs in a single COG, with the species in which fission occurred represented by two distinct proteins.

comment

One fusion event, fused gene transfer from bacteria to archaea

http://genomebiology.com/2002/3/5/research/0024.11

12 Genome Biology

Vol 3 No 5

Yanai et al.

Table 3 Summary of evolutionary scenarios for cross-kingdom gene fusions Evolutionary mode*

Number of fusion-linked COG pairs

Cross-kingdom horizontal transfer of a fused gene

31

Independent fusion events

14

Ancestral fusion Uncertain Total

2 4 51

*As indicated in Table 2, the evolutionary scenarios for some of the analyzed COGs included both cross-kingdom horizontal transfer and apparent independent gene fusion within one of the kingdoms.

through such independent events, and probable additional independent fusions were noted among bacteria. This could be due to the extensive genome rearrangement characteristic of the evolution of prokaryotes [27,28], and to the selective value of these particular fusions, which tend to get fixed once they emerge.

matrix for minimum evolution (least-square) tree building [30] using the FITCH program. The PROTDIST and FITCH programs are modules of the PHYLIP software package [31]. The tree topology was then optimized by local rearrangements using PROTML, a maximum likelihood tree-building program, included in the MOLPHY package [32]. Local bootstrap probability was estimated for each internal branch by using the resampling of estimated log-likelihoods (RELL) method with 10,000 bootstrap replications [33]. The gene order in prokaryotic genomes was examined using the ‘Genomic context’ feature of the COG database.

Additional data files Phylogenetic trees for 82 individual COGS presented as 52 pairs of trans-kingdom fusion-linked COGs are available with the online version of this paper. Bootstrap values (percentage of 1,000 replications) are indicated for each fork. Archaeal proteins are designated by black squares, bacterial proteins by gray squares and eukaryotic proteins by empty squares. Fusion components are denoted by _1, _2, _3, etc.

Acknowledgements Materials and methods The version of the COG database used in this study included the following complete prokaryotic genomes. Bacteria: Aae, Aquifex aeolicus; Bap, Buchnera aphidicola; Bbu, Borrelia burgdorferi; Bsu, Bacillus subtilis; Bhal, Bacillus halodurans; Cje, Campylobacter jejuni; Cpn, Chlamydophila pneumoniae; Ctr, Chlamydia trachomatis; Dra, Deinococcus radiodurans; Eco, Escherichia coli; Hin, Haemophilus influenzae; Hpy, Helicobacter pylori; Mge, Mycoplasma genitalium; Mpn, Mycoplasma pneumoniae; Mtu, Mycobacterium tuberculosis; Nme, Neisseria meningitidis; Pae, Pseudomonas aeruginosa; Rpr, Rickettsia prowazekii; Syn, Synechocystis sp.; Tma, Thermotoga maritima; Tpa, Treponema pallidum; Vch, Vibrio cholerae; Xfa, Xylella fastidiosa. Eukaryote: Sce, Saccharomyces cerevisiae. Archaea: Ape, Aeropyrum pernix; Afu, Archaeoglobus fulgidus; Hbs, Halobacterium sp.; Mja, Methanococcus jannaschii; Mth, Methanobacterium thermoautotrophicum; Pho, Pyrococcus horikoshii; Pab, Pyrococcus abyssi; Tac, Thermoplasma acidophilum. COGs containing fusion components from at least two of the three primary kingdoms, were selected for phylogenetic analysis. COGs containing 60 or more members were excluded because of potential uncertainty of orthologous relationship between members of such large groups [18]. Multiple alignments were generated for each analyzed COG using the T-Coffee program [29]. Phylogenetic trees were constructed by first generating a distance matrix using the PROTDIST program and the Dayhoff PAM model for amino-acid substitutions and employing this

We thank Charles DeLisi, Adnan Derti, I. King Jordan, Kira Makarova, Igor Rogozin, and Fyodor Kondrashov for critical reading of the manuscript and helpful discussions.

References 1. 2. 3.

4. 5. 6. 7. 8.

9. 10. 11. 12. 13. 14. 15.

Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D: Detecting protein function and protein-protein interactions from genome sequences. Science 1999, 285:751-753. Huynen MJ, Snel B: Gene and context: integrative approaches to genome analysis. Adv Prot Chem 2000, 54:345-379. Yanai I, Derti A, DeLisi C: Genes linked by fusion events are generally of the same functional category: a systematic analysis of 30 microbial genomes. Proc Natl Acad Sci USA 2001, 98:7940-7945. Parkinson JS, Kofoid EC: Communication modules in bacterial signaling proteins. Annu Rev Genet 1992, 26:71-112. Reizer J, Saier MH, Jr.: Modular multidomain phosphoryl transfer proteins of bacteria. Curr Opin Struct Biol 1997, 7:407-415. Hunter T: Signaling - 2000 and beyond. Cell 2000, 100:113-127. Koonin EV, Aravind L, Kondrashov AS: The impact of comparative genomics on our understanding of evolution. Cell 2000, 101:573-576. Rubin GM, Yandell MD, Wortman JR, Gabor Miklos GL, Nelson CR, Hariharan IK, Fortini ME, Li PW, Apweiler R, Fleischmann W, et al.: Comparative genomics of the eukaryotes. Science 2000, 287:2204-2215. International Human Genome Consortium: Initial sequencing and analysis of the human genome. Nature 2001, 409:860-921. Enright AJ, Ilipoulos I, Kyrpides NC, Ouzounis CA: Protein interaction maps for complete genomes based on gene fusion events. Nature 1999, 402:86-90. Galperin MY, Koonin EV: Who’s your neighbor? New computational approaches for functional genomics. Nat Biotechnol 2000, 18:609-613. Snel B, Bork P, Huynen M: Genome evolution: gene fusion versus gene fission. Trends Genet 2000, 16:9-11. Wolf YI, Rogozin IB, Grishin NV, Tatusov RL, Koonin EV: Genome trees constructed using five different approaches suggest new major bacterial clades. BMC Evol Biol 2001, 1:8. Pace NR: A molecular view of microbial diversity and the biosphere. Science 1997, 276:734-740. Teichmann SA, Mitchison G: Is there a phylogenetic signal in prokaryote proteins? J Mol Evol 1999, 49:98-107.

http://genomebiology.com/2002/3/5/research/0024.13

comment reviews reports deposited research refereed research

16. Woese CR, Kandler O, Wheelis ML: Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. Proc Natl Acad Sci USA 1990, 87:4576-4579. 17. Brown JR, Doolittle WF: Archaea and the prokaryote-toeukaryote transition. Microbiol Mol Biol Rev 1997, 61:456-502. 18. Tatusov RL, Koonin EV, Lipman DJ: A genomic perspective on protein families. Science 1997, 278:631-637. 19. Tatusov RL, Natale DA, Garkavtsev IV, Tatusova TA, Shankavaram UT, Rao BS, Kiryutin B, Galperin MY, Fedorova ND, Koonin EV: The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res 2001, 29:22-28. 20. Koonin EV, Mushegian AR, Galperin MY, Walker DR: Comparison of archaeal and bacterial genomes: computer analysis of protein sequences predicts novel functions and suggests a chimeric origin for the archaea. Mol Microbiol 1997, 25:619-637. 21. Aravind L, Tatusov RL, Wolf YI, Walker DR, Koonin EV: Evidence for massive gene exchange between archaeal and bacterial hyperthermophiles. Trends Genet 1998, 14:442-444. 22. Nelson KE, Clayton RA, Gill SR, Gwinn ML, Dodson RJ, Haft DH, Hickey EK, Peterson JD, Nelson WC, Ketchum KA, et al.: Evidence for lateral gene transfer between Archaea and Bacteria from genome sequence of Thermotoga maritima. Nature 1999, 399:323-329. 23. Doolittle WF: Lateral genomics. Trends Cell Biol 1999, 9:M5-M8. 24. Koonin EV, Makarova KS, Aravind L: Horizontal gene transfer in prokaryotes: quantification and classification. Annu Rev Microbiol 2001, 55:709-742. 25. Kyrpides NC, Olsen GJ: Archaeal and bacterial hyperthermophiles: horizontal gene exchange or common ancestry? Trends Genet 1999, 15:298-299. 26. Logsdon JM, Faguy DM: Thermotoga heats up lateral gene transfer. Curr Biol 1999, 9:R747-R751. 27. Dandekar T, Snel B, Huynen M, Bork P: Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem Sci 1998, 23:324-328. 28. Wolf YI, Rogozin IB, Kondrashov AS, Koonin EV: Genome alignment, evolution of prokaryotic genome organization and prediction of gene function using genomic context. Genome Res 2001,11:356-372. 29. Notredame C, Higgins DG, Heringa J: T-Coffee: A novel method for fast and accurate multiple sequence alignment. J Mol Biol 2000, 302:205-217. 30. Fitch WM, Margoliash E: Construction of phylogenetic trees. Science 1967, 155:279-284. 31. Felsenstein J: Inferring phylogenies from protein sequences by parsimony, distance, and likelihood methods. Methods Enzymol 1996, 266:418-427. 32. Adachi J, Hasegawa M: MOLPHY: Programs for Molecular Phylogenetics. Tokyo: Institute of Statistical Mathematics; 1992. 33. Kishino H, Miyata T, Hasegawa M: Maximum likelihood inference of protein phylogeny and the origin of chloroplasts. J Mol Evol 1990, 31:151-160.

interactions information