Evidence of recent interkingdom horizontal gene transfer between

0 downloads 0 Views 1008KB Size Report
Jun 24, 2008 - resistance by benign bacteria [4], and also to the gain of genes that confer the ability .... supported (100% Bootstrap support (BP)) clade with Bur- kholderia species ..... breakpoints [46]. .... media (data not shown). We therefore ...
BMC Evolutionary Biology

BioMed Central

Open Access

Research article

Evidence of recent interkingdom horizontal gene transfer between bacteria and Candida parapsilosis David A Fitzpatrick*, Mary E Logue and Geraldine Butler Address: School of Biomolecular and Biomedical Science, Conway Institute, University College, Dublin, Belfield, Dublin 4, Ireland Email: David A Fitzpatrick* - [email protected]; Mary E Logue - [email protected]; Geraldine Butler - [email protected] * Corresponding author

Published: 24 June 2008 BMC Evolutionary Biology 2008, 8:181

doi:10.1186/1471-2148-8-181

Received: 17 January 2008 Accepted: 24 June 2008

This article is available from: http://www.biomedcentral.com/1471-2148/8/181 © 2008 Fitzpatrick et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract Background: To date very few incidences of interdomain gene transfer into fungi have been identified. Here, we used the emerging genome sequences of Candida albicans WO-1, Candida tropicalis, Candida parapsilosis, Clavispora lusitaniae, Pichia guilliermondii, and Lodderomyces elongisporus to identify recent interdomain HGT events. We refer to these as CTG species because they translate the CTG codon as serine rather than leucine, and share a recent common ancestor. Results: Phylogenetic and syntenic information infer that two C. parapsilosis genes originate from bacterial sources. One encodes a putative proline racemase (PR). Phylogenetic analysis also infers that there were independent transfers of bacterial PR enzymes into members of the Pezizomycotina, and protists. The second HGT gene in C. parapsilosis belongs to the phenazine F (PhzF) superfamily. Most CTG species also contain a fungal PhzF homolog. Our phylogeny suggests that the CTG homolog originated from an ancient HGT event, from a member of the proteobacteria. An analysis of synteny suggests that C. parapsilosis has lost the endogenous fungal form of PhzF, and subsequently reacquired it from a proteobacterial source. There is evidence that Schizosaccharomyces pombe and Basidiomycotina also obtained a PhzF homolog through HGT. Conclusion: Our search revealed two instances of well-supported HGT from bacteria into the CTG clade, both specific to C. parapsilosis. Therefore, while recent interkingdom gene transfer has taken place in the CTG lineage, its occurrence is rare. However, our analysis will not detect ancient gene transfers, and we may have underestimated the global extent of HGT into CTG species.

Background Lateral or horizontal gene transfer (HGT) is defined as the exchange of genes between different strains or species [1]. HGT introduces new genes into a recipient genome that are either homologous to existing genes, or belong to entirely new sequence families. Large-scale genomic sequencing of prokaryotes has revealed that gene transfer is an important evolutionary mechanism for these organisms [2,3]. HGT has been linked to the acquisition of drug resistance by benign bacteria [4], and also to the gain of

genes that confer the ability to catabolize certain amino acids that are important virulence factors [5]. However there is much debate as to whether lateral gene transfer is an ubiquitous influence throughout prokaryotic genome evolution [6]. Until recently, the process of gene transfer has been assumed to be of limited significance to eukaryotes [7]. The availability of diverse eukaryotic genome sequence data is dramatically changing our views on the important role gene transfer can play in eukaryotic evolution. Page 1 of 15 (page number not for citation purposes)

BMC Evolutionary Biology 2008, 8:181

The rapid increase in fungal sequence data has promoted this kingdom to the forefront of comparative genomics [8]. Whereas there is some documented evidence for HGT between fungal species [9-17] or from bacteria to fungi [18-28] [see additional file 1], overall very few incidences have been identified. There are two possible explanations: either gene transfer is indeed extremely rare amongst fungi, or it has not yet been thoroughly studied. To address this question we investigated the frequency of successful recent interdomain HGT events between prokaryotes and yeast species belonging to the CTG clade. We chose this course of action as we expect recent interdomain HGT events to be more readily identified and supported than more ancient transfers. For the purposes of this study, we define CTG species as the immediate relatives of C. albicans, including C. tropicalis, C. parapsilosis, Clavispora lusitaniae, Pichia guilliermondii, and Lodderomyces elongisporus. These species have been completely sequenced, share a relatively recent common ancestor [29], and the codon CUG is translated as serine rather than leucine [30]. We used syntenic, phylogenetic and sequence based analyses to identify two cases of interdomain HGT between prokaryotes and C. parapsilosis, most likely involving the proteobacteria phylum. Our results suggest that extant CTG species do not readily take up exogenous DNA.

Results and discussion Identification of horizontal gene transfer candidates through Blast database search We compared all available CTG gene sets against UniProt using BlastP [31]. CTG genes with top database hits to bacterial species were identified as putative horizontally transferred genes and the resultant Blast files were inspected manually. A D. hansenii gene (protein ydhR precursor) with a top database hit to a bacterial sequence was not considered for further analsyes as it has previously been described [22]. After this process two genes from C. parapsilosis were considered for further analysis; one encodes a putative proline racemase, and the second encodes a member of the phenazine F superfamily. Related family members were identified by a second round of database searching against GenBank to ensure all available genomic data was utilized. Proline racemase phylogeny and characterization The C. parapsilosis gene (designated CPAG_02038) is most similar to a proline racemase homolog from Burkholderia cenocepacia AU 1054 protein (66% pairwise identity; Figure 1A). Amino acid racemases catalyze the interconversion of L- and D-amino acids by abstraction of the αamino proton of the enzyme bound substrate [32]. CPAG_02038 lies within a large contig and is also present

http://www.biomedcentral.com/1471-2148/8/181

in a previously published genome survey of C. parapsilosis [33], suggesting its presence does not the result from contamination. We could not locate any related genes in any other CTG genome (using BlastP or TBlastN). Family members are widely distributed throughout the prokaryotes however, and are also located within the Pezizomycotina. We extracted 321 putative proline racemases from 207 organisms, including members of the α, β, γ, and δ-proteobacteria, Actinobacteria, Fungi, Protozoa and Metazoa. Numerous species were found to have several family members [see additional file 2]; all were included for complete comparative purposes. A maximum likelihood (ML) phylogeny was reconstructed from an alignment of all the PR proteins (Figure 2). There are a large number of polytomies displayed in Figure 2. These probably result from duplication of PR genes followed by diversifying selection, leading to a high degree of sequence heterogeneity. For example, Agrobacterium tumefaciens str. C58 contains three PR homologs [see additional file 2], with an average amino acid pairwise percentage identity of ~31%. Burkholderia cenocepacia AU 1054 contains 2 proline racemase homologs [see additional file 2], which are only 28% identical. To help resolve the evolutionary history amongst PR homologs we reconstructed an additional ML phylogeny based on a reduced dataset (Figure 3). We also reconstructed a Bayesian phylogeny using the heterogeneous CAT site model. The CAT model can account for site-specific features of sequence evolution and has been found to be more robust than other methods against phylogenetic artifacts such as long branch attraction [34]. The resultant Bayesian phylogeny is highly congruent with the ML phylogeny (not shown). The putative C. parapsilosis PR homolog lies in a strongly supported (100% Bootstrap support (BP)) clade with Burkholderia species (Figures 2 &3 clade-A). Burkholderia are βproteobacteria. However, no other β-proteobacteria, or indeed any other bacterial genus were found within cladeA (Figures 2 &3). Although no PR homologs were identified in other CTG species, or indeed in any other of the Saccharomycotina, there are homologs in family members of the Pezizomycotina. A Pezizomycotina specific subclade is evident in our phylogeny containing Phaeosphaeria nodorum, Aspergillus niger and Gibberella zeae (Figures 2 &3 clade-B 100% BP). This subclade is found in a strongly supported clade with members of the Actinobacteria (Figure 2 100% BP), containing Brevibacterium linens and an unclassified marine actinobacterium and excluding Rubrobacter xylanophilus (Figure 2 87% BP). This suggests that these Pezizo-

Page 2 of 15 (page number not for citation purposes)

BMC Evolutionary Biology 2008, 8:181

(A)

http://www.biomedcentral.com/1471-2148/8/181

(B)

CpPR BcPR

0 MNQ DR LIS T I ET H T G GE P F R IV T S G L P RL K 0 MKI SR SLS T V EV H T G GE A F R IV T S G L P RL P

CpPhzF PlPhzF

0 M S SA - F K QV D V FTSK P F KG N P VAV I M D AN 0 M S LV P F K QV D V FTHR P F KG N P VAV V M D AQ

CpPR BcPR

30 GDT IV AKR T W IK T H H DE I R K FL M Y E P R GH A 30 GDT IV QRR A W LK A H A DE I R R AL M F E P R GH A

CpPhzF PlPhzF

29 L S TE Q M Q TI A N WTNL S E TT F V FPA T S D KA 30 L S SI Q M Q GI A N WTNL S E TT F I LPA E N P LA

CpPR BcPR

60 DMY GG YLV D S VS D D A DF G V I FL H N E G Y SD H 60 DMY GG YLT E P VS P N A DF G V I FV H N E G Y SD H

CpPhzF PlPhzF

59 Y Y VR I F T PQ S E LPFA G H PT I G TCH A L L ES 60 Y R VR I F T PG S E LPFA G H PT I G TAH A L L EA

CpPR BcPR

90 CGH GI IAL A S TA V K L GW V E R TQ P K T R V GI D 90 CGH GV IAL S T AA V E L GW V Q R TV P E T R V GI D

CpPhzF PlPhzF

89 L I SA K D G VV V Q ECGA G L VK L T IVP G - - -90 L I QA R E G RI V Q ECGA G L IT L N VTE R D E GQ

CpPR 120 APC GF IEA F V KW D G E KV G N V RF V N V P S FM Y BcPR 120 APC GF IEA F V QW D G E HA G P V RF V N V P S FI W

CpPhzF 115 S T SF E L P DP V I TPLS D T QI N S LET D L G CA PlPhzF 120 L I TF E L P EP T I TPLS S E QI D R LES I L D CP

CpPR 150 IKD AT VDT P S FG E V I GD I A F GG A F Y F Y MN S BcPR 150 RRD VS VDT P S FG T V T GD I A Y GG A F Y F Y VD G

CpPhzF 145 D K NL R P A LV D V GARW I I AR V A DAK T V L SA PlPhzF 150 D R AL T P A LI D V GARW I V AH T T GAE A V L AT

CpPR 180 ASL DI PIG L S QV E T L RR L G N EV K I A A N KK Y BcPR 180 APF DL PVR E S AV E K L IR F G A EV K A A A N AT Y

CpPhzF 175 P A FP Q L A KH N N ELKA T G VS I Y GNW S D - -PlPhzF 180 P D YA R L L EH D T QMNI T G VC L Y GAY H E G AE

CpPR 210 KVV HP EIA E I NH V Y G TI I D N IS P D G G A SQ S BcPR 210 PVV HP EIP E I NH I Y G TI I A N AP R H A G S TQ A

CpPhzF 202 H I EV R S F AP A C GVDE D P VC G S GNG A V A AF PlPhzF 210 D I EV R S F AP S C GVNE D P VC G S GNG S V A AF

CpPR 240 NVC IF ADR Q V DR S P T GS G T A GR A A Q L F AR G BcPR 240 NCC VF ADR E V DR S P T GS G T G GR V A Q L Y QR G

CpPhzF 232 R S HT N - - -E D K ILKS S Q GS V V NRE G A L KL PlPhzF 240 R H HK V A M ID D K IVHS S Q GK K L GRQ G S V WL

CpPR 270 ELQ VG QVF T N ES I V G SV F S A KV V K Q V K YH G BcPR 270 LLA AG DTL V N ES I V G TV F K G RV L R E T T VG D

CpPhzF 259 I S NK K V L VG G D AVTC I E GK I R IT PlPhzF 270 H S DG K I F VG G S AVTC I N GT I T I-

CpPR 300 FDA VI PEV E G NA N V I GY A N W IV D P D D E IG K BcPR 300 FPA VI PEV E G SA H I C GF A N W IV D E R D P LT Y CpPR 330 GFL VR E BcPR 330 GFL VR -

Figure with A) AnMUSCLE alignment 1 of PR proteins from C. parapsilosis (CpPR CPAG_02038) and Burkholderia cenocepacia (BcPR) was generated A) An alignment of PR proteins from C. parapsilosis (CpPR CPAG_02038) and Burkholderia cenocepacia (BcPR) was generated with MUSCLE. These proteins are 66% identical B) An alignment of PhzF proteins from Candida parapsilosis (CpPhzF CPAG_03462) and Photorhabdus luminescens (PlPhzF), these are 61% identical. mycotina species obtained their PR gene from the Actinobacteridae subclass rather than the Rubrobacteridae subclass. This transfer event is another independent HGT event of a PR gene into fungi, and we hypothesize it occurred early in the Pezizomycotina lineage, as it is shared by three distantly related species. Its patchy phyletic distribution suggests it has been subsequently lost in other Pezizomycotina species. There are also PR homologs in the Metazoans. These are found in a eukaryote clade that also contains a number of Pezizomycotina representatives (Figures 2 &3 clade-C 93% BP). Several scenarios can explain this phylogenetic positioning. Firstly, the PR gene may have been present in the last universal common ancestor of all eukaryotes but has been differentially lost in all lineages except those leading to modern day Metazoa and Pezizomycotina. Alternatively, an ancient gene transfer from bacteria to the last common ancestor (LCA) of Metazoa and Fungi could have occurred, with subsequent gene loss amongst different Metazoan and Fungal lineages. A third hypothesis is

that two independent gene transfers have occurred into the Metazoan and Pezizomycotina lineages from unsampled bacterial donors. Finally, a transfer from unsampled bacteria into one of the eukaryote clades (either Metazoa or Pezizomycotina) may have occurred with subsequent transfer from one eukaryotic group to the other. A. niger, A. oryzae and G. zeae all contain multiple PR homologs [see additional file 2]. One A. niger, one G. zeae and the three A. oryzae PR homologs are nested in a strongly supported Pezizomycotina specific subclade (Figures 2 &3 clade-D 100% BP). This subclade if found within a larger predominately proteobacterial clade (Figure 2 74% BP). This infers that there was an independent gene transfer event of a bacterial PR homolog into an ancestral Pezizomycotina species. The phylogenetic position of the C. parapsilosis PR homolog (Figures 2 &3) resemble that described for the adenosine deaminase (ADA) gene in the Dekkera bruxellensis genome [21]. In that analysis, the authors suggest

Page 3 of 15 (page number not for citation purposes)

http://www.biomedcentral.com/1471-2148/8/181

Bacillus cereus E33L Bacillus cereus W Bacillus cereus AH187 41 Bacillus cereusCG92 ATC 10987 Bacillus cereus eus G9842 Bacillus cer 14579 eus ATCC 4 Bacillus cer us cereus B42634 Bacill AH11 4 cereus Bacillushanensis KBAB 41 step us C3-905 haeric 14 s weihen B Bacillu 7-99 cillus sp us sp Bacills NVH0599727 Lysiniba u str mes s cere A nsis Bacillu thuringieracis str AH820 s 3L nth llus Baci acillus a us cereureus E3kam B e ill Bac cillus c tr Al HaH187 Ba nsis s us A 0987 cere C 1 241 ngie thuri acillusus ATCus G911344 B ere illus cere s AH 426 9 c Bac illus illus reu s B 57 Bac Bac lus ce cereuCC 1498424 il s G B Bac acillu us ATreus KBA ndii B ere ce sis kla MF s c lus en tic Y s s A cillu Bacil phan ium nes Q hIL 30 Ba ste strid ige sp O ile 6 Bf d en eih Cloallire ium difficlinum 50204 3 t id 4 sw me lostr iumbotu CC 35781 r cillu e s id 3 C Ba ilu str m . AT CC 1 ren e r ph Clo tridiu str ATDSML B ren L2 9 ali s A ns C B B 0 Alk Clo num de tena ain CL ens 32 sis 3 in 2 r tuli sc ica st ain lin C en 8 bo m ng zi s t r m C a n 3 8 0 ) m idiu lo cru zi riu AT hig PB 46 3(2 idiu str rea a cru cte m ic CP A A str Clo Do som a iba aru r m N lis M lor m v o in te i n ti o Clo an so re on ac tra i lic yp no B lm vib s s erm coe T r ypa sa l a si av s Tr m C ne n es yce a c ig my tom ich pto rep r m tre St S

(E)

ni

c ba

te

riu

ac

te

ib av Cl

Re

Sa

cc

ha ro po ly br sp ob M o e ac B ra M th et yl te re er r x vib yt Ha hylo oba M loa ba cte e yla a c hra Rh t c h r r c u t e i u y n op t e r e a od ob R la mrium m n loba hiluium NR o ac h o R te do ari no du cte s Dline L Ps Fer r sp bac smo dula lans rium SMns 233 eu r o p ha te rtu n O s 9 B 8 do la ero r s i A s O R p 94L2 mo sm id ph T R S 44 1 Er Pse na a a es a e CC S 206 6 win u s 2 c A ro ia dom aer idar TC ide 430060 0 ca o ug m C s 2 4 9 rot na in an 17 4 Bu ovor s ch osa us f 029 1 rkh a S lor C3 er1 old CR ora 71 eria I1 ph 9 sp 043 is 38 3 Ru

Bacillus thuringiensis str 9727 Bacillus thuringiensis str Bacillus cereus AH820 Al Hakam Bacillus anthracis str A2012 Bacillus anthraci s str Ames Burkholde Burkhold ria ambifaria MC40 6 Maricau eria multivorans Rhodos lis maris MCS1 ATCC 17616 0 Burkho pirillum ru Bordet lderia xenobrum ATCC 11 vo 170 el la petri ra Met i DSM ns LB400 Parahylobact 12804 Agro coccus erium sp Rhiz bacteriu denitrifica446 ns PD Rh obium m tum 1222 Rh izobium legum efacie Rh izobiu legu inosaru ns str C Mic izobiu m leg minosa m WS 58 Alg roscil m etli uminosa rum bv M1325 vic rum Ro orip la m CFN WSM iae 384 Fla bigin hagus arina 42 1 230 K vob itale sp P ATC 4 Psordia acte a bifo R1 C 2313 4 Fla ychr algic riales rmata o b v id HT L ac o fle a Doeeuwbacte xus OT terium CC25 1 M kd en ria tor HT 01 CC O eso onia hoe les quis 217 B chr rhiz do kie bac AT 0 B ruc oba ob ng lla b teriu CC B ruc ella ctr ium hae lan m 700 Br ruce ella suisum loti nsis dens ALC 755 Au uce lla su A ant MA ME is M -1 D H r ll m is T hr F E Oc oef anti a ab elit 133 CC2 opi A F303 134 D21 7 O e le m o e 0 3 4 TC 0 9 45 C 9 F c a a on rtu ns 49 P ul ea nib ph as s is 1 18 S a vim nic ul ot s bio 6M 8 S in ra a o bu otr p S va M ino orh coc rinala g s in oph I85 r 1 s ar rh iz cu p ra d o ic in iz ob s e nu lif a D 9A tr 99 om o iu de la lo ex F 1 41 on bium m nitr gi H sus HE L43 as m me ific TC HT L4 s p e l d i c an C 2 C C 5 M ilot ae s P 506 25 ED i 1 W D1 16 12 021 SM 22 41 2 1 9

BMC Evolutionary Biology 2008, 8:181

3 63 10 9 6 22 23 D 0 M 81 C I T R 3 KC us AQ is tic s ns oly ticu 1 4H 2 u e m ly G0 a 3 a D 7 e j ae mo h 12 rae cat TW c h e la ra ha 5 us th ni m 5 B el pa ara x2 lytic rery s tu eriu h o E B2 34 h a t a i p H ibr rio sp gino syc on bac V4 is S 700 B4 V ib rio al p rom les a P ns CC -E V ib rio llia lte da ic ne AT AW 08 V ib we oa na loih azo na is H 519 3 V ol ud o lla m lea ns C C se rom ne a a ea xe TC WEB P lte wa nell a p alifa yi A HA A he wa ell a h od is T99 1222 S he an ell wo imin a K PD S hew an ella sed thic ans 2 S hew an lla en rific BL 306 S hew ane lla b enit ens str S hew ane us d lin HA1 citri S hew occ rium sp R A1 pv S arac acte cus p RH podis P revib coc us s xono B odo cocc as a a e Rh odo mon oryz Rh ntho illus niger Xasperg illus ae PH1 A perg a ze zae As berell s ory e Gibpergillu s oryza ED193 4 As pergillu ter sp M K20926 WSM230 As seobac er sp S sarum Ro obact gumino n6076 1 Roseobium le itatus Elli ii DSM1394 Rhiz acter us stenholz LHE1 Solibeiflexus ca ehrlichei M Ros lilimnicola M3043 297 Alka ea sp MED salexigens DS Reinek halobacter 893 Chromo acter algicola DGVT8 ob lei Marin acter aquaeo Marinob sp ELB17 Marinobacter putida W619 Pseudomonas Mesorhizobium sp BNC1 WSM2304 Rhizobium leguminosarum Burkholderia ambifaria MC406 Burkholderia multivorans ATCC1761 6 Agrobacterium tumefaciens str C58 Rhizobium leguminosarum bv viciae 3841 Rhizobium leguminosar um WSM1325 Rhizobium etli CFN 42 Rhizobium Mesorhiz leguminosarum WSM2304 Rhodobacobium loti MAFF 303099 terales ba Paraco Rhizo ccus denitrifi cterium HTCC Ochrobium legumin cans PD1222 2150 osarum Bruce bactrum Bruce lla ovis A anthropi AT WSM2304 Bru lla suis TCC2584 CC49188 0 Bru cella ca 1330 alphcella menis ATCC a pro litensi 2336 5 teob Pse acte s 16M rium Pse udom BAL Ps udo onas 199 Pseeudommonas aerug in a u o Pse dom nas erugin osa P A udo ona aeru o mo s ae gino sa UC 7 S n as ru B s Ja ilicib aeru ginos a C371 PPPA1 P nna acte 4 gin a 219 9 Rhsychr schiar pom o 2 sa R od om sp er PA O1 R ose oba ona CC oyi D Rooseoovar ctera s ing S1 SS3 St se var ius les rah S ap ov iu sp ba am P ilic pia ariu s s . T cte ii 3 P ha iba ag s n p 2 M103 rium 7 R ha eo cte gr ub 17 HT 5 CC A os eo ba r s ega inh 22 Si gro eob bac cter p TM ta I iben 55 A S n b a te g sI S O in or ac ct r g all 10 M O ce orh hizo teri er s alla aec 40 1261 M u ie c P l e an i z o b i m p A e c n 4 es an ib bi um tu z ie sis io ico ulb um m m w3 nsis 2. cy la u m e ef B 1 BS 0 st g s in e dic aci is ra d lil ae en 10 7 pa n u oli o t i W s ci lo fex 10 S str fic su H 21 M C 41 5 8 a s E 9 SI HT L4 R 5 C -1 C 25 16

(D)

Mus Mus muscu m lu ulus x Ra Mus m usculu s ttus n uscu s s Pan tr orvegiclu oglod us Macac ytes a m ul Hom atta Equuso sapiens Bos gr unnien s x Boscaballus taur Ornithorhy Canis familia us ris nchus an atinu llus gal s MonodelphiGa s domestilus ca Silurana Danio frankei Tetraodon nigroviridis Strongylocentrotus purpuratus Strongylocentrotus purpuratus Strongylocentrotus purpuratus Nematostella vectensis sis Nematostella vecten vectensis Nematostellatus A1163 fumiga Aspergillus fumigatus Af293 1 Aspergillusfischeri NRRL 18 4 a H262 Neosartoryillus terreus NIs niger illu Asperg Asperg FGSC A4 ns la 43 du DFL42 0 illus ni Asperg ototrophicantiacus J1 RS1 aura a ph Hoefle oroflexus iflexus sp -EB4 Chl Rosensis HAW70034058 C xe 19 halifa ana ATCTCC 5 EB3 lla e an ale 4 yi A AW Shew nella pea wood inis H ica PV2B a ll B im ih Shew hewane lla sed ella lo nsis S 34H e a n e 0 S wan Shewamazon thrae C217 4 ry She C a 13 T re ella sych um H C 23 -1 p C 99 IR wa n ri 75 She lwellia bacte ina ATifica Silosis 5 92 Co ales mar pac raps M81 0 0 ri cte oscilla cystisida pam ST LB4 06 a 1 b 7 s r 7 7 vo Mic lesio and matu ran MC4383 Fla C phy ovo ria sp 184 P a 4 n eria xe bif eria PC 05 99 old ria am old cia 1 3 rkh lde ria kh pa AU C0 1 Bu rkhoholde Bur oce acia ia M 176 is n BuBurk ce cep epac CC ens JN T on s D eria eno oc s A ub s P M 15 r old c en rkh eria ia c oran eria man AM SN ige 1 a Bu hold lder 2 n ltiv old ofir aci rum s PH L 1 o rk mu urkh hyt cep odo gillueae ns B 0C Bu urkh ia B a p ia n er z e 2 B er ri er ia sp la lin SC old de old a e r A rel m PH h l k o r h h h be iu Bu rk urk s p ib er m G act er i u Bu B a e o ib ct Ph ev a Br inob t ac e in ar m

(C)

musc

64 99

ba

ct

er

lu

cil

Ba

ro

ub

R

(B)

1 TO 2 01 ii T 79 n d EF e 37 mo ia Pf5 C2 lau en s is en 568 TC sp r e sc s s A ub 8 te re an cu s s M9 ac luo cul tia en i D K5 o b f a a n s c l e i s 45 4 6 hr as am ur ne al ent 36 22 8 ep o n te a i o m o l M M 97 i n m r o o n l um u d h a l S Q D 17 rm udo ia p iph us pse ryo na us U 1 C Ve se rrat etos abd ria r c ari lob SH TC A P e rp rh de cte m rig a 5 S e to ol ba ula cu ltic nnii 81 H ho kh ro ell bs ba ma nnii TM 00 S 4 P ur ch pir a o la au a B y o t llu b um um LB JN Ps last ma pire cter r ba mat rans s Ps 7 B em do ba cte hy vo an PA 19 p no irm sa 37 G o to a 14 Rhcineetob eriaia xe ytof gino sa C PA PP A cin hold er ph eru gino A rk old ria s a ru D3 UCB 43 Buurkhholde ona s ae age osa PAO1 SM 30 2 h gin sa m a D B k p n r o 247 Buseuddomoonas aeruugino igens TCC1 P seu om nas aer salex m A P seud omo nas ter ceu fO1 P eud omo lobac viola ens P 48 Ps ud ha rium resc ila L Pseromo bacte fluo oph Ch romo monass entoma GB1 Ch eudo mona putid F1 Ps eudo onas putida T2440 13 Ps eudom onas utida K 1622 C 3 39 Ps udom nas p us DK r ATC Pse udomous xanth pestris st r 306 Pse occ cam dis st 0 c o s p o Myx thomonaas axono stris str851 Xan thomon campe LS256 18 Xan omonas oryzae B F 3110 CC10331 as Xanth ae MAF omon KA Xanthhomonas oryz ae pv oryzae Xant omonas oryz iensis BS107 Xanth acter gallaecciensis 2.10 Phaeob er gallae Phaeobact sp MED193 Roseobacterr sp SK20926 Roseobacte a IAM 12614 Stappia aggregat 1 Marinomonas sp MWYL Roseovarius sp 217 Roseovarius sp. TM1035 Silicibacter pomeroyi DSS3

(A)

100 97

xy la n Ba op ci hilu Ba st l h cil B lus s D lu a ce S Ba Bac urin cil illu gie Ba s ce cill re M Ba lus s ns ci re us us 99 llu u c E 4 cil Ba a i s c n lus cil Ba er th s s s er 3 1 thu lus cillu eu rac tr A cer AH eu 3L Ba Bacil rin cer s s A is l H eu 82 s cillu lu s g e c T s s w s thu Ba ien us ere CC tr A aka W 0 s Ba eihe ring cillu is A ATC us A 14 me m Ba cillu nste iens s ce TC C 1 H1 579 s cillu s c ph is reu C 3 09 87 s t ereu ane ATC s G 56 87 h 4 Ba Bacil urings NV nsis C35 98426 cillu lus ien H0 KB 64 Bu A 6 s 5 s c Bu rkhold Bac cere ereu is st 97-9 B4 Burkrkhold eria millus us H s AHr 972 9 all cer 30 11 7 Burk hold eria hold eria malle ei GBeus G81.9734 eria pse i AT 8 h 92 B o u Burk urkho pseud doma CC 2 rse 441 Burk holde lderia omalle llei 173344 ria p pse hold i K9 10 b u Burk eria tha seudomdomall 6243 Burk holde ilande allei ei 14 hold ria th nsis DM9 Burk e a Burkh holderia ria thailailandenTXDOH8 nd sis olderi o a th klahom ensis Bt4 Burkh ailanden ensis EOE264 Burkho olderia ce sis MSM 147 pa ld B 3 Burkho eria ambifa cia AMM4D Burkhold lderia vietn ria MC40 6 eria am Burkhold cenocepacia iensis G4 eria ceno AU cepacia 1054 Burkholderi MC03 Burkholderia a sp 383 Burkholderia dolosa AUO158 dolosa Burkholderia ubon AUO158 ensis Bu Burkholderia multivorans ATCC 17616 Silicibacter sp TM1040

Mus

Fungi α-proteobacteria β-proteobacteria γ-proteobacteria Actinobacteria Trypanosoma Metazoa Firmicutes miscellaneous bacteria

Figure racemase Proline 2 maximum likelihood phylogeny Proline racemase maximum likelihood phylogeny. The optimum model of protein substitution was found to be WAG+G. The number of gamma rate categories was 4 (alpha = 1.163). Bootstrap resampling (100 iterations) was undertaken and are displayed. For display purposes branches with less than 50% support were collapsed. Letters (A-E) in parentheses are used to distinguish clades and are discussed in the text. Branches are colored according to their taxonomy. Fungal branches and species names are colored green.

that D. bruxellensis and Burkholderia species received the ADA gene from a species not yet represented in the public sequence databases. Our PR phylogeny suggests a similar event may have occurred within clade-A, which contains

only C. parapsilosis and Burkholderia species (Figures 2 &3). Burkholderia species are known to have a genomic repertoire that allows the transfer and receipt of exogenous DNA [35] and a number of studies have reported success-

Page 4 of 15 (page number not for citation purposes)

BMC Evolutionary Biology 2008, 8:181

http://www.biomedcentral.com/1471-2148/8/181

100

100

100 95

55 55 100

100 100

55

95

90

75

70 100 100

100

100

100 100

100 95 65

100 95

100

100

100 100 100 100

90

95

95 100 100

100 60 100

95

65 100 55

90 100

Mus musculus Rattus norvegicus Pan troglodytes Homo sapiens Aspergillus nidulans Aspergillus flavus Aspergillus niger Aspergillus terreus Neosartorya fischeri Aspergillus fumigatus Candida parapsilosis Burkholderia phymatum Burkholderia phytofirmans Burkholderia xenovorans Burkholderia Burkholderia cepacia Burkholderia ambifaria Burkholderia cenocepacia Burkholderia cenocepacia Burkholderia cenocepacia marine actinobacterium Brevibacterium linens Gibberella zeae Aspergillus niger Phaeosphaeria nodorum

DRSPTGSGVTARIAL DRSPTGSGVTARIAL DRSPTGSGVTARIAL DRSPTGSGVTARIAL DRSPTGSCVTARLAL DRSPTGSCVAARVAL (C) DRSPTGSCVTARMAL DRSPTGSCVTARMAL DRSPTGSCVTARMAL DRSPTGSCVIARMAL DRSPTGSGTAGRAAQ DRSPTGSGTGGRVAQ DRSPTGSGTGGRVAQ DRSPTGSGTGGRVAQ DRSPTGSGTGGRVAQ DRSPTGSGTGGRVAQ (A) DRSPTGSGTGGRVAQ DRSPTGSGTGGRVAQ DRSPTGSGTGGRVAQ DRSPTGSGTAGRVAQ DRSPCGSGTCARIAT DRSPCGTGTSARVAA DRSPCGSGTASRIAV (B) DRSPCGSGTCARIAT DRSPCGSGTSARLAI

Agrobacterium tumefaciens Paracoccus denitrificans Brucella melitensis Brucella suis Silicibacter Burkholderia xenovorans Burkholderia phytofirmans Burkholderia phymatum Burkholderia ambifaria Burkholderia cepacia Burkholderia multivorans Burkholderia cenocepacia Burkholderia cenocepacia

DRSPTGTALSARMAV DRSPCGTGTSARMAQ DRSPCGTGTSARMAQ DRSPCGTGTSARMAQ DRSPCGTGSSGFVAC DRSPCGTGTSAKLAC DRSPCGTGTSAKLAC DRSPCGTGTSAKLAC DRSPCGTGTSAKLAC DRSPCGTGTSAKLAC DRSPCGTGTSAKVAC DRSPCGTGTSAKVAC DRSPCGTGTSAKVAC

Paracoccus denitrificans Brucella suis Brucella melitensis Psychromonas ingrahamii Agrobacterium tumefaciens Silicibacter Gibberella zeae Aspergillus oryzae Aspergillus oryzae Aspergillus flavus Aspergillus niger Aspergillus flavus Aspergillus oryzae

DRSPTGTGCSARMAV DRSPTGTGCSARMAV DRSPTGTGCSARMAV DRCPTGTSVSARMAI DRSPTGTGCSARMAV DRSPTGTAVSARMAL DRSPCGTGSSSRLAI (D) DRSPCGTGSSARMAV DRSPCGTGSSARMAV DRSPCGTGSSARMAV DRCPCGTGSCARMAV DRCPCGTGSSARMAV DRCPCGTGSSARMAV

Figure 3Proline racemase maximum likelihood phylogeny with active site alignment Reduced Reduced Proline racemase maximum likelihood phylogeny with active site alignment. Bootstrap resampling (100 iterations) was undertaken and percentages are displayed. Fungal branches are shown in green. An alignment around the active site is also displayed. Clade letters in parentheses correspond to those in Figure 2. The phylogeny is rooted around the Metazoan/Pezizomycotina specific clade (clade-C), all members of this clade have a threonine at the active site. C. parapsilosis and its phylogenetic neighbors have a threonine instead of a cysteine at the active site (clade-A). A. oryzae, A. niger and G. zeae all contain cysteine at the active site (clade-D). A. flavus, A. oryzae, A. niger, A. nidulans and G. zeae also have cysteine at the active site (clade-C).

Page 5 of 15 (page number not for citation purposes)

BMC Evolutionary Biology 2008, 8:181

ful gene transfers into Burkholderia species [36,37]. It is possible therefore that there have been other successful gene transfers into this bacterial lineage. The vast majority of amino acids found in living cells correspond to the L-stereoisomer [38]. However, D-amino acids are long known to be found in the cell walls of Gram positive and negative bacteria, where they are essential components of peptidoglycan [39]. Apart from low levels of D-amino acids derived from spontaneous racemization as a result of aging [40], it was assumed that only L-amino acid enantiomers were present in eukaryotes [41]. However, recent studies have reported the presence of numerous D-amino acids in an array of organisms, including mammals [42]. The first eukaryotic (proline) amino acid racemase has recently been described from the human pathogen Trypanosoma cruzi [43]. A high degree of sequence similarity was observed between the T. cruzi and bacterial homologs [43]. Our phylogeny infers that T. cruzi obtained its PR homolog through interdomain HGT from a member of the Firmicutes (subclass Clostridia), as it is grouped beside members of this group with a high degree of support (Figure 2 clade-E 96% BP). We performed database searches [44], against other Protozoan genomes including Trypanosoma brucei, Trypanosoma congolense and Trypanosoma annulata. We failed to locate a homolog in all species except for T. vivax. Previous analysis has shown that T. cruzi and T. vivax are not each others closest phylogenetic neighbors, relative to the other species sampled [45]. This suggests an ancestral Trypanosoma gained the PR gene and multiple losses in different Trypanosoma lineages has subsequently occurred. Gene order around PR homologs The C. parapsilosis PR homolog lies close to an ortholog (CPAG_02041) of orf19.1135 from C. albicans (Figure 4). The gene order to the left of this ORF is conserved in all CTG species, the order to the right is conserved in most CTG species apart from C. parapsilosis and L. elongisporus. C. parapsilosis and L. elongisporus are closely related [29], and an examination of synteny suggests that the PR gene (together with a second ORF, cpar5437) were inserted between CPAG_2041 and CPAG_2037 (Figure 4). cpar5437 encodes a neutral amino acid (AA) transporter. The presence of an AA transporter beside the PR homolog is interesting. If the putative proline racemase has a role in amino acid metabolism, then the presence of the transporter may be the result of an adaptive translocation to enhance the activity of the PR gene. Unlike the PR ORF the AA transporter is fungal in origin. Most CTG species contain a single neutral AA transporter; however C. parapsilosis and D. hansenii have four.

http://www.biomedcentral.com/1471-2148/8/181

We located tRNA genes for nearly all CTG species beside the large conserved syntenic block (Figure 4). It has been shown that tRNA genes are associated with genomic breakpoints [46]. We hypothesize that a genomic rearrangement has occurred at this site in the LCA of C. parapsilosis and L. elongisporus. We cannot determine if the bacterial PR homolog was inserted into the LCA of L. elongisporus/C. parapsilosis and subsequently lost in L. elongisporus, or gained by C. parapsilosis after speciation. We also investigated the gene order around the Pezizomycotina PR homologs [see additional file 3]. Gene synteny around the PR homologs found in clade-D (Figures 2 &3) is not conserved (not shown). Interestingly however, both A. niger and G. zeae in clade-D (Figures 2 &3) have genes containing a FAD dependent oxidoreductase domain in close proximity to their PR homologs (not shown). According to Pfam [47], FAD dependent oxidases include D-amino acid oxidases, that catalyze the oxidation of neutral and basic D-amino acids into their corresponding keto acids. The presence of these oxidases may be another example of an adaptive translocation to enhance the activity of the PR gene in these Pezizomycotina species. A. oryzae has three PR homologs (Figures 2 &3 clade-C). All of these have orthologs in its close relative A. flavus (Figure 3 clade-C), and synteny around these is conserved [see additional file 3 clade-D]. The remaining two species in clade-C are A. niger and G. zeae. There is no evidence of conserved gene order within these species, or with A. oryzae or A. flavus. Gene order around the A. flavus and A. terreus PR homologs found in the Metazoan/Pezizomycotina clade (Figures 2 &3) is also conserved [see additional file 3], as is the order between A. fumigatus and N. fishceri [see additional file 3]. We could not locate amino acid transporters or FAD dependent oxidases beside any of the PR homologs found in clades B or C (Figure 2). Proline racemase codon usage It has been shown that recently acquired genes often display an atypical codon preference when compared to other genes in the genome [48,49]. However, the transferred PR homologs have a codon usage consistent with the rest of their genomes [see additional file 4]. We undertook an analysis of variation in synonymous codon usage on all PR genes shown in Figure 2. Homologs from related species cluster together [see additional file 5]. For example, the Actinobactria, the Firmicutes and the Burkholderia species all inhabit unique areas in two dimensional correspondence analysis space [see additional file 5].

The majority of fungal and Metazoan PRs are clustered together [see additional file 5]. The C. parapsilosis PR homolog has a codon usage distinct from the other Pezizomycotina fungal PR homologs [see additional file 5],

Page 6 of 15 (page number not for citation purposes)

BMC Evolutionary Biology 2008, 8:181

http://www.biomedcentral.com/1471-2148/8/181

orf19.1139

orf19.1137

orf19.1136

orf19.1135 tRNA

C. albicans

C. albicans

C. albicans

C. albicans

Cd11060

C. dubliniensis

Cd11070

C. dubliniensis

Cd11080

Cd11090

C. dubliniensis

C. dubliniensis

CTRG_03469

CTRG_03468

CTRG_03467

CTRG_03466

C. tropicalis

C. tropicalis

C. tropicalis

C. tropicalis

tRNA

tRNA

2 528 orf1 bicans . al

1 528 orf1 bicans l .a C

C

1110 is Cd1bliniens

u C. d

0 110 is Cd1 liniens ub

3 346 G_0 is CTR opical . tr

C. d

C

4 346 G_0 is CTR opical . tr

C

g 121 F11 senii an

D. h

F11165g tRNA

F11231g

F11209g

F11187g

D. hansenii

D. hansenii

D. hansenii

D. hansenii

CLUG_03488

CLUG_03489

CLUG_03490

CLUG_03491

Cl. lusitaniae

Cl. lusitaniae

Cl. lusitaniae

Cl. lusitaniae

2 349 G_0 niae U L C a lusit

Cl.

PGUG_01866 P. guillermondii

PGUG_01867 P. guillermondii

PGUG_01868 P. guillermondii

PGUG_01869 P. guillermondii

CPAG_02044 C. parapsilosis

CPAG_02043 C. parapsilosis

CPAG_02042 C. parapsilosis

CPAG_02041 C. parapsilosis

LELG_00346 L. elongisporus

LELG_00345 L. elongisporus

LELG_00344 L. elongisporus

tRNA

LELG_00343 L. elongisporus

AA cp C. p ar5437 arap silos is

PR

CPA C. p G_0203 arap 8 silos is

CP C. p AG_02 arap 037 silos is LEL L. e G_003 long 4 ispo 2 rus

CP C. p AG_02 arap 036 silos is LEL L. e G_003 long 4 ispo 1 rus

Gene Figure order 4 around C. parapsilosis proline racemase gene Gene order around C. parapsilosis proline racemase gene. Species names and identifiers are shown in each box. Gene identifiers relate to annotations from the Broad Institute [66]. On the left hand side orthologous genes are stacked under one another in pillars. Relative positions of t-RNA genes are shown and may indicate a breakpoint. After the breakpoint, synteny is conserved between C. albicans, C. dubliniensis, C. tropicalis, D. hansenii and Cl. lusitaniae. Synteny between C. parapsilosis and L. elongisporus is conserved but differs to the other CTG species. C. parapsilosis has a proline racemase (PR CPAG_02038) and a neutral amino acid transporter (AA cpar5437) insertion in this region. cpar5437 is absent from the Broad gene list but present in our manual gene call.

which is unsurprising as C. parapsilosis belongs to the Saccharomycotina subphylum. The C. parapsilosis homolog is also separate from the Burkholderia (β-proteobacteria) genes with which it forms a closely related phylogenetic group (Figures 2 &3). This suggests that the gene may have

originated from a genome with no other close relatives among the species analyzed here. Proline racemase activity The PR active site from Trypanosoma cruzi, Clostridium sticklandii, Agrobacterium tumefaciens, Brucella melitensis and

Page 7 of 15 (page number not for citation purposes)

BMC Evolutionary Biology 2008, 8:181

Pseudomonas aeruginosa all contain cysteine at amino acid position 330 [43,50]. This amino acid is essential for enzymatic function, because substitution with serine abolishes activity [41]. However, PR homologs from human, mouse, Rhizobium and Brucella contain a threonine instead of a cysteine at position 330 [41]. We observed that cysteine is found in the equivalent position in many of the bacterial proteins. The Pezizomycotina PR genes found in clade-B and clade-D contain a cysteine at the active site (Figure 3). The PR homologs found in the Metazoan/Pezizomycotina clade (clade-B) have a threonine at position 330. Similarly, the C. parapsilosis PR homolog, together with its relatives from Burkholderia all contain a threonine (Figure 3). However, Burkholderia species have multiple PR homologs [see additional file 2] with a cysteine as the active site (not shown). It is not clear what effect the substitution has on enzyme activity. It has been suggested that homologs containing threonine at the active site are not true PRs [41], but may instead belong to a superfamily. We cannot detect any difference in the ability of C. parapsilosis, the other CTG species or any of the Pezizomycotina species to utilize D-proline as growth media (data not shown). We therefore cannot confidently infer the function of the PR homologs in the fungi analyzed here. Phenazine F phylogeny and characterisation The C. parapsilosis gene (designated CPAG_03462) is most similar to a Photorhabdus luminescens phenazine F (PhzF) protein with 61% pairwise identity (Figure 1B). Phenazines are biologically active compounds, all of which have a characteristic tricyclic ring system and have been shown to confer a selective growth advantage to organisms which secrete them, as they possess broad-spectrum antibiotic activity towards bacteria, fungi and higher eukaryotes [51]. In Pseudomonas, the best studied phenazine producer, PhzF is part of an operon required for the conversion of chorismic acid to phenazine-1-carboxylate (PCA) [52]. PhzF homologs were identified in most of the CTG species tested as well as several other fungal species. However, we could not identify a PhzF homolog in the L. elongisporus genome, even when multiple TBlastN and BlastN searches were used.

PhzF homologs were extracted from GenBank for subsequent phylogenetic analysis. In total 181 representative protein coding sequences distributed amongst 154 organisms were used. These taxa were distributed amongst α, β, γ and δ-proteobacteria, Actinobacteria, Fungi, Firmicutes a well as other bacterial groups. We aligned all sequences and reconstructed a PhzF ML phylogeny (Figure 5). The C. parapsilosis PhzF homolog is found in a clade with members of the β-proteobacteria (Burkholderia multiovorans, Burkholderia cepacia, Burkholde-

http://www.biomedcentral.com/1471-2148/8/181

ria ambifaria), α-proteobacteria (Roseovarius) and the γproteobacteria (Azotobacter vinelandii, Acinetobacter baumannii, Shewanella baltica and Photorhabdus luminescens) (81% BP). In contrast, all other PhzF homologs from CTG species are in a completely separate clade (Figure 5). These form a sister group (63% BP) to PhzF homologs from other Saccharomycotina species (C. glabrata, Saccharomyces cerevisiae, Kluyveromyces lactis and Vanderwaltozyma polyspora). All three clades are grouped together in a larger clade with high support (75% BP). The sister group relationship between the PhzF homologs from the Ascomycota and the proteobacteria clade is intriguing (Figure 5), as it suggests that an ancestral Saccharomycotina species gained the PhzF homolog from a proteobacteria. The bacterial PhzF gene has subsequently been retained after multiple speciation events, but lost in C. parapsilosis. We hypothesize that C. parapsilosis has recently reacquired a bacterial PhzF homolog from a proteobacterial source, as it is grouped (81% BP) within a proteobacterial subclade. To test this hypothesis we reconstructed constrained trees that placed C. parapsilosis together with the remaining Ascomycota species [see additional file 6 C-H]. The AU test of phylogenetic tree selection [53], showed that the original unconstrained tree (groups C. parapsilosis with proteobacteria) receives the optimal likelihood tree score, and the differences in likelihood scores when compared to the constrained trees [see additional file 6], are significant (P < 0.05). This is also supported by spectral analysis [see additional file 7]. Our phylogeny shows that the Schizosaccharomyces pombe PhzF homolog is found in a clade containing all CTG PhzF homologs (Figure 5 99% BP). Furthermore it is grouped beside D. hansenii (66% BP). S. pombe is not a member of the Saccharomycotina, it belongs to the Taphrinomycotina subphylum. The genome sequences of Schizosaccharomyces japonicus and Schizosaccharomyces octosporus have recently been completed [54]. We could not locate a PhzF homolog in S. japonicus but did locate a homolog in S. octosporus using a TBlastN search strategy. Phylogenetic analysis has shown that S. pombe and S. octosporus are more closely related to one another than to S. japonicus [55]. Therefore we hypothesize that the LCA ancestor of S. pombe and S. octosporus gained the PhzF gene from an ancestral D. hansenii-like species after speciation from S. japonicus. We reconstructed a constrained tree that placed S. pombe outside the Saccharomycotina clade [see additional file 6B]. The approximately unbiased test of phylogenetic tree selection (AU test) [53], showed that the phylogenetic inferences of the unconstrained tree are significantly better (P < 0.05) than the constrained tree [see additional file 6]. This infers that S. pombe has obtained a PhzF homolog from a member of the CTG clade.

Page 8 of 15 (page number not for citation purposes)

Pseudomonas fluorescens Pf5 violaceum Chromobacterium ns SPH1 vora Delftia acido cia AU 1054 pa 8 ria cenoce faciens str C55 e Burkholde JF m tu m cterium ilium cryptu p CcI3 Agroba h ia s 4a Acidip Frank i ACN1 ec p ln AN1 1 kia a Fran ia sp E ydis 5266 k Fran ago ma BS 79 278 C RS i1 il Ust osa O A glob m sp p BT 649 ezia zobiu ium s CC2 614 b HT JS 00 lass hi o Ma dyr p p C 2 hiz Bra radyr ter s es s AA JS4 1 B ibac ioid nae sp TC 6 d Jan car ave rax ens s A 24 No rax ovo esc licu FB 16 5 vo cid ur o p 02 L ido A ter a hen ter s S3 PA M4 p Ac ac ro ac SR us Z ob hlo ob s ic ilis thr c hr an ph b Ar ter Art oler tro mo c ot zo s ba di ia a ro ra d on th s er om Ar cu ct oc ba Zym oc eto ne ac n co lu G

Ki

M

et h C ylib Co ong ium Co ryn reg p ry eb iba etr ne a c o Br ba cter ter leip My Salin evib cter ium lito hilu i u a co i s g ra m Sa bac pora cteri m g luta lis PM l K t cc ha erium trop um l utammic T7 1 rop in u 1 oly ulce ica C ens icumm sp ora rans NB4 BL2 ery Agy 40 thr 99 ae a

Bu

06 r3 5 st s tr 8 i 8 od s s 6 01 op stri S25 311 1 n 3 L e o F 3 ax mp e B AF 10 as ca yza e M CC on nas or yza e KA s r m o a 6 ho m na s o yz 61 nt tho mo na or 17 P 17 Xa an tho mo nas sp 2 osis i AvO TCC X an tho mo us sil ndi s A 6 X an tho ari rap ela ran 40 X an ov pa vin ltivo MC D X ose ida ter mu faria 1 M R and bac ria mbi a AM TTO C zoto olde ria a paci ens esc 5 A urkh olde ia ce in B rkh lder s lum OS15 Bu rkho abdu ltica ii 7978 Bu otorh ella ba aumann ATCC 1 Ph ewan cter b mannii Sh etoba ter bau nnii Acin etobac r bauma Acin etobacte ta 89 ra in b e YJM7 c A a gla revisia Candidaromyces ce is Sacch omyces lact ra Kluyver ltozyma polyspo Vanderwa lusitaniae Clavispora 767 CBS Debaryomyces hansenii Schizosaccharomyces pombe 972h Pichia guilliermondii ATCC 6260 Pichia stipitis CBS 6054 Candida tropic alis Candid Candidaa dubliniensis Candid albicans SC 5314 Erwin a albican Erw ia caroto s Sod inia chry vora SCR I104 Ye alis glo santhem ssin i str 3 3 Bu rsinia id Bur rkhold frederik ius str m 937 e k s o ria en Bu ho rsita ns Bu rkho lderia phym ii ATC B rkh lder xen atum C 33 Buurkhoolderi ia ph ovora STM 641 815 ns B rk ld a u yto B urk hold eria bone firma LB40 0 ns ns P B urk hol eri sp sJN Bu urk h old deria a ce 3 83 is no B r ho e Bu urk kho lde ria c ceno cepa Bu rk ho lde ria en ce c rk ho lde ria do oce pac ia PC ho ld ria m los pa ia 1 ld eri v ult a A cia AU 84 er a ie ivo U ia am tna ra O MC 1054 1 ce bi m ns 58 0 3 pa far ien AT ci ia sis C a C AM MC G4 17 4 6 1 M 0 D 6

3 B4 SM 6 M 78 s C6 147 si en sis EO nd en is la m ns 9 ai o e e i th lah om all i 91 10 ia ok lah om alle B72 4 i d k er l d er i a o e u o m a l l e i 7 8 9 2 1 5 C e ho d ria ps ud m l rk ho lde ria pse udo all i BC Bu urk kho lde ria pse udom alle i 14 4 B ur kho lde ria se dom all e 3 34 p B r o e u 2 m Bu urkh hold deria pseeudo TCC 668 B urk hol eria ps ei A llei 0 5 B rk old ria all ma 3 Buurkh olde ria m eudo allei 96243 p s do m e i K B rkh l d e Bu rkho lderia pseu domall E264 Bu rkho lderia pseu nsis OH Bu rkho eria ailande sis TXD th old Bu den 222 h ia n 1 r k Bur kholde ia thaila cans PD Bur kholder denitrifi M4 160 Bur coccus mobilis Z DSM 2 Para omonas araonis nas ph Zym o 1 m o C Natron terium sp NR es QYMF Halobac lus metalliredigen g58 Alkaliphi ficile QCD 32 Clostridium dif Clostridium difficile 630 Herpetosiphon aurantiacus

rk ho H l a R de lo Ch als ria ba Rh ro ton ce cte od Sa mo ia no riu ob Sa ac lin linis bac eut cep m s t e r i s p p o t e ro a c p ale or ra riu ph ia N s b a a tro m a J M RC a re p vi M C 1 S De yn Ro cter nic ica olac P1 03 sul ec iu o C s e 3 fur h oc eoba m H la C NB um 4 C o m y y Cl an ct T N S 4 s o Clo ostridi nas a othe tis sp er s CC2 20 40 p strid u c 6 5 c ium m bei etoxi e sp PCC CCS 54 d j Clo aceto erinck ans CCY 680 2 DS 01 3 strid but i N 1 ylic ium Clos um CIMB M 68 0 d Desu tridium ifficile ATC 805 4 2 QC C bo lfit Cam pylob obacteriulteae A D-63q824 acter 4 m ha TCC fetus B 2 subs fniense YAA p 5 Bacillufetus 824 1 0 sp Clostrid ium sp umilus OhILAs Bacillus sp B14905 Rhodopirellu la baltica SH 1 Rhodopirellul a baltica SH1 Blastopirellula marina DSM 3645 Caldicellulosiruptor saccharolyticus

alph a alph proteob ap ac alpha roteobac terium teriu BAL pro Rhodo pseud teobacteri m BAL1199 omon u Rhodo as pa m BAL1999 pseudo lustris monas Rhodops T 9 eudomon palustris CGAIE1 as palust 009 ris BisB18 Rhizobium etli CFN 42 Rhizobium legum inosarum WSM1 325 Rhizobium leguminosarum 3841 Rhizobium leguminosarum WSM2304 Sinorhizobium medicae WSM419 liloti 1021 Sinorhizobium me DFL4343 ototrophica DA 110 Hoeflea ph S U m cu poni ED2979 obium ja a sp M 41 Bradyrhiz Reinekeicae WSM 021 1 ed m lo m li e ti 1330 biu m o iz m rh Sino izobiu a suis ropi Sinorh Brucellum anth 614 12 2 ctr roba ta IAM us Py h c O 01 ga hic ggre totrop uli BJ0llata a pia p te e au Stap bacter m po ttula sshiba S1 teriu Sagi cter CC 383 tho Xan ylobac sp p 83 ba th seo hia a s 3 3 Me oro asc eri sp 04 1 D in Jann hol d eria RI1 AO s ld C P u 1 rk Bu rkho ra S osa tiac 10 06 Bu tovo ugin r an IMS 8 1 ro er au m C ca s a xus eu PC nia ona fle thra sp i o w Er dom hlor ery bya eu C ium ng Ps Ly m es od ir ch T

http://www.biomedcentral.com/1471-2148/8/181

Pseudomonas syringae B728a Pseudomonas syringae str DC300 0 Pseudomonas syr ingae 1448A Delftia acido vo ra ns SPH 1 Comam Rhod onas testoste Polarooferax ferrired roni KF 1 mon as ucens Polaro T1 n Pse monas s aphthalenivo 18 rans C p JS6 Pse u d o m o J2 6 Pse udomo nas putid 6 Pse udom nas ae a W619 rugin Pse udom onas Ps udo ona aerugin osa UCB PPP osa S eud mon s ae A14 Xatenot omon as ae ruginos PAO1 Xa ntho roph as a rugin a PA o sa 7 Xa nth mo om eru gin PA X n t om n a o n X an ho on s c as m osa C CS2 alt 3 X a n t h o m o as a m St ant thommon nas cam pest ophil 719 ris ia p X e h a a Xa an not omo ona s o xono estri ATC R55 13 C nt tho rop na s o ryza po s st r 8 339 ho m ho s ry e di 13 m on m ory zae BLS s str 5 on as on za K 3 2 0 6 as ca as e M AC 56 ca mp ma AF C1 m es lto F 033 pe tr ph 31 1 1 st is i ris AT lia R 018 pv CC 55 ca 33 13 m 91 pe 3 st ris

BMC Evolutionary Biology 2008, 8:181

Fungi α-proteobacteria β-proteobacteria γ-proteobacteria Actinobacteria miscellaneous bacteria Archaea

Figure PhzF maximum 5 likelihood phylogeny PhzF maximum likelihood phylogeny. The optimum model of protein substitution was found to be WAG+G. The number of gamma rate categories was 4 (alpha = 0.873). Bootstrap resampling (100 iterations) was undertaken and are displayed. For display purposes branches with less than 50% support were collapsed. Branches are colored according to their taxonomy. Fungal branches are shown in green. The S. pombe PhzF homolog is highlighted with a red rectangle.

A small basidiomycete clade is evident amongst prokaryote species (Figure 5). Both Ustilago maydis and Malassezia globosa belong to the Ustilaginomycotina subphylum. Therefore our phylogeny infers that an ancestral Ustilaginomycotina species gained a PhzF gene from an unknown

bacterial source, and both species have retained this after speciation. A correspondence analysis of synonymous codon usage for all PhzF homologs was also performed and is shown

Page 9 of 15 (page number not for citation purposes)

BMC Evolutionary Biology 2008, 8:181

http://www.biomedcentral.com/1471-2148/8/181

in additional information [see additional file 8]. The S. pombe PhzF homolog has a codon usage pattern very similar to the D. hansenii protein. Gene order around PhzF Analysis of the genes adjacent to the PhzF homolog in C. parapsilosis shows that there is a high conservation of gene

synteny and supports our hypothesis that PhzF was recently acquired in this species (Figure 6). Homologs in the other CTG species are located in completely different regions of the genome relative to C. parapsilosis (not shown). For example, the C. albicans PhzF homolog is located between orf19.5619 and orf19.5621, whereas the C. parapsilosis homolog is found between orf19.6689 &

orf19.6689

orf19.6687

orf19.6686

orf19.6685

orf19.6684

C. albicans

C. albicans

C. albicans

C. albicans

C. albicans

Cd73320

Cd73200

Cd73190

Cd73180

Cd73170

C. dubliniensis

C. dubliniensis

C. dubliniensis

C. dubliniensis

C. dubliniensis

CTRG_05176

CTRG_05178

CTRG_05179

CTRG_05149

CTRG_05150

C. tropicalis

C. tropicalis

C. tropicalis

C. tropicalis

C. tropicalis

C146740g D. hansenii

C14630g D. hansenii

C14608 D. hansenii

C14586 D. hansenii

C14564 D. hansenii

CLUG_01663

CLUG_01662

CLUG_01661

CLUG_01660

CLUG_01659

Cl. lusitaniae

Cl. lusitaniae

Cl. lusitaniae

Cl. lusitaniae

Cl. lusitaniae

PGUG_05628 P. guillermondii

PGUG_05626 P. guillermondii

PGUG_05625 P. guillermondii

PGUG_05624 P. guillermondii

PGUG_05623 P. guillermondii

CPAG_03463 C. parapsilosis

CPAG_03464 C. parapsilosis

CPAG_03465 C. parapsilosis

CPAG_03466 C. parapsilosis

LELG_05171 L. elongisporus

LELG_05170 L. elongisporus

LELG_05169 L. elongisporus

LELG_05168 L. elongisporus

CPAG_03460 C. parapsilosis

LELG_05173 L. elongisporus

CPAG_03462 C. parapsilosis

Gene Figure order 6 around C. parapsilosis PhzF gene Gene order around C. parapsilosis PhzF gene. Species names and gene identifiers are shown in each box. Orthologous genes are stacked under one another in pillars. The C. parapsilosis PhzF homolog (CPAG_03462) is highlighted with a red box. Synteny relative to the C. parapsilosis PhzF homolog is conserved in all species. Other CTG PhzF homologs are found in completely different regions of the genome relative to C. parapsilosis. L. elongisporus is the only CTG species missing a PhzF gene, and there is no evidence for a pseudogene in the genome.

Page 10 of 15 (page number not for citation purposes)

BMC Evolutionary Biology 2008, 8:181

orf19.6687 relative to C. albicans SC5314 (Figure 6). However, the L. elongisporus genome contains no PhzF homolog, either at a position equivalent to the C. parapsilosis copy or elsewhere in the genome. We propose that the LCA of L. elongisporus and C. parapsilosis lost the PhzF gene present in the other CTG species, and a second (new) copy was subsequently gained by C. parapsilosis after speciation. We have partial sequence data (unpublished) from Candida orthopsilosis, a species so closely related to C. parapsilosis that it was once designated C. parapsilosis group II [56]. We located a C. orthopsilosis PR homolog that is 83% identical (at the amino acid level) to the C. parapsilosis copy. This implies that the common ancestor of C. parapsilosis and C. orthopsilosis acquired the bacterial PhzF homolog after speciation from L. elongisporus. Mechanisms of gene transfer into fungi are poorly understood. To date no DNA uptake mechanism has been identified in CTG species. Interkingdom conjugation between bacteria and yeast has been observed however [57-59]. Similarly, Saccharomyces cerevisiae has been shown to be transformant competent under certain conditions [60]. CTG species are known interact with bacteria in vivo [61], and it is therefore possible that interkingdom conjugation and transformation may facilitate DNA transfer in C. parapsilosis. These mechanisms may also be applicable to the Pezizomycotina species examined in this analysis.

Conclusion We investigated the frequency of recent interkingdom gene transfer between CTG and bacterial species. We located two strongly supported incidences of HGT, both within the C. parapsilosis genome. We also located independent transfers into the Pezizomycotina, Basidiomycotina and Protozoan lineages. We cannot determine the exact origin of the PR homolog (CPAG_02038) found in the C. parapsilosis genome. However, based on its phylogenetic position it either originated from a Burkholderia source, or more likely an organism not yet represented in the sequence databases. Our PR phylogenetic analysis also suggests there were two independent transfers into Pezizomycotina species, one from an Actinobacterial source, and the second is from an unknown proteobacterial source. There is also evidence that T. cruzi has obtained its PR homolog from a Firmicutes species. The transferred PR genes analyzed here belong to a superfamily of proline racemases, although we cannot determine their exact function in the fungal species examined. Their proximity to an amino acid transporter (in C. parapsilosis) and a FAD dependent oxidoreductase (in A. niger and G. zeae) suggests they do have a role in amino acid metabolism. Furthermore, evidence of

http://www.biomedcentral.com/1471-2148/8/181

multiple independent transfers into fungi suggests the protein does confer a biological advantage, although we cannot determine what is. The bacteria-derived PR gene has the potential to be a novel antifungal drug target as there would be no undesired host protein-drug interactions. The bacterial PhzF homolog (CPAG_03462) found in C. parapsilosis most likely originated from a proteobacterial source. Most CTG species examined contained PhzF homologs, with the exception of L. elongisporus. The crystal structure the PhzF homolog in S. cerevisiae has been determined and while its function remains unknown, it is not thought to be involved in phenazine production [62]. We postulate that the PhzF homolog present in other CTG species was initially lost by the ancestor of C. parapsilosis and L. elongisporus, but subsequently regained by C. parapsilosis through HGT. The loss of eukaryote genes and subsequent reacquisition of a prokaryotic copy has previously been described in yeast, and can confer specific metabolic capabilities. An analysis of the biotin biosynthesis pathway discovered that the ancestor of Candida, Debaryomyces, Kluyveromyces and Saccharomyces lost the majority of the pathway after the divergence from the ancestor of Y. lipolytica. However, Saccharomyces species have rebuilt the biotin pathway through gene duplication/neofunctionalization after horizontal gene transfer from α and γ proteobacterial sources [20]. The acquisition of the URA1 gene (encoding dihydroorotate dehydrogenase) from Lactobacillus and replacement of the endogenous gene in S. cerevisiae, allowed growth under anaerobic conditions [19]. Similarly, acquisition of BDS1 (alkyl-aryl-sulfatase) from proteobacteria may have enabled the survival of S. cerevisiae in a harsh soil environment [19]. Our PhzF phylogeny suggests that the PhzF homolog found in most CTG species originated from an ancient HGT event, from a member of the proteobacteria. Our analysis also shows that S. pombe has obtained a PhzF homolog from a CTG species, most likely one closely related to D. hansenii. There is also phylogenetic evidence showing that an ancestral Ustilaginomycotina species gained a PhzF gene from an unknown bacterial source. We cannot however, determine the biological advantage to the organisms. Although it was not the major goal of this study, we did locate HGT from bacteria into fungal genomes outside the CTG clade, and also inter-fungal transfers. In a previous analysis of HGT in diplomonads, fifteen genes were found to have undergone HGT [18]. There is phylogenetic evidence that these genes have undergone independent transfers into other eukaryotic lineages including Fungi. Therfore, in eukaryotes just as HGT has affected some species more than others [63], there may be groups of genes that are more likely to be taken up through HGT than others. We cannot test this directly however, as we have not

Page 11 of 15 (page number not for citation purposes)

BMC Evolutionary Biology 2008, 8:181

identified all cases of HGT from bacteria to fungi outside the CTG clade. Our analysis indicates that recent interkingdom gene transfer into extant CTG species is negligible. This supports a previous hypothesis that genetic code alterations blocks horizontal gene transfer [64]. It should be noted however that we searched for recent bacterial gene transfers into individual CTG species, and not for more ancient transfers. We took this approach because the presence of recently gained bacterial genes in a eukaryote genome should be readily detected compared to older transfers. Similarly, we have not investigated eukaryote-to-eukaryote transfers. It is therefore possible that we have underestimated the overall rate of HGT into the CTG lineage. The discovery of HGT in other fungal lineages implies that HGT plays an important role in fungal evolution and deserves further analysis. In particular a strategy which can detect ancient gene transfers would be meaningful.

http://www.biomedcentral.com/1471-2148/8/181

Blast based approach to detect potential horizontally transferred genes Taking one CTG species at a time, we located gene families of interest by comparing individual protein coding genes against the UniProt database (v11.1) using the BlastP algorithm [31] with a cutoff expectation (E) value of 10-20. To use all available sequence data, CTG proteins with a top database hit to a bacterial protein in UniProt were extracted for a second round of database searching against GenBank (E value of 10-20). Proteins which also had a top database hit to a bacterial protein in GenBank were considered as possible incidences of horizontal gene transfer. All putative homologs were extracted from GenBank and searched against the relevant CTG genome to ensure a reciprocal best Blast hit. For completeness, CTG proteins not yet deposited in GenBank were added to gene families of interest where appropriate.

Accession numbers for all sequences used in this analysis can be found in additional material [see additional file 2].

Methods Sequence data The complete C. albicans (SC5314) genome (Assembly 19) was obtained from the Candida genome database [65]. The Broad institute have sequenced and annotated five CTG species (C. albicans (WO-1), C. tropicalis, L. elongisporus, P. guilliermondii, and Cl. lusitaniae). These genomes were obtained directly from the Broad Institute [66]. Gene sets for the C. dubliniensis were downloaded from GeneDB [44].

The incomplete C. parapsilosis geneome was downloaded from the Sanger Institute [67]. Gene annotations were performed using two separate approaches. The first involved a reciprocal best BLAST [31] search with a cutoff E- value of 10-7 of Candida albicans SC5314 protein coding genes against the unannotated C. parapsilosis genome. Top BLAST hits longer than 300 nucleotides were retained as putative open reading frames. The second approach involved a pipeline of analysis that combined several different gene prediction programs, including ab initio programs SNAP [68], Genezilla [69], and AUGUSTUS [69], with gene models from Exonerate [70] and Genewise [71] based on alignments of proteins and Expressed Sequence Tags. Putative gene sets from both approaches were imported into Artemis [72] and cross corroborated manually. The resultant gene sets contained 5,823 protein-coding genes. The C. parapsilosis genome was also annotated by the Broad Institute, and where possible we have used the gene names they assigned. The UniProt database (v11.1) was downloaded [73]. Database searches against GenBank refer to release 164.0.

Phylogenetic methods Gene families were aligned using MUSCLE (v3.6) [74] using the default settings. Obvious alignment ambiguities were corrected manually.

Phylogenetic relationships were inferred using maximum likelihood methods. Appropriate protein models of substitution were selected for each gene family using ModelGenerator [75]. One hundred bootstrap replicates were then carried out with the appropriate protein model using the software program PHYML [76] and summarized using the majority-rule consensus method. We performed the approximately unbiased test of phylogenetic tree selection [53], to assess whether differences in topology between constrained and unconstrained gene trees are no greater than expected by chance. Codon usage analysis and spectral analysis To determine if the putative HGT genes had a different codon usage pattern to the host genome an analysis of variation in synonymous codon usage was undertaken using the GCUA software [77]. Individual correspondence analyses of raw codon counts for the Candida parapsilosis, Ustilago maydis, Malassezia globosa, Aspergillus flavus, Aspergillus niger, Gibberella zeae, Aspergillus oryzae, Phaeosphaeria nodorum, and Schizosaccharomyces pombe genomes were performed, with the first four principal axes being used to evaluate synonymous codon usage patterns. Similar analyses were also carried out on members of the proline racemase and phenazine F gene families displayed in Figures 2 and 4. We used spectrum [78] to perform a spectral analysis on a subset of the phenazine data.

Page 12 of 15 (page number not for citation purposes)

BMC Evolutionary Biology 2008, 8:181

Authors' contributions DAF, MEL and GB were involved in the design phase. MEL predicted genes in unannotated genomes. DAF sourced homologs, examined synteny and performed phylogenetic analyses. DAF and GB drafted the manuscript. All authors read and approved the final manuscript.

Additional material

http://www.biomedcentral.com/1471-2148/8/181

Additional file 5 Correspondence analysis of codon usage in the proline racemase gene family analyzed in this study. Major groups are color-coded. The C. parapsilosis PR gene has a codon usage pattern distinct from other fungal species in this analysis. It is also quite distinct from the Burkholderia (βproteobacteria) species, which were found to be its phylogenetic neighbors. Click here for file [http://www.biomedcentral.com/content/supplementary/14712148-8-181-S5.eps]

Additional file 1

Additional file 6

Examples of reported incidences of interkingdom gene transfer between prokaryotes and fungi. One Kluyveromyces lactis gene (KLLA0D19949g) previously highlighted [22], been omitted as it is no longer recognized as an ORF. Y. lipolytica genes denoted with a * and ^ indicate possible gene duplications after HGT. Click here for file [http://www.biomedcentral.com/content/supplementary/14712148-8-181-S1.doc]

GenBank accession numbers for PR (A) and PhzF (B) sequences used in this analysis. Species identified with an * use the accession numbers created by the Broad Institute [66] or the Wellcome Trust Sanger Institute [67]. Click here for file [http://www.biomedcentral.com/content/supplementary/14712148-8-181-S2.doc]

Trees for approximately unbiased test for PhzF homologs. Tree A is the original unconstrained topology, which groups C. parapsilosis with proteobacteria. Topology B is a constrained tree that places S. pombe outside the Saccharomycotina clade. Topologies C-H are constrained and place C. parapsilosis amongst the other Saccharomycotina species. Log likelihood scores for each tree are given. To assess the likelihood that any differences in topology between the inferred trees is no more significant that that expected by chance, we performed the approximately unbiased test. The AU test shows that the unconstrained tree receives the optimal likelihood tree score. Furthermore, the differences in likelihood scores when compared to the constrained trees are significant (P < 0.05). Therefore based on these results the placement of the C. parapsilosis homolog in the proteobacterial clade to the exclusion of the Saccharomycotina and S. pombe within the Saccharomycotina clade is significant. Click here for file [http://www.biomedcentral.com/content/supplementary/14712148-8-181-S6.eps]

Additional file 3

Additional file 7

Gene order around Pezizomycotina proline racemase genes. Species names and identifiers are shown in each box. PR genes are labeled. Gene identifiers relate to annotations from the Broad Institute. Clade letters in parentheses correspond to those in Figure 2. There is evidence for conserved gene synteny between some species such as A. oryzae and A. flavus (clade-C). A. flavus/A. terreus and N. fischeri/A. fumigatus in the Metazoan/Pezizomycotina clade (B). The A. flavus gene denoted by a * is absent from the Broad gene set but we were able to locate it with a BlastX search. Click here for file [http://www.biomedcentral.com/content/supplementary/14712148-8-181-S3.eps]

PhzF spectral analysis. Analysis was performed on the Saccharomycotina and selected proteobacterial clade. Bars above the x-axis represent frequency of support for each split. Bars below the x-axis represent the sum of all corresponding conflicts. Clad grams above columns represent the corresponding splits in the data. There is no support for the placement of C. parapsilosis with the other Saccharomycotina species. Click here for file [http://www.biomedcentral.com/content/supplementary/14712148-8-181-S7.eps]

Additional file 2

Additional file 4 Correspondence analysis of codon usage. Correspondence analysis of codon usage in the C. parapsilosis (1), U. maydis (2), M. globosa (3), A. flavus (4), A. niger (5), G. zeae (6), A. oryzae (7), P. nodorum (8), and S. pombe (9) genomes. Transferred genes are highlighted. All have a codon usage similar to the rest of their genomes which is unsurprising as transferred genes have been shown to ameliorate their codon usage to their hosts [79]. Click here for file [http://www.biomedcentral.com/content/supplementary/14712148-8-181-S4.pdf]

Additional file 8 Correspondence analysis of codon usage in the PhzF gene family analyzed in this study. Major groups are color-coded. The C. parapsilosis PhzF gene has a codon usage pattern similar to other CTG species analyzed. It is quite distinct from the proteobacterial species that were found to be its phylogenetic neighbors. Click here for file [http://www.biomedcentral.com/content/supplementary/14712148-8-181-S8.eps]

Acknowledgements The authors wish to acknowledge the Wellcome Trust Sanger Institute and Broad institute of MIT & Harvard for releasing data ahead of publication. We would like to acknowledge the financial support of the Irish Research Council for Science, Engineering and Technology (IRCSET), the Irish Health Research Board (HRB) and Science Foundation Ireland (SFI). We wish to acknowledge the SFI/HEA Irish Centre for High-End Computing (ICHEC) for the provision of computational facilities and support. We thank Mike Lorenz and Paul Dyer for fungal strains. We also thank Jason Stajich for help with gene annotations.

Page 13 of 15 (page number not for citation purposes)

BMC Evolutionary Biology 2008, 8:181

http://www.biomedcentral.com/1471-2148/8/181

References 1. 2. 3. 4.

5.

6. 7. 8. 9.

10. 11.

12. 13.

14. 15. 16.

17.

18.

19. 20.

21. 22.

Doolittle WF: Lateral genomics. Trends Cell Biol 1999, 9(12):M5-8. Jain R, Rivera MC, Moore JE, Lake JA: Horizontal gene transfer accelerates genome innovation and evolution. Mol Biol Evol 2003, 20(10):1598-1602. Eisen JA: Assessing evolutionary relationships among microbes from whole-genome analysis. Curr Opin Microbiol 2000, 3(5):475-480. Woo PC, To AP, Lau SK, Yuen KY: Facilitation of horizontal transfer of antimicrobial resistance by transformation of antibiotic-induced cell-wall-deficient bacteria. Med Hypotheses 2003, 61(4):503-508. Martin K, Morlin G, Smith A, Nordyke A, Eisenstark A, Golomb M: The tryptophanase gene cluster of Haemophilus influenzae type b: evidence for horizontal gene transfer. J Bacteriol 1998, 180(1):107-118. Kurland CG, Canback B, Berg OG: Horizontal gene transfer: a critical view. Proc Natl Acad Sci U S A 2003, 100(17):9658-9662. Andersson JO: Lateral gene transfer in eukaryotes. Cell Mol Life Sci 2005, 62(11):1182-1197. Dujon B: Hemiascomycetous yeasts at the forefront of comparative genomics. Curr Opin Genet Dev 2005, 15(6):614-620. Friesen TL, Stukenbrock EH, Liu Z, Meinhardt S, Ling H, Faris JD, Rasmussen JB, Solomon PS, McDonald BA, Oliver RP: Emergence of a new disease as a result of interspecific virulence gene transfer. Nat Genet 2006, 38(8):953-956. Inderbitzin P, Harkness J, Turgeon BG, Berbee ML: Lateral transfer of mating system in Stemphylium. Proc Natl Acad Sci U S A 2005, 102(32):11390-11395. Kavanaugh LA, Fraser JA, Dietrich FS: Recent evolution of the human pathogen Cryptococcus neoformans by intervarietal transfer of a 14-gene fragment. Mol Biol Evol 2006, 23(10):1879-1890. Khaldi N, Collemare J, Lebrun MH, Wolfe KH: Evidence for horizontal transfer of a secondary metabolite gene cluster between fungi. Genome Biol 2008, 9(1):R18. Paoletti M, Buck KW, Brasier CM: Selective acquisition of novel mating type and vegetative incompatibility genes via interspecies gene transfer in the globally invading eukaryote Ophiostoma novo-ulmi. Mol Ecol 2006, 15(1):249-262. Slot JC, Hallstrom KN, Matheny PB, Hibbett DS: Diversification of NRT2 and the origin of its fungal homolog. Mol Biol Evol 2007, 24(8):1731-1743. Slot JC, Hibbett DS: Horizontal transfer of a nitrate assimilation gene cluster and ecological transitions in fungi: a phylogenetic study. PLoS ONE 2007, 2(10):e1097. Waller RF, Slamovits CH, Keeling PJ: Lateral gene transfer of a multigene region from cyanobacteria to dinoflagellates resulting in a novel plastid-targeted fusion protein. Mol Biol Evol 2006, 23(7):1437-1443. Wei W, McCusker JH, Hyman RW, Jones T, Ning Y, Cao Z, Gu Z, Bruno D, Miranda M, Nguyen M, Wilhelmy J, Komp C, Tamse R, Wang X, Jia P, Luedi P, Oefner PJ, David L, Dietrich FS, Li Y, Davis RW, Steinmetz LM: Genome sequencing and comparative analysis of Saccharomyces cerevisiae strain YJM789. Proc Natl Acad Sci U S A 2007, 104(31):12825-12830. Andersson JO, Sjogren AM, Davis LA, Embley TM, Roger AJ: Phylogenetic analyses of diplomonad genes reveal frequent lateral gene transfers affecting eukaryotes. Curr Biol 2003, 13(2):94-104. Hall C, Brachat S, Dietrich FS: Contribution of horizontal gene transfer to the evolution of Saccharomyces cerevisiae. Eukaryot Cell 2005, 4(6):1102-1115. Hall C, Dietrich FS: The Reacquisition of Biotin Prototrophy in Saccharomyces cerevisiae Involved Horizontal Gene Transfer, Gene Duplication and Gene Clustering. Genetics 2007, 177(4):2293-2307. Woolfit M, Rozpedowska E, Piskur J, Wolfe KH: Genome survey sequencing of the wine spoilage yeast Dekkera (Brettanomyces) bruxellensis. Eukaryot Cell 2007, 6(4):721-733. Dujon B, Sherman D, Fischer G, Durrens P, Casaregola S, Lafontaine I, De Montigny J, Marck C, Neuveglise C, Talla E, Goffard N, Frangeul L, Aigle M, Anthouard V, Babour A, Barbe V, Barnay S, Blanchin S, Beckerich JM, Beyne E, Bleykasten C, Boisrame A, Boyer J, Cattolico L, Confanioleri F, De Daruvar A, Despons L, Fabre E, Fairhead C, Ferry-Dumazet H, Groppi A, Hantraye F, Hennequin C, Jauniaux N,

23.

24. 25.

26. 27. 28. 29. 30. 31.

32. 33.

34. 35. 36. 37.

38.

39. 40. 41.

Joyet P, Kachouri R, Kerrest A, Koszul R, Lemaire M, Lesur I, Ma L, Muller H, Nicaud JM, Nikolski M, Oztas S, Ozier-Kalogeropoulos O, Pellenz S, Potier S, Richard GF, Straub ML, Suleau A, Swennen D, Tekaia F, Wesolowski-Louvel M, Westhof E, Wirth B, Zeniou-Meyer M, Zivanovic I, Bolotin-Fukuhara M, Thierry A, Bouchier C, Caudron B, Scarpelli C, Gaillardin C, Weissenbach J, Wincker P, Souciet JL: Genome evolution in yeasts. Nature 2004, 430(6995):35-44. Gojkovic Z, Knecht W, Zameitat E, Warneboldt J, Coutelis JB, Pynyaha Y, Neuveglise C, Moller K, Loffler M, Piskur J: Horizontal gene transfer promoted evolution of the ability to propagate under anaerobic conditions in yeasts. Mol Genet Genomics 2004, 271(4):387-393. Brinkman FS, Macfarlane EL, Warrener P, Hancock RE: Evolutionary relationships among virulence-associated histidine kinases. Infect Immun 2001, 69(8):5207-5211. Temporini ED, VanEtten HD: An analysis of the phylogenetic distribution of the pea pathogenicity genes of Nectria haematococca MPVI supports the hypothesis of their origin by horizontal transfer and uncovers a potentially new pathogen of garden pea: Neocosmospora boniensis. Curr Genet 2004, 46(1):29-36. Wenzl P, Wong L, Kwang-won K, Jefferson RA: A functional screen identifies lateral transfer of beta-glucuronidase (gus) from bacteria to fungi. Mol Biol Evol 2005, 22(2):308-316. Garcia-Vallve S, Romeu A, Palau J: Horizontal gene transfer of glycosyl hydrolases of the rumen fungi. Mol Biol Evol 2000, 17(3):352-361. Klotz MG, Klassen GR, Loewen PC: Phylogenetic relationships among prokaryotic and eukaryotic catalases. Mol Biol Evol 1997, 14(9):951-958. Fitzpatrick DA, Logue ME, Stajich JE, Butler G: A Fungal phylogeny based on 42 complete genomes derived from supertree and combined gene analysis. BMC Evol Biol 2006, 6:99. Sugita T, Nakase T: Non-universal usage of the leucine CUG codon and the molecular phylogeny of the genus Candida. Syst Appl Microbiol 1999, 22(1):79-86. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25(17):3389-3402. Cardinale GJ, Abeles RH: Purification and mechanism of action of proline racemase. Biochemistry 1968, 7(11):3970-3978. Logue ME, Wong S, Wolfe KH, Butler G: A genome sequence survey shows that the pathogenic yeast Candida parapsilosis has a defective MTLa1 allele at its mating type locus. Eukaryot Cell 2005, 4(6):1009-1017. Lartillot N, Brinkmann H, Philippe H: Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model. BMC Evol Biol 2007, 7 Suppl 1:S4. Langley R, Kenna DT, Vandamme P, Ure R, Govan JR: Lysogeny and bacteriophage host range within the Burkholderia cepacia complex. J Med Microbiol 2003, 52(Pt 6):483-490. Eberl L, Tummler B: Pseudomonas aeruginosa and Burkholderia cepacia in cystic fibrosis: genome evolution, interactions and adaptation. Int J Med Microbiol 2004, 294(2-3):123-131. Tuanyok A, Auerbach RK, Brettin TS, Bruce DC, Munk AC, Detter JC, Pearson T, Hornstra H, Sermswan RW, Wuthiekanun V, Peacock SJ, Currie BJ, Keim P, Wagner DM: A horizontal gene transfer event defines two distinct groups within Burkholderia pseudomallei that have dissimilar geographic distributions. J Bacteriol 2007. Buschiazzo A, Goytia M, Schaeffer F, Degrave W, Shepard W, Gregoire C, Chamond N, Cosson A, Berneman A, Coatnoan N, Alzari PM, Minoprio P: Crystal structure, catalytic mechanism, and mitogenic properties of Trypanosoma cruzi proline racemase. Proc Natl Acad Sci U S A 2006, 103(6):1705-1710. Lamzin VS, Dauter Z, Wilson KS: How nature deals with stereoisomers. Curr Opin Struct Biol 1995, 5(6):830-836. Fisher GH: Appearance of D-amino acids during aging: Damino acids in tumor proteins. Exs 1998, 85:109-118. Chamond N, Gregoire C, Coatnoan N, Rougeot C, Freitas-Junior LH, da Silveira JF, Degrave WM, Minoprio P: Biochemical characterization of proline racemases from the human protozoan parasite Trypanosoma cruzi and definition of putative protein signatures. J Biol Chem 2003, 278(18):15484-15494.

Page 14 of 15 (page number not for citation purposes)

BMC Evolutionary Biology 2008, 8:181

42.

43.

44. 45. 46. 47. 48. 49. 50. 51.

52.

53. 54. 55. 56. 57. 58.

59.

60. 61. 62.

63. 64.

65.

Wolosker H, Blackshaw S, Snyder SH: Serine racemase: a glial enzyme synthesizing D-serine to regulate glutamate-Nmethyl-D-aspartate neurotransmission. Proc Natl Acad Sci U S A 1999, 96(23):13409-13414. Reina-San-Martin B, Degrave W, Rougeot C, Cosson A, Chamond N, Cordeiro-Da-Silva A, Arala-Chaves M, Coutinho A, Minoprio P: A Bcell mitogen from a pathogenic trypanosome is a eukaryotic proline racemase. Nat Med 2000, 6(8):890-897. GeneDB [http://www.genedb.org] Stevens JR, Gibson WC: The evolution of pathogenic trypanosomes. Cad Saude Publica 1999, 15(4):673-684. Fischer G, James SA, Roberts IN, Oliver SG, Louis EJ: Chromosomal evolution in Saccharomyces. Nature 2000, 405(6785):451-454. Pfam [http://pfam.sanger.ac.uk] Medigue C, Rouxel T, Vigier P, Henaut A, Danchin A: Evidence for horizontal gene transfer in Escherichia coli speciation. J Mol Biol 1991, 222(4):851-856. Ochman H, Lawrence JG, Groisman EA: Lateral gene transfer and the nature of bacterial innovation. In Nature Volume 405. Issue 6784 ENGLAND ; 2000:299-304. Rudnick G, Abeles RH: Reaction mechanism and structure of the active site of proline racemase. Biochemistry 1975, 14(20):4515-4522. Blankenfeldt W, Kuzin AP, Skarina T, Korniyenko Y, Tong L, Bayer P, Janning P, Thomashow LS, Mavrodi DV: Structure and function of the phenazine biosynthetic protein PhzF from Pseudomonas fluorescens. Proc Natl Acad Sci U S A 2004, 101(47):16431-16436. Parsons JF, Song F, Parsons L, Calabrese K, Eisenstein E, Ladner JE: Structure and function of the phenazine biosynthesis protein PhzF from Pseudomonas fluorescens 2-79. Biochemistry 2004, 43(39):12427-12435. Shimodaira H: An approximately unbiased test of phylogenetic tree selection. Syst Biol 2002, 51(3):492-508. The Schizosaccharomyces group at the Broad Institute [http://www.broad.mit.edu/annotation/genome/ schizosaccharomyces_group/MultiHome.html] Bullerwell CE, Leigh J, Forget L, Lang BF: A comparison of three fission yeast mitochondrial genomes. Nucleic Acids Res 2003, 31(2):759-768. Lin D, Wu LC, Rinaldi MG, Lehmann PF: Three distinct genotypes within Candida parapsilosis from clinical sources. J Clin Microbiol 1995, 33(7):1815-1821. Heinemann JA, Sprague GF Jr.: Bacterial conjugative plasmids mobilize DNA transfer between bacteria and yeast. Nature 1989, 340(6230):205-209. Inomata K, Nishikawa M, Yoshida K: The yeast Saccharomyces kluyveri as a recipient eukaryote in transkingdom conjugation: behavior of transmitted plasmids in transconjugants. J Bacteriol 1994, 176(15):4770-4773. Sawasaki Y, Inomata K, Yoshida K: Trans-kingdom conjugation between Agrobacterium tumefaciens and Saccharomyces cerevisiae, a bacterium and a yeast. Plant Cell Physiol 1996, 37(1):103-106. Nevoigt E, Fassbender A, Stahl U: Cells of the yeast Saccharomyces cerevisiae are transformable by DNA under non-artificial conditions. Yeast 2000, 16(12):1107-1110. Hogan DA, Kolter R: Pseudomonas-Candida interactions: an ecological role for virulence factors. Science 2002, 296(5576):2229-2232. Liger D, Quevillon-Cheruel S, Sorel I, Bremang M, Blondeau K, Aboulfath I, Janin J, van Tilbeurgh H, Leulliot N: Crystal structure of YHI9, the yeast member of the phenazine biosynthesis PhzF enzyme superfamily. Proteins 2005, 60(4):778-786. Keeling PJ, Burger G, Durnford DG, Lang BF, Lee RW, Pearlman RE, Roger AJ, Gray MW: The tree of eukaryotes. Trends Ecol Evol 2005, 20(12):670-676. Silva RM, Paredes JA, Moura GR, Manadas B, Lima-Costa T, Rocha R, Miranda I, Gomes AC, Koerkamp MJ, Perrot M, Holstege FC, Boucherie H, Santos MA: Critical roles for a genetic code alteration in the evolution of the genus Candida. Embo J 2007, 26(21):4555-4565. The Candida Genome Database [http://www.candidagen ome.org]

http://www.biomedcentral.com/1471-2148/8/181

66. 67. 68. 69. 70. 71. 72. 73. 74. 75.

76. 77. 78. 79.

The Candida group at the Broad Institute [http:www.broad.mit.edu/annotation/genome/candida_group/Multi Home.html] The Wellcome Trust Sanger Institute [http:// www.sanger.ac.uk/] Korf I: Gene finding in novel genomes. BMC Bioinformatics 2004, 5:59. Majoros WH, Pertea M, Salzberg SL: TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics 2004, 20(16):2878-2879. Slater GS, Birney E: Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 2005, 6(1):31. Birney E, Clamp M, Durbin R: GeneWise and Genomewise. Genome Res 2004, 14(5):988-995. Rutherford K, Parkhill J, Crook J, Horsnell T, Rice P, Rajandream MA, Barrell B: Artemis: sequence visualization and annotation. Bioinformatics 2000, 16(10):944-945. The UniProt database [ftp://ftp.uniprot.org/pub/databases/] Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 2004, 32(5):1792-1797. Keane TM, Creevey CJ, Pentony MM, Naughton TJ, McLnerney JO: Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified. BMC Evol Biol 2006, 6:29. Guindon S, Gascuel O: A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 2003, 52(5):696-704. McInerney JO: GCUA: general codon usage analysis. Bioinformatics 1998, 14(4):372-373. Charleston MA: Spectrum: spectral analysis of phylogenetic data. Bioinformatics (Oxford, England) 1998, 14(1):98-99. Lawrence JG, Ochman H: Amelioration of bacterial genomes: rates of change and exchange. J Mol Evol 1997, 44(4):383-397.

Publish with Bio Med Central and every scientist can read your work free of charge "BioMed Central will be the most significant development for disseminating the results of biomedical researc h in our lifetime." Sir Paul Nurse, Cancer Research UK

Your research papers will be: available free of charge to the entire biomedical community peer reviewed and published immediately upon acceptance cited in PubMed and archived on PubMed Central yours — you keep the copyright

BioMedcentral

Submit your manuscript here: http://www.biomedcentral.com/info/publishing_adv.asp

Page 15 of 15 (page number not for citation purposes)