Evidence for horizontal gene transfer of anaerobic carbon ... - Core

8 downloads 0 Views 11MB Size Report
Apr 17, 2012 - Ruminococcus spp. to constituents of the human gut microbiome such as Campylobacter ...... epidemic.type.X.CooS-1. Ruminococcus.obeum.
ORIGINAL RESEARCH ARTICLE published: 17 April 2012 doi: 10.3389/fmicb.2012.00132

Evidence for horizontal gene transfer of anaerobic carbon monoxide dehydrogenases Stephen M. Techtmann 1,2 , Alexander V. Lebedinsky 3 , Albert S. Colman 4 , Tatyana G. Sokolova 3 , Tanja Woyke 5 , Lynne Goodwin 5 and Frank T. Robb 1,2 * 1

Institute of Marine and Environmental Technology, University of Maryland, Baltimore, MD, USA Department of Microbiology and Immunology, University of Maryland, Baltimore, MD, USA 3 Winogradsky Institute of Microbiology, Russian Academy of Sciences, Moscow, Russia 4 Department of the Geophysical Sciences, University of Chicago, Chicago, IL, USA 5 DOE Joint Genome Institute, Walnut Creek, CA, USA 2

Edited by: Martin G. Klotz, University of North Carolina at Charlotte, USA Reviewed by: Annette Summers Engel, University of Tennessee, USA Viktoria Shcherbakova, Institute of Biochemistry and Physiology of Microorganisms, Russian Academy of Sciences, Russia *Correspondence: Frank T. Robb, Institute of Marine and Environmental Technology, 701 East Pratt Street, Baltimore, MD 21202, USA. e-mail: [email protected]

Carbon monoxide (CO) is commonly known as a toxic gas, yet both cultivation studies and emerging genome sequences of bacteria and archaea establish that CO is a widely utilized microbial growth substrate. In this study, we determined the prevalence of anaerobic carbon monoxide dehydrogenases ([Ni,Fe]-CODHs) in currently available genomic sequence databases. Currently, 185 out of 2887, or 6% of sequenced bacterial and archaeal genomes possess at least one gene encoding [Ni,Fe]-CODH, the key enzyme for anaerobic CO utilization. Many genomes encode multiple copies of [Ni,Fe]-CODH genes whose functions and regulation are correlated with their associated gene clusters. The phylogenetic analysis of this extended protein family revealed six distinct clades; many clades consisted of [Ni,Fe]CODHs that were encoded by microbes from disparate phylogenetic lineages, based on 16S rRNA sequences, and widely ranging physiology.To more clearly define if the branching patterns observed in the [Ni,Fe]-CODH trees are due to functional conservation vs. evolutionary lineage, the genomic context of the [Ni,Fe]-CODH gene clusters was examined, and superimposed on the phylogenetic trees. On the whole, there was a correlation between genomic contexts and the tree topology, but several functionally similar [Ni,Fe]-CODHs were found in different clades. In addition, some distantly related organisms have similar [Ni,Fe]-CODH genes. Thermosinus carboxydivorans was used to observe horizontal gene transfer (HGT) of [Ni,Fe]-CODH gene clusters by applying Kullback–Leibler divergence analysis methods. Divergent tetranucleotide frequency and codon usage showed that the gene cluster of T. carboxydivorans that encodes a [Ni,Fe]-CODH and an energyconverting hydrogenase is dissimilar to its whole genome but is similar to the genome of the phylogenetically distant Firmicute, Carboxydothermus hydrogenoformans. These results imply that T carboxydivorans acquired this gene cluster via HGT from a relative of C. hydrogenoformans. Keywords: carbon monoxide, thermophiles, hydrogenogens, carboxydotrophs, Thermosinus carboxydivorans, Carboxydothermus hydrogenoformans, carbon monoxide dehydrogenase

INTRODUCTION Carbon monoxide (CO) is most commonly known as a potent human toxin. Alternatively, CO can provide an energy and carbon source in both anaerobic and aerobic microbes (Uffen, 1976; King and Weber, 2007; Sokolova et al., 2009; Techtmann et al., 2009). This ability to utilize CO relies upon enzymes known as carbon monoxide dehydrogenases (CODHs, EC 1.2.99.2; Ferry, 1995; Seravalli et al., 1995, 1997; Ragsdale, 2008). CO oxidation is performed by aerobic and anaerobic species using CODHs in different classes (Ragsdale, 2004; King and Weber, 2007). While the overall reaction catalyzed by both aerobic and anaerobic species is the same (CO + H2 O → CO2 + 2H+ + 2e− ), major differences result from the different terminal electron acceptors of the pathway. For aerobic species this reaction is catalyzed by the aerobic CODH (CoxSML) complex (Hugendieck and Meyer, 1992;

www.frontiersin.org

Schubel et al., 1995). The Cox-type CODH contains a highly conserved Mo based active site (Meyer and Schlegel, 1983; Meyer and Rajagopalan, 1984; Meyer et al., 1993; Dobbek et al., 1999; Gnida et al., 2003). In the case of the aerobic Cox-type CODH, the electrons generated from oxidation of CO are transferred to oxygen or, in some cases, nitrate as the final electron acceptor (King, 2003, 2006; King and Weber, 2007). Anaerobic CODHs are distinct from Cox CODHs and have a Ni–Fe active site (Dobbek et al., 2001, 2004; Svetlitchnyi et al., 2001, 2004). The anaerobic CODH generates electrons from the oxidation of CO and transfers them to a variety of acceptors, allowing CO to be the source of reducing equivalents for various pathways including sulfate reduction, acetogenesis, methanogenesis, hydrogenogenesis, and metal reduction, and it can also reduce CO2 to CO used further for acetate synthesis by the Wood–Ljungdahl pathway (Gonzalez and

April 2012 | Volume 3 | Article 132 | 1

Techtmann et al.

Robb, 2000; Svetlitchnyi et al., 2001, 2004; Ragsdale, 2004, 2008; Wu et al., 2005; Sokolova et al., 2009). The extreme thermophile Carboxydothermus hydrogenoformans is a prototype anaerobic carboxydotroph with five distinct anaerobic CODHs. C. hydrogenoformans can grow efficiently on CO as sole carbon and energy source, confirming that CO can fuel divergent pathways (Wu et al., 2005). The presence of multiple anaerobic CODHs encoded by a single organism requires that these homologs arose either by duplication of an ancestral CODH gene in the same lineage or else have been acquired by horizontal gene transfer (HGT) between separate lineages. In this study we identified [Ni,Fe]-CODH genes in the rapidly expanding database of microbial genomes in order to understand the mechanisms of evolution and dispersion of [Ni,Fe]CODHs. A comprehensive phylogenetic analysis was designed to observe if CODHs are found in distinct clades. Examination of the genomic context of each of the [Ni,Fe]-CODHs provided insights into the functions of [Ni,Fe]-CODH gene clusters. We then used these functional assignments to determine whether differences in inferred function of [Ni,Fe]-CODHs explain the pattern of phylogenetic divergence. In addition, in-depth statistical analysis was performed on two gene clusters that catalyze similar functions. By examining operons with the same function we control for differences that have arisen based on functional diversification. The two gene clusters examined were from the prototype thermophilic hydrogenogen – C. hydrogenoformans – and a thermophilic metal reducing hydrogenogen – Thermosinus carboxydivorans. These organisms were chosen based on the following criteria. (1) They are both thermophilic, thus controlling for selection due to adaptations to differing growth temperatures. (2) C. hydrogenoformans and T. carboxydivorans exhibit different types of cell wall structure and are members of widely divergent classes of the Phylum Firmicutes: the Clostridia and Negativicutes, respectively. (3) Despite their divergent lineages, their [Ni,Fe]-CODHs and the linked energy-converting hydrogenases (ECH) are very similar. The genomes of these bacteria were analyzed to determine whether the [Ni,Fe]-CODH–ECH gene cluster may have been subject to recent HGT.

MATERIALS AND METHODS DATABASE SEARCHING FOR [Ni,Fe]-CODH SEQUENCES

The US Department of Energy’s Integrated Microbial Genome (IMG) database was searched using BLASTp to query the protein database (Altschul et al., 1990) for sequences corresponding to [Ni,Fe]-CODHs. [Ni,Fe]-CODHs are subdivided into two types: the type called CooS, which is more frequent in bacteria, and the type called Cdh, occurring almost exclusively in archaea. The ligands of the Ni,Fe-containing active site (Dobbek’s C cluster; Dobbek et al., 2001) are conserved, with few exceptions, in CooS- and Cdh-type CODHs (Lindahl, 2002). However, Cdh-type CODHs harbor two additional [Fe4 S4 ] clusters (Gencic et al., 2010), and, on the whole, there is rather low homology between CooS- and Cdh-type CODHs. Therefore, the IMG database was searched using a CooS-type sequence (CooS-I from C. hydrogenoformans) and a Cdh-type sequence (Cdh-1 from the archaeon Archeoglobus fulgidus). Overlapping results and low scoring hits (Quality scores less than 200) were removed from the

Frontiers in Microbiology | Evolutionary and Genomic Microbiology

Anaerobic carbon monoxide dehydrogenase HGT

final [Ni,Fe]-CODH database (Table S3 in Supplemental Material). Many of the low scoring hits were often genes annotated as hydroxylamine reductases. This refined database was then used for further analysis. PHYLOGENETIC TREE CONSTRUCTION

The [Ni,Fe]-CODH sequences were aligned using the MUSCLE alignment program (Edgar, 2004a,b). This multiple sequence alignment was used to construct phylogenetic trees using Maximum Likelihood (ML) and Bayesian inference (BI). ML trees were constructed using the RAxML-HPC2 program (version 7.2.8; Stamatakis, 2006; Stamatakis et al., 2008) on the CIPRES servers (Miller et al., 2010). ML trees were constructed using the WAG + Γ + I model which was selected as the best-fit model for our database by the model selection tool implemented in Topali2 (Milne et al., 2009). Node support was assessed by 1,000 bootstrap replicates with the same model. BI trees were constructed using the BEAST program (Drummond and Rambaut, 2007) with the WAG + Γ + I model as was selected as the best-fit model by the Topali2 package. Searches were run with four chains of 7,500,000 generations, for which the first 750,000 were discarded as “burnin.” Trees were sampled every 1,000 generations. Stabilization of chain parameters was determined using the program TRACER (Rambaut and Aj, 2007). A maximum credibility clade tree was annotated using the TreeAnnotator program (part of the BEAST package). Trees were drawn using the FigTree program1 . Subtrees were drawn of each of the clades and are shown in (Figure A2 in Appendix). CONSTRUCTION OF FUNCTIONAL TREES

The analysis of the genomic contexts was done using the information available at Gene Detail pages of the IMG database and, where necessary, applying tblastn with appropriate queries at the IMG website2 . This information was used to identify whether the [Ni,Fe]-CODH gene was located (1) within an ACS gene cluster, (2) adjacent to an ECH gene cluster, (3) located at a distance less than 3 kb away from a gene encoding CooF, a ferredoxin-like FeS protein (Kerby et al., 1992) which carries out electron transfer from CooS to a variety of electron acceptors, or (4) not clustered with a cooF gene. GENOMIC SEQUENCING OF THERMOSINUS CARBOXYDIVORANS

Thermosinus carboxydivorans was grown as previously described (Sokolova et al., 2004). DNA was extracted using previously described protocols (Wu et al., 2005). The genome of T. carboxydivorans Nor1 was sequenced at the Joint Genome Institute (JGI) using a combination of 3, 8, and 40 kb (fosmid) DNA libraries. In addition to Sanger sequencing, 454 pyrosequencing was done to a depth of 20× coverage. All general aspects of library construction and sequencing performed at the JGI can be found at http://www.jgi.doe.gov/. Draft assemblies were based on 28,812 total reads. All three libraries provided 9.3× coverage of the genome. The 1 http://tree.bio.edu.ac.uk/software/figrtee/ 2 http://img.jgi.doe.gov/

April 2012 | Volume 3 | Article 132 | 2

Techtmann et al.

Anaerobic carbon monoxide dehydrogenase HGT

Phred/Phrap/Consed software package3 was used for sequence assembly and quality assessment (Ewing and Green, 1998; Ewing et al., 1998; Gordon et al., 1998). After the shotgun stage, reads were assembled with parallel Phrap (High Performance Software, LLC). Possible mis-assemblies were corrected with Dupfinisher (Han and Chain, 2006) or transposon bombing of bridging clones (Epicentre Biotechnologies, Madison, WI, USA). Gaps between contigs were closed by editing in Consed, custom primer walk or PCR amplification (Roche Applied Science, Indianapolis, IN, USA). A total of 5919 additional reactions were necessary to close gaps and to elevate the quality of the finished sequence. The completed genome sequences of T. carboxydivorans Nor1 contains 36,788 reads, achieving an average of 10-fold sequence coverage per base with an error rate less than 1 in 100,000. The draft genome sequence was deposited into GenBank (Accession Numbers AAWL01000001–AAWL01000049). HORIZONTAL GENE TRANSFER ANALYSES OF THE [Ni,Fe]-CODH–ECH GENE CLUSTER

The nucleotide sequences for the [Ni,Fe]-CODH–ECH gene cluster were extracted (including genes of all hydrogenase subunits) from the C. hydrogenoformans and the T. carboxydivorans genomes. These sequences and the whole genome sequences for both organisms were used for further analysis. Additionally, the Escherichia coli K12 genome (GenBank Accession Number U00096.2) was used as a control. G + C content was determined using GC-Profile program (Gao and Zhang, 2006). Tetranucleotide frequency was determined using the TETRA program (Teeling et al., 2004). Codon usage was determined from tables in the codon usage database4 . Overall codon usage for the 10 genes of the [Ni,Fe]-CODH-ECH gene cluster was determined by averaging the codon frequency of each gene in the gene cluster. The Kullback–Leibler (K–L) divergence metric (Kullback and Leibler, 1951) was used to determine whether the differences in G + C content, tetranucleotide frequency, and codon usage were significant or not. These three metrics and the K–L divergence were chosen based on a recent paper that suggests that codon usage and tetranucleotide frequency combined with K–L divergence are the most accurate metrics for determining HGT (Becq et al., 2010). K–L divergence was determined using the following formula:    g (i) DKL g ||G = g (i) ln G (i) i

where g is the a parameter (e.g., frequency of a particular tetranucleotide or frequency of usage of a particular codon) for the gene or operon and G is that same parameter for the whole genome. This calculation is repeated for all of the tetranucleotide frequencies and the resulting values are added together to determine the K–L divergence for tetranucleotide frequency. The same basic calculation is done for the frequency of usage for each codon to determine the K–L divergence for codon usage between a gene or operon and a genome. 3 www.phrap.com 4 www.kazusa.or.jp/codon/

www.frontiersin.org

RESULTS MANY ORGANISMS ENCODE MORE THAN ONE [Ni,Fe]-CODH GENE

BLAST searches of the IMG database revealed that a surprising number of organisms encode [Ni,Fe]-CODH genes. Of the 2887 extant bacterial and archaeal genomes, 185 genomes (>6%) encoded at least one [Ni,Fe]-CODH gene. Of these 185 genomes, 43% encoded more than one [Ni,Fe]-CODH (Listed in Table S2 in Supplemental Material). The highest number of [Ni,Fe]-CODHs detected in a single genome is five, found in the genome sequences of C. hydrogenoformans, the Mono Lake isolate delta Proteobacterium MLMS-1, and the methanogenic archaeon Methanosarcina acetivorans. Many of the species that encode multiple [Ni,Fe]CODHs have been described as having the ability to utilize CO for both energy conservation and carbon acquisition, or energy conservation alone. Several of the gene clusters have genomic contexts that indicate probable primary functions such as metal reduction, oxidative stress responses, and cofactor reduction. In addition, the important human pathogens Clostridium difficile and Clostridium botulinum both possess two copies of [Ni,Fe]-CODHs, suggesting a role for [Ni,Fe]-CODHs in their physiology. The majority of [Ni,Fe]-CODHs found in our analysis occur in strains whose modes of CO utilization are not yet established, and therefore more CO-related physiological traits are likely to be discovered in the future. PHYLOGENETIC ANALYSIS OF [Ni,Fe]-CODH SEQUENCES REVEALS SEVERAL DISTINCT CLADES

A dataset of all of the [Ni,Fe]-CODHs from the IMG database was used to reconstruct the phylogeny of [Ni,Fe]-CODHs. Both ML and BI phylogenetic analyses were used. The phylogenetic trees (Figures 1A,B) constructed from both methods were congruent, therefore only the ML trees are shown in subsequent figures. These trees both detail six distinct, strongly supported clades (greater than 50% boostraps for ML trees, with most nodes supported at >86%, and Bayesian posterior support of greater than or equal to 0.99). While the branching pattern within clades differs slightly between ML and BI, the composition of all six clades in both ML and BI trees are identical (Table S3 in Supplemental Material). The distribution of sequences within these clades greatly expands our knowledge of the organisms that encode [Ni,Fe]-CODHs as well as the potential mechanisms for their evolution. FUNCTIONAL ANALYSIS OF [Ni,Fe]-CODH CLADES

Analysis of the genomic context was performed so that the [Ni,Fe]CODH sequences were binned according to four categories: (1) within an acetyl-CoA synthase gene cluster, (2) adjacent to an ECH gene cluster, (3) adjacent to a cooF electron transfer protein, and (4) not adjacent to a cooF gene. These categories were plotted on the ML phylogenetic tree displayed in a circular format which was color coded according to the genomic context of the CODHs (Figure 2). SUMMARY OF [Ni,Fe]-CODH CLADES

Clade A is composed of sequences corresponding to the Cdh-type CODHs, which occur almost exclusively in the Archaea and for the most part cluster in the genomes as part of the five-subunit acetylCoA decarbonylase/synthase (ACDS) complex (Terlesky et al.,

April 2012 | Volume 3 | Article 132 | 3

Techtmann et al.

FIGURE 1 | Unrooted phylogenetic trees of the [Ni,Fe]-CODH amino acid sequences encoded by IMG genomes were constructed from (A) Maximum Likelihood and (B) Bayesian Inference algorithms. Bootstrap values as a percentage of 1000 replicates or Bayesian posterior

1986; Grahame, 1991; Kocsis et al., 1999; Gencic et al., 2010). Only one bacterium, the deep subsurface pioneer species Candidatus Desulforudis audaxviator, has a typical archaeal Cdh-type ACDS complex (Chivian et al., 2008). Clade B contains [Ni,Fe]-CODHs from an eclectic group of Bacteria, the majority comprising either gut microbiota or putative cellulose degraders. This clade includes species found in diverse enteric environments ranging from cow rumen isolates such as Ruminococcus spp. to constituents of the human gut microbiome such as Campylobacter spp. However, this clade also includes metal reducing Geobacter species, as well as two phototrophs,

Frontiers in Microbiology | Evolutionary and Genomic Microbiology

Anaerobic carbon monoxide dehydrogenase HGT

probabilities are shown for the deep branches defining the six [Ni,Fe]-CODH clades. The compositions of all six clades in both ML and BI trees are identical (see Figure A1 in Appendix and Table S3 in Supplemental Material).

Chlorobium phaeobacteroides and Chlorobaculum parvum. These phototrophs and the second of two CooS sequences from the homoacetogen Moorella thermoacetica form the most deeply branching members of this clade. None of the sequences in Clade B is linked to any other known CO-related genes. Clade C is composed primarily of clostridial species. Most of the strains within this clade are found as members of the human microbiome. A large proportion of the [Ni,Fe]-CODH sequences within this clade are one of two CooS homologs found in many strains of the human pathogens C. botulinum and C. difficile. A large number of these sequences are found adjacent to cooF genes,

April 2012 | Volume 3 | Article 132 | 4

Techtmann et al.

FIGURE 2 | Maximum likelihood phylogeny from Figure 1A drawn as an unrooted circular tree. IMG Gene ID numbers are shown in parentheses after sequence names. The genomic context for each of the [Ni,Fe]-CODH sequences is shown by coloring of the tip labels. Blue indicates CODH occurrence within an ACS gene

suggesting that these [Ni,Fe]-CODHs are able to use CooF to transfer electrons to other functional proteins. Thus it is possible that these [Ni,Fe]-CODHs are coupled to a broad range of functional processes including disease-related physiology. Further biochemical investigation is required to determine the function of the [Ni,Fe]-CODHs within this clade. The most deeply branching members of this clade are unlinked [Ni,Fe]-CODHs from hyperthermophilic methanogens including several Methanocaldococcus species and Methanopyrus kandleri. The organisms within this clade thus span an extraordinary phylogenetic range, from Bacteria through Euryarchaeota and likely participate in diverse pathways for CO utilization. All of the [Ni,Fe]-CODH sequences within Clade D were found to be “lone” cooS genes, that is, not found in genomic context with known CO-metabolism related genes. The most deeply branching group of Clade D is a subclade composed of a few alkaliphilic bacteria. Another deeply branching group in Clade D is from Euryarchaeal species, either methanogens or Archaeoglobaceae. One

www.frontiersin.org

Anaerobic carbon monoxide dehydrogenase HGT

cluster. Purple indicates CODH adjacent to an energy-converting hydrogenase gene cluster. Green indicates that the CODH is adjacent to a cooF gene. Black indicates that the CODH is not linked to a cooF gene. Bootstrap values are shown as a percentage derived from 1000 replicates.

of the subgroups of Clade D contains sequences from the second class of C. botulinum [Ni,Fe]-CODH sequences among other Clostridium spp. A third group of sequences within Clade D is composed of sequences similar to CooS-V from C. hydrogenoformans. The C. hydrogenoformans CooS-V was described in the genome analysis of C hydrogenoformans as being unique, since it is the only one of the five [Ni,Fe]-CODH genes in this genome without a putative function, and is not directly linked to other genes that could provide a contextual clue as to its function (Wu et al., 2005). The majority of Clade D [Ni,Fe]-CODH genes are encoded in genomes that also encode other [Ni,Fe]-CODH genes. Clade E is the largest and most eclectic clade, composed of several subclades. Representatives of each of the four functional categories used in this study are found in Clade E. One of the Clade E subclades is dominated by clostridial and delta proteobacterial [Ni,Fe]-CODHs found within ACS gene clusters. This group of sequences includes the majority of the known

April 2012 | Volume 3 | Article 132 | 5

Techtmann et al.

bacterial CODH/ACS gene clusters. Another large subclade in Clade E contains sequences from sulfate reducing bacteria as well as methanogenic archaea. A noteable feature of one more subclade within Clade E is a group of [Ni,Fe]-CODHs from Thermococcus spp. that are linked to ECH in these Euryarchaeota. These thermococci have the ability to couple the anaerobic oxidation of CO to hydrogen production, via a conserved [Ni,Fe]-CODH-ECH gene cluster (Lim et al., 2010). Clade F is numerically small, however members in all of the four functional categories are represented in this group of CODHs. The four C. hydrogenoformans [Ni,Fe]-CODH gene clusters in Clade F have been assigned different metabolic functions based on biochemical characterization as well as their genetic linkages to other functional genes (Wu et al., 2005). For example, the well-characterized [Ni,Fe]-CODH–ECH gene clusters originally described in R. rubrum and C. hydrogenoformans are assigned to this clade. In addition to these well-characterized [Ni,Fe]CODH–ECH operons there are several other ECH-clustered CODHs within Clade F. Another subgroup within Clade F contains bacterial CODH/ACS gene clusters.

FIGURE 3 | Phylogenetic position of the mini-CooS from Thermosinus carboxydivorans. A maximum likelihood tree was built from a MUSCLE alignment of representative sequences selected from all of the [Ni,Fe]-CODH

Frontiers in Microbiology | Evolutionary and Genomic Microbiology

Anaerobic carbon monoxide dehydrogenase HGT

Clade F also contains many characterized [Ni,Fe]-CODHs with representatives from each of the four functional classes used in this study. Many of these have been characterized structurally and biochemically (Svetlitchnyi et al., 2001, 2004; Soboh et al., 2002; Volbeda and Fontecilla-Camps, 2005). These biochemical studies establish Clade F as encompassing the greatest known functional diversity of [Ni,Fe]-CODH gene clusters. THE GENOME OF T. CARBOXYDIVORANS ENCODES A MINIMAL CooS HOMOLOG

The genome of T. carboxydivorans encodes three [Ni,Fe]-CODH homologs. One of these homologs is highly similar to the hydrogenase-linked [Ni,Fe]-CODH from C. hydrogenoformans. A second [Ni,Fe]-CODH is highly similar to the CODH-V from C. hydrogenoformans. The third [Ni,Fe]-CODH homolog however is distinct from all known [Ni,Fe]-CODH sequences (Figure 3). Other known [Ni,Fe]-CODH sequences are 600–800 amino acid residues long. This divergent mini-CooS has only 482 amino acid residues and its short length does not result from frameshift truncation. Upon alignment with other [Ni,Fe]-CODH sequences, it is

clades defined by Figure 1 as well as the mini-CooS from T. carboxydivorans. The tree was bootstrapped with 1000 replicates,which are shown as percentage values at the nodes in the tree.

April 2012 | Volume 3 | Article 132 | 6

Techtmann et al.

apparent that its small size can be accounted for by many deletions of varying size relative to standard size CooS genes (Figure 4), however the mini-CooS seems to retain all of the important domains of typical CODHs, suggesting that it may be functional (Figure 4). Dobbek et al. (2004) solved the prototype crystal structure for the CooS-II from C. hydrogenoformans which revealed three metal binding clusters (Clusters B, C, and D), each with separate metal coordinating domains with conserved Cys and His residues essential for CODH activity. Interestingly, the 11 Cys residues and the His residue coordinating these clusters are all conserved in the minimal CooS from T. carboxydivorans. Hydroxylamine reductases share some sequence similarity with CooS, but they all lack some of the conserved Cys residues that locate the metal clusters in CooS. The proximity in the genome of the minimal cooS gene to a cooF gene provides further evidence for its involvement in CO-metabolism. EVIDENCE FOR HGT OF THE [NI,FE]-CODH-ECH GENE CLUSTER OF THERMOSINUS CARBOXYDIVORANS

The genome sequence of T. carboxydivorans provides insights into the ability of microbes to utilize CO as a versatile electron donor as well as potential mechanisms for acquisition of [Ni,Fe]-CODH operons and the evolution of multi-[Ni,Fe]-CODH genotypes. As mentioned previously, the draft T. carboxydivorans genome sequence revealed three CODH gene clusters. The [Ni,Fe]-CODHECH gene cluster is highly similar in sequence and gene content to the homologs found in C. hydrogenoformans and R. rubrum as shown in Figure 5. In R. rubrum, this gene cluster has been

FIGURE 4 | The mini-CooS from Thermosinus carboxydivorans contains conserved residues essential for CODH activity. An alignment of Carboxydothermus hydrogenoformans CooS-II, the Cdh-type CODH of Methanothermobacter thermautotrophicus (MTH1708), Thermosinus carboxydivorans mini-CooS, and Thermincola potens hydroxylamine reductase TherJR_2940 was constructed. Dobbek et al. (2001) defined

www.frontiersin.org

Anaerobic carbon monoxide dehydrogenase HGT

shown to comprise two operons: a CODH operon, which contains a cooS gene and the gene for a nickel–iron, electron transfer protein (cooF ), and the adjacent membrane-bound hydrogenase operon. The [Ni,Fe]-CODH-ECH gene clusters are composed of similar sets of genes in R. rubrum, C. hydrogenoformans, and T. carboxydivorans. In C. hydrogenoformans, the [Ni,Fe]-CODH–ECH gene cluster probably comprises two operons (there is an intergenic space approximately 100 nt long between the hypA gene and the cooF gene). In T. carboxydivorans, there is no intergenic space or putative promoter sequences between the ECH and CODH sections of the gene cluster, suggesting that it forms a single operon. Nevertheless, the CODH and the clustered genes encoding the seven subunits of the membrane-bound hydrogenase from T. carboxydivorans are highly similar to the corresponding genes from C. hydrogenoformans, in both gene order and in nucleotide sequence. The 11.3 kb operon that comprises the [Ni,Fe]-CODH–ECH gene cluster in T. carboxydivorans is 84% identical on the nucleotide level to the homologous gene cluster in C. hydrogenoformans, an observation that prompted us to examine these gene clusters for quantifiable evidence of HGT. Several in silico methods have been proposed as indicators of HGT. G + C content has been a well-established and simple method for predicting HGT (Lawrence and Ochman, 1998). The G + C content of the T. carboxydivorans genome is 51.6% whereas the G + C content of its 11.3-kb [Ni,Fe]-CODH–ECH operon is 43.4%, which is similar to the overall G + C content of C. hydrogenoformans genome (42.0%). The G + C% of the [Ni,Fe]CODH-ECH gene cluster from C. hydrogenoformans (44%) is very

three sites as being essential for CooS activity. Cluster C (purple), B (red), and D (green). All of these residues are present in the C. hydrogenoformans CooS-II, M. thermoautotrophicus Cdh, and T. carboxydivorans minimal CooS. Many of these residues are absent in the hydroxylamine reductase. The sequences were aligned with MAFFT v6.861b (http://mafft.cbrc.jp/alignment/server/index.html).

April 2012 | Volume 3 | Article 132 | 7

Techtmann et al.

FIGURE 5 | Gene topology for the hydrogenase – CODH gene clusters from C. hydrogenoformans, T. carboxydivorans, and R. rubrum. Homologous genes are colored similarly between operons. The C. hydrogenoformans and T. carboxydivorans gene clusters are identical except for displacement of the associated cooC gene and presence of intergenic

similar to the overall G + C% of the C. hydrogenoformans genome (Wu et al., 2005). Because G + C content alone is considered to be an inadequate criterion to detect HGT (Garcia-Vallve et al., 2000) we sought other criteria to distinguish the [Ni,Fe]-CODH–ECH gene cluster in T. carboxydivorans from the genome at large. In a recent paper, additional methods for determining HGT were evaluated (Becq et al., 2010). Tetranucleotide frequency and codon usage similarity quantified with K–L divergence were proposed as the most accurate metrics for predicting HGT. We compared the [Ni,Fe]-CODH–ECH gene clusters of T. carboxydivorans and C. hydrogenoformans with the full genome sequences of T. carboxydivorans, C. hydrogenoformans, with K–L analysis using the genome of E. coli K12 as a control to determine if these two CODH-I gene clusters are divergent from the genomes at large (Table 1). The T. carboxydivorans [Ni,Fe]-CODH–ECH gene cluster was compared to the T. carboxydivorans genome using tetranucleotide frequency, resulting in a K–L divergence of 0.217. When codon usage was used as the metric, the K–L divergence was 0.133. The threshold values of 0.05 and 0.1 for K–L divergence of codon usage and tetranucleotide frequency respectively were used to determine similarity. By both metrics the T. carboxydivorans [Ni,Fe]-CODH– ECH gene cluster is significantly different from the T. carboxydivorans genome. As a control, the tetranucleotide frequency and codon usage of the genes encoding the T. carboxydivorans [Ni,Fe]-CODH–ECH gene cluster were compared to those of the E. coli K12 genome. K–L divergences for this comparison are 0.116 and 0.212 for tetranucleotide frequency and codon usage

Frontiers in Microbiology | Evolutionary and Genomic Microbiology

Anaerobic carbon monoxide dehydrogenase HGT

space between hypA and the cooF in C. hydrogenoformans. Additionally the T. carboxydivorans cooA gene is located 254 kb away from the rest of the gene cluster. While the R. rubrum gene cluster has many of the same genes, there are notable differences in the genes encoding the hydrogenase accessory proteins (i.e., presence of cooT and cooJ in R. rubrum instead of hypA).

respectively. These results confirmed that the threshold values are appropriate for specifying similarity. The K–L divergence of the T. carboxydivorans [Ni,Fe]-CODH–ECH gene cluster vs. the T. carboxydivorans genome is thus greater than the K–L distance of the T. carboxydivorans [Ni,Fe]-CODH–ECH gene cluster vs. the E. coli K12 genome. Based on these metrics it appears that the [Ni,Fe]CODH–ECH gene cluster from T. carboxydivorans is significantly divergent from the T. carboxydivorans genome at large. The same analysis was performed to examine whether the C. hydrogenoformans [Ni,Fe]-CODH–ECH gene cluster is similar to rest of the C. hydrogenoformans genome. The K–L divergences comparing tetranucleotide frequency and codon usage for the C. hydrogenoformans [Ni,Fe]-CODH–ECH gene cluster against the C. hydrogenoformans genome are 0.023 and 0.075 respectively. Again E. coli K12 was used as a control and the K–L divergences for tetranucleotide frequency and codon usage are 0.155 and 0.236 respectively. The K–L divergences comparing the C. hydrogenoformans [Ni,Fe]-CODH–ECH gene cluster with the rest of the genome are well below the cutoff values and therefore suggest that the C. hydrogenoformans CODH-I gene cluster is equilibrated with the C. hydrogenoformans genome. The K–L divergences for the comparison of tetranucleotide frequency and codon usage of the T. carboxydivorans [Ni,Fe]-CODH– ECH gene cluster against the C. hydrogenoformans genome are 0.048 and 0.088. These K–L divergences are below the cutoff values, indicating that the T. carboxydivorans [Ni,Fe]-CODH– ECH gene cluster is remarkably similar in composition to the C. hydrogenoformans genome.

April 2012 | Volume 3 | Article 132 | 8

Techtmann et al.

Anaerobic carbon monoxide dehydrogenase HGT

Table 1 | Kullback–Leibler divergence comparing codon usage (bold) and tetranucleotide frequency (Italics) for various combinations of the C. hydrogenoformans, T. carboxydivorans, and E. coli K12 genomes and the C. hydrogenoformans and T. carboxydivorans CODH-ECH gene clusters. C. hydrogenoformans

T. carboxydivorans

C. hydrogenoformans

T. carboxydivorans

E. coli K12

genome

genome

CODH–ECH cluster

CODH–ECH cluster

genome

C. hydrogenoformans genome

0

0.302

0.023

0.048

0.155

T. carboxydivorans genome

0.143

0

0.202

0.217

0.098

C. hydrogenoformans CODH–ECH cluster

0.075

0.154

0

0.016

0.115

T. carboxydivorans CODH–ECH cluster

0.088

0.133

0.036

0

0.116

E. coli K12 genome

0.170

0.153

0.236

0.212

0

DISCUSSION The burgeoning release of microbial genomes has revealed that the ability to utilize CO is a surprisingly common trait shared by a diverse collection of bacteria and archaea. Phylogenetic analysis of [Ni,Fe]-CODHs revealed extensive diversity of [Ni,Fe]-CODH sequences in 6% of the released microbial genomes. Most of these species have primary physiologies not related to CO metabolism and have not been scored for CO utilization. Among [Ni,Fe]CODH encoding species, more than 40% encode more than one CODHs. This previously unexplored diversity of sequences has the potential to provide insights into both the function and evolution of this important protein family. For the most part the multiple [Ni,Fe]-CODHs encoded by the same organism are not close relatives. This suggests that these multiple [Ni,Fe]-CODHs in one genome are not the result of recent gene duplications. Our phylogenetic analysis confirms prior work which has shown that there is a clear distinction between the bacterial-type CODH/ACS and the archaeal-type CODH/ACDS. The clade composed of the archaeal-type CODH/ACDS is populated by archaea with one noteable exception. There is a sequence for an archaeallike CODH/ACDS found in the genome of the deep subsurface bacterium Candidatus D. audaxviator, which was the sole detected inhabitant of subterranean effluent waters in a South African gold mine 2.8 km below the surface (Chivian et al., 2008). So far the CODH/ACDS from D. audaxviator is the only sequence of a CODH/ACDS found in a bacterial genome, supporting the notion that D. audaxviator acquired this CODH/ACDS via a HGT event from an archaeon. Aside from D. audaxviator’s CODH/ACDS it is difficult to distinguish putative HGT events of [Ni,Fe]-CODHs from vertical descent solely from phylogenetic analysis. There are many examples of [Ni,Fe]-CODH clades that are composed of organisms that are not related to one another phylogenetically. For example, in clades C, D, and E, archaeal [Ni,Fe]-CODHs form deep branches of a clade composed primarily of [Ni,Fe]-CODHs from bacterial species, tempting us to infer that these sequences may be the result of ancient inter-domain HGTs. However, the disparity between the 16S rRNA phylogeny and the [Ni,Fe]-CODH phylogeny reconstructed here could result from functional distinctions between CODH lineages. We assume that the function of [Ni,Fe]-CODHs is determined by their associated proteins encoded within gene clusters, since this has been observed in many cases. Therefore, the CODH proteins from one clade may have evolved to associate with accessory proteins more often clustered with another clade

www.frontiersin.org

member, resulting in [Ni,Fe]-CODHs with similar functions to have convergent sequence motifs, even if originating from different evolutionary starting points. It is difficult at this stage to assess with accuracy whether the [Ni,Fe]-CODH phylogeny is due to divergence along functional lines as described above, due to the fact that only [Ni,Fe]-CODHs from clades A and F have been definitively characterized biochemically. In an effort to clearly elucidate the branching pattern seen in the global [Ni,Fe]-CODH tree shown in Figure 1, we have attempted to analyze the genomic context of each of the [Ni,Fe]-CODHs in the tree. The function of a [Ni,Fe]-CODH is often determined by the accessory proteins with which it associates. In many genomes, these accessory proteins are encoded in the same operon or adjacent to the [Ni,Fe]-CODH. The analysis of the genomic context and in turn the implied function of the [Ni,Fe]-CODH may help to distinguish vertical descent along function-determined lineages from HGT. The functional tree (Figure 2) shows that on the whole there is co-occurrence of clades and genomic context. However there are exceptions where the genomic context is not congruent with clades. The best example of genomic context being congruent with phylogenetic lineage is in the case of the archaeal CODH/ACDS (Figure 2, Clade A). Aside from a few isolated cases, namely A. fulgidus and M. acetivorans, the remaining members are found within the context of the CODH/ACDS operon. It is most possible that in the cases of the exceptions, genomic context does not reflect functional cooperation. For the most part, however, Clade A is predominated by species from the domain Archaea and with [Ni,Fe]-CODHs with a conserved genomic context (ACDS clusters). The bacterial side of the tree is not as straightforward. There are many cases in which genomic context does appear to be correlated with appearance in a common clade. Also, there are a few examples where similar functions are found in two different clades. The well-characterized physiology of hydrogenogenic carboxydotrophy is encoded by a [Ni,Fe]-CODH linked to an ECH. In this phylogenetic tree there are two places in which this gene cluster is found (Clade E and Clade F). While it may be difficult to explain this phenomenon in terms of vertical descent, it is clear that unlike the CODHs in Clade A the [Ni,Fe]-CODH–ECH gene cluster is not the result of simple vertical descent from an ancestral [Ni,Fe]-CODH–ECH. Another example of correlation of the genomic context with the tree topology is provided by the absence of CooS/ACS gene

April 2012 | Volume 3 | Article 132 | 9

Techtmann et al.

clusters beyond clades E and F. This could indicate that CooS/ACS sequences have evolved by vertical inheritance and the presence of phylogenetically diverse hosts within these clades is not due to HGT but is a result of conservation of function. However, like the CODH–ECH gene cluster, the CODH/ACS gene cluster is present in three clades. Again this points to a more complex evolutionary history, which may indicate that this gene cluster has arisen more than once during the course of evolution or potentially a more complex mechanism involving HGT and genome rearrangements. By overlaying known functional categories onto the phylogenetic tree we were able to clarify some of the phylogenetic history of [Ni,Fe]-CODHs. However much work needs to be done to further clarify the function of the many of the remaining [Ni,Fe]-CODHs. Since it appears that in some cases the clades fall along functional boundaries, the question of the role of HGT as a means of acquisition of [Ni,Fe]-CODHs could be more clearly addressed by examining unrelated species that possesses [Ni,Fe]-CODHs that are highly similar in sequence and function. One such example is found in the [Ni,Fe]-CODH–ECH of Clade F. C. hydrogenoformans and T. carboxydivorans are two thermophiles from the Phylum Firmicutes but are from two divergent classes, Clostridia and Negativicutes, respectively. While these organisms share a common physiology they represent widely separated phylogenetic lineages. They both possess [Ni,Fe]-CODH–ECH homologous gene clusters with high similarity to each other (84% nucleotide identity). For this reason we set out to investigate whether the [Ni,Fe]-CODH–ECH gene cluster could have been acquired by T. carboxydivorans via an HGT event. Analysis of G + C content, tetranucleotide frequency, and codon usage support the hypothesis that the CODH – ECH gene cluster was assimilated by T. carboxydivorans via HGT. Based on the high similarity between the hydrogenase – CODH gene clusters of T. carboxydivorans and C. hydrogenoformans and on the fitness of the T. carboxydivorans [Ni,Fe]-CODH–ECH to the C. hydrogenoformans genome in terms of tetranucleotide frequency and codon usage patterns, it is tempting to propose that T. carboxydivorans obtained this operon by HGT from C. hydrogenoformans or a closely related species. However, closer examination of the data indicate that there may have been a recent transfer from a relative of C. hydrogenoformans or else a more ancient transfer from an ancestor of C. hydrogenoformans into the T. carboxydivorans lineage. First, the nucleotide identity of the gene clusters, although high (84%), is lower than the ANI value of 95–96% shown to delimit microbial species (Goris et al., 2007; Richter and Rossello-Mora, 2009; Tindall et al., 2010). Second, whereas in C. hydrogenoformans the cluster seems to comprise two operons, T. carboxydivorans appears likely to encode a single operon (Figure 5). Third, the order of genes in the gene clusters from T. carboxydivorans and C. hydrogenoformans has been switched by the relocation of cooC (Figure 5). The rearrangements suggest further evolution in T. carboxydivorans or C. hydrogenoformans following a horizontal transfer. The location of the regulator gene cooA apart from the [Ni,Fe]-CODH–ECH gene cluster in T. carboxydivorans also suggests that there is ongoing genome reordering of the CO-related genes in this isolate, or may be interpreted as the acquisition of cooA that occurred independently from the acquisition of the [Ni,Fe]-CODH–ECH gene cluster. One of the two cooA

Frontiers in Microbiology | Evolutionary and Genomic Microbiology

Anaerobic carbon monoxide dehydrogenase HGT

genes (cooA-1) in C. hydrogenoformans is directly upstream of its [Ni,Fe]-CODH–ECH gene cluster. The lone cooA homolog in the genome of T. carboxydivorans is a homolog of C hydrogenoformans cooA-1 and is separated by 252 kb from the hydrogenase – CODH gene cluster. Interestingly, the T. carboxydivorans cooA is flanked by transposes and phage-related genes, which are known to be mediators of genomic rearrangement and HGT (Ochman et al., 2000). The T. carboxydivorans genome also contains the smallest gene predicted to contain all of the conserved CooS features. This putative CODH is 428 amino acid residues long compared with typical [Ni,Fe]-CODHs that range from approximately 600 to 800 residues long. This “mini-CooS” has been placed in a separate multiple alignment that gave rise to the ML analysis shown in Figure 3. Taking into account (i) the acquisition of the hydrogenase – [Ni,Fe]-CODH gene cluster via HGT, which we substantiate in this paper, (ii) the separate location of the cooA gene, and (iii) the fact that the G + C content of the gene encoding the unusual small cooS is about 10 mol% higher than that of the genome. Therefore, we speculate that the assemblage of the carboxydotrophic hydrogenogenic phenotype in T. carboxydivorans is an evolutionarily recent event, suggesting that rapid genome evolution is under way to accommodate selection for CO utilization. Due to the large number of species that have been shown to encode multiple CODHs, it seems probable that multiple [Ni,Fe]CODH operons increase the fitness of an organism in CO-rich milieus by providing divergent metabolic pathways through which CO can be metabolized. The consumption of CO by independent pathways providing energy conservation, carbon fixation, and nicotinamide coenzyme reduction are now well-established. On a broader scale, in microbial consortia, CO utilization has the additional communal benefit of dissipating a potentially toxic compound (Parshina et al., 2005; Techtmann et al., 2009). Our recent work revealed that adaptive regulation of expression of multiple [Ni,Fe]-CODH operons by dual cooA genes may partition CO between multiple competing pathways in response to varying needs for energy conservation and carbon acquisition across a wide range of CO concentrations (Techtmann et al., 2011). In conclusion, we have developed a comprehensive phylogeny for [Ni,Fe]-CODHs which revealed the presence of six distinct clades. Comparison of the genomic contexts with CooS phylogeny suggests that several clades diverged based on function. These clades appear to be evolving via vertical descent. However there are a few CO-related metabolic functions that are spread throughout various clades on the tree either indicating different evolutionary events or HGT. It seems likely that both HGT and vertical transmission have driven the remarkable divergence of [Ni,Fe]-CODHs currently seen in both archaea and bacteria

ACKNOWLEDGMENTS This work was funded by the US National Science Foundation through award numbers EAR 0747394 (Frank T. Robb), EAR 0747412 (Albert S. Colman), and MCB 0605301 (Frank T. Robb and Albert S. Colman). The work of Alexander V. Lebedinsky and Tatyana G. Sokolova has been supported by the Russian

April 2012 | Volume 3 | Article 132 | 10

Techtmann et al.

Anaerobic carbon monoxide dehydrogenase HGT

Foundation for Basic Research (project number 11-04-01723-a), federal targeted program for 2009–2013 “Scientific and Pedagogical Personnel of Innovative Russia” (GK number P646), and the program of the Presidium of the Russian Academy of Sciences “Molecular and Cell Biology.” The sequencing work conducted by the U.S. Department of Energy Joint Genome Institute is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. We are indebted to our reviewers for helpful comments on phylogenetic methods. This is Contribution 11-230 from the Institute of Marine and Environmental Technology.

REFERENCES Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990). Basic local alignment search tool. J. Mol. Biol. 215, 403–410. Becq, J., Churlaud, C., and Deschavanne, P. (2010). A benchmark of parametric methods for horizontal transfers detection. PLoS ONE 5, e9989. doi:10.1371/journal.pone.0009989 Chivian, D., Brodie, E. L., Alm, E. J., Culley, D. E., Dehal, P. S., Desantis, T. Z., Gihring, T. M., Lapidus, A., Lin, L. H., Lowry, S. R., Moser, D. P., Richardson, P. M., Southam, G., Wanger, G., Pratt, L. M., Andersen, G. L., Hazen, T. C., Brockman, F. J., Arkin, A. P., and Onstott, T. C. (2008). Environmental genomics reveals a single-species ecosystem deep within Earth. Science 322, 275–278. Dobbek, H., Gremer, L., Meyer, O., and Huber, R. (1999). Crystal structure and mechanism of CO dehydrogenase, a molybdo ironsulfur flavoprotein containing Sselanylcysteine. Proc. Natl. Acad. Sci. U.S.A. 96, 8884–8889. Dobbek, H., Svetlitchnyi, V., Gremer, L., Huber, R., and Meyer, O. (2001). Crystal structure of a carbon monoxide dehydrogenase reveals a [Ni-4Fe-5S] cluster. Science 293, 1281–1285. Dobbek, H., Svetlitchnyi, V., Liss, J., and Meyer, O. (2004). Carbon monoxide induced decomposition of the active site [Ni-4Fe-5S] cluster of CO dehydrogenase. J. Am. Chem. Soc. 126, 5382–5387. Drummond, A. J., and Rambaut, A. (2007). BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 7, 214. doi:10.1186/1471-2148-7-214 Edgar, R. C. (2004a). MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5, 113. doi:10.1186/1471-2105-5-113 Edgar, R. C. (2004b). MUSCLE: multiple sequence alignment with high

www.frontiersin.org

accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797. Ewing, B., and Green, P. (1998). Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8, 186–194. Ewing, B., Hillier, L., Wendl, M. C., and Green, P. (1998). Basecalling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8, 175–185. Ferry, J. G. (1995). CO dehydrogenase. Annu. Rev. Microbiol. 49, 305–333. Gao, F., and Zhang, C. T. (2006). GCProfile: a web-based tool for visualizing and analyzing the variation of GC content in genomic sequences. Nucleic Acids Res. 34, W686–W691. Garcia-Vallve, S., Romeu, A., and Palau, J. (2000). Horizontal gene transfer in bacterial and archaeal complete genomes. Genome Res. 10, 1719–1725. Gencic, S., Duin, E. C., and Grahame, D. A. (2010). Tight coupling of partial reactions in the acetyl-CoA decarbonylase/synthase (ACDS) multienzyme complex from Methanosarcina thermophila: acetyl C-C bond fragmentation at the a cluster promoted by protein conformational changes. J. Biol. Chem. 285, 15450–15463. Gnida, M., Ferner, R., Gremer, L., Meyer, O., and Meyer-Klaucke, W. (2003). A novel binuclear [CuSMo] cluster at the active site of carbon monoxide dehydrogenase: characterization by X-ray absorption spectroscopy. Biochemistry 42, 222–230. Gonzalez, J. M., and Robb, F. T. (2000). Genetic analysis of Carboxydothermus hydrogenoformans carbon monoxide dehydrogenase genes cooF and cooS. FEMS Microbiol. Lett. 191, 243–247. Gordon, D., Abajian, C., and Green, P. (1998). Consed: a graphical tool for sequence finishing. Genome Res. 8, 195–202. Goris, J., Konstantinidis, K. T., Klappenbach, J. A., Coenye, T.,Vandamme, P., and Tiedje, J. M. (2007). DNA-DNA hybridization values and their relationship to whole-genome sequence

SUPPLEMENTAL MATERIAL The Supplementary Material for this article can be found online at http://www.frontiersin.org/Evolutionary_and_Genomic_Micro biology/10.3389/fmicb.2012.00132/abstract Table S1 | [Ni,Fe]-CODH sequences found in the IMG database as of October 1, 2011. Table S2 | Genomes encoding multiple [Ni,Fe]-CODH homologs. Table S3 | Composition of the six [Ni,Fe]-CODH clades of both the ML and BI phylogenetic trees.

similarities. Int. J. Syst. Evol. Microbiol. 57, 81–91. Grahame, D. A. (1991). Catalysis of acetyl-CoA cleavage and tetrahydrosarcinapterin methylation by a carbon monoxide dehydrogenasecorrinoid enzyme complex. J. Biol. Chem. 266, 22227–22233. Han, C. S., and Chain, P. (2006). “Finishing repeat regions automatically with Dupfinisher,” in Proceedings of the 2006 International Conference on Bioinformatics and Computational Biology, eds. H. R. Arabnia and H. Valafar (Las Vegas: CSREA Press), 141–146. Hugendieck, I., and Meyer, O. (1992). The structural genes encoding CO dehydrogenase subunits (cox L, M and S) in Pseudomonas carboxydovorans OM5 reside on plasmid pHCG3 and are, with the exception of Streptomyces thermoautotrophicus, conserved in carboxydotrophic bacteria. Arch. Microbiol. 157, 301–304. Kerby, R. L., Hong, S. S., Ensign, S. A., Coppoc, L. J., Ludden, P. W., and Roberts, G. P. (1992). Genetic and physiological characterization of the Rhodospirillum rubrum carbon monoxide dehydrogenase system. J. Bacteriol. 174, 5284–5294. King, G. M. (2003). Uptake of carbon monoxide and hydrogen at environmentally relevant concentrations by mycobacteria. Appl. Environ. Microbiol. 69, 7266–7272. King, G. M. (2006). Nitrate-dependent anaerobic carbon monoxide oxidation by aerobic CO-oxidizing bacteria. FEMS Microbiol. Ecol. 56, 1–7. King, G. M., and Weber, C. F. (2007). Distribution, diversity and ecology of aerobic CO-oxidizing bacteria. Nat. Rev. Microbiol. 5, 107–118. Kocsis, E., Kessel, M., Demoll, E., and Grahame, D. A. (1999). Structure of the Ni/Fe-S protein subcomponent of the acetyl-CoA decarbonylase/synthase complex from Methanosarcina thermophila at 26-A

resolution. J. Struct. Biol. 128, 165–174. Kullback, S., and Leibler, R. A. (1951). On information and sufficiency. Ann. Math. Stat. 22, 79–86. Lawrence, J. G., and Ochman, H. (1998). Molecular archaeology of the Escherichia coli genome. Proc. Natl. Acad. Sci. U.S.A. 95, 9413–9417. Lim, J. K., Kang, S. G., Lebedinsky, A. V., Lee, J. H., and Lee, H. S. (2010). Identification of a novel class of membrane-bound [NiFe]-hydrogenases in Thermococcus onnurineus NA1 by in silico analysis. Appl. Environ. Microbiol. 76, 6286–6289. Lindahl, P. A. (2002). The Ni-containing carbon monoxide dehydrogenase family: light at the end of the tunnel? Biochemistry 41, 2097–2105. Meyer, O., Frunzke, K., and Mörsdorf, G. (eds). (1993). Biochemistry of the Aerobic Utilization of Carbon Monoxide. Andover, MA: Intercept, Ltd. Meyer, O., and Rajagopalan, K. V. (1984). Molybdopterin in carbon monoxide oxidase from carboxydotrophic bacteria. J. Bacteriol. 157, 643–648. Meyer, O., and Schlegel, H. G. (1983). Biology of aerobic carbon monoxide-oxidizing bacteria. Annu. Rev. Microbiol. 37, 277–310. Miller, M. A., Holder, M. T.,Vos, R., Midford, P. E., Liebowitz, T., Chan, L., Hoover, P., and Warnow, T. (2010). The CIPRES Portals. CIPRES. Available at: http://www.phylo.org/ sub_sections/portal Milne, I., Lindner, D., Bayer, M., Husmeier, D., Mcguire, G., Marshall, D. F., and Wright, F. (2009). TOPALi v2: a rich graphical interface for evolutionary analyses of multiple alignments on HPC clusters and multi-core desktops. Bioinformatics 25, 126–127. Ochman, H., Lawrence, J. G., and Groisman, E. A. (2000). Lateral gene transfer and the nature of bacterial innovation. Nature 405, 299–304.

April 2012 | Volume 3 | Article 132 | 11

Techtmann et al.

Parshina, S. N., Kijlstra, S., Henstra, A. M., Sipma, J., Plugge, C. M., and Stams, A. J. (2005). Carbon monoxide conversion by thermophilic sulfate-reducing bacteria in pure culture and in co-culture with Carboxydothermus hydrogenoformans. Appl. Microbiol. Biotechnol. 68, 390–396. Ragsdale, S. W. (2004). Life with carbon monoxide. Crit. Rev. Biochem. Mol. Biol. 39, 165–195. Ragsdale, S. W. (2008). Enzymology of the wood-Ljungdahl pathway of acetogenesis. Ann. N. Y. Acad. Sci. 1125, 129–136. Rambaut, A., and Aj, D. (2007). Tracer v1.4. Available at: http://beast.bio.ed.ac.uk/Tracer Richter, M., and Rossello-Mora, R. (2009). Shifting the genomic gold standard for the prokaryotic species definition. Proc. Natl. Acad. Sci. U.S.A. 106, 19126–19131. Schubel, U., Kraut, M., Morsdorf, G., and Meyer, O. (1995). Molecular characterization of the gene cluster coxMSL encoding the molybdenum-containing carbon monoxide dehydrogenase of Oligotropha carboxidovorans. J. Bacteriol. 177, 2197–2203. Seravalli, J., Kumar, M., Lu, W. P., and Ragsdale, S. W. (1995). Mechanism of CO oxidation by carbon monoxide dehydrogenase from Clostridium thermoaceticum and its inhibition by anions. Biochemistry 34, 7879–7888. Seravalli, J., Kumar, M., Lu, W. P., and Ragsdale, S. W. (1997). Mechanism of carbon monoxide oxidation by the carbon monoxide dehydrogenase/acetyl-CoA synthase from Clostridium thermoaceticum: kinetic characterization of the intermediates. Biochemistry 36, 11241–11251.

Anaerobic carbon monoxide dehydrogenase HGT

Soboh, B., Linder, D., and Hedderich, R. (2002). Purification and catalytic properties of a CO-oxidizing:H2evolving enzyme complex from Carboxydothermus hydrogenoformans. Eur. J. Biochem. 269, 5712–5721. Sokolova, T. G., Gonzalez, J. M., Kostrikina, N. A., Chernyh, N. A., Slepova, T. V., Bonch-Osmolovskaya, E. A., and Robb, F. T. (2004). Thermosinus carboxydivorans gen. nov., sp. nov., a new anaerobic, thermophilic, carbon-monoxide-oxidizing, hydrogenogenic bacterium from a hot pool of Yellowstone National Park. Int. J. Syst. Evol. Microbiol. 54, 2353–2359. Sokolova, T. G., Henstra, A. M., Sipma, J., Parshina, S. N., Stams, A. J., and Lebedinsky, A. V. (2009). Diversity and ecophysiological features of thermophilic carboxydotrophic anaerobes. FEMS Microbiol. Ecol. 68, 131–141. Stamatakis, A. (2006). RAxML-VIHPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22, 2688–2690. Stamatakis, A., Hoover, P., and Rougemont, J. (2008). A rapid bootstrap algorithm for the RAxML Web servers. Syst. Biol. 57, 758–771. Svetlitchnyi, V., Dobbek, H., MeyerKlaucke, W., Meins, T., Thiele, B., Romer, P., Huber, R., and Meyer, O. (2004). A functional Ni-Ni[4Fe-4S] cluster in the monomeric acetyl-CoA synthase from Carboxydothermus hydrogenoformans. Proc. Natl. Acad. Sci. U.S.A. 101, 446–451. Svetlitchnyi, V., Peschel, C., Acker, G., and Meyer, O. (2001). Two membrane-associated NiFeScarbon monoxide dehydrogenases

Frontiers in Microbiology | Evolutionary and Genomic Microbiology

from the anaerobic carbonmonoxide-utilizing Eubacterium Carboxydothermus hydrogenoformans. J. Bacteriol. 183, 5134–5144. Techtmann, S. M., Colman, A. S., Murphy, M. B., Schackwitz, W. S., Goodwin, L. A., and Robb, F. T. (2011). Regulation of multiple carbon monoxide consumption pathways in anaerobic bacteria. Front. Microbiol. 2:147. doi:10.3389/fmicb.2011.00147 Techtmann, S. M., Colman, A. S., and Robb, F. T. (2009). ‘That which does not kill us only makes us stronger’: the role of carbon monoxide in thermophilic microbial consortia. Environ. Microbiol. 11, 1027–1037. Teeling, H., Waldmann, J., Lombardot, T., Bauer, M., and Glockner, F. O. (2004). TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences. BMC Bioinformatics 5, 163. doi:10.1186/1471-2105-5-163 Terlesky, K. C., Nelson, M. J., and Ferry, J. G. (1986). Isolation of an enzyme complex with carbon monoxide dehydrogenase activity containing corrinoid and nickel from acetategrown Methanosarcina thermophila. J. Bacteriol. 168, 1053–1058. Tindall, B. J., Rossello-Mora, R., Busse, H. J., Ludwig, W., and Kampfer, P. (2010). Notes on the characterization of prokaryote strains for taxonomic purposes. Int. J. Syst. Evol. Microbiol. 60, 249–266. Uffen, R. L. (1976). Anaerobic growth of a Rhodopseudomonas species in the dark with carbon monoxide as sole carbon and energy substrate. Proc. Natl. Acad. Sci. U.S.A. 73, 3298–3302.

Volbeda, A., and Fontecilla-Camps, J. C. (2005). Structural bases for the catalytic mechanism of Ni-containing carbon monoxide dehydrogenases. Dalton Trans. 21, 3443–3450. Wu, M., Ren, Q., Durkin, A. S., Daugherty, S. C., Brinkac, L. M., Dodson, R. J., Madupu, R., Sullivan, S. A., Kolonay, J. F., Haft, D. H., Nelson, W. C., Tallon, L. J., Jones, K. M., Ulrich, L. E., Gonzalez, J. M., Zhulin, I. B., Robb, F. T., and Eisen, J. A. (2005). Life in hot carbon monoxide: the complete genome sequence of Carboxydothermus hydrogenoformans Z-2901. PLoS Genet. 1, e65. doi:10.1371/journal.pgen.0010065 Conflict of Interest Statement: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. Received: 07 December 2011; accepted: 20 March 2012; published online: 17 April 2012. Citation: Techtmann SM, Lebedinsky AV, Colman AS, Sokolova TG, Woyke T, Goodwin L and Robb FT (2012) Evidence for horizontal gene transfer of anaerobic carbon monoxide dehydrogenases. Front. Microbio. 3:132. doi: 10.3389/fmicb.2012.00132 This article was submitted to Frontiers in Evolutionary and Genomic Microbiology, a specialty of Frontiers in Microbiology. Copyright © 2012 Techtmann, Lebedinsky, Colman, Sokolova, Woyke, Goodwin and Robb. This is an open-access article distributed under the terms of the Creative Commons Attribution Non Commercial License, which permits noncommercial use, distribution, and reproduction in other forums, provided the original authors and source are credited.

April 2012 | Volume 3 | Article 132 | 12

Techtmann et al.

Anaerobic carbon monoxide dehydrogenase HGT

APPENDIX Clade C

0.9951

0.994

1

0.9917

2 Soo .C 13 .13 M -1 dh S -2 h-1 dh .C .C 42 h-2 8.DCdh .Cd h-1 A8 06 d .LQ19. F-1 i.6 .1 4.C h-1 1.Cd ne.DSM.430 umi.AVtei.J d 72 ll o e r 4.C .3 .bo DO SM oc dle ga 30 M rmkan .hun -I egulaII12-16.D M.4 3.DS e 0 C . .thrus lum ..R nor ED.VC .PT6.DS -73 42 9 Z 2 21 iumpy piril .sp tha us.Aus hilaC-1 m.Z .6 trid no os ella.Me cid lgidmopus.VtigatuDSM SM..5dh-1 -1 .D ..C losethahannoctus s.plaus.f.tuherlg - .Cdh C dh-2 id es nii.DLP.D CM C .804 dh--2 S s.C2AA..C et a a bu b s.fum.ev urto ii.S h MMethndoidglo oglosaeta .b u n b iu s a oraa ns.C2 .DSM647.C CFaerrchaeanoogloalobcoideilus.m ro v dh--2 -3 ti ra r r eth ae oh oc ph .ace tivo .Fusa.DSM.3..C Cdh .804.Ch-1 AM n e c A ri h lo a 1 c d 2 a M o e o s.C S 7.C Arceththannohsaarcinina.a.barkzei.Grra n ro.D 64 M e tha no sarc ina a iv M e tha no arc a.m cetiivori.Fusa.DSM.3 M e tha nos arcin a.a rke o1 6 M e a os rcin a.ba zei.G dis.C 5 M eth an osa rcin .ma palu is.C M eth an osa ina ariip alud C 7 -2 m arip dis.C M eth an sarc us.m .H M eth ano cocc us.m Delta .2133.Cdh S2 uss..D ipaluludis.S M M eth ano occ sm hiiccu ph maarip a SB op r .DS M eth anoc occu .m ottrrro burg utto lii. rm m nrip ie M eth noc ccus..m n s.Marr 8 ermoa a tte he M etha noco us.va r.t.th 08 r ensiS rg .2 bu r a er M etha nococcmobac ar SM S.D Cdh-1 er.m .V24 4S M etha ther oba cttte us niu .M7. M ethano therm sac .fervid aniuss.M lca 1 --1 M ethano thermuscc ulc 86 .vu uss.v s.AG8 occu M ethano ldo .2661.Cdh co s.ferven M ethanoca oc chii.DSM occuus 1 h h-1 Cdh.jannas 2.Cd M ethanocaldoc -22 6-2 06 occccus.ja 40 S4 occo FS ..F M thanocaldo sp s .sp .IH cus wensis.IH1 Methanocaldococ -2 cus.okinanka dhdh-2 i-3.C Methanotherrmo moccocolicu 1 h-1 s.Na CdhC.Cd 04C. P104 Me hanococc s.a MP1 tor.M ccus.ae iator. axvia daxv .aud is.au rudis Met idatus.De lforud sulfo Desu E s.ME Cand nocald rrnus.M f rnu coccus.infe ococc o doc d -1 Metha coccus.a dh-1 kai-3.Cdh ankai-3.C licus.Nan eolicus.N Methano cccus.aeo 133.Cdh-1 .DSM.2133.Cdh-1 r rg r is.Marburg.DSM.2 sis.Marbu rgen r r r.marbu e at acter.marburgens ermobact Methanothermob Moorella.thermoacetica.ATCC.39073.CooS-1 ermoacetica ATCC 390 e ermoacetica.ATCC.39073.CooS 3 39073 CooS-1 1 Geobacter.metalli .m met etalliredu rreducens.GS-15 c s.GS-15 cen s.GS 15 Geobacter.bemidjie bemidjien b midjiensis i nsis.Bem sis.Bem.D sis.Bem .Bem.DSM.16 SM.1662 622.CooS Geobacter.sp 2.CooS-2 --2 sp..FR p..FRCFRC C-32.C C 32.Co ooSo oS-2 Campylob 2 acter ac ccte .concisus.1382 Cam bac 3 6 acttter.c er.cur Clo pylo urvu rvus.5 s.52 ium p 25 5.92 .92 pap Clostrid pyr yro oso os solv so idium..pa lven ven enss.D Clo stridiu .DS celluloly SM M.2 .27 782 ytticu 82..Co icum. m.the .CooS-1 Cl str m.H1 hermoc h H10 tridium 0.C .Co oce Cl ostrid oo ellum.LQ oS S-1 .th the -1 errm L m ium Cloos 8.DSM.13 oc ell .th he he errm 64 stridium.c mocell um.DSM.23 13.CooS52 el um Vei88 lu 1 lovora .ATCC. 60 V llone 35.p0 ns.743 27405 Veeillilloonella ar B.ATC la T V n lla.s vu C.352 C 6_1.A V eillo ella.dispp...6 96.Coo _27 C.17745 Veeilillonneella.sp ar.A S4 lla T lo V .. V eillo ne .pa oral.taCxC..1774 S eill ne lla.s rvula D 8 S on.158.F S ele one lla.a p..3 .D 200 S ele nomlla.a typic _1_44M.2 8 0412 ACS M ele no on typ a.A Bryitsuonommonaas.spica.A utigACS-0-134-V E R u antekllellaonass.s.artem e Ruumbinacte a.f .mu p.. idnisa.AT49-V-S-Col7a .F CC ch PRsumminoocoriumormaltacidoraal.ta axo0399 ..35186 e in c cc .ce tex a.D EO 5 llulo ige DSM n.1 ubribaucdorocococcuus.fla C n 3 h s DChloloroacteteriuamibcus.a.alb vefasolves.I-5 .205 7.F0 DDe esesulfrobbacriumm.sactelbusus.7 ciensns.62.DS44 430 M.1 .F su ulfoomiumulu .sa inu r.ala.8 FD 446 lfo vib ic .p m.p bu s.F cto -1 ha riorob ha a rre 02 ly 9.C lob .m iu eo rvu um 68 ticu ooS s.A DS iuum agm.b bac m.N .D -2 A .re ne acu tero CIB M.3 TCC tba ticu lat ide .8 98 .23 26 en s.Rum s.D 327 6 3 se S .X. SM .H -1 DS .2 R1 M 66 .40 00 .D 2 SM 8 .56 92

-1 74 2 oS 7 S- 1 o .27 oo oS- -1.C CC 1.C o O3 AT gh S- 1.C S ns. ou A r . LMMS ns rica bo ..M L uta ulfu den .spsp..Mism.des .Hil P4 -1 1 oS ium . iod s ris .D 63 n m a --11 ter riu .th ica lg aris ki.i.FF .Co .10 ooS ac te pira ur .vu lg za T2 SM 2.C ob bac os sulf aris .vuMiyya .AH 2.D 064 oteteo ron .de ulg arisiiss.M usG20 spo- 638 M.1 ooS-1 .pr ro at rio .v ulg ar hil .G .A .2 S lta .p on vib rioio.v ulg alip nsis sis SM O.D OB.C1 v n lk D e e .D 2 a k dedelteasulfulfolfovib MPooS--1S--11 9958 ibr rio..vio io.aalas poe ens DII1 ns.M .1 .C o DDes suulfoovvibivibriio 71 e s lflf r r .a.aeslexigs.AErooxiidahS2 -1.Co.DSM .115 oS-1 D p a u DDeesusulfffuulflfovvibibriorio.sla Na tei.JF 1-9ceii.6A8 7.DSM71.Co cidfuuma .N oS-1 s.p teerr. rium ga is.Eboon R.484 SM.7 3 Deessuulfolfovib D e su lobuobacbacttee .hunalustrula.b 5 2.Co D e og ph eo illum la.p reg us.SEB.5575.D .126 .338 D err ttro ro tte o ri ns SM SM F yn a.ppronosppirhaeerutth han olea xida M.D RM2.D S t 1 H M ettrro ceto m.J .H --1 u .p 1 S -1 delltethaanoasttu o S usla..M o o nusullum.a.pacifiocphicum 1 M th S--1 2A.C 7.Co c m tr t opm Meandid iu ns.C M.364 04.Coo o ara han tto C etth achium.auto tivora 1.DS SM.8 M esulfo xydaibcteriiin -1 a.aceazei.Gousaro.D D arbolfo CooS C esu onbosarc T..C ooS-1 a.mba eri.F la.P 3 D etha nosarcinina hi P 1699.C EJJ3 a.berrk mop M etha nosarc a errans. .DSM.1 olle th m atto mat M ha S--2 etta..th e um.BSA oo sa ph g ga s. no Met M4.Cm olithotro occcu et etha occo Mh s.sp..A mo .ther oo errm occucte he S-1 Th T rium-32 oS mocro ba ..F Thersu C 3 .C A RC CA lfu ter .PC nss.P sp..FR en Deeo uccce cte .sp edu e acccte f4 ba furrred ob lfllfur Rf4 Rf 2 ns. .su Ge S-2 e er oSter t uce ct Coo .Co ac a ba b 80. ob 380 eo .23 M.2 G Ge SM .CooS-1 .DS cter.uraniiredlicuss.D b bino rb Geoba 288.DSM.5501 car . .Z-7 ter.car ccter Peloba m arabaticum43B.ATCC.35296.CooS-2 b m.a l bi lobiu Acetoha m..cellulovorans.7 Clostridium 1 2821 64885282 8 F8 offilum.OPF o pttto p ryyp aeum.cry rchae rarc Kora atus Ko Candidat S-1 oS o oo Co AM AM4.Co

Clade E

Clostridium.botulinum.A A2 2.K Ky yyo ottto o o-F.CooS-1 Clostridiu .botulinum .NCT T TC C.2916.Co oo oS-1 Clostridium m.botulinum.FC .L La angeland.CooS-1 Clostridiu m.botulinu inum m.F. Clostridiu .F.23 2306 0613 13.C m.botulinum .Coo ooSS-2 Clostrid .BoNT/B1.Okr ium.klu Clostridiu yve a.CooS-1 eri.N ri.N NB BR RC m. C.1 Clostr .12 klu 201 2 6 eri. ri.D DS SM 64 idium. yve M..55 555 Cl 801546 ljungdahlii.P lii PE ET TC C.D DSM.13 C ostridium3 rb 528.Coo C lostridium.ca idivora .botulox S-1 C lostridium ns.P7.DS inum .nov Cloloststridiu .type.C yi.NT M.1 ridiumm.bot Clo .-. E ul st klund5243.CooS-2 C rid .a to inum .D Clolosstridiuium.bce bu .1 C tr m.d artletti tyylicum 873 ifficile i.D .ATC Clolostrididium.d D S C s iu .C P M.167 C.824 C los tridiu m.d ifficile IP C los trid m iffic .BI9 .107 95.C .CooS-1 C los trid ium .diffi ile.R 9.Co 932.C ooS C lo trid ium .diffi cile 202 oS-1 ooS-1-1 C lo strid ium .dif cile.Q.6C30.--..e91.Co C lo strid ium .d ficile.Q D 6 epide oS-1 Clolosstrtridiuium.d.diffiifficile.Q C 3q m D 9-6 c C Clolosstrtrididiumm.dififficilile.QCCD-3-9 7b342.Coic.type CCaaldstrididiuium.d.difficficilee.QC D-762g584.NA oS--1 .X.Coo S--1 CCaaldldicicelluiumm.dififficil ile.Q.CD1D-66cw55.N.CooP1b/00 1 T e e 9 .d fi C ic 2 D he ldic e llu los ifficcile .QC D- 6.C 6.C AP1S--11 6.Co oS--1 ile .N D 37 ooS ooS .C C MPeloesulfrmoellullulolosiruirupto 1 -2 -1 ooS-1 eth b o se los sir p r.l.NAAP0-23mx79.N anactetom dim irupupt tor.k actoP087.Co 63.CNAP op r.c ac inib to or.h ris ac .C oS Coo 1a/0 yru ar ulu a r.sa yd tjan eti ooS -1 S-1 01 .Co s.kbinom.acter cc rot ssocus.6 -1 oS an lic ce .oc har her nii.1 A -1 D dle us tox ean oly ma 7 .D ri.A.DS ida i.J ticu lis. 7R1SM.9 W 1 s n M B 5 V1 .2 s. /IW .D 08 .D 4 9.C 38 557 -1 SM SM 5 .12 dh 0.C 5.D 228 .89 13 -1 oo SM P.D 03 7 S- .7 DS 1 71 M .C .16 oo 64 S- 6.C 2 oo S1

Clo str C idiu lo Clo m strid .ca iu str idiu rb m ox .b Cloom.ce idiv eije sttrr llulo or rin idiu v an ck Clo Clo m or s.P ii.N a str sttrr .ssp ns 7.D CI idiu id or .7 Clo SMMB m.bium oge 43B Clo s b .b n tr .15.80 strid Clo idd C o bo e .AT 25 ium strid C ium lostrttuulinuttuulins.A AT CC 6 432.C Clo u .b iu loo .b id sttrriidd C otuli m.b strid otuli iumm.F.Lm.BCC.11.352 64 456.Cooo ium lostr num otuli ium num .bo Lan a4 55 96 88 11 oSS-2 .C Co54804 -4 Clo .bottuu idium .BoNnum .bo .A2 ttuulingela.665779.C Carb C tu s li .B .K u .b tr n .C n T C oooS 292 oxy The idium um otu /A1 oN linuKyo m.Fd.C doth rm .s .Bo linu .A T/B m to ..22.CooooSS-2-3 Meth erm osin acch NT//A m.NTCCB1.O.typ-F.C 3 ano us.h us.cc aro 3.L CT .19 krae.ACoo061S-22 halo lytic oc C.2 39 .C .-.HS- 3 ydro arb bium r b o h 7 u gen xyd m.W.M Ma 916 .CoooS allll2 Meth .eves oformiv ivo M ree .Co oS --22 anos tig g ans 1.D .Co oS --22 .C ansrra arcinatum.Z Meth ..N Meth Meth N S -2 o .Z 2 a o M a -7 anosa ano .maz 303 noco -290 r1 1 C .25S-2 o 442 rcina sarcin ei.Go1.DSM ccus 1.C.C .bark a.ace .DS .37 .voltaoooSS--2 Metha --V2 e M 2 tivora A3V nosari.Fusaro a .36 1.C e.A eta. .DSnMs.C24A7.CooooSS--1 1 .Coo --2 ermDop .804.C 2 Natrana DesulPersephthon S hi -2 fotom 2 erob ella.m la.PT.CooS--2 aculum 2 arina. dh--1 Clostridius.thermop .redu EX-H 1 ium.teta hiluss.J cens Dethiob 1 JW W/NM ni.Mas.JW .M MIacter.alka II-1 Alkalip 1 ph p ch et-WN-L hilu i s.m liphilussa me eta ts.E8L8F t llire .AHTus ire ed Clo tridi dig ig 1.C 1 Co ge enss.Q en diu oo um .Q oS m.c S.ce YM Clostridiumsstri ellu 2 llulo ulo F.C F .Co Co lyytticu oo icu .papyrosolveloly oS um S-3 S-3 m.H m .H1 H1 10. 0.Co 0.C Co ooS A ottoba Azo oS--2 SM.278 obacccter. ob o 2 2782 ter vinelandns.D 2.C Coo ooS iiiii.DJ S-2 DJ.ATC C.BAA-13 Thermincola 03 .potens.JR.CooS Carboxydothermus mus.hydrogenoforma -4 01.CooS-IV Desul Desulfitobacterium.hafnins.Z-29 ense.Y51.CooS-3 Desulfito Desulfitobacterium.hafniense.DCB-2.CooS-4 Desulfurispirillum.indicum.S5 74.CooS-2 fotom culum.nigrificans.DSM.5 Desulfo Desulfotoma S-II ns.Z-2901.Coo C1 mus.hydrogenoformaacillus .sp..Y4.1MS93 Carboxydothermu Geob us.C56-Y 4T oglucosidasi sis.MB170 congenCC bacillus.therm eng Geoba .11 18 ter.t bac .AT um.S1 lustris.BisB rmoanaero Therm -3 m.rubron oS rillu .pa spi Co as Rhodo opseudom .potens.JR. S-1I 1.Coo Soo Rhod Thermincolaorans.Nor01 .C S div oo -3 Z-29 arboxy rmans..D B-2.C ooSS-4 -1 osinus.c nofoni se seC.Y51.C oo en .C Thermus.hydroge S-1 af ium.hm.hafnien .16622F.Coo S-1 erm riu xydoth ulfitobacter .DSM QYM51.CooooS-1 -2 Carbo Desu obacte .Bem ens.nse.YB -2.C oooSS-3 esulfit idjienseistalliredig De fniese.DC9073.C o oS-4 a em .h .C .b m cter hilus. riumafnien CC.3M.7711.CoooS-1-1 T S .77 .C oS -2 .h Geoba Alkaliplfitobacte riumcetica.A 5.D Mns.JR1.CoooS -2 u a ns.55775.DS Deslfitobacte T o oS -2 5 .pote H501.C erm a u .CoooS sis Desorella.thcetoxididans.5incolahilus.A M.5 nsis .CartienoS-3-2 Mo ulum.a cetoxThermalkalip8.DSgartieO3-1 o S 2 r. -728 tutt .AS.stuttg82.C.CoooSS- -25 ac lum.a te m c s 3 u to ac ba .Z ia.s n ia .3 01 .Cooo 7 1 ulfo thio um en uta en M K- 2 .C .20 S- 1 Des ulfotom De batic uen dism.Kuen2.DSns.AaphSxd3SM oooSS- -22 ra s.K hio s M ra .N s.H .D 3.C o o S- 2 Des .a 4 d .C o m tu .t tu R o biu ida ira ida .H niv ium ran t1 x 1 .C oo S -5 alo nd osp nd um e ter vo .2s s.H -0 2 .C oo oS ttoh Caatron Ca phic m.alkbac leoarsiiorans.AK.338OB2.C.Co o o .o P Ace on totr cillu ote us .ba ov ran SM AHTS-1 sulf .au atiba a.pr cocculus s.olenivo 2.Dans.M s. LM De m lt f o c lf u u u M r e ri su de sulf lfa cc lkk R xid hil ..M o o acte De De esu lfoc m.am.Hmar allip .sp ob D su cillu icu .fu .alk m sulf De titibba oph cter rioo teriu De a tr a ib c ulff autoo hob uriv oba s De m.atrop sulf rote riu n e p cte Sy Deltta. ba d lfo su De

Th erm Ca os nd ed d ida im tus de eltta. inib .D ltta. pro ac es pr te ter ulf ote ob Clo .oc or ob a ea str ud a ct Clo idiu n is.a cte eriu str m.l i.JW idiu ud riu m /I Clo jun a m . m strid .ca gd EuW-1 A xvia .spsp.. M a r b b iu h a 22 mm to ..M L Clo ox m idiv lii.PEcter8P.D on r.M LMMS strid C .car ium S ife P1 S -1 ium los boxid oran TC.D Clo M x 0 - . .lim tr s .dif idiu ivo strid m .1 .d 4 1. C ficil m ran .P7.D SM.1os 664egeC.CCooooS ium e.6 .ba s.P .d S 3 um 6.Cns oo S -3 Clo Clo C ifficile Clo30.-.ertlettii 7.D M.15 6528 .KISCooii.KCS-1-4 strid strid loos .QC str p .D SM 24 480.Co T6 S-24 id S ium ium tridiu D-3 iddiu 5 6483.Co115oS-212 .diffi .dif m.d 7x i m emiccM.1.1 0 .ddif .ttyy 679243.C fi fificcil pe 5.C o129oS-434 Clocile.QCcile.Qifficile79.NA strid D-9 CD .Q e B .X X.C oooS 50 -7 QC P1 ..B iu I9.C Clos Clos m.diffi7b34.N6w5D-663a//0001I9 C oo S- -1 N 5 q .C tridiu tridiu cile C oo S- 2 QCAP1b.NAP412.CoooSS-22 m m.de.Q D-6 /006 .Co oS --22 ifficile ClosClostrid .difficile tridiu ium .C .C 6c26 .Co oS -2 --22 P.1 .Co oS-2 1 D19 .C Clostm.diffic.difficileIP -22 oS--2 ridium ile.Q .R R 0793 6.C 2 C ooS--1 .diffic CD-32202912.C ile.N g58 CooooSS-2 Alkalip Clostrid -1 AP0 .Coo -2 hilus.m ium.d -22 8 S .C iff -2 et ic 2 allired ile.NA 6428 ooS-2 Blautia.ha Clostrid ig ss. P07.C4093-2 nsenii.V 0 m.sen YMF. ooS-2 PI 7-iu24 ticklaQnd CooS -2 Rumino.C ii.DSM SM M.2058 coccus.D .51.obeu 3.C 3 Co Ruminoco um oo m. m ATCC oS Sccus.o .29 2917 be um Ru .A2-16 minococc Ruminoccoc Bryantella.form sp.SR occcus SR1/ us ccus.sp f atex atexig igen gens ens. s.I-5 I-52. 2.DS DSM Blautia.hyd M y rogenotroph p ica.DS SM.10507.CooSBlautia.hyyd drro ogeno ottrro ophicca a.DS Th T herrm mo oc cco occcc o cu T Th herrm mo oc cco oc o ccc cus.b ba arro ophilu

Clade F

ch.Maree.CooS-1 oc o Lo ottulinum.BoNT/A3.L Clossttridium.bo ooS-1 otulinum.Ba4.657.C Clossttridium.bot num.Bf Clostridium.botuli/A1.H all inum.BoNT inu -1 Clostridium.botulTC C .19397.CooS 2 TCC oSll.Co m.BoNT/A1.A tyype.A-Ha oS-1 m.typ Clostridium.botulinu 579.Co o ulinum .bot .15 S-10 CC C T AT A 2.Coo50 Clostridiumpor es..A nes gen oge IIMB.805 .sporo m.s ium 1 648851 ridiu sstrid i kii.NC Clost ooSm.beijerinc AA 296.C _47F Clostridiu CC.35 ATC 5981 ..1_7 s.743B. les.spe. DSMC.1.27756A lovoran Clostridia agiforms.ATC 7FA 5 _5 7 ium.cellu 2 _1 sparag 3 orque Clostrid ss..ttto ium.aoc .sp..8 M.1 S-3 oc cu rrium 31.DS7.Coo S-1 Clostrid Rumin .bacte TO-9 050 .Coo 78173 ae onis.T SM.1 0583SM.1 irace AA-6 nosp .hiran ica.D SMe.2xile.D oS-2 Lach stridium notroph7-24.Dm .n CC.B.Co oS-1 Clo ydroge VPI.C stridiu ae.AT.335533.CoooSS-1-1 ia.h enii. Clo m.b boltei.DSMM.33 19.C.CoooS-1-1 tia Blau tia.hans diu .hallillii.DSleri.AV6612.CoooSS-2 ttrriid 8 o .C 9 Blau o Closcteriuumm.ha.kandDSM4.206-2 .M7 .C.156oS-1-2 a s o oS -2 Eubbacteriopyruaschii.p..FSanius1699SM.C Eu ethans.j. ann cus.ss.vulcSM.1-1.D3042.CooooSS-1-3 M occu ococ ccu A.D .HBM.4064 R.C.CoooSS-31 ld co S ns S .1 s.J 4 c o 70 aldo oca ldo .B ica 6.D M n .57 6.C o .8287 -2 noc han oca um nif -1 .DS ote M 64 8.CM .1 S -2 tthha Metthethan trophmmos.VC2DO ola.p.DS .16352.DSSMCoooSS-2 Me M thho .a u 1 c ns SM .1 lyR .D -2. .Co oo id in II li io a mo vibr s.ffulg ED erm rific P.D SMlG D2CB 51 4.C rm o .tthhe rm bu us.A Th .nig1228TC.Dus.Fus.Ce.D e.Y.82 m - E llic um Theeoglo lacidd s s C teri ulu W P o lyyticniennienTC ha s.p l f af .A ac W/I ahlii..glyculo bac Arcglobu uro ttoom i.J d s ell ha .h m o lffo an ng ulu .c m. m u sulf err su ce .lju ottu rio riu riu ylic De F De ter.o ium hob ivib cte cte but ac trid op cet ba ba to inib los ntr A ulfito lfitto .ace im C Sy s su m ed De De ridiu os t erm os Th Clo

Clade D

Clade A

Clade B

0.3

FIGURE A1 | Circular Bayesian Inference tree with sequence names shown (Same tree as Figure 1B). Posterior probabilities for deeply branching nodes are displayed.

www.frontiersin.org

April 2012 | Volume 3 | Article 132 | 13

Techtmann et al.

Anaerobic carbon monoxide dehydrogenase HGT

FIGURE A2 | Continued

Frontiers in Microbiology | Evolutionary and Genomic Microbiology

April 2012 | Volume 3 | Article 132 | 14

Techtmann et al.

Anaerobic carbon monoxide dehydrogenase HGT

FIGURE A2 | Continued

www.frontiersin.org

April 2012 | Volume 3 | Article 132 | 15

Techtmann et al.

Anaerobic carbon monoxide dehydrogenase HGT

FIGURE A2 | Subtrees of maximum likelihood tree of [Ni,Fe]-CODHs (Figure 2) divided based on clades.

Frontiers in Microbiology | Evolutionary and Genomic Microbiology

April 2012 | Volume 3 | Article 132 | 16