FEMS Microbiology Letters 243 (2005) 9–14 www.fems-microbiology.org
Eukaryotic signature proteins of Prosthecobacter dejongeii and Gemmata sp. Wa-1 as revealed by in silico analysis James T. Staley *, Heather Bouzek 1, Cheryl Jenkins
Department of Microbiology, University of Washington, P.O. Box 98195, Seattle, WA 98195, USA Received 19 March 2004; received in revised form 5 August 2004; accepted 16 November 2004 First published online 28 November 2004 Edited by R. Aminov
Abstract The genomes of representatives of three bacterial phyla have been compared with the list of 347 eukaryotic signature proteins (ESPs) derived by Hartman and Fedorov [Proc. Natl. Acad. Sci. USA 99 (2002) 1420]. The species included Prosthecobacter dejongeii of the Verrucomicrobia phylum, Gemmata sp. Wa-1 of the Planctomycetes phylum and Caulobacter crescentus of the Proteobacteria. The protist Trypanosoma brucei was used as a eukaryotic control. P. dejongeii had unique ERGO blast matches to a-, b, and c-tubulin, to Set2, a transciptional factor associated with eukaryotic DNA, and to LAMMER protein kinase for a total of 10 high-scoring ESP matches altogether. Gemmata sp. Wa-1 shared four of its 17 high-scoring ESP matches with P. dejongeii, and that information coupled with other genomic data provides strong support that these two phyla are related to one another. If the ESP list is an accurate listing of unique eukaryotic proteins, then the low number of high-scoring matches between the proteins of these two bacteria with the list raises doubts about these phyla being direct ancestors of the Eucarya. However, this does not rule out the possibility that ancestral members of either the Verrucomicrobia or Planctomycetes may have played an important role in the evolution of a proto-eukaryotic organism. 2004 Federation of European Microbiological Societies. Published by Elsevier B.V. All rights reserved. Keywords: Prosthecobacter; Gemmata; Verrucomicrobia; Planctomycetes; Eukaryotic evolution
1. Introduction The origin of the Eucarya remains one of the most enduring biological enigmas. Among contemporary hypotheses, some suggest that the Eucarya originated by an engulfment or symbiotic event between a member of the Archaea and a member of the Bacteria [2–5] or by a fusion between members of these two lineages . Oth* Corresponding author. Tel.: +1 206 543 0461; fax: +1 206 543 8297. E-mail address: [email protected]
(J.T. Staley). 1 Current address: Fred Hutchinson Cancer Research Institute, Seattle, WA, USA. 2 Current address: Department of Agriculture, Sydney, Australia.
ers have conjectured that the eukaryotic nucleus evolved directly from an archaeal lineage [7,8]. Margulis  has proposed that speciﬁc bacterial groups have contributed genes, such as those involved in motility, through symbiotic associations with the evolving eukaryotic organism. Some hypotheses involve an RNA-based proto-eukaryote, termed the Chronocyte, which engulfed an archaeon to give rise to the Eucarya [1,10]. Sogin envisaged an RNA-based proto-eukaryote lineage with a cytoskeleton that engulfed an archaeon that became the nucleus . If, as several of these hypotheses suggest, a particular bacterial or archaeal lineage or lineages were the progenitors of the Eucarya, then members of these lineages would be expected to contain a higher proportion of eukaryote-like genes than other prokaryotic lineages.
0378-1097/$22.00 2004 Federation of European Microbiological Societies. Published by Elsevier B.V. All rights reserved. doi:10.1016/j.femsle.2004.11.034
J.T. Staley et al. / FEMS Microbiology Letters 243 (2005) 9–14
Based on this, genomes of representatives of two bacterial phyla, the Verrucomicrobia and the Planctomycetes have been analyzed. These two phyla are of particular relevance in this regard as they possess a number of phenotypic and molecular features typical of eukaryotes. Thus, four species of the Verrucomicrobia genus Prosthecobacter were shown to contain genes for tubulin, a cytoskeletal element previously found only in eukaryotes . This discovery is of particular signiﬁcance given the widely held idea that the acquisition of a cytoskeleton was the most signiﬁcant step in the evolution of eukaryotic cell complexity [11,13–16]. Members of the division Planctomycetes have also been shown to contain eukaryote-like features including membrane-bounded nucleoids [17,18] cell walls lacking in peptidoglycan  and fatty acids containing palmitic and stearic acid in addition to other features that are more typical of microbial eukaryotes . More recently, sterols have been reported in Gemmata sp. . Heretofore they were reported in only two groups of the Bacteria, the methanotrophs  and the myxobacteria , both representatives of the Proteobacteria, as well as the eukaryotes. In addition, an analysis of the genome of a member of the Planctomycetes, Rhodopirellula baltica, indicates that 8% of its predicted genes share a greater relatedness to the Eucarya than to the Bacteria . Also, R. baltica lacks the gene central to bacterial cell division, ftsZ , which is found with few exceptions (Chlamydia sp. and Ureaplasma urealyticum) in all other known bacterial species as well as members of the Euryarchaeota. Furthermore, the Planctomycetes and the Verrucomicrobia are regarded as possible sister phyla based upon phylogenetic analyses of 16S rDNA sequences [25,26]. Based on the above ﬁndings, we hypothesized that an ancestor of the Planctomycetes or the Verrucomicrobia is a potential candidate as a progenitor of the eukaryotic lineage. To test this we have compared the deduced proteins from the genomes of a representative of each of these phyla, Gemmata sp. Wa-1 and P. dejongeii, respectively, with a list of eukaryotic signature proteins (ESPs). The ESP list was compiled by Hartman and Fedorov  in the following manner. Five diﬀerent eukaryotic genomes were used as a source for the proteins: Drosophila melanogaster, Caenorhabditis elegans, Saccharomyces cerevisiae, Arabidopsis thaliana and Giardia lamblia. The S. cerevisiae genome was initially used to identify potential ESPs. Candidate ESPs that did not have homologs in all of the other eukaryotic genomes except for G. lamblia were removed from the ESP list. Prokaryotic proteins were obtained from 72,990 entries from GenBank release 121 which contained protein sequences from 44 completely sequenced genomes of Bacteria and Archaea. Proteins for which bacterial homologs could be identiﬁed were removed from the ESP list. Finally, of the 914 proteins remaining,
only those that were also found in the genome of the deeply branching eukaryote G. lamblia were used to compile the list of 347.
2. Materials and methods Using the ESP list of 347 proteins that were identiﬁed as unique to members of the Eucarya  we conducted a comparative analysis with the partial genome sequences from P. dejongeii FC1 that has three-fold coverage (95% complete) and Gemmata sp. Wa-1 with eight-fold coverage (>99% complete). The sequences for the ESPs (1) were downloaded from the website (www.mcb.harvard.edu/gilbert/ESP/ set_Sc347). These proteins are described as an ‘‘ESP set of fasta-formatted 347 protein sequences of S. cerevisiae with homology (55 bits BLAST score) to D. melanogaster, C. elegans, A. thaliana, and Giardia lamblia and do not have a homology to any bacterial proteins’’. These sequences were then cut and pasted into the ERGO bioinformatics suite, by Integrated Genomics (www.ergo.integratedgenomics.com/ERGO/). A description for the methods used to generate the ERGO database is available at the Integrated Genomics, Inc. website (www.ergo.integratedgenomics.com/ ERGO_supplement/tutorial.html). The identiﬁcation of open reading frames (ORF) and the functional annotation were performed by software developed for ERGO . In this study, protein–protein comparisons were performed on the sequences provided by Hartman and Fedorov against the bacterial genomes available in the ERGO sequence database. This method is similar to BLAST and the results were evaluated with respect to bit score and signiﬁcance support (E-value). All bacterial sequence matches with E-values greater than 10e 6 (bit score less than 55) were excluded from Tables 2 and 3 as this is the threshold used by Hartman and Fedorov for deriving the ESP list of uniquely eukaryotic proteins. Protein names and categories were retained from the S. cerevisiae nomenclature used . The database of sequences compared was not limited to the sequences stored in the ERGO site, but included sequences in Swiss-Prot  and the Protein Information Resource . As a control organism for the analysis, a genome sequence of representative species of the Proteobacteria, Caulobacter crescentus, was used. Also, a member of the Eucarya, Trypanosoma brucei, was included as a positive control.
3. Results and discussion Table 1 provides a listing of the number of uniquely eukaryotic proteins to which representatives of the
J.T. Staley et al. / FEMS Microbiology Letters 243 (2005) 9–14 Table 1 Number of eukaryotic signature proteins detected in selected microbial genomes Microbial species
E-values less than 10
Trypanosoma brucei Gemmata Wa-1 Prosthecobacter dejongeiia Caulobacter crescentus
163 104 94 107
76 17 10 0
Since the genome sequence is estimated to be about 95% complete, the reported values likely under-represent the true values.
various microbial taxa studied here show some relatedness [E values >10 6 and