Ribonucleotide Reductases - Springer Link

14 downloads 0 Views 595KB Size Report
1 Institut de Biotecnologia i de Biomedicina and Departament de Genética i de Microbiologia, Bacterial Molecular Genetics group, Universitat. Auto`noma de ...
J Mol Evol (2002) 55:138–152 DOI: 10.1007/s00239-002-2311-7

© Springer-Verlag New York Inc. 2002

Ribonucleotide Reductases: Divergent Evolution of an Ancient Enzyme Eduard Torrents,1 Patrick Aloy,2 Isidre Gibert,1 Francisco Rodrı´guez-Trelles3 1 Institut de Biotecnologia i de Biomedicina and Departament de Gene´tica i de Microbiologia, Bacterial Molecular Genetics group, Universitat Auto`noma de Barcelona, 08193-Bellaterra, Barcelona, Spain 2 EMBL, Meyerhofstrasse 1, D-69117 Heidelberg, Germany 3 Instituto de Investigaciones Agrobiolo´gicas de Galicia, Consejo Superior de Investigaciones Cientı´ficas, Avenida de Vigo s/n, Apartado 122, 15780-Santiago de Compostela, Spain

Received: 5 October 2001 / Accepted: 25 January 2002

Abstract. Ribonucleotide reductases (RNRs) are uniquely responsible for converting nucleotides to deoxynucleotides in all dividing cells. The three known classes of RNRs operate through a free radical mechanism but differ in the way in which the protein radical is generated. Class I enzymes depend on oxygen for radical generation, class II uses adenosylcobalamin, and the anaerobic class III requires S-adenosylmethionine and an iron–sulfur cluster. Despite their metabolic prominence, the evolutionary origin and relationships between these enzymes remain elusive. This gap in RNR knowledge can, to a major extent, be attributed to the fact that different RNR classes exhibit greatly diverged polypeptide chains, rendering homology assessments inconclusive. Evolutionary studies of RNRs conducted until now have focused on comparison of the amino acid sequence of the proteins, without considering how they fold into space. The present study is an attempt to understand the evolutionary history of RNRs taking into account their threedimensional structure. We first infer the structural alignment by superposing the equivalent stretches of the three-dimensional structures of representatives of each family. We then use the structural alignment to guide the alignment of all publicly available RNR sequences. Our results support the hypothesis that the three RNR classes diverged from a common ancestor currently represented by the anaerobic class III. Also, lateral transfer appears to

Correspondence to: F. Rodrı´guez-Trelles; email: [email protected]

have played a significant role in the evolution of this protein family. Key words: nrd — Ribonucleotide reductase (RNR) — RNR evolution — Pyruvate formate lyase — Structural alignment — Lateral transfer — Phylogeny

Introduction Ribonucleotide reductases (RNRs) are a family of structurally complex enzymes that play an essential role in all living organisms: they catalyze the conversion of the four common nucleotides to deoxynucleotides essential for DNA replication and repair. The three known classes of RNRs use free radical chemistry for catalysis but rely on different metallocofactors for the initiation of the radical reduction process, each exhibiting a different behavior toward oxygen (Sjo¨berg 1997). Class I RNRs require oxygen to produce a tyrosyl radical by a diferric iron center, thereby they can function only under aerobic conditions. They consist of two homodimeric proteins, in Escherichia coli called NrdA (␣2) and NrdB (␤2), arranged as a heterotetramer (␣2␤2). The tyrosyl radical is located in the ␤2 polypeptide. Based on sequence identity and allosteric properties, class I RNRs are subdivided into classes Ia and Ib, encoded, respectively, by the nrdABs and the nrdEFs genes. Class II RNRs use adenosylcobalamin (AdoCbl) in a radical generation process not affected by oxygen, thereby they can work in aerobic or anaerobic environments. Class II RNRs are

139

mostly ␣2 homodimers encoded by the nrdJ genes. Class III RNRs use S-adenosylmethionine (SAM) and a small activating protein, NrdG, to generate a stable glycyl radical. Structurally they are (␣2–␤2), with subunits encoded by the nrdDG genes. Recent studies have established a connection between class III RNR and the pyruvate formate-lyase (PFL) system: as in the PFL system, the NrdG protein of class III RNR acts as an activase in the generation of the glycyl radical of NrdD (Tamarit et al. 1999; Torrents 2001). The NrdG activase harbors one (4Fe–4S) cluster per polypeptide chain. The glycyl radical, located at the C-terminal end of NrdD, is sensitive to oxygen, hence class III RNRs can function only under strictly anaerobic conditions. RNRs share a complex pattern of allosteric regulation (Jordan and Reichard 1998). The binding site for substrates and allosteric effectors is located on the large ␣-polypeptide. The substrate specificity of the catalytic site for a given ribonucleotide is determined by the binding of specific deoxyribonucleoside triphosphates (dATP, dTTP, dGTP) or ATP to an allosteric site termed the specificity site. Class Ia and III and some class III RNRs contain an extra allosteric site (referred to as the activity site) which activates or inhibits the overall activity of the enzyme, with ATP and dATP acting as enhancer and inhibitor, respectively (reviewed by Jordan and Reichard 1998; Stubbe and van der Donk 1998). The diversity of RNR metallocofactors may seem to indicate that the three RNR classes arose independently. However, the facts that the ribonucleotide reduction pathway in which they are involved occurs in all modern organisms studied so far and that the different RNR classes exhibit very similar catalytic mechanisms (Reichard 1997; Stubbe et al. 2001) suggest that RNRs originated from a single ancestral form, prior to the divergence of archaebacteria, eubacteria, and eukaryotes. Indeed, as they are essential for the production of the building blocks for DNA synthesis, RNRs are hypothesized to be involved in the transition from the RNA to the “DNA world” (Reichard 1993, 1997; Freeland et al. 1999). If the RNRs had originated from each other by duplication in the last universal common ancestor (LUCA), it would follow that classes II and III RNRs (and also Ib) were lost in eukaryotes, because they are not present in this life domain (but see below). Yet despite the metabolic prominence of RNRs, this and other fundamental questions remain to be answered. Such questions are which modern class of RNR represents the ancestral enzyme (see Stubbe et al. 2001) and the shape of RNR tree topologies in connection with accepted organismal phylogenies. Moreover, RNRs have been largely neglected in efforts to determine the functional content of the LUCA (see Kyrpides et al. 1999). To a great extent, this gap in our understanding of the evolution of RNRs can be attributed to the fact that these

enzymes have very different polypeptide chains, such that homology assessments based on conventional alignment strategies have been inconclusive. Generally, protein structure is better conserved than primary sequence structure (Chothia and Lesk 1986; Murzin et al. 1995; Patthy 1999). Proteins are broadly classified as pertaining to a given structural superfamily if evidence for homology becomes apparent after structural alignment. In turn, structural information becomes invaluable for identifying key active residues (Russell 1998; Aloy et al. 2001a), binding sites, and surfaces (Russell et al. 1998), which can guide primary sequence alignment, ultimately allowing more detailed evolutionary analyses. The present study is an attempt to cast light on the evolutionary history of RNRs from the threedimensional (3D) structure of the protein. We focus on the large ␣-polypeptide, which comprises the active and allosteric regulation sites, common to the three RNRs. First, we identified a consensus RNR structure by superposition of individual RNR structures. Then we used this consensus for alignment of the primary sequences.

Materials and Methods

Sequence Retrieval and Alignment Amino acid sequences of RNRs were retrieved from the GenBank, EMBL, and PIR databases (Release 65, December 2000) with the BLAST (version 2.1) sequence similarity search tool. Blast probing of DNA and protein databases was performed with the Blastp and tBlastn programs (Altschul et al. 1997). Accession numbers for the sequences are listed in Table 1. For alignment of the amino acid primary sequences we adopted the following strategy: first, we aligned the sequences using the ClustalX version 1.81 (Thompson et al. 1997) program with the default gap opening and extension penalties. Then we adjusted the alignments by eye using GeneDoc version 2.6.001 (Nicholas and Nicholas 1997). Visual fine-tuning of the alignment was conducted taking into account (i) structure-based alignments (Logan et al. 1999; Uhlin and Ekund 1994), (ii) identified conserved putative allosteric binding regions (Eriksson et al. 1997), and (iii) the alignment of the RNR family hosted in the Pfam database (Bateman et al. 2000). First, sequences belonging to each RNR class were aligned separately. Then the resulting three multiple alignments were aligned with each other. After removal of all gaps and ambiguities, the length of the alignment was 236 residues. The crystallographic structures of the PFL (2pfl) NrdA subunit of RNR class Ia (1r1r) and the NrdD subunit of anaerobic RNR class III (1b8b) were obtained from the Brookhaven Protein Data Bank (Berman et al. 2000; http://www.rcsb.org/pdb/) and converted to protein sequences using the STAMP package (Russell and Barton 1992; http:// barton.ebi.ac.uk/manuals/stamp.html). The domain definition and evolutionary families adopted for the analysis of the protein sequences were those of the Structural Classification of Proteins (SCOP release 1.53) database (Murzin et al. 1995; http://scop.mrclmb.cam.ac.uk/) and the Protein Families Database of Alignments and HMMs (Pfam release 6) database (Bateman et al. 2000; http://www.sanger.ac.uk/Software/ Pfam/).

140 Table 1.

Occurrence of ribonucleotide reductases in species in this studya

Species Archaebacteria Euryacheota 1. Aeropyrum pernix 2. Archaeglobus fulgidus 3. Halobacterium sp. 4. Metanobacterium thermoautotrophicum 5. Metanococcus jannaschii 6. Pyrococcus abyssi 7. Pyrococcus furiosus 8. Pyrococcus horiskishii 9. Thermoplasma acidophilus Eubacteria Aquificales 10. Aquifex aeolicus Cyanobacteria 11. Synechocystis sp. Cytophaga/Flexibacter/Bacterioides group 12. Porphyromonas gingivalis Green nonsulfur bacteria 13. Chloroflexus Green sulfur bacteria 14. Chlorobium tepidum High-G+C Gram-positive 15. Corynebacterium ammoniagenes 16. Corynebacterium diphteriae 17. Corynebacterium nephridii 18. Mycobacterium avium 19. Mycobacterium bovis 20. Mycobacterium leprae 21. Mycobacterium tuberculosis 22. Streptomyces clavuligerus 23. Streptomyces coelicor Low-G+C Gram-positive 24. Bacillus anthracis 25. Bacillus halodurans 26. Bacillus megaterium 27. Bacillus stearothermophilus 28. Bacillus subtilis 29. Clostridium acetobutylicum 30. Clostridium difficile 31. Enterococcus faecalis 32. Lactobacillus leichmannii 33. Lactococcus lactis 34. Mycoplasma gallisepticum 35. Mycoplasma genitalium 36. Mycoplasma pneumoniae 37. Staphylococcus aureus 38. Streptococcus mutans 39. Streptococcus pneumoniae 40. Streptococcus pyogenes Plantomyces/Chlamydia group 41. Chlamydia pneumoniae 42. Chlamydia trachomatis Proteobacteria (␣ subdivision) 43. Caulobacter crescentus 44. Mesorhizobium loti 45. Rhodobacter capsulatus 46. Rhodobacter sphaeroides Proteobacteria (␤ subdivision) 47. Neisseria gonorrhoeae 48. Neisseria meningitides 49. Ralstonia eutropha

Metabolism

Sequencing status

RNR class

Accession no.

AE AN AE AN AN AN AN AN AE

G G G G G G g G G

II II Ia, II II, III III II, III II, III II, III II

AP000063 AE000782 AE005120, AE005073 AE000666 U67527 NC_000868 U78098, TIGR UG NC_000961 AL445067

AN

G

Ia

AE000657

AE

G

Ia

P74240

g

II, III

TIGR UG

II

Personal communication

AN AE

G

II

TIGR UG

AE AE AE AE AE AE AE AE AE

S g S g g G G S g

Ib Ib (2E) Ib, II Ib, II Ib, II Ib (2F), II II II

Y09572 TIGR UG Personal communication TIGR UG TIGR UG AL583923, AL583924 P50640, A70933 AJ224870 T35125 TIGR UG BAB04220.1, BAB06529.1 Personal communication TIGR UG P50620 NC 003030 TIGR UG TIGR UG L20047 AE006332, U73336 AF152114 P47473 U00089 AP003138, AP003131 TIGR UG TGR UG AE004092

AE AN AN FA FA FA FAN FAN FAN FA FA FA FA

g G g g g S S g G G G G G g

Ib, II Ia, II II Ia, II Ib (2×) Ia, II, III Ia, II, III Ib, III II Ib, III Ib Ib Ib Ib, III Ib, III Ib, III Ib (2×), III

AE AE

G G

Ia Ia

NC_000922 NC_000117

FA FA FA

g G g g

Ia II II, III II

AE005862 NC_002678 R. capsulatus genome project TIGR UG

AE AE AE

g G S

Ia Ia III

B81101 AL162756 AJ012479

AE AE

g G

141 Table 1.

Continued

Species Proteobacteria (␦ and ⑀ subdivisions) 50. Campylobacter jejuni 51. Geobacter sulfurreducens 52. Helicobacter pylori 53. Rickettsia prowasekii Proteobacteria (␥ subdivision) 54. Actinobacillus actinomycetemcom 55. Buchnera sp. 56. Escherichia coli 57. Haemophilus influenzae 58. Pasteurella multicocida 59. Pseudomonas aeruginosa 60. Pseudomonas stutzeri 61. Salmonella typhimurium 62. Shewanella putresgasciens 63. Thiobacillus ferrooxidans 64. Vibrio cholearae 65. Xylella fastidiosa 66. Yersinia pestis Spirochaetes 67. Borrelia burgorferi 68. Treponema pallidum Thermotogales 69. Thermotoga maritima Thermus/Deinococcus group 70. Deinococcus radiodurans Eubacteria viruses 71. Mycobacteriophage L5 72. Phage T4 73. Roseophage S101 Eukaryotes Apicomplexa 74. Plasmodium falciparum 75. Cryptosporidium parvum Euglenozoa 76. Trypanosoma brucei Animals 77. Caenorhabditis elegans 78. Danio rerio 79. Homo sapiens 80. Mus musculus Fungi 81. Schyzosaccharomyces pombe 82. Neurospora crassa 83. Saccharomyces cerevisiae 84. Candida albicans Plants 85. Arabidopsis thaliana 86. Nicotiana tabacum Eukaryote viruses 87. African swine fever virus 88. Bovine herpes virus 89. Epstein Barr virus A 90. Herpes virus 91. Orgya pseudotsugata virus 92. Pseudorabies virus 93. Spodoptera exigua nucleopolyhedrovirus 94. Vaccinia virus 95. Varicella virus 96. Varicella zoster a

Sequencing status

RNR class

Accession no.

G g G G

Ia II Ia Ia

AL139074 TIGR UG P55982 C71655

g G G G G G

FA

g g g G G G

Ia, Ia Ia, Ia, Ia, Ia, Ia, Ia, III Ia, Ia, Ia Ia,

AE AE

G G

Ia Ia

AE00783 AE000520

AN

G

II

Y12877

AE

G

Ib, II

AE001826, D75281

G G G

II Ia, III II

S30995 AF158101 NC002519

AE AE

G g

Ia Ia

AF205580 AF043243

AE

g

Ia

015909

AE AE AE AE

G g G g

Ia Ia Ia Ia

Q03604 U57964 P23921 P07742

AE AE AE AE

g g G g

Ia Ia Ia (2×) Ia

P36602 AF171697 P21524, P21672 AJ390500

AE AE

G g

Ia Ia

T51813 Y10862

AE AE AE AE AE AE AE AE AE AE

S S S S S S S S S S

Ia Ia Ia Ia Ia Ia Ia Ia Ia Ia

NP_042739 P50646 P03190 P08543 U75930 P50643 NC_002169.1 P20503 P32984 P09248

Metabolism AE AE AE FA FA FA FA FA FA FA FA AE FA

III Ib, III III III II, III II, III II, III III III Ib, III

TIGR UG AP00118 X06999, P39452, P28903 P43754, A64047 Nc_002663 AE004545, AE004962, AE004618 Personal communication X7948, X73226, AF242390 TIGR UG TIGR UG AE82223, D82452 C82710 TIGR UG

AE, aerobe; AN, anaerobe; FA, facultative aerobe; FAN, facultative anaerobe; G/g, gene sequence available (complete/partial); S, gene sequence known; TIGR UG, TIGR unfinished genomes.

142

Structural Alignments and P3d-Value Calculation Alignment of the 3D structures was performed with the STAMP package for protein structure alignment and superimposition. All the alignments were double-checked and eventually manually edited to prevent erroneous results. Superposition and calculation of the structurally equivalent regions across the three proteins were also performed by means of the STAMP package. Following structural alignment, one has to measure the probability of sequence identity occurring by chance. For this purpose Murzin (1993) derived a P3d value based on the tendency of buried residues to be hydrophobic and exposed residues to be hydrophilic. It was originally applied to the cystatin–monelin similarity, where an evolutionary relationship was inferred from the P3d value of ∼10−3. More recently, Aloy et al. (2001b) applied it to the search for links between sequence and structure spaces.

Phylogenetic Analyses We adopted a model-based maximum likelihood approach of phylogenetic inference (e.g., Yang et al. 1995; Huelsenbeck and Crandall 1997; Rodrı´guez-Trelles et al. 1999, 2000). We first built a protein distance matrix using a simple Poisson model and used this matrix to recover a tree with the neighbor-joining (NJ) algorithm. The tree topology obtained in this manner was then used as a working hypothesis for model fitting using the likelihood ratio test. Amino acid substitution models used in this study are all special cases of the model of Yang et al. (1998); this model is based on the matrix of Jones et al. (1992) with amino acid frequencies set as free parameters (referred to as JTT-F). Variation of substitution rates across sites is accommodated in the substitution models using the discrete-gamma approximation of Yang (1996a) with shape parameter ␣ (setting eight equally probable categories of rates to approximate the continuous gamma distribution, referred to as dG models). The value of ␣ is inversely related to the extent of rate variation among sites (Yang 1996a). The transition probability matrixes of models and details on parameter estimation are given by Yang (2000). Likelihood ratio tests are applied for contrasting several hypotheses of interest. For a given tree topology (i.e., Fig. 3), a model (H1) containing p free parameters and with log-likelihood Ll fits the data significantly better than a nested submodel (H0) with q ⳱ p − n restrictions and likelihood L0 if the deviance D ⳱ −2log⌳ ⳱ −2(logL1 − logL0) falls in the rejection region of a ␹2 distribution with n degrees of freedom (Yang 1996b). We use several starting values in the iterations to guard against the possible existence of multiple local optima. These analyses are conducted with the program DAMBE (Xia 2000) and the CODEML program from the PAML version 3.0b package (Yang 2000). The model found to describe satisfactorily the amino acid replacement process in the RNR gene region is used as a hypothesis for phylogenetic reconstruction by distance methods. The estimate of ␣ that we use in distance computation is that obtained simultaneously by the joint likelihood comparison of all sequences in the first stage, which can be considered the most reliable (Yang 1996a). Statistical support for nodes of the NJ trees is assessed using 50% majority-rule consensus trees compiled from 1000 bootstrap replications (Felsenstein 1985).

Results Distribution of the nrd Classes Across the Tree of Life Table 1 compiles available information on the distribution of the RNR classes across the life domains archae-

bacteria, eubacteria, and eukaryotes, as defined by Woese et al. (1990). Data on the presence/absence of RNR classes is continuously growing with genome sequencing projects. The three main conclusions are as follows. (i) All three RNR classes are represented in archaebacteria and eubacteria. All archaebacteria have RNR classes II and III, except Archaeoglobus and Thermoplasma, which have only class II, and Methanococcus, which has only class III. In contrast, class I RNR is represented in only a single archaebacterial species, Halobacterium sp., which also has class II (see Table 1). Eubacteria exhibit the greatest diversity of RNR classes within one single genome. In this life domain the three RNR classes occur simultaneously in the genomes of Pseudomonas aeruginosa, Clostridium acetobutylicum, and Clostridium difficile. In the remaining species it is possible to observe every conceivable pairwise combination of RNR classes (I/II, I/III, and II/III). In Grampositive eubacteria, the majority of low-G+C content species has a class Ib–III combination; high-G+C content species, however, have almost exclusively class II, sometimes in conjunction with class Ib. In Gram-negative eubacteria the Proteobacteria group exhibits the most diverse spectrum of RNR class combinations: for example, E. coli has the two types of class I RNR (i.e., Ia and Ib) and class III RNR (see Table 1). (ii) Known nrd sequences from eukaryotes all belong to the oxygendependent class Ia. Class II (i.e., dependent on AdoCbl) activity has, however, been detected in crude extracts of the algae Euglena gracilis and the primitive fungus Pithomyces chartarum (Gleason 1970; Stutzenberger 1974). The corresponding sequences for the nrd genes of these organisms have not yet been obtained, but the reported activities represent the first evidence suggesting that eukaryotes can have RNR classes other than Ia. (iii) Viruses have RNR classes similar to those of their hosts. However, viral sequences show distinctive features in regions of the polypeptide chain involved in the activity site of the enzyme (Berglund 1972; Nikas et al. 1986). The RNR classes present on a given genome reflect the environment inhabited by the organism regarding to the presence/absence of oxygen: aerobes have RNR class I (which requires oxygen for functioning, e.g., the eubacterium Treponema pallidum, and all eukaryotes have RNR class Ia), class II (which is oxygen independent, e.g., the archaebacterium Aeropyrum pernix), or both (e.g., the eubacterium Mycobacterium tuberculosis and the archaebacterium Halobacterium sp.). On the other hand, anaerobes have RNR class III (which is inactivated by oxygen, e.g., the archaebacterium Methanococcus jannaschii), class II (e.g., the archaebacterium Thermotoga maritima), or both (e.g., the eubacterium Porphyromonas gingivalis and the archaebacterium Methanobacterium thermoautotrophicum). However, the specific RNR class combination present in a given aerobe (i.e.,

143 Table 2.

1b8b 2pfl 1rlr

a

RMSDs and protein length 1b8b

2pfl

1rlr

589 res 3.1 Å/451 res 3.6 Å/331 res

759 res 3.7 Å/345 res

748 res

a

The number of residues for each protein is given on the diagonal. Below the diagonal are given the RMSDs and the length of the structurally equivalent regions.

Fig. 1. Molscript diagram (Kraulis 1991) showing the structural superposition of the NrdA (lrlr), NrdD (1b8b), and PFL (2pfl) subunits. The structurally equivalent regions are shown in helix’n’strand representation. The nonequivalent segments are shown in Ca trace. The catalytic cysteines are represented as spacefill residues.

RNR I, RNR II, or both) or anaerobe (i.e., RNR II, RNR III, or both) does not follow a discernible pattern. This absence of a pattern might be best illustrated by the genera Corynebacterium and Bacillus. Both genera comprise a group of closely related species, which, however, differ in their RNR class(es). Additional examples are the ␣-proteobacterium P. aeruginosa, seemingly an aerobe that has RNR III (in addition to classes I and II), and the Gram-positive low-G+C% C. acetobutylicum, which has RNR I (in addition to classes II and III) despite being a strict anaerobe. Primary Sequence Alignment and Comparison of Three-Dimensional Structures Indicate that RNR Classes are Distantly Homologous In principle, homology of structure is better conserved than homology of sequence (Chothia and Lesk 1986; Patthy 1999). Therefore remote homology of the RNR classes might be better reflected by comparison of their 3D structures. The 3D structures have been determined for two of the three RNRs and are available from the Protein Data Bank: the E. coli NrdA subunit of RNR class Ia [denoted 1r1r (Uhlin and Eklund 1994)] and the T4-phage NrdD subunit of RNR class III [denoted 1b8b (Logan et al. 1999)]. In addition, the crystal structure of the E. coli pyruvate formate-lyase (Becker et al. 1999) has been resolved. This glycyl radical enzyme exhibits the same glycyl radical chemistry as the class III RNR, as well as most of their structural features, and is therefore considered here. A class II structure is not publicly available, but Stubbe et al. (2001) have reported that it exhibits the same 3D configuration. Figure 1 represents the structural superposition of the RNR classes I (i.e., 1r1r) and III (1b8b) and PFL (2pfl).

The three model structures have the same fold: a scaffold consisting of a 10-stranded ␣/␤-barrel which accommodates a hairpin loop inside. The degree of structural equivalence (stretches in helix’n’strand representation) is highest in the hydrophobic core of the proteins (corresponding to the active center of the enzymes). Most of the differences are confined to the external loops. The details of the structural superposition are listed in Table 2. Similarity is greater between 1b8b and 2pfl [3.1-Å root mean square deviation (RMSD) in 451 equivalent residues] than between 1r1r (RNR I) and either 1b8b (3.6 Å/331) or 2pfl (3.7 Å/345). According to Murzin’s (1993) criterion the degree of structural similarity between 1b8b and 2pfl is significant (P3d value, 10−3). However, the functionally important residues align nicely, always occupying equivalent regions. This kind of pattern has proven to be sufficient to assign two proteins to the same superfamily in cases like ours, where other functional features are also consistent (i.e., the RMSD value; see Table 2). Similarly to the case of many other remote homologues (see Holm and Sander 1997), a comparison of the aligned primary amino acid sequences of the three RNR classes reflects a low pairwise overall similarity. Average within-class similarity values are 38, 35, and 36, for RNR classes I, II, and III, respectively. Interclass average global similarity is highest between class I and class II (25%) and substantially lower (≈10%) between RNR class I and RNR class III or between class II and class III. Figures 2A–C show that conservation of primary sequence concentrates along short stretches encompassing the allosteric binding specificity (Fig. 2A) and activity (Fig. 2B) sites and the active site (Fig. 2C), which are critical to maintain the fold and function of the enzymes. In all nonviral ribonucleotide reductases investigated so far specificity toward the substrate is achieved basically in the same fashion (Reichard 1997). Therefore, it is not surprising that the binding residues for the allosteric effector in the specificity site (Fig. 2A) are almost identical for RNR classes I and II. In class III this region also exhibits an increased proportion of conserved residues (specially hydrophobic amino acids; Fig. 2A); al-

Fig. 2. Alignment of eight representatives of RNR classes I, II, and III for the regions involved in binding the effectors at the (A) specificity site, (B) activity site, and (C) active site. Amino acid similarity groups are as follows: hydrophobic residues L, I, A, V, M, F, and W; R and K; and D and E. Black and gray shading denotes sites showing 87 and 75% of the residues similar, respectively. Downward arrowheads indicate residues involved in (or close to) effector and substrate binding (allosteric and active site) in the 3D structure of class Ia of E. coli. Upward arrowheads in C indicate residues corresponding to the active

site in the 3D structure of class III of phage T4. The asterisk indicates the cysteine responsible for initiating the reduction in all three RNR classes. ESSCOL, Escherichia coli; CAEELE, Caenorhabditis elegans; LACLAC, Lactococcus lactis; PYRFUR, Pyrococcus lactis; MYCTUB, Mycobacterium tuberculosis; METJAN, Methanococcus jannaschii; EB, eubacteria; EK, eukaryotes; AR, archaebacteria; 1A, class Ia RNR; 1B, class 1b RNR; 2, class II RNR; 3, class III RNR.

144

145 Table 3.

a

Results of the likelihood ratio test carried out on the RNR amino acid sequences used in this study RNR class

H0:H1 Poisson:Poisson + dG ␣ Poisson + dG:JTT-F + dG ␣

df 1 19

I (56,406)

Ia (44,419)

Ib (12,676)

II (22,412)

III (21,473)

All combined (99,237)

1339.9 1.99 5148.9 1.54

1077.0 1.91 4096.1 1.50

474.1 1.26 2210.2 1.01

673.1 1.75 1909.5 1.13

318.9 2.31 1912.7 1.73

1127.9 2.78 4539.7 2.22

a

In each row, the null hypothesis (H0) is compared with another hypothesis (H1). Log-likelihood values were obtained assuming the NJ topology based on the Poisson distance using the complete RNR sequence data set. All values of the likelihood ratio test statistic (×2log␭) are significant (p < 10−6). In parentheses are given the number of sequences and the length of the alignment for each RNR class. Poisson + dG and JTT-F + dG are the Poisson and Yang et al. (1998) models (see Materials and Methods) models assuming discrete gamma (with shape parameter ␣)-distributed rates at sites.

though they do not match those given above, which were inferred from the X-ray structure of class Ia (Eriksson 1997), the fact that they are conserved indicates likely control substrate specificity in NrdD as well. Figure 2B shows the alignment of the N-terminal region for the three RNR classes. There is a conserved motif (V–x–KRDG–x(9)–KI–x(3)–I) which comprises the residues involved in binding the nucleotides at the overall activity site. From the alignment and data on enzymatic properties of the RNR, two situations can be distinguished: (i) RNR classes Ia and III and some class II proteins, which exhibit the conserved motif of the activity site; and (ii) the remaining class II proteins and RNR class Ib, which lack the first 50 residues of the Nterminal end of the protein and, thus, do not have the activity site (Eliasson 1999). In the active site (Fig. 2C) only one residue, the transient radical cysteine (Cys439 in E. coli class Ia, Cys408 in L. leichmannii class II, and Cys290 in T4-phage class III) is conserved throughout the three RNR classes. This cysteine is involved in substrate activation by hydrogen abstraction from the C3⬘ of the substrate. Two additional cysteine residues are also highly conserved, at least across classes I and II: Cys225 in E. coli class I, which corresponds to Cys119 in L. leichamanii class II and Cys79 in T4-phage class III; and Cys462 in E. coli class I, which corresponds to Cys419 in L. leichmanii class II but becomes Asn311 in T4-phage class III. Although these two residues are not identical across the three classes, they occupy the same position in the 3D structures, with both being required for the reduction of the ribose moiety. The area surrounding the transient radical cystein exhibits the highest degree of similarity in the alignment (Fig. 2C). The rest of the alignment (data not shown) reflects the peculiarities of each RNR class. Classes I and II are more similar to each other than to class III in the C-terminal moiety. In this region the first two classes contain two cysteines that function putatively as a hydrogen acceptor for the reductive system (Stubbe and van der Donk 1998). In addition, class II exhibits an extra domain in

the C-terminal part of the protein with a conserved motif [D–x–H–x–G–x14–V–x–x–G–x35–S–x–V–x36–G (Tollinger 1998)], which binds S-adenosylcobalamin required to generate the radical. In class III, the domain containing the glycyl radical is located near the active center in the crystal structure. Specifically, the glycyl radical occupies the same position as the tyrosine side chain (Tyr730 and Tyr731) of E. coli class I, which is involved in the radical transfer pathway.

Phylogenetic Inferences Table 3 lists the log-likelihood ratio statistic values for the models. Log-likelihood values were derived assuming the topology obtained with the NJ algorithm based on the Poisson distances between pairs of amino acid sequences, separately from each RNR class (I, Ia, Ib, II, and III) and from all three RNR classes pooled. dG models (i.e., Poisson + dG and JTT-F + dG) always yield higher likelihood scores than their uniform counterparts (i.e., Poisson and JTT-F; not shown), meaning that variation among sites in the rate of amino acid substitution is a significant feature of the nrd data (see Table 3). Also, using an empirical matrix of substitution rates with the amino acid frequencies set as free parameters with the JTT-F + dG model of Yang et al. (1998) yields greater likelihood scores than assuming an equal rate of change between any two amino acids with the Poisson + dG model (see Table 3). Table 3 also lists the estimates of the parameter ␣ of the discrete-gamma distribution obtained from each model. The Poisson + dG model produces systematically higher values of ␣ than the JTT-F + dG model (i.e., 1.99 vs 1.54, 1.91 vs 1.50, 1.26 vs 1.01, 1.75 vs 1.13, 2.31 vs 1.73, and 2.78 vs 2.22 for nrd I, Ia, Ib, II, and III, respectively; see Table 3), underestimating the extent of the substitution rate variation from site to site. As inferred from the JTT-F + dG model, nrd classes Ib and III show the lowest (1.01) and the highest (1.73) values of ␣, respectively. When the sequences of all three RNR classes are pooled, the value of ␣ increases

146

147

(2.22; Table 3), because of a reduction in the proportion of invariable sites in the alignment. It should be noted, however, that the ␣ values for the RNR classes listed in Table 3 were obtained from different data sets, involving different species and numbers of sequences, so they cannot be compared. Figure 3 shows the unrooted NJ tree based on the Poisson + dG pairwise amino acid distances, using the ungapped alignment of all the RNR sequences in this study. The value of ␣ used for distance calculation is that obtained by joint likelihood comparison of all sequences using the JTT-F + dG model (i.e., ␣ ⳱ 2.22; see Table 3). The following general patterns are observed. (i) The three RNR classes form three separate clusters, highly supported statistically [bootstrap values are 73 for RNR class I and 100 for RNR class III, i.e., above the 70% Hillis and Bull’s (1993) criterion, respectively]. Under the premise that they share a common ancestor, this observation suggests that the three RNR classes evolved independently from each other before the diversification of the tree of life. Within RNR class I, class Ib forms a cluster clearly separated from class Ia (class Ib splits from within class Ia, but this branching is insignificant). Class Ia groups eukaryotes apart from eubacteria and divides the eubacteria into two separate clusters. Overall, species relationships are less defined for class Ia than for class Ib, suggesting that class Ia has diverged more rapidly. (ii) RNR classes I and II are more related to each other than to class III. (iii) Sequences from eukaryotic viruses form a cluster which splits before the diversification of the RNR class Ia of their eukaryote hosts. In addition, RNR class II of Lactobacillus leichmanii forms a strongly supported clade with bacteriophages mycobacteriophage L5 and roseophage SIO1 (bootstrap value 94; Fig. 3). Apparently, this clade split before the diversification of the remaining RNR class II sequences. (iv) Overall the phylogenetic relationships among the sequences reconstructed from each nrd class do not match the accepted phylogenetic relationships among the species. Thus, for example, RNR class Ia places eukaryotes (and eukaryote virus) together but cannot resolve the base of the eukaryote tree and clusters different bacterial lineages intertwined with each other. Particularly noticeable is the positioning of the archaebacterium Halobacterium, which forms a strongly supported cluster with the eubacterial lineages of P. aeruginosa and Chlamydia (bootstrap value 98; Fig. 3). In addition, the phylogenetic trees yielded by the different RNR classes are not con-

< Fig. 3. Neighbor-joining (NJ) tree based on the Poisson + dG (␣⳱2.25; see Table 3) distance for the RNR amino acid data. Gray squares represent archaea; gray circles, eubacteria; gray diamonds, eukaryotes; inverted black triangles, eubacteria viruses; upright black triangles, eukaryota viruses. Prt-␤, proteobacteria beta subdivision; Prt-␥, proteobacteria gamma subdivision; G+, Gram-positive bacteria; EA, euryarcheota; Prt-␣, proteobacteria alpha subdivision; Chl, Chla-

gruent with each other. Topological inconsistency between trees is not caused by differences in the set of outgroups used for the rooting of each RNR class (i.e., RNR classes I, II, and III are rooted using RNR classes II and III, I and III, and I and II, respectively), because the NJ trees obtained for each RNR class separately are basically identical to their counterparts in the global tree shown in Fig. 3 (results not shown). Apparently, the three RNRs evolve in a fashion beyond the constraints imposed by the history of the cellular lineages that contain their coding genes.

Discussion RNRs represent highly plastic proteins at the level of their primary sequence (similarity between classes ranges from 25 to 10%) yet, as we show here, their structures are strikingly conserved (see Fig. 1). In addition, all maintain the same protein radical chemistry. Our study suggests that all modern RNR (classes I, II, and III) share a common ancestor that lived before the diversification of the tree of life. The different RNR arose by duplication, followed by divergence with acquisition of new adaptations (i.e., regarding their function and how they were affected by oxygen). Taking into account what is currently known about the function and distribution of RNRs across the tree of life, together with the results of the present study, next we try to trace back some prominent episodes in the evolutionary history of this highly heterogeneous protein family. Our first question concerns which modern RNR represents the ancestral enzyme. Figure 4 depicts presentday relationships among RNR classes that are expected from nine duplication scenarios (a trivial scenario, that the three classes derived from each other at the same time, i.e., the star tree, is not depicted). To discriminate among these hypothetical scenarios we need to know the direction of character evolution along the tree shown in Fig. 3. Yet this tree is an unrooted tree (i.e., a network). Deciding the polarity of this network is complicated because there are no outgroups (i.e., external sequences that can be used as a reference) available. In addition, different RNR classes are represented by different sets of taxa, which can affect the location of the root by alternative methods such as the midpoint-rooting method. The positioning of the root by this technique would be more reliable if the different RNR classes were represented in

mydia group; Prt-␦, proteobacteria delta and epsilon subdivision; Aqu, aquificales; Spi, spirochaetes; TD, Thermus/Deinococcus group; CFB, cytophaga/flexibacter/bacteroides group; The, thermotogales; GS, green sulfur bacteria. Branch lengths are proportional to the scale, given as substitutions per nucleotide. Percentage bootstrap values (based on 1000 pseudo-replications) are given on the nodes for trees A–C.

148

the same species. Therefore we limit our analysis to C. acetobutylicum and P. aeruginosa, the only species carrying RNR from each of the three classes. Figure 5 shows the NJ topology obtained for the C. acetobutylicum and P. aeruginosa RNR data subset using the poisson + dG model (␣ ⳱ 2.5; obtained with the JTT + dG model assuming the topology in Fig. 3), with the position of the root inferred by the midpoint rooting method. This method assumes that the most divergent taxa in the phylogeny evolve at equal rates. To check this premise we tested the somewhat more restrictive assumption that the rate of evolution is the same for all taxa (i.e., the global clock assumption). The likelihood ratio test was performed using the JTT-F + dG model assuming the same topology as in Fig. 5 (i.e., rooted along branch 2-5). Relaxing the global clock assumption does not lead to a significant improvement of the likelihood score (−2log␭ ⳱ 12.9; 5 df, p > 0.01). Moreover, placing the root along branch 2-5 yields a likelihood score which is significantly higher than the likelihood scores that result from positioning the root along branch 2-3 or 2-4 (−2log␭ ⳱ 123.2; 5 df, p < 10−6, in both cases; by the RELL test of Kishino et al. (1990), which reinforces the hypothesis that branch 2-5 is the true location of the root. Consequently, we can rule out scenarios b, c, e, f, g, and h in Fig. 4 and are left with three alternative hypotheses for the origin of the RNR: scenario a assumes that RNR class I is the most primitive class, from which RNR III originated first, followed by class II; scenario d assumes that RNR class II is the ancestral, from which RNR class III originated first, followed by class I; and scenario g assumes that RNR class III is the ancestral, from which either class I or class II originated. The relative likelihood of these three hypotheses is discussed below in connection with available data on structural and functional properties of the enzymes and what is known about the conditions of the primitive earth. Because nodes 3, 4, and 5 in Fig. 5 are simultaneous in time, under a constant evolutionary rate the branches leading from them to the terminal nodes should be of the same length. Yet this is clearly not the case (e.g., branch 2-3 is nearly three times longer than branch 2-4; see Fig. 5). We have conducted a likelihood ratio test of the null hypothesis that branches 2-3 and 2-4 are equal using the JTT-F + dG model and the topology shown in Fig. 5 without the global clock assumption [test carried out with the HYPHY program (Kosakovsky Pond and Muse 2000)]. The unconstrained model is significantly better than the model constraining the two branches to be equal (−2log␭ ⳱ 3.38; 1 df, p ∼ 0.06), meaning that the evolutionary distance between Clostridium and Pseudomonas is significantly shorter when measured from RNR class I than when estimated from RNR class II or III. This result is consistent with our above inference that the tree shown in Fig. 5 fits the global clock assumption, because increases and decreases in the rate of evolution

of each RNR protein could have canceled each other out in the long term. There are at least two hypotheses to explain why class I appears to be evolving more slowly. (i) A larger proportion of the amino acid sites of this protein might be submitted to purifying selection. Yet there is not an obvious reason why RNR class I should behave differently than RNR classes II and III. (ii) The gene encoding this protein could have arrived at the genome of Clostridium by lateral transfer from Pseudomonas, or the other way around. There is consensus in that, for lateral transfer to occur, first, the gene to be transferred should confer a selective advantage on the recipient species; and/or, second, a strong selective environment favoring the growth and survival of the species containing the transferred gene should exist (see Gupta 1998). Considering the relevance of their metabolic role, together with the different susceptibilities of RNR classes to oxygen, it seems reasonable to assume that at least the second scenario should have been a factor contributing to lateral RNR transfer during early evolution. Besides the unexpected closeness between class I of Clostridium and class I of Pseudomonas discussed above, lateral gene transfer (plus loss of the phylogenetic signal) could account for the tangle-like configuration of the global tree shown in Fig. 3. Specifically, horizontal transmission might be most apparent from several statistically highly supported, yet unexpected relationships in this tree, for example, the position of the archaebacterium Halobacterium sp. close to the eubacteria Pseudomonas and Chlamydia from class I RNR (see Fig. 3). In fact, lateral transfer from eubacteria has been hypothesized to explain the occurrence of ribulose-1,5-biphosphate (rbcl) in the halobacterium Haloferax mediterranei [(Rawal et al. 1988); note that rbcl is a very complex enzyme which is absent in all primitive archaebacteria]. As we have pointed out, so far Halobacterium sp. is the only archaebacterium known to contain RNR class I (note that other aerobic archaebacteria such as Thermoplasma acidophilum and Pyrodictium carry solely class II). If we accept that Halobacterium sp. obtained its class I RNR by lateral transfer from a eubacterium [instead of vertically from the LUCA; note that Halobacterium clusters with Pseudomonas from RNR II as well, although this grouping is not significant (see Fig. 3)], it might be the case that archaebacteria never had class I RNR. Put together, the considerations above lead to a hypothesis for the origin and subsequent evolution of the different RNR classes. Based on the Earth’s geological history, the conditions under which the earliest organisms evolved were anaerobic. Because of its anaerobic function RNR class III most likely is the primitive form from which all others derived. However, class II RNR can function either with or without oxygen, hence class II might alternatively be the ancestral state rather than a feature that evolved later in evolution, e.g., as an adap-

149

Fig. 4. Nine hypotheses for the sequence of duplications (a trivial hypothesis, that the three classes derived from each other simultaneously, i.e., the star phylogeny, is not represented).

tation to oxygenic environments. This scenario is, however, quite unlikely, for the following reasons. (i) AdoCbl, the molecule that activates class II RNR, is conspicuously more complex structurally than Sadenosylmethionine (SAM), the activator (together with an extra activator protein) of RNR class III; in fact, SAM has been dubbed the poor man’s AdoCbl (Frey 1993). Moreover, AdoCbl biosynthesis involves many more steps and enzymes (Roth 1996) than SAM, whose production is comparatively straightforward. (ii) RNR III contains a Fe–S cluster, in protein NrdG, for the generation of its glycyl radical. It is widely held that iron–sulfur clusters are among the most ancient, ubiquitous, and functionally diverse classes of biological prosthetic groups (Beinert 1997). It was even suggested that all chemical conversions of the primordial metabolism might have occurred on the surface of iron–sulfur minerals (Huber and Wa¨ chtersha¨ user 1997). (iii) RNR III uses formate as the external reductant (Torrents 2001). This is a much simpler compound than the thioredoxin

and glutaredoxin system used by class I and II RNR. (iv) As we have shown (see Fig. 1 and Table 2) class III RNR greatly resembles pyruvate formate lyase. PFL catalyzes a key step in anaerobic energy metabolism and should have appeared early during the evolution of life (Reichard 1997). An enzyme similar to present-day class III reductase could then have ursurped the radical mechanism of a primitive form of the PFL to evolve into a ribonucleotide reductase. Indeed, given the apparent structural similarity between RNR III and PFL (see also Logan 1999), the possibility that class III RNR evolved directly from duplication of the pyruvate formate-lyase seems quite likely. (v) Table 1 shows that an anaerobic microorganism can live indistinctly with class II or class III RNR (e.g., Chlorobium tepidum and Methanococcus jannaschii; note that for these two species the whole genome has been sequenced, such that the possibility that they could have additional, unknown nrd genes can be safely ruled out). Keeping this in mind, it is unclear which advantage could provide the origin of an anaero-

150

Fig. 5. The molecular clock maximum likelihood tree of the relationships among the three nrd classes from Clostridium acetobutylicum and Pseudomonas aeruginosa. Internal node numbers are shown in gray boxes.

bic enzyme (i.e., class III) if another enzyme performing equally well already existed under those conditions (i.e., class II). In principle, it makes more sense that class II originated from class III, allowing the exploitation of increasingly aerobic environments. It seems, therefore, likely that RNR class III represents the primitive RNR class. Considering the long branches preceding the diversification of each RNR in Fig. 3, the duplication of the ancestral RNR III should have occurred very early in evolution. Free of functional constraints, the new copy could have evolved rapidly into class II. This process could have occurred by a few changes in the C-terminal end of the protein, involving loss of the glycyl radical, with concomitant acquisition of affinity for AdoCbl (i.e., an oxygen-independent radical generator). Released from the limitations imposed by oxygen, this new RNR class would have allowed colonization of existing oxygenic environments. There is good evidence that in Earth’s early atmosphere traces of oxygen existed as a consequence of the photolysis of water, which could result in localized oxygen oases (Kasting 1993). Both RNR classes coexisted until the LUCA split into archaeabacteria and eubacteria. The second duplication took place later in the eubacterial lineage and resulted in the evolution of class I from class II RNR. This would explain why current archaebacteria seem to be devoid of class I

RNR (except Halobacterium, which appears to have obtained its class I RNR by later transfer; see above). It seems very likely that the emergence of RNR class I met a selectively favorable environment, because it freed the synthesis of deoxyribonucleotides from the requirement for AdoCbl. Note that AdoCbl synthesis is restricted to some eubacteria and archaebacteria; animals and some protists require AdoCbl but cannot produce it, and plants and fungi neither synthesize nor use AdoCbl (Roth et al. 1996). In a later step, fusion of a RNR class I-carrying eubacterium with an archaebacterium (see Martin et al. 2001) would have resulted in a eukaryotic cell containing the three RNR classes. Classes II and III degenerated in most eukaryotes, with the concomitant development of aerobic metabolism. Also, since all eukaryote class I RNRs belong to type Ia, it seems most plausible that class Ib RNR originated from class Ia in eubacteria after the origin of the eukaryotic cell. This sequence is also supported by the observation that class Ib is the only RNR lacking one of the two allosteric sites. Acknowledgments. F.R.-T. has received support from the Ministerio de Educacio´ n y Cultura (Spain; Contrato Ramo´ n y Cajal). E.T. and I.G. were supported by a grant from the Spanish Direccio´ n General de Ensen˜ anza Superior e Investigacio´ n Cientı´fica (PB97-0196). We would like to express our gratitude to Dr. Albert Jordan for revising the manuscript and to Prof. Peter Reichard for helpful, stimulating discussions and revising the manuscript.

151

References Aloy P, Querol E, Aviles FX, Sternberg MJE (2001a) Automated structure-based prediction of functional sites in proteins—Application to assessing the validity of inheriting protein function from homology in genome annotation and to protein docking. J Mol Biol 311:395– 408 Aloy P, Oliva B, Querol E, Aviles FX, Russell RB (2001b) Structural similarity to link sequence space. New potencial superfamilies and implications for structural genomics. Protein Sci 11:1101–1116 Altschul SF, Madden TL, Schaeffer AA, Zhang J, Zhang Z, Miller M, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res 25:3389–3402 Bateman A, Birney E, Durbin R, Eddy SR, Howe KL, Sonnhammer EL (2000) The Pfam protein families database. Nucleic Acids Res 28:263–266 Becker A, Fritz-Wolf K, Kabsch K, Knappe J, Schultz S, Volker Wagner AF (1999) Structure and mechanism of the glycyl radical enzyme pyruvate formate-lyase. Nature Struct Biol 6:969–975 Beinert H, Holm RH, Mu¨ nck E (1997) Iron-sulfur clusters: Nature’s modular, multipurpose structures. Science 277:653–659 Berglund O (1972) Ribonucleoside diphosphate reductase induced by bacteriophage T4: Allosteric regulation of substrate specificity and catalytic activity. J Biol Chem 247:7276–7281 Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The Protein Data Bank. Nucleic Acids Res 28:235–242 Chothia C, Lesk AM (1986) The relation between the divergence of sequence and structure in proteins. EMBO J 5:823–826 Eliasson R, Pontis E, Jordan A, Reichard P (1999) Allosteric control of three B12-dependent (class II) ribonucleotide reductases. Implications for the evolution of ribonucleotide reduction. J Biol Chem 274:7182–7189 Eriksson M, Uhlin U, Ramaswamy S, Ekberg M, Regnstrom K, Sjo¨ berg B-M, Eklund H (1997) Binding of allosteric effectors to ribonucleotide protein R1: Reduction of active-site cysteines promotes substrate binding. Structure 5:1077–1092 Felsenstein J (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783–791 Freeland SJ, Knight RD, Landweber L (1999) Do proteins predate DNA? Science 286:690–692 Frey PA (1993) Lysine 2,3-aminomutase: Is adenosylmethionine a poor man’s adenosylcobalamin? FASEB J 7:662–670 Gleason FK, Hogenkamp HP (1970) Ribonucleotide reductase from Euglena gracilis, a deoxyadenosylcobalamin-dependent enzyme. J Biol Chem 245:4894–4899 Gupta RS (1998) Protein phylogenies and signature sequences: A reappraisal of evolutionary relationships among archaebacteria, aubacteria, and eukaryotes. Microbiol Mol Biol Rev 62:1435–1491 Holm L, Sander C (1997) Decision support system for the evolutionary classification of protein structures. ISMB 5:240–246 Hillis DM, Bull JJ (1993) An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analysis. Syst Biol 42:182–192 Huber C, Wa¨ chtersha¨ user G (1997) Activated acetic acid by carbon fixation on (Fe, Ni)S under primordial conditions. Science 276: 245–247 Huelsenbeck JP, Crandall KA (1997) Phylogeny estimation and hypothesis testing using maximum likelihood. Annu Rev Ecol Syst 28:437–466 Jones DT, Taylor WR, Thornton JM (1992) The rapid generation of mutation data matrices from protein sequences. CABIOS 8:275– 282 Jordan A, Reichard P (1998) Ribonucleotide reductases. Annu Rev Biochem 67:71–98 Kasting JF (1993) Earth’s early atmosphere. Science 259:920–926

Kishino H, Miyata T, Hasegawa M (1990) Maximum likelihood inference of protein phylogeny and the origin of chloroplasts. J Mol Evol 31:151–160 Kosakovsky Pond SL, Muse SV (2000) HYPHY: Hypothesis testing using phylogenies (kernel beta 0.71). University of Arizona (distributed by the authors) Kraulis PJ (1991) MOLSCRIPT: A program to produce both detailed and schematic plots of protein structures. J Appl Cryst 24:946–950 Kyrpides N, Overbeek R, Ouzounis C (1999) Universal protein families and the functional content of the last universal common ancestor. J Mol Evol 49:413–423 Logan DT, Andersson J, Sjo¨ berg B-M, Nordlund P (1999) A glycyl radical site in the crystal structure of a class III ribonucleotide reductase. Science 283:1499–1504 Martin W, Hoffmeister M, Rotte C, Hence K (2001) An overview of endosymbiotic models for the origins of Eukaryotes, their ATPproducing organelles (mitochondria and hydrogenosomes), and their heterotrophic lifestyle. Biol Chem 382:1521–1539 Murzin AG (1993) Sweet-tasting protein monellin is related to the cystatin family of thiol proteinase inhibitors. J Mol Biol 230:689– 694 Murzin AG, Brenner SE, Hubbard T, Chothia C (1995) SCOP: A structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 247:536–540 Nicholas KB Jr, Nicholas HB (1997) GeneDoc: A tool for editing and annotating multiple sequence alignments, version 2.6.2001 (distributed by the authors) Nikas I, McLauchlan J, Davison AJ, Taylor WR, Clements JB (1986) Structural features of ribonucleotide reductase. Proteins 1:376–384 Patthy L (1999) Protein evolution. Blackwell Science, Malden, MA Rawal N, Kelkar SM, Altekar W (1988) Ribulose 1,5-bisphosphate dependent CO2 fixation in the halophilic archaebacterium, Halobacterium mediterranei. Biochem Biophys Res Commun 156:451– 456 Reichard P (1993) From RNA to DNA, why so many ribonucleotide reductases? Science 260:1773–1777 Reichard P (1997) The evolution of ribonucleotide reduction. TIBS 22:81–85 Rodrı´guez-Trelles F, Tarrı´o R, Ayala FJ (1999) Molecular evolution and phylogeny of the Drosophila saltans species group inferred from the Xdh gene. Mol Phylogenet Evol 13:110–121 Rodrı´guez-Trelles F, Alarco´ n L, Fontdevila A (2000) Molecular evolution and phylogeny of the buzzatii complex (Drosophila repleta group): A maximum-likelihood approach. Mol Biol Evol 17:1112– 1122 Roth JR, Lawrence JG, Bobik TA (1996) Cobalamin (coenzyme B12): Synthesis and biological significance. Annu Rev Microbiol 50: 137–181 Russell RB (1998) Detection of protein three-dimensional side-chain patterns: New examples of convergent evolution. J Mol Biol 279: 1211–1227 Russell RB, Barton GJ (1992) Multiple protein sequence alignment from tertiary structure comparison: Assignment of global and residue confidence levels. Proteins 14:309–323 Russell RB, Sasieni PD, Sternberg MJE (1998) Supersites within superfolds. Binding site similarity in the absence of homology. J Mol Biol 282:903–918 Sjo¨ berg B-M (1997) Ribonucleotide reductases—A group of enzymes with different metallosites and a similar reaction mechanism. Struct Bond 88:139–173 Stubbe J, van der Donk WA (1998) Protein radicals in enzyme catalysis. Chem Rev 98:705–762 Stubbe J, Ge J, Yee CS (2001) The evolution of ribonucleotide reduction revisited. TIBS 26:93–99 Stutzenberger F (1974) Ribonucleotide reductase of Pithomyces chartarum: Requirement for B12 coenzyme. J Gen Microbiol 81:501– 503 Tamarit J, Mulliez E, Meier C, Trautwein A, Fontecave M (1999) The

152 anaerobic ribonucleotide reductase from Escherichia coli. The small protein is an activating enzyme containing a [4Fe-4S](2+) center. J Biol Chem 274:31291–31296 Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG (1997) The CLUSTAL-X windows interface: Flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res 25:4876–4882 Tollinger M, Konrat R, Hilbert BH, Marsh ENG, Kra¨ utler B (1998) How a protein prepares for B12-binding subunit of glutamate mutase from Clostridium tetanomorphum. Structure 6:1021–1033 Torrents E, Eliasson R, Wolpher H, Gra¨ slund A, Reichard P (2001) The ribonucleotide reductase from Lactococcus lactis. Interactions between the two proteins NrdD and NrdG. J Biol Chem 276:33488– 33494 Uhlin U, Eklund H (1994) Structure of ribonucleotide reductase protein R1. Nature 370:533–539 Woese CR, Kandler O, Wheelis ML (1990) Towards a natural system

of organisms: Proposal for the domains Archaea, Bacteria and Eukaria. Proc Natl Acad Sci USA 87:4576–4579 Xia X (2000) Data analysis in molecular biology and evolution. Department of Ecology and Biodiversity, University of Hong Kong, Hong Kong Yang Z (1996a) The among-site rate variation and its impact on phylogenetic analyses. TREE 11:367–372 Yang Z (1996b) Maximum likelihood models for combined analyses of multiple sequence data. J Mol Evol 42:587–596 Yang Z, Lauder IJ, Lin HJ (1995) Molecular evolution of the hepatitis B virus genome. J Mol Evol 41:587–596 Yang Z, Nielsen R, Hasegawa M (1998) Models of amino acid substitution and applications to mitochondrial DNA evolution. J Mol Biol Evol 15:1600–1611 Yang Z (2000) Phylogenetic analysis by maximum likelihood (PAML), version 3.0a University College London, London