Proteus mirabilis Urease - Europe PMC

1 downloads 0 Views 2MB Size Report
BRADLEY D. JONES AND HARRY L. T. MOBLEY* ... ganella morganii (L. Hu, B. Jones, M. Fox, E. Nicholson, ...... ates and John Wiley & Sons, Inc., New York. 4.
Vol. 171, No. 12

JOURNAL OF BACTERIOLOGY, Dec. 1989, p. 6414-6422

0021-9193/89/126414-09$02.00/0 Copyright © 1989, American Society for Microbiology

Proteus mirabilis Urease: Nucleotide Sequence Determination and Comparison with Jack Bean Urease BRADLEY D. JONES AND HARRY L. T. MOBLEY* Division of Infectious Diseases, Department of Medicine, University of Maryland School of Medicine, 10 South Pine Street, Baltimore, Maryland 21201 Received 3 July 1989/Accepted 5 September 1989 Proteus mirabilis, a common cause of urinary tract infection, produces a potent urease that hydrolyzes urea to NH3 and C02, initiating kidney stone formation. Urease genes, which were localized to a 7.6-kilobase-pair region of DNA, were sequenced by using the dideoxy method. Six open reading frames were found within a region of 4,952 base pairs which were predicted to encode polypeptides of 31.0 (ureD), 11.0 (ureA), 12.2 (ureB), 61.0 (ureC), 17.9 (ureE), and 23.0 (ureF) kilodaltons (kDa). Each open reading frame was preceded by a ribosome-binding site, with the exception of ureE. Putative promoterlike sequences were identified upstream of ureD, ureA, and ureF. Possible termination sites were found downstream of ureD, ureC, and ureF. Structural subunits of the enzyme were encoded by ureA, ureB, and ureC and were translated from a single transcript in the order of 11.0, 12.2, and 61.0 kDa. When the deduced amino acid sequences of the P. mirabiis urease subunits were compared with the amino acid sequence of the jack bean urease, significant amino acid similarity was observed (58% exact matches; 73% exact plus conservative replacements). The 11.0-kDa polypeptide aligned with the N-terminal residues of the plant enzyme, the 12.2-kDa polypeptide lined up with internal residues, and the 61.0-kDa polypeptide matched with the C-terminal residues, suggesting an evolutionary relationship of the urease genes of jack bean and P. miabilis.

Ostensibly, bacterial and jack bean ureases appear to be distinct with respect to subunit structure, subunit stoichiometry, and native molecular weight. Several purified bacterial ureases have been shown to have similar heteromeric subunit structures (6, 25, 33, 34; Hu et al., Abstr. Annu. Meet. Am. Soc. Microbiol. 1989). In P. mirabilis and Providencia stuartii, the three subunit polypeptides are transcribed on a single mRNA molecule from the smallest to the largest subunit (15, 25). In contrast to the heteromeric bacterial ureases, jack bean urease is composed of six identical subunits. Despite this difference, we present evidence that a striking similarity exists between the deduced amino acid sequence for the three subunits of the P. mirabilis urease and the known amino acid sequence of the jack bean urease subunit. In this report we present the complete nucleotide sequence of the region of the recombinant plasmid pMID1003 which encodes active urease. The operon encoded open reading frames (ORFs) for six polypeptides with molecular sizes, ordered as they appear in the operon, of 31.0, 11.0, 12.2, 61.0, 17.9, and 23.0 kilodaltons (kDa). The 11.0-, 12.2-, and 61.0-kDa polypeptides represented the subunits of the P. mirabilis urease and exhibited a high degree of homology with the jack bean urease subunit at the amino acid level. Evidence is presented that suggests that the three bacterial urease subunits merged to form the single plant urease subunit.

Urinary tract infection with Proteus mirabilis can lead to serious complications, including cystitis, prostatitis, urolithiasis, pyelonephritis, bacteremia, and death (30, 38). The enzyme urease is recognized as an important virulence factor for this uropathogenic bacterial species and, indeed, as the causative agent of infection-induced kidney and bladder stones, which are estimated to represent 20 to 40% of all urinary stones (14). Alkalinization of the urine by hydrolysis of urea to carbon dioxide and ammonia facilitates precipitation of struvite, MgNH4PO4 6H20 and carbonateapatite, Ca1O(PO4CO3OH)6(OH). Furthermore, in catheterized patients, precipitation of urinary stones results in encrustation and blockage of indwelling urinary catheters. This complication has been uniquely correlated with the presence of P. mirabilis but not other ureolytic organisms (23). Further evidence suggests that the ammonia per se generated by ureolysis may be toxic to the kidney epithelia (5). Recent work has begun to yield an understanding of the biochemistry and genetics of ureases produced by members of the Proteeae tribe (21). Urease gene sequences from Providencia stuartii (22), P. mirabilis (15, 37), and Morganella morganii (L. Hu, B. Jones, M. Fox, E. Nicholson, and H. Mobley, Abstr. Annu. Meet. Am. Soc. Microbiol. 1989, B64, p. 41) have been identified by cloning and expression in Escherichia coli. Genetic analyses of the cloned ureases of Providencia stuartii and P. mirabilis have identified the coding regions for the structural subunits of the enzyme as well as the accessory polypeptides which are required for expression of enzyme activity in vivo (15, 25, 37). Mulrooney and co-workers (25) have purified the cloned urease of Providencia stuartii and determined its biochemical properties. In addition, they have demonstrated that the native enzyme possesses a heteromeric subunit structure of one large and two small polypeptides and contains four nickel ions per active enzyme molecule. *

MATERIALS AND METHODS Bacterial strains and plasmids. E. coli HB101 (F- hsdR hsdM proA2 leuB6 rpsL20 recA13) was used as a host for recombinant plasmids. E. coli DH5aF' [F' hsdR 480dlacZ AM15 (lacZYA-argF)U169 recAl] was used as the host of M13 derivatives (Bethesda Research Laboratories, Inc., Gaithersburg, Md.). Plasmid pMID1003 encodes the urease of P. mirabilis H14320 and has been described previously

Corresponding author.

(15). 6414

P. MIRABILIS UREASE GENE SEQUENCE

VOL. 171, 1989

DNA isolation. Replicative forms of the M13 vectors mpl8 and mpl9 were obtained from J. B. Kaper (University of Maryland School of Medicine). M13 DNA was isolated from cultures of E. coli DHSaF' that was grown for 6 h in 2 x tryptone-yeast medium (3) by alkaline sodium dodecyl sulfate extraction (4) and purified by centrifugation to equilibrium in cesium chloride-ethidium bromide density gradients (18). Molecular cloning and production of sequential deletions. Specific DNA fragments (1.95-kilobase [kb] Hindlll, 1.5-kb HindIll, 0.9-kb PstI-XhoI, and 2.5-kb HindIlI-BamHI fragments) derived from pMID1003 were subcloned into either M13mpl8 or M13mpl9, so that both strands of the entire urease genetic sequences were represented. Deletions were created by cutting the M13 derivatives with appropriate restriction enzymes (see Fig. 1), religating, and transforming the new derivatives into DH5aF'. This method allowed approximately 80% of the 4,952-base-pair (bp) length to be sequenced. In regions where no useful restriction enzyme sites existed to create deletions, oligonucleotides were synthesized for use as primers to determine these sequences (see Fig. 1). Labeling and electrophoresis of templates. Dideoxy sequencing was carried out by using Sequenase as specified by the commercial supplier of the kit (U.S. Biochemicals, Cleveland, Ohio). [a-35S]dATP (ca. 800 to 1,000 Ci/mmol) was purchased from Dupont, NEN Research Products (Boston, Mass.). The labeled DNA reaction mixtures were separated by electrophoresis in one of two types of gels. (i) Gels of 50% urea-7.2% acrylamide (bisacrylamide-acrylamide [1:20] [1]) were poured with wedge spacers (width, 0.4 to 1.2 mm), and samples were electrophoresed for 2.5 h to resolve up to 250 bases from the primer; or (ii) gels of 50% urea6.0% acrylamide were poured with straight spacers (width, 0.4 mm), and samples were electrophoresed for 5.5 h to resolve up to 400 bases from the primer. The gel was dialyzed with 10% acetic acid-12% methanol for 1 h, dried under vacuum, and exposed to film (XAR-2; Eastman Kodak Co., Rochester, N.Y.) for 18 h before reading the sequence directly from the autoradiograph. DNA and amino acid sequence analysis. The DNA-protein sequence analysis software programs, version 2.02, of International Biotechnologies Inc. and Pustell and Kafatos (28) were used for analysis of the DNA sequence for restriction enzyme sites, ORFs, ribosome-binding sites, promoterlike sequences, catabolite repressor protein-binding sites, and nitrogen regulation sequences. The deduced amino acid sequences of the ORFs were analyzed for signal sequences, ATP-binding sites, divalent cation-binding sites, amino acid composition, isoelectric points, and hydropathy. The Genetics Computer Group sequence analysis software package, version 5 (University of Wisconsin, Madison, Wis.), was used to screen the National Biomedical Research Foundation protein sequence bank for sequence similarities to UreA, UreB, UreC, UreD, UreE, and UreF, as well as to construct hydropathy plots. RESULTS DNA sequence of the P. mirabiis urease. The series of overlapping M13 subclones created by restriction enzyme deletions on DNA spanning urease-encoding regions of plasmid pMID1003 is shown in Fig. 1. In addition, 10 synthetic oligonucleotide primers were synthesized to generate sequence where no suitable restriction enzyme sites existed. The nucleotide sequence determined from these

pBR322

EV

4

3

2

0

B

HH

H

6415 EV

5

BomHI 4-4

Bcl Bst Ell

-4--5

s-. 4-.

'

-+----~

EcoRV Hind III

4--"-----@ 4-4----4>

---------3.

~ _NO-00-

@

Nrul Nsi

Pst Synthetic Primers

FIG. 1. Sequencing scheme of the P. mirabilis urease operon. The urease gene boundaries were previously determined by Tn5 transposon insertions. The 5.0-kb region that was sequenced is expanded in the lower portion of the figure (numbered vertical lines represent 1 kb of DNA). The direction of sequencing on the DNA is shown by the arrows; restriction endonucleases on the right refer to the enzyme used to create the deletion for sequencing that particular region. Restriction endonuclease abbreviations: H, HindIII; B, BamHI; EV, EcoRV.

subclones covered a 4,952-bp region (Fig. 2). Both strands of DNA were completely sequenced, except for the last 60 bp of the noncoding strand. The sequence on the coding strand was confirmed in this area by two overlapping subclones. ORFs associated with multiple polypeptides in the urease operon. The DNA sequence (Fig. 2) encoded six ORFs of greater than 95 codons, each beginning with the characteristic ATG start codon. No ORF of any significant length (>50 codons) was found on the reverse complement of the sequence shown in Fig. 2. We identified sites similar to the E. coli consensus ribosome-binding sequence (Shine-Dalgarno sequence) (31) that were present immediately upstream of the methionine start codon for five of the six ORFs (ureD, bp 431 to 436; ureA, bp 1277 to 1282; ureB, bp 1585 to 1590; ureC, bp 1908 to 1913; ureF, bp 4159 to 4164) (Fig. 2). The ORF (ureE) encoding a 17.9-kDa polypeptide lacked a detectable Shine-Dalgarno sequence. The predicted molecular masses of the polypeptides encoded by the six ORFs, in sequential order (5' to 3') as found on the coding strand of DNA, were 31.0, 11.0, 12.2, 61.0, 17.9, and 23.0 kDa. DNA sequence features. A search of the urease operon for putative transcriptional initiation sequences was carried out by using E. coli consensus promoter sequences (29), -35 (TT*G*ACA) and -10 (T*A*ATAAT*), with a -35 to -10 spacing of 17 + 2 bp. Conditions of the search for a possible promoter were that eight or more matches of the possible 12 nucleotides were required, with exact matches at the nucleotides marked with asterisks. We were unable to locate any putative promoter sequences until we reduced the required number of nucleotide matches to seven, relaxed the requirement for exact matches at the nucleotides marked with asterisks, and allowed the gap between the -35 and -10 sequences to extend to 23 bp. Using these modified search conditions, we located five promoterlike sequences (Fig. 3). Two sequences were found upstream of ureD (-35, bp 307; -10, bp 335; and -35, bp 354; -10, bp 381), one upstream of ureA (-35, bp 1232; -10, bp 1261), and two upstream of ureF (-35, bp 4019; -10, bp 4044; and -35, bp 4107; -10, bp 4136). No promoterlike sequences were found upstream of the ureE ORF by using these conditions; we did not expect or find any promoterlike regions upstream of ureB and ureC since they are transcribed with ureA on a single mRNA (15). Urease is known to have a role in the nitrogen regulation pathway of some microorganisms such as the bacterium

10

AA

*

OCT

TAA ACT

20

30 40 * *a ACT TAT AAC CAC TAA CCA

*

TA%

CCA

CT?

AAT

TTC TCA

50

1500

TGG CTT TTA TTA

100 90 70 60 so 110 * * * a * * TCA CAA TAT TCC TAT TCC CAL ACC CCC TCC TSA TAC CCA ATA CTA TAA GAC TGG CTC 120

130

140

150

160

*

*

*

*

*

10

210

200

190

TTA TTT TTA CCA TTC TTA

TT? TTT

250 240 a a TTT GMA ACT COO TOT AAA ATC aCC

260

230

1650

1660

280

1710

CGO CAT TGA TOO AGC OCT TTA TCC TOT TTG AGG

AAL ATO

CAA TTT ATC

330

320

310

300

TI' ATT CAC aCC

AT? TIC TCG

Val Cly

410

470

480

490

500

S10

*

*

*

*

*

*

530

670

660

650

1990

700

710

1860

1070

1980

2000

2030

2020

2010

2090 2060 2070 COT OCT COT AAL CT ATT COT OAT GOT ATO 0O0 CAA Tyr 0ly Glu Clu Vol Lys Ph. Gly Gly 0ly Lys Vol I1e Arg Asp 0ly Not Gly Gln

2040

2050

2100

680

730

720

1850

2060

TAT GCC GAA CAC GTC AAA TT' 2110

2120

2150

2140

2130

AGC CAA CTT GT? ALOT Oc GAO TOT GTC OAT CT? CTO ATC ACC AAT 0cc AT? AT? TTA Bar Gln Vol Vol Sr Ala Olu Cyo Vol ALp Vol Lou I1 TShr Asn Ala I1. I01 Lou

Lcc TTA TCC AT? ALT AT? ALT OTO CAA CCT TAC GCA CAT GCC CTA TTA ACA ACG CCG Thr Lou Bor 01 Lan I1 Asn Val Gln Pro Tyr Ala His Ala Lou Lou Thr Thr Pro 690

1840

CGT TTI CGA TTA CCA OAT ACC GML CTO 'TI CTT CAA ATT GAA AAA OAT TIC ACC ACT Ar; Lou Arg Lou Ala Asp Thr Glu Lou Ph. Lou Glu 010 Clu Lys Asp Pho Thr Thr

CAL GOT OTT GCA CAT AC: TAT TO TTG CAT COT OCT GOT CGG GTG GTC GGT GCT CAT Gin Gly Val Ala His Thr Tyr Lou Lou His Pro Pro Cly G1y Vol Val Cly Gly Asp 640

Ala Arg

Po

1010

ATG AAA ACT ATC TCA CGT CAA COT TAT CCC CAT ATG TTT 0OC CCA ACA ACA aaC GAT Not Lys Thr I10 Ser Arg GCn Ala Tyr Ala LAp Not Ph- 0ly Pro Thr Thr Gly Asp

CTT ACT GAA AAA COT CAT CTC GGC CCC TTA ATG CTT CAC CCA CCT TI' TAT CCA GAG Lou Thr Olu Lys Arg Hia Lou Cly Pro Lou Mat Val Gln Arg Pro Ph, Tyr Pro Glu 610 600 S00 590 620

630

L*u Arq

1800

1790

CGT TTA AAT AT? CCT CT OCT ATO OCT GTT CCC TTC GAG Pro Ala 0ly Not Ala Vol Arg Ph. Clu

1830

570

560

550

540

1780

GCA CCA

1890 1900 1910 1s80 1920 a a sa| * TAT OCT TTT CAT CCC AAA GTO ATO GCT ALA TIC GAG ACT GAO ALL MA TGA Tyr Gly Ph- His 0ly Lys Vol Not GCy Lys Lou Glu Bar Glu Lys Ly --1930 1970 1960 1940 1950

AAG GOT TOG CTT OCT CAC ATC aCT TTA CCA TAT GAC TTA AAG CCA GCG AAO ACA TGT Lys Gly Trp Lou Ala Asp I1. Ala Lau Arg Tyr Clu Lou Lys Arg 0ly Lys Thr cys 520

GIu Ala

Ph. Tyr GIU Val Assn

Tyr His

AGO TI'

CCC CGT CAA ACT CCC ACT CT? GAT CAG TTA OTG OCT TI' CCA CCA AAA CT GMA ATT Pro 0ly Gln Ser Arg Thr Val Ap Glu Lou Vol Ala Ph0 Ala 0ly Lys Arg Clu I10

ATT TAT TTC ATC AAT TTT GCC AAL TTC WCA GGA GTO CGT ATO CCT GAC TTT TCT GAG Not Pro Asp Ph. Sar Glu 460

Ser His

1760

1750

1740

Glu Thr Lou Gly Ph. Arg Lou Asn I01

1820

450

440

430

420

1700

1690

1680

1730

ALA CAG ACA TTA OCT mT'

Lys 390 370 400 380 . * a a * * ATT TTG AAT CAC ATA ATC TOL TOO OTA GTG COG TAT ATA TTC GTC TAT TIC CTG

OT

1720

1770

360

350

1670

CAA CTC GCC TCT CAT TAC CAC TI' TAT GAA GTG AAT CAG CCLA OT

340

CTA CCC AAC ATT CAT TIC ATT

1640

1630

1620

AAT OCT OCT CCC CGA ACA AAA ACC ATA CAG GTG COT AAT CAT GCC GAT AGA CCT CTA Aen Ala 0ly Arg Clu Thr Lys Thr I10 Gln Vol Ala Asn His Gly Ap Arg Pro Vol

Gln 290

1610

CT AAT AAC ATO ATC CCC OCT CAL ATT AOA CTT AAT GCA CCA TTA CaC OAT ATT CAA CTO not I10 Pro Gly Glu 010 Arg Val Asn Ala Ala Lou 0ly Asp I1 Clu Lou

ATT TAA AAC GCA T

270 a

*

1580

1570

1600

TTS TTA CTT

220

CTA AAC AAA TC CTCTmT

1560

GGC ACC AAA TTG GTT TCA ATT CAC TCA CCT ATT GTC TAG Gly Thr Lya Lou Vol Bar Il* His Ser Pro I1e Val ---

170

ACA ATA TAA TCT TCT TOC TCA CCA ATA ACA ATA TCG ATA TCT CT? CAT

1540 1550 TOC ACT TTC CCC GAT Cy0 Thr Pho Pro Asp

1530

1520

1510

ATG GAA GGG GTG CCA GAG ATG ATA AAA GAT GTT CAA GTA CAG Not Glu Gly Vol Pro Glu Mat Ile Lys ASp Val Gln Vol Glu

2160

740

2170

2190

2180

2200

O0G C000 CA ALL TI' TAT COT ALT OCA aG" GOT ACT OCA TCC CAA ACG CAG ACA TTM Gly Ala Thr Lyo PSi Tyr Arg Nr Al Gly Gly Thr Ala $or Gln Thr Gln Thr Lou 790 780 770 800 750 760

OAT TAT TOC 00c AT? OTA AAA OCA OAT ATT 00c AT? AaA GAT 00C COT ATS OTC OOT Ap Tyr Trp Gly I01 Val Lys Ala Asp I1 0ly 010 Lys Asp 0ly Ar; I1e Vol 0ly 2210 2220 2230 2240 2260 2250

ALO OTT BOA CAA GAG 0c TT? TTA GA: TOO TTA CCC CAAGAG AALT TC TT' mT CCT TSr Vol Ala Gin 0lu 0ly Pbe Lou Olu Trp Lou Pro Gln Glu Lan I1- Pho Phb Pro

AT? CCC AAG GCC GOT AAT CCA CAT CT? CAG CCC AAT aTO OAT ATT OTM ATT CGC CCC I1 0ly Lys Al. Gly asn Pro Ap Vol Gln Pro Asn Vol asp I1 Vol 1o 01ly Pro

010

030

820

GTO TOT TTA

2270

850

840

2280

2290

2300

2320

2310

TCA TCA GCC AAA TTT ATC Cys Lou TSr Tr HNs I11 His Lou Ala Ser Sar Ala Lys Phe Ile

CCA AOL OLA STT 0TC COT GGA GAA GOT ALA ATA GTC ACT GOCT G4 GOT AT? OAT ACC 0ly Thr Glu Vol Vol Ala Gly Glu Gly Lys I1e Vol Thr Ala 0ly 0ly I01 ALp TShr

890 900 000 910 070 000 TO GAL ATG CAG TOT TT GLA COO CCA GTTTTAAAT GAG TOO TI? GAA ACT GGC 0ly Trp 0lu Not Gln Cyo Pb. Oly Arg Pro Val Lou ALn Glu Trp Pho Clu Thr Cly

CAT ATC CAC TTT AT? TOT CCA CAL CLA GOCC CAA GAA GOT CTC O TOCT 0C OTA ACC ilia I0 His Ph. I01 Cys Pro Gln Gln Ala Gln Glu 0ly Lou Vol Sr Gly Vol Thr

OAT GOT CAA

Vol

Asp Ala aln

ACC ACA CAT ATT CAT TTA

CCC

2330

860

940 950 960 970 920 930 AAG OTA ALA 00 C0C TTA AAT TI' TAT GTT CAT CGA AOA TTA ATT TTA ACA CAC TCA Lys Val Lys Gly Arg Lou Asn Ph0 Tyr Vol Ap Glu Arg Lou I1e Lou Thr Glu Ser 1000

990

980

2380

2390

2350

2400

2370

2360

2420

2410

2430

ACC TT' ATT CGT GGA GCA ACA CGC OCT OTG GCO COT ACT ALT OCA ACC ACO OTT ACC Thr Ph I1e Gly Gly Gly TShr Gly Pro Vl Ala Gly TShr An Ala Thr TShr Vol Thr 2440

1020

1010

2340

2450

2400

2470

2460

2490

ATO CO6 OTT GA CC TSTA CAL AAA CAA OC 0CC CA ATG COT GAA TTT CCT ATC T not Arq Val alu aly Lou aln Lys Gln Ala Ala Ala Nat Arg Glu Ph. Pro Not Phe

CCC GOT ATT TGO A;T ATG TAC COO ATO TSA GAG 000 OTO OAT GA TSA COT AT? AAT Pro Gly I1. Trp Asn Not Tyr Arg Not Lou Glu Ala Vol Asp Glu Lou Pro I1. Lan

1050 1040 1060 1070 1080 00C TOO CTT TAT ATT TAT CCT CCA ACC GAT CCA TTa AAA CAC ATT ATT CAA CAC CAT 0ly Bor Lou Tyr I1e Tyr Pro lab Thr Asp Ala Lou Lys Clu 01l Ila Cln His Ilis 1120 1130 1110 1140 1100 1090

2500 2510 2520 2530 2540 2550 GTG GGT TTA T' GGC LAL GGT TGT GTC AGT CAG CCC GAA GCA ATC COC GAL CAA ATA Vaol ly Lou Ph. Gly Lys Gly Cys Vol B8r Gln Pro Glu Ala I01 Ar alu aln I1e

TTA GAO AAG OTA ALT CCC CTA GTT GAA TAT CGT TTA ACG CAT GTT CAT CCC ATT TTA Lou Glu Lys Val AnL Pro Lou Val Clu Tyr 0ly Lou Thr Asp Val Asp Gly Ile Let 1160 1170 1180 1190 1150

ACA GCO GCT GCT ATA GGT CTT ALA ATA CAT GAA CAC TOO acG WCA ALG CCA ATO Thr Ala 0ly Alb Il1 Gly Lou Lys 1. His Glu ALp Trp 0ly Ala Tbr Pro Not 2610 2620 2640 2630 2650 2660 ATT CAC AAT TGC CTT AAT GTC 0CC OAT CAA ATO GAT GTA CAL GTO GCT ATT CAC Ila Hi: Asn Cy- Lou Asn Vol Ala Asp Glu ot Asp Vol Oln Val Ala Ile H81

1030

2560

aT TTTA CT OTA TTA GGGC AC CAA ACC GAC CCG ATO ATG GCc TOT T?T CCC CAA GTA Val LOu Arq Val Lou Gly Thr Gin Thr Glu Pro Not Nat Ala Cya Pha Ala Gin Val 1210

1200

1220

1230

1240

TOG CAA ATC GTC AOL CACAC TOG CTA GGT TAT TGC CCT GAC CCA CCC CCC ATC TGG Trp Gln IIo Val Arg Gin Hib Trp Lou 0ly Tyr Cya Pro Glu Pro Pro Arg; 11- Trp

1200

1290

TCG T;T ATT TTA GCA GCC

1310

1320

CAA ATO GAA TTA AOA CCA AOA GaA aAA OAT AAA TTAC not Glu Lou Thr Pro Arg;lu LyG ALp Lys Lou Lou

1340

1330

1300

1350

1360

2840

CT? TTT ACT CCA GCO CT? CTT CCA CGA AOA COT TTA OCT AAL CCa TTA AAA CTT aAT Lou Ph TShr Ala 0ly Lou Val Ala Clu Ar; Arg Lou Ala Lys 0ly LOu Ly7 Lou Asn 1390

1400

1410

1420

2710

OcA Ala TCT for

2720

2750

2770

2760

2850

2860

2070

CCT OAT Pro asp

1430

aTO ATC Vol Ile

2030

LCO ATO OCT

TAT Thr Not Pro Tyr

2800

ACC AT? ALT ACC GTG CGA CAG CAT CST CAT ATG TIC ATG OTC TOT CAT CAT Thr I1- Asn Thr Val Asp Glu His Lou Asp Not Lou Not Vl Cy-HNs HNi 2900 2910 2920 2930 2040 CCC TOT AT OCT CLA OAT OGT GCA TTT OCT GCA TCT CST AT? COT COO GA Pro Sor Ile Pro Glu Asp Vol Ala Pha Ala Glu S-r Arg Ile Arg Ar; Olu

CCA CCA CAG GGG Ala Arg Clu Gly 1490

2600

GTG ATC CAT GTA TTC CAT ACC GAA GGC GOA OGT G0C CCOTOAT 0CC Vol lI ills Val Ph. Hi. Thr Clu aly Ala 0ly 0Gly ly H1s Ala 2780 2790 2300 2020 2010 AOG TCG GTA GCA CAG CCC AAT AT? TTA CCT GCA TCA ACC ALC CCA Lys 8ar Vol 0ly Glu Pro Asn Ile Lou Pro Ala 8ar TShr an Pro

1370

TAC CCT CAA CGT GTC CCC TTC AT? AGT TOO CCO AT? ATG CAA CCC Tyr Pro Clu Arg Val Ala Lou I1- SOr Cy0 Ala 11l Not Clu 0ly 1460 1450 1470 1480 1440 a * *a * * AAa AOA OTT GCT CLA TTA ATO ACT CAA CCA CGT ACT OTT TTa ACC Lys Thr Vol Ala Gln Lou Nt S-r Glu Gly Arg Thr Vol Lou Thr

2700

2690

2740

2730

1270

2590

GAC ACC TTA AAT GCA GCT GOT TTT TAT GAA GAG ACA GTA AAA 0CC ATT 0CC CGT CCA Ap Thr Lou Asn Clu Gly Gly Ph. Tyr Clu Glu Thr Vol Lym Ala Il1 Ala 0ly Ar;

1260

CCC ACA TAA Ala Thr ---

1380

2680

2670

1250

2580

2570

2090 CTC OAT Lou LAp

ACc AT?

Tbr I1l

2970 2980 2960 2990 3000 a a a a * * GCT GCA GCA GAT ATC TTA CAT GAT ATOG (tGi; GcA AT? TCG GOTO ATO TCG TCL GAC TCA Ala Ala Clu Asp Ila Lou Ilia LAp Net Gly Ala I1- Bar Val Not b-r aer Ap Sar

2950

CCA GAG CAL GTA Ala Clu Gln Val

FIG. 2. Nucleotide sequence of the P. mirabilis urease genes. Numbers above the sequence indicate the nucleotide position. Predicted amino acid sequences, in sequential order, for UreD (bp 441 to 1262), UreA (bp 1287 to 1586), UreB (bp 1598 to 1924), UreC (bp 1924 to 3624), UreE (bp 3655 to 4137), and UreF (bp 4168 to 4782) are shown below the DNA sequence. Putative ribosome-binding sequences (Shine-Dalgarno [S.D.] sites) are shown above the DNA sequence preceding each gene. 6416

VOL. 171, 1989

P. MIRABILIS UREASE GENE SEQUENCE

3020

3010

3030

3040

3050

4030

3060

GGA GAM GTT ATC TTA CGC ACT TGG CAG TGT GCA CAT MAA Glu Val le Leo Arg Thr Trp Gln Cys Ala His Lys

CAA GCC ATG GGA CGA GTC

Gln Ala Met Gly Arg Val Gly 3080

3070

3090

3100

3130

AAT GAT Asn Asp

GCG GGT GAT AGC GCA GAT ATG AAA TTG CAA CGA GGC ACA TTA Met Lys Leu Gln Arg Gly Thr Leu Ala Gly Asp Ser Ala Asp 3140

3130

3150

3160

Ile Lys Arg Tyr

3200

3190

3180

3210

3250

3260

Arg

GGC ATT GCT Gly Ile Ala

3220

CAT ACG GTG GGA TCA ATA GAA AAA GGT AAA CTT GCG GAT ATC Hiis Thr Val Gly Ser Ie Glu Lye Gly Lys Leu Ala Asp Ile 3240

Asn

3170

AAA CGT TAT ATC GCT AMA TAC ACG ATT AAC CCG GCA CTG GCA CAT le Ala Lys Tyr Thr Ie Asn Pro Ala Leu Ala His

ATT

3230

GTG CTA TGG GAT

CCT

3270

3280

3300 Pro Met

3370

3360

3380

3430

3420

3410

3440

3450

3480

3490

3500

3540

3530

3550

Gly

Arg

3560

3600

3610

3650

3640

3630

3660

3670

3710

3700

3690

3680

3720

3730

CTT

3750

Met Asp Glu 3800

ACC

Arg Thr

Gly

Leu Phe Leu Pro

TTG CCT CGA GGC

3860

Arg Gly

GTA

CTT

Thr Val Lou

3870

AAA GAG GGG GAT

3880

3890

GGC GAT GTT GTC ACC ATT GAA GCG GCT AAA Glu Gly Asp Val Val Thr Ile Glu Ala Ala Lys 3920

3930

3980

GAG Glu

3940

3990

CTG

CTG

4280

4380

4290

4340

4390

4250

4300

4310

4350

4360

4400

4410

4420

4440

4430

4450

4460

4470

1490

AGT CGC GAA ACC AAA GAG TTA AGG CAG GiAA GAG CGT CAA CCG GGG ATC GCT TTT CCC Ser Arg Glu Thr Lys Glu Leu Arg Gln Glu Glu Arg Gln Pro Gly Ile Ala Phe Pro 4500

4510

4520

4530

CGT TTA CTT CCT CAA TTA GGC ATT GAA TTA GAC GAT ACG TTA CAA CAG CGG GTT AAA Arg Leu Leu Pro Gln Leu Gly Ile Glu Lou Asp Asp Thr LOu Gln Gln Arg Val Lys 4550

4560

4570

4580

4590

CAG ACG CAA TTA ATG GCG TTT GCG TTA GCT GCC GTG CAT TGG CAT ATC GAT AGT GAA Gln TtSr Gln Leu Met Ala Phe Ala Leu Ala Ala Val His Trp His Ile Asp Ser Glu 4610

4620

4630

4640

4650

AAA CTG GTG CCA TTA GGG CAA AGC GCA GGG CAA AAA ATG TTG TTT GCT CTA GCT GAG Lys Leu Val Pro Leu Gly Gln Ser Ala Gly Gln Lys Met Leu Phe Ala Leu Ala Glu

4720

4730

CAG ATC CCC, GCT

ATT

4780

Arg

Gin

Lou

GAA

Ala

TGA

CCA

AGA

TGC

AAG

GAA AAA

CAG

AGT

TGG

CCA CAA GAG

GAT

ATT

His

Trp

Pro

Gln Glu

Asp

Ile Gly Ser 4820

4810

4800

TCG

GGC

CAT

GTA GTC GCC ATG

AAA CTC AAT ATA CTC GAC

TTT

Lys--

4850

4840

TTC GTT CAT

GCA

4760

4750

4740 TTA TCG

4790

TTA CGC CAG CTC AAG Lou

GTT GAG

Ile Pro Ala Ile Val Glu Leu Ser

Gin

4830

TCA

GAG ATA AAG

TAT

CAC

Tyr

His

4000

ACG

GTT

TAT

AGT

Thr

Val

Tyr

Ser

4010

GTG

0CC

CTG

4940

3960

GGT AAC CGA Leu Gly Asn Arg

TTA

4890

391C

3900

CAA GTA TCA Gln Val Ser 3950

GAT CCA TTA TTG CTT GCT CGT GTT TGT Pro Lou LOu Leu Ala Arg Val Cys

4240

AAC

CAC TAT TAG

AAG

ATA

4880

4870

4860 ATC

AAT

CAC

TOC

GTA

TTG

GTG

TT

Lys Glu Gly Asp Ile Leu Leu Ser Glu

GAO

Asp Asp

ATT

4140

AGT TTG GCA AAG GGT GAT AGC GAT ACA GTG AAA TAT TGG TGT GAC TTT ATG GTC SCA Ser Leu Ala Lys Gly Asp Ser Asp Thr Val Lys Tyr Trp Cys Asp Ph. Met Val Ala

3850

3840

3830

ACC

4130

4230

4330

4320

4770

AAA AGT CGC TTA AAA GTG GCT TTA AGT GAC GGG CAA GAA GCC Ser Arg Leu Lys Val Ala Leu Ser Asp Gly Gln Glu Ala 3820

CTA TTT

4120

CCG

CAA ATG ACC CGA ACA TTA GCC ACA CTC GAG CTT CCT ATA TTG CGG CAA TTA CAA ACG Gln Met Thr Gly Thr Leu Ala Tlhr Leu Glu Leu Pro Ile Leu Arg Gln Leu Gln Thr

Thr

Lys

3810

0G0

ACC

3790

3780

3770

3760

ATG GAT GAG CGC

3970

4270

3740

GAA CTA ACC TCT ACA GAA AAG CCA AAG TTA ACC CTT TGT CTT Gln Lys Ala Leu Glu Leu Thr Ser Thr Glu Lys Pro Lys Leu Thr Leu Cys Leu

CAA AAA GCG

Pro Glu Pro

Tyr

AAG CTC TGT TGT GCC TAT GTT TGG GGC TGG TTA GAA AAT ACG GTG ATG TCT GGG GTA Lys Leu Cys Cys Ala Tyr Val Trp Gly Trp Leu Glu Asn Thr Val Met Ser Gly Val 4660 4670 4680 4690 4700 4710

GCG TTT TTA TTG AGA ATT TAT TGA ATG AAA AAA TTT ACT CAG ATT ATT GAT CAA Met Lys Lys Phe Thr Gln Ile Ile Asp Gln

GAC

4260

4600

3520

TGT GAG CCA GCG ACT GAA TTA CCG ATG GCT CAA CGC TAT TTC TTA TTT TAA Cys Glu Pro Ala Thr Glu Leu Pro Met Ala Gln Arg Tyr Phe Leu Phe ---

CCA

Gln

Lys

Glu

GCC ATT GAA AAA GGT TGG GTA TGC TCA GCA GAA ACC TTG TCA GAT TGG TTA AGC GCA Ala Ile Glu Lys Gly Trp Val Cya Ser Ala Glu Thr Leu Ser Asp Trp Leu Ser Ala

4540

3570

GAT CCA CAA ACT TAC ATT GTT AAA GCG GAT GGT GTA CCA CTG GTT His Ile Glu Leu Asp Pro Gln Thr Tyr Ile Val Lys Ala Asp Gly Val Pro Leu Val 3590

Gl- Leu

GAG

GGT GGT CAT CAC CAC CAC CAT GAT CAC CAC CAT TAA Gly Gly His His His His His Asp His His His ---

4220

4490

CAT ATC GAG TTA

3580

CCT

4110

4210

3510

GTG GAG GGC TGT CGT CAT ATC ACA AAA GCT TCG ATG ATC CAC AAT AAC TAT GTT CCT Val Glu Gly Cys Arg His Ile Thr Lys Ala Ser Met Ile His Asn Asn Tyr Val Pro 3520

4200

3460

GGT CGT

GTG CCA GAA AAA TTA GGC TTA AAA AGC TTA ATT Ile Glu Ala Gly Val Pro Glu Lys Leu Gly Leu Lys Ser Lou Ile

ATT GAG GCG GGA

3470

4080 CAA

TTA GTT AGC CCC TCT CTT CCG GTA GGT GCT TTT ACT TAT TCT CAA GGG TTA GAG TGG Leu Val Ser Pro Ser Leu Pro Val Gly Ala Phe Thr Tyr SOr Gln Gly Leu Glu Trp

4370

3390

ATG TAT GCC TGT CTA GGA AAA GCC AAA TAT CAA ACG TCG ATG ATC TTT ATG TCA AAA Met Tyr Ala Cys Leu Gly Lys Ala Lys Tyr Gln Thr Ser Met Ile Phe Met Ser Lys

GCG GGT Ala Gly

4100

4070

3290

ATG GGG GAT ATT AAT GCG GCT ATT CCA ACC CCG CAA CCG GTT CAT TAT 'CGT CCA Gly Asp Ile Asn Ala Ala Ile Pro Thr Pro Gln Pro Val His Tyr Arg Pro

CCA 3350

4060

GGC TTA GAA AAA TAC

4170 4180 4190 * SD * A TGG CAC TGC GAT CAT CAA AGG AGG TGC ATG ATG CTA GCT GAT CTG CGC TTA TAT' CAA Met Met Leu Ala Asp Leu Arg Leu Tyr Gln

Gly Val Lys

Ala Phe Ph.

4050

GGG GCT TAT GGT GGG TCA TCC Gly Ala Tyr Gly Gly Ser Ser 4150 4160

Val Leu Trp Asp Pro

GTC AAA CCG GCA CTT ATC ATA AAM GGT GGT ATG GTC tGT TAT GCG Pro Ala Leu Ile Ile Lys Gly Gly Met Val Arg Tyr Ala 3310 3320 3330 3340

GCT TTC TTT GGC

4090

3120

AAT CGT

T Asn

4040

GCT CGC GGC TTA GGG GCT ACG GTG GTG GTT Ala Arg Gly Leu Gly Ala Thr Val Val Val

6417

4910

4900 TTG

GTT

CAG

4930

4920 TTC TTT

GTA AAG

CTA

TGC

4950 *

CAT

GTA

CCA

His

Val

Pro

GCG

ATA

OTT

ACC

AA

4020

TTG CAA ATA GAA 0CG GOT TGG TOT CGT TAT rTT CAC GAT CAT GTA TTA GAT GAT ATG Leu Gln Ile Clu Al. Gly Trp Cys Arg Tyr Phe His Asp His Val Iou Asp Asp Met

FIG. 2-Continued

Klebsiella

aerogenes (13) and the fungus Aspergillus nidulans (19). A search for nitrogen regulation sites with the sequence TGGYARN4TTGCA (2), where Y is a pyrimidine and R is a purine, was carried out in the regions upstream of each ORF in the P. mirabilis gene complex. A site upstream of the ureA locus at bp 1221 was found which matched 13 of 16 bases in the sequence. Preliminary physiological experiments indicated, however, that the operon was not under nitrogen regulation control (E. Nicholson, G. Chippendale, and H. Mobley, Abstr. Annu. Meet. Am. Soc. Microbiol. 1989, H126, p. 190). Another possible mechanism for operon regulation was through the CRP-cyclic AMP cascade. We were unable, however, to find sequences similar to the catabolite repressor protein-binding site (AANTGTGA N2TN4CA) (10) in the putative promoter regions for any of the cistrons. Sequences downstream of each cistron were analyzed for transcription termination signals similar to those established

for E. coli genes (29). Characteristically, these rho-independent signals formed a secondary structure in the mRNA which consists of a stem-loop structure followed by a string of uridylates. Regions downstream of the ureD (bp 1306 to 1328), ureC (bp 3791 to 3809), and ureF (bp 4804 to 4826) were found that could form small stem-loop structures followed by 4 to 6 uridylates. No such sites were found for the ureA, ureB, or ureE ORFs. Previous work has not delineated the ends of the urease gene complex. We demonstrated that DNA sequences 5' to the ureD ORF are unnecessary for an active urease by deletion of upstream sequences. pMID1003 was partially digested with ClaI to form a linear plasmid, followed by digestion with AccI and religation. When assayed for the ability to synthesize urease, the resulting plasmid produced active enzyme at the same level as the parent plasmid. In addition, we confirmed that DNA sequences downstream of the 23.0-kDa ORF were not required for urease activity. A

6418

J. BACTERIOL.

JONES AND MOBLEY

. _ r~zwum *{o Cigg~~~~~~~~ Ili,Z={IY ZXxzW111 ~

are

8I1

D77Al

I

31.0

kDa

P so

UUU~-. WZ0

I

I

11.0 12.2

p

c 61.0

n

17.9

m 11

I ILF 23.0

kb

p

n lSD so

SD

FIG. 3. Physical map of the urease gene complex. The rectangular boxes labeled D, A, B, C, E, and F indicate the physical positions in the operon of each of the ure ORFs. Numbers beneath each rectangle correspond to the predicted molecular size for each polypeptide. The lines with arrows beneath the map indicate the direction and predicted length of each transcript. Two putative promoter sites were found upstream for both ureD and ureF (see text for positions). Restriction endonuclease sites are indicated above the line. P, Promoter; SD, Shine-Dalgarno sites.

BalI deletion of pMID1003, which removed all sequences downstream of the urease operon, including a portion of the vector and the last 12 codons of the UreE protein, was constructed. This deletion plasmid also conferred an active urease phenotype to E. coli HB101, although with enzyme activity that was two- to threefold lower than that of the parent plasmid, presumably because of a truncated UreF protein (Fig. 3). The G+C content of the P. mirabilis urease gene sequences (43%) was not significantly different from the previously determined G+C content of the genomic DNA (39%) (12). Predicted amino acid sequence features. With the use of the predicted amino acid content, pIs were determined for each of the polypeptides. The ORFs for ureA-, ureC-, ureD-, ureE-, and ureF-encoded acidic proteins (pl values of 5.8, 5.4, 6.3, 6.0, and 4.9, respectively), whereas the polypeptide produced from ureB was basic (pI 9.0). We also noted that the UreB polypeptide contained no cysteine residues in its sequence and the UreE protein had a string of eight histidine residues at its COOH terminus. Other than these two fea-

Amino acid

Ala Val Leu

Ile Pro Met Phe

Trp Gly

Ser Thr

Cys Tyr

Asn

Gln Asp Glu

Lys Arg His

tures, nothing was remarkable about the amino acid compositions of the proteins. The amino acid composition of each ORF is shown in Table 1. Shown in Fig. 4 are the hydropathy plots of each polypeptide based on the prediction of Kyte and Doolittle (16). The plots for the three structural subunits were consistent with plots for cytoplasmic polypeptides. The plots for UreD and UreE contained both hydrophilic and hydrophobic regions, while the plot for UreF contained two large hydrophobic regions, residues 1 to 25 and residues 168 to 190, which indicated possible membrane-spanning domains. An examination of the predicted N-terminal regions of the polypeptides revealed a possible signal sequence in the UreE protein (20, 27, 32, 35). This region possessed the general properties of a leader sequence with charged residues near the N terminus followed by eight nonpolar and hydrophobic residues. These residues were followed by a short-side-chain amino acid (alanine) which was five residues from a serine at residue 18, the putative cleavage site. In addition, we searched the predicted amino acid sequences for metal-binding sites (C-X2-C-X3-F-X5-L-X2-H-X3-H) (11) and ATP-binding sites (GKGGVGKT) (36). No matches were found for these sequences in any of the polypeptides. Sequence homology. The NBRF-PIR protein data base was searched for similarities with the deduced amino acid sequences of each ORF. The deduced amino acid sequences of the ORFs for ureA, ureB, and ureC had'a high similarity to the amino acid sequence of jack bean urease (Fig. 5). No striking sequence similarity was found for UreD, UreE, or UreF with protein sequences in the gene bank. Closer examination of the similarity between the jack bean urease subunit and the three subunits of the P. mirabilis urease revealed that the P. mirabilis subunits aligned with the jack bean subunit in a nonoverlapping fashion in the order that the P. mirabilis subunits were transcribed, UreA, UreB, and UreC. UreA (100 amino acids) aligned with the first 100 amino acids of the jack bean subunit (840 amino acids) (17). Following a gap of 28 amino acids, UreB (109 amino acids) aligned with the next 109 amino acids, followed by a gap of 33 amino acid residues. Lastly, the UreC polypeptide (567 amino acids) matched with the last 567 amino acids of the jack bean urease subunit with no unmatched amino acids at

TABLE 1. Predicted amino acid compositions of UreA, UreB, UreC, UreD, UreE, and UreF Mol% (no. of amino acid residues) of: UreD (274) UreE (161) UreC (567) UreA (100) UreB (109) 6.21 (10) 7.30 (20) 9.17 (52) 8.00 (8) 9.17 (10) 7.45 (12) 6.57 (18) 8.11 (46) 8.26 (9) 10.00 (10) 12.42 (20) 10.95 (30) 5.82 (33) 12.00 (12) 6.42 (7) 3.11 (5) 4.74 (13) 9.52 (54) 5.00 (5) 5.50 (6) 3.73 (6) 6.57 (18) 5.47 (31) 5.00 (5) 3.67 (4) 1.86 (3) 2.92 (8) 3.70 (21) 5.00 (5) 2.75 (3) 1.86 (3) 4.74 (13) 2.65 (15) 6.50 (6) 2.00 (2) 0.62 (1) 2.55 (7) 0.88 (5) 0.00 (0) 0.00 (0) 9.32 (15) 8.03 (22) 10.76 (61) 7.00 (7) 11.01 (12) 4.97 (8) 2.92 (8) 3.53 (20) 4.00 (4) 2.75 (3) 6.21 (1) 7.30 (20) 6.35 (36) 3.67 (4) 7.00 (7) 1.86 (3) 1.82 (5) 1.59 (9) 0.00 (0) 2.00 (2) 3.11 (5) 3.65 (10) 2.65 (15) 1.00 (1) 2.75 (3) 0.62 (1) 2.19 (6) 3.35 (19) 4.59 (5) 1.00 (1) 4.35 (7) 5.84 (16) 3.00 (17) 2.75 (3) 3.00 (3) 6.83 (11) 2.92 (8) 5.82 (33) 3.00 (3) 2.75 (3) 7.45 (12) 6.93 (19) 5.82 (33) 11.00 (11) 10.09 (11) 6.21 (10) 4.01 (11) 4.76 (27) 6.42 (7) 7.00 (7) 4.35 (7).* 4.74 (13) 3.70 (21) 8.26 (9) 6.00 (6) 7.45 (12) 3.28 (9) 3.35 (19) 3.67 (4) 1.00 (1)

UreF (205) 9.27 (19) 5.85 (12) 15.12 (31) 3.90 (8) 4.39 (9) 3.41 (7) 2.44 (5) 3.90 (8) 5.85 (12) 6.83 (14) 5.37 (11) 1.95 (4) 1.95 (4) 0.49 (1) 8.29 (17) 4.39 (9) 6.83 (14) 4.39 (9) 3.90 (8) 1.46 (3)

P. MIRABILIS UREASE GENE SEQUENCE

VOL. 171, 1989

6419

5.0

URE D

Wl Hydrophlllclty S.o

URE A

HN HYdPOPhi1lCltY Hydophilicity H#E

-I.e l

1 il l l., l ll,l1

Bll

l l lllB l l1 l 11 l l l l 1l l 1l lll l Bll l l l B l lI l Bl I

20

0

60

40

l111 lB

B

100

6

5..

URE B

Hi ftyd&ophlllcSty

URE C

HW HydrophlBJIclty

5.0

*

URE E

URE F

100

aoo

see

400

600

|Hit Hydrophlllclty/ l

6.e9 Hi Hydrophi1Jc1fty _ j|

*

50

slee

-e

-

FIG. 4. Predicted hydropathy profiles for each of the six urease polypeptides. The numbered horizontal axis under each panel represents the amino acid number. The left vertical axis indicates the relative hydrophilicity (positive ordinate) or hydrophobicity (negative ordinate). Plotted is the calculated hydropathy value for a window of nine amino acids as the frame moves consecutively one amino acid at a time toward the C terminus (16). HW, Hopp and Wood analysis.

the carboxy termini of either polypeptide. There were a total of 446 exact amino acid matches, giving a similarity of 57.5%. When conserved amino acid matches were considered as well, 567 matches were found, giving 73.2% similarity. Using the hypothesis that the three bacterial urease subunits evolved into one jack bean urease subunit, we "searched the intervening region of the structural subunit ORFs with sequences of consensus splice sites necessary for the intron splicing of eucaryotic genes..This search revealed a sequence with a high percentage of matches (24) to the consensus sequence (YYYYYYYYYYYNYAGG, where Y is a pyrimidine-restue) located between the ureA and ureB ORFs. Inspection of the junction between the ureB and ureC

ORFs revealed that the last codon of the ureB ORF and the start codon of the ureC ORF share a single nucleotide

(Fig. 6). DISCUSSION We presented the nucleotide sequence of the chromosomally encoded urease operon which was cloned from P. mirabilis HI4320, an isolate cultured from the urine of a patient with bacteriuria. The urease operon encoded an

inducible, high-molecular-weight Ni2" metalloenzyme with a complex subunit structure. We sequenced a 4,952-bp region of DNA that was sufficient for expression of an active

JONES AND MOBLEY

6420

J. BACTERIOL.

P. mirabilis A

urease

C

B -0

Jr

mIWu

I,

I_Mlm1

|1-1S,!1

1oo

40

200

nn im I

3

ago

00

Jack bean

o 600S o o

m

m r iniiuil 700

m

urease

FIG. 5. Amino acid sequence similarity between P. mirabilis urease subunits and jack bean urease subunit. The letters A, B, and C above the lines refer to the structural subunits of the P. mirabilis urease encoded by ureA, ureB, and ureC, respectively. Numbers above and below the horizontal lines represent amino acid positions. The sequences of the three P. mirabilis subunits were combined and numbered sequentially to facilitate analysis. Black vertical lines between the sequences represent an exact amino acid match or conservative replacement.

urease protein. The sequence revealed that the operon encoded six ORFs which were named ureA, ureB, ureC, ureD, ureE, and ureF. The polypeptides UreA, UreB, UreC, UreD, and UreF were required for enzyme activity in E. coli HB101. The molecular sizes of the polypeptides which were calculated from the deduced amino acid sequences were similar in size to polypeptides previously identified as belonging to the urease operon (15, 37). Each of these polypeptides was previously mapped to a position within the operon by TnS transposon insertions which was the same as the position of the corresponding ORF revealed by sequencing (15, 37). The similarity between the P. mirabilis urease genes that we sequenced and those that Walz et al. (37) cloned was not surprising since it has recently been demonstrated by Southern hybridization that the urease genes encoded by nearly 100 different isolates of P. mirabilis are

z/re

ID

C

1@

E1FI I1kb

-.. end =1 start X ureA V P OS M TCA CCT ATT GTG TAG GTAATAAC ATG ATC

urek-

Splice - Accept

ure

ID

A

lf ZIXC EZ I1

F kb

E2> start ureCC M K T ATG AAA ACT ATC GAG AAA AAA TGA E K K -

ureB

=4

end

B

FIG. 6. DNA sequence characteristics at the junctions of the structural genes. (A) Junction between ureA and ureB. A region of DNA similar to a eucaryotic mRNA splice-acceptor site (13 of 16 nucleotides) (24), near the end of the ureA ORF and extending into the untranslated nucleotides, is underlined. (B) Junction between ureB and ureC. The last nucleotide of the last codon for ureB is the first nucleotide of the start codon for ureC. Letters in the rectangles refer to the polypeptides. Letters above and below the DNA sequence are standard single-letter amino acid codes.

highly conserved with respect to specific urease gene restriction fragments recognized by DNA probes (H. L. T. Mobley and G. Chippendale, submitted for publication). We have demonstrated (15) that the gamma, beta, and alpha subunits (11.0, 12.2, and 61.0 kDa, respectively) represent the three structural subunits of the urease enzyme, are transcribed on a single mRNA, and are translated in the order of the smallest to the largest subunit. This explains why a single transposon insertion in ureA or ureB had a polar effect on the translation of downstream ORFs. It is unclear whether ureD is encoded on the same transcript as the structural subunits or is translated from a unique mRNA. We were unable to determine whether transposon insertions in ureD exert a downstream polar effect on translation of the structural subunits. However, DNA sequences just downstream of the stop codon of the 31.0-kDa polypeptide (bp 1306 to 1328) resembled a rho-independent transcriptional termination region similar to those found in many E. coli genes. If termination of ureD transcription occurred at this point, the enzyme subunits would necessarily be transcribed on a separate message. In contrast, the 23.0-kDa polypeptide was produced from its own promoter. Insertion of a transposon in the ureE ORF (37), which would have a downstream polar effect on transcription of ureF if ureE and ureF shared the same transcript, did not affect urease expression. Therefore, transcription of the ureF cistron, which is required for urease activity, begins downstream of the 17.9kDa ORF. No promoter could be found for the ureE ORF, and the 17.9-kDa gene product is not required for urease activity in the recombinant host E. coli HB101, as shown by Walz et al. (37). However, the involvement of this polypeptide in some aspect of ureolysis cannot be ruled out since the loss of this protein in the recombinant host may be complemented in trans by a homologous E. coli protein. Current studies are aimed at studying the expression and regulation of UreD, UreE, and UreF, as well as identifying the functions that they perform. The possible roles being investigated for these proteins include urea transport, nickel transport, nickel insertion, and enzyme assembly. The enzyme is comprised of three different subunits previously designated gamma, beta, and alpha that are encoded by ureA, ureB, and ureC, respectively, which have predicted molecular sizes of 11.0, 12.2, and 61.0 kDa. We propose that the names of the polypeptide subunits be changed to be consistent with the genetic designations for the individual cistrons, so that future confusion will be

P. MIRABILIS UREASE GENE SEQUENCE

VOL. 171, 1989

avoided when referring to the operon and its translation products. The gamma subunit will become UreA, the beta subunit UreB, and the alpha subunit UreC (Fig. 3). The general structure of one large subunit and two smaller subunits has been observed in other bacterial ureases, with few exceptions. The enzymes of Selenomonas ruminantium, Klebsiella aerogenes, Sporosarcina ureae, P. mirabilis (6, 34), Ureaplasma urealyticum (33), Providencia stuartii (25), and M. morganii (Hu et al., Abstr. Annu. Meet. Am. Soc. Microbiol. 1989) have all been shown to have this subunit structure. In contrast, reports have been published of ureases with only a single large subunit for Bacillus pasteurii (8), Brevibacterium ammoniagenes (26), and Spirulina maxima (7). A possible explanation for this difference may be that small subunits were overlooked on low-percentage polyacrylamide gels. A true exception to this subunit structure in bacteria appears to be the urease from Campylobacter pylori, which has been reported to have only two subunits of 65 and 31 kDa (9; L. Hu and H. L. T. Mobley, unpublished data). With the assumption that the P. mirabilis urease is similar to the ureases produced by K. aerogenes and Providencia stuartii, the probable stoichiometry of the native enzyme would be two of the large subunits and four of each of the smaller subunits, to give a native molecular weight of approximately 215 kDa. Perhaps the most surprising and interesting result was the high percentage of similarity between the three subunits of the P. mirabilis urease and the subunit of the jack bean urease. This similarity suggests an evolutionary relationship between the eucaryotic jack bean urease and the procaryotic P. mirabilis urease. Interestingly, sequences which were very similar to the intron splice acceptor concensus sequence were found in the DNA between the ureA and ureB ORFs of P. mirabilis. One could speculate that this region is a remnant of sequences which allowed ancestral genes of these two cistrons to be spliced, resulting in the formation of a fusion UreA-UreB subunit. Examination of the junction between the ureB and ureC cistrons showed that the two ORFs share a single nucleotide. The third residue (adenosine) in the codon of the last amino acid of UreB was the first residue of the start codon of UreC. Further evolution to a single urease ORF from the hypothetical fusion ureA-ureB ORF and the remaining ureC gene could occur most simply by an insertion of an adenosine residue after bp 1924, resulting in a frameshift mutation which would allow translation of a single large urease subunit of 83.5 kDa. Sequence analysis of the P. mirabilis urease gene complex provided information which will be valuable for future study of the operon. The coordinates of each ORF were determined, making it possible to isolate and study specific gene-polypeptide relationships and identify the function for each gene product. The location and regulation of promoters can be investigated to determine the transcriptional organization of the operon and to provide insight as to how the operon is controlled in the pathogenic process. ACKNOWLEDGMENTS

This work was supported in part by Public Health Service grants A123328 and AG04393 from the National Institutes of Health. We thank Merrill Snyder and Robert Hausinger for editorial review and Jim Kaper for assistance with data analysis. LITERATURE CITED 1. Ansorge, W., and S. Labeit. 1984. Field gradients improve resolution on DNA sequencing gels. J. Biochem. Biophys. Methods 10:237-243.

6421

2. Ausubel, F. M. 1984. Regulation of nitrogen fixation genes. Cell 37:5-6. 3. Ausubel, F. M., R. Brent, R. E. Kingston, D. D. Moore, J. A. Smith, J. G. Seidman, and K. Struhl (ed). 1987. Current protocols in molecular biology, p. 1.1.3. Greene Publishing Associates and John Wiley & Sons, Inc., New York. 4. Birnboim, H. C., and J. Doly. 1979. A rapid alkaline extraction procedure for screening recombinant plasmid DNA. Nucleic Acids Res. 7:1513-1523. 5. Braude, A. I., and J. Siemienenski. 1960. Role of bacterial urease in experimental pyelonephritis. J. Bacteriol. 80:171-179. 6. Brietenbach, J. M., and R. P. Hausinger. 1988. Proteus mirabilis urease: partial purification and inhibition by boric and boronic acids. Biochem. J. 250:917-920. 7. Carvajal, N., M. Fernandez, J. P. Rodriguez, and M. Donoso. 1982. Urease of Spirulina maxima. Phytochemistry 21:28212823. 8. Christians, S., and H. Kaltwasser. 1986. Nickel-content of urease from Bacillus pasteurii. Arch. Microbiol. 145:51-55. 9. Clayton, C. L., B. W. Bren, P. Muliany, A. Topping, and S. Tabaqchali. 1989. Molecular cloning and expression of Campylobacter pylori species-specific antigens in Escherichia coli K-12. Infect. Immun. 57:623-629. 10. Ebright, R. H. 1982. Sequence homologies in the DNA of six sites known to bind to the catabolite activator protein of Escherichia coli, p. 91-99. In J. P. Griffin and W. L. Duax (ed.), Molecular structure and biological activity. Elsevier Science Publishing, Inc., New York. 11. Evans, R. M., and S. M. Hollenberg. 1988. Zinc fingers: gilt by association. Cell 52:1-3. 12. Fasman, G. (ed.). 1976. CRC handbook of biochemistry and molecular biology, nucleic acids, vol. 2, p. 104-114, CRC Press, Inc., Cleveland, Ohio. 13. Friedrich, B., and B. Magasanik. 1977. Urease of Klebsiella aerogenes: control of its synthesis by glutamine synthetase. J. Bacteriol. 131:446-452. 14. Griffith, D. P., D. M. Musher, and C. Itin. 1976. Urease. The primary cause of infection-induced urinary stones. Invest. Urol.

13:346-350.

15. Jones, B. D., and H. L. T. Mobley. 1988. Proteus mirabilis urease: genetic organization, regulation, and expression of structural genes. J. Bacteriol. 170:3342-3349. 16. Kyte, J., and R. F. Doolittle. 1982. A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 156:105-132. 17. Mamiya, G., K. Takishima, M. Masakuni, T. Kayumi, K. Ogawa, and T. Sekita. 1985. Complete amino acid sequence of jack bean urease. Proc. Jpn. Acad. 61:395-398. 18. Maniatis, T., E. F. Fritsch, and J. Sambrook. 1982. Molecular cloning: a laboratory manual. Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y. 19. Marzluff, G. A. 1981. Regulation of nitrogen metabolism and gene expression in fungi. Microbiol. Rev. 45:437-461. 20. Michaelis, S., and J. Beckwith. 1982. Mechanism of incorporation of cell envelope proteins in Escherichia coli. Annu. Rev. Microbiol. 36:435-465. 21. Mobley, H. L. T., and R. P. Hausinger. 1988. Microbial ureases: significance, regulation, and molecular characterization. Microbiol. Rev. 53:85-108. 22. Mobley, H. L. T., B. D. Jones, and A. E. Jerse. 1986. Cloning of urease gene sequences from Providencia stuartii. Infect. Immun. 54:161-169. 23. Mobley, H. L. T., and J. W. Warren. 1987. Urease-positive bacteriuria and obstruction of long-term urinary catheters. J. Clin. Microbiol. 25:2216-2217. 24. Mount, S. M. 1982. A catalogue of splice junction sequences. Nucleic Acids Res. 10:459-472. 25. Mulrooney, S. B., M. J. Lynch, H. L. T. Mobley, and R. P. Hausinger. 1988. Purification, characterization, and genetic organization of recombinant Providencia stuartii urease expressed by Escherichia coli. J. Bacteriol. 170:2202-2207. 26. Nakano, H., S. Takenishi, and Y. Watanabe. 1984. Purification and properties of urease from Brevibacterium ammoniagenes.

6422

JONES AND MOBLEY

Agric. Biol. Chem. 48:1495-1502. 27. Perlman, D., and H. 0. Halvorson. 1983. A putative signal peptidase recognition site and sequence in eucaryotic and procaryotic signal peptides. J. Mol. Biol. 167:391-409. 28. Pusteli, J., and F. C. Kafatos. 1984. A convenient and adaptable package of computer programs for DNA and protein sequence management, analysis, and homology determination. Nucleic Acids Res. 12:643-655. 29. Rosenberg, M., and D. Court. 1979. Regulatory sequences involved in the promotion and termination of RNA transcription. Annu. Rev. Genet. 13:319-353. 30. Rubin, R. H., N. E. Tolkoff-Rubin, and R. S. Cotran. 1986. Urinary tract infection, pyelonephritis, and reflux nephropathy, p. 1085-1141. In B. M. Brenner and F. C. Rector (ed.), The kidney. The W. B. Saunders Co., Philadelphia. 31. Shine, J., and L. Dalgarno. 1974. The 3'-terminal sequence of Escherichia coli 16S ribosomal RNA: complementarity to nonsense triplets and ribosome binding sites. Proc. Natl. Acad. Sci. USA 71:1342-1346. 32. Silhavy, T., S. Benson, and S. Emr. 1983. Mechanisms of protein

J. BACTERIOL. localization. Microbiol. Rev. 47:313-344. 33. Thirkeli, D., A. D. Myles, B. L. Precious, J. S. Frost, J. C. Woodall, M. G. Burdon, and W. C. RusseUl. 1989. The urease of Ureaplasma urealyticum. J. Gen. Microbiol. 135:315-323. 34. Todd, M. J., and R. P. Hausinger. 1987. Purification and characterization of the nickel-containing multicomponent urease from Klebsiella aerogenes. J. Biol. Chem. 262:5963-5967. 35. Von Heijne, G. 1983. Patterns of amino acids near signalsequence cleavage sites. Eur. J. Biochem. 133:17-21. 36. Walker, J. E., M. Saraste, M. J. Runswick, and N. J. Gay. 1982. The ATP operon-nucleotide-sequence of the genes for the gamma-subunit, beta-subunit, epsilon-subunit of Escherichia coli ATP synthase. EMBO J. 1:945-951. 37. Walz, S. E., S. K. Wray, S. I. Hull, and R. A. Hull. 1988. Multiple proteins encoded within the urease gene complex of Proteus mirabilis. J. Bacteriol. 170:1027-1033. 38. Warren, J. W., D. Damron, J. H. Tenney, J. M. Hoopes, B. Deforge, and H. L. Muncie, Jr. 1987. Fever, bacteremia, and death as complications of bacteriuria in women with long-term urethral catheters. J. Infect. Dis. 155:1151-1158.