Proteomics Reveals Open Reading Frames in Mycobacterium ...

5 downloads 1144 Views 1MB Size Report
desorption ionization and nano-electrospray mass spectrometry. Each year ... Mailing address: Max Planck Institute for ... E-mail: [email protected].
INFECTION AND IMMUNITY, Sept. 2001, p. 5905–5907 0019-9567/01/$04.00⫹0 DOI: 10.1128/IAI.69.9.5905–5907.2001 Copyright © 2001, American Society for Microbiology. All Rights Reserved.

Vol. 69, No. 9

Proteomics Reveals Open Reading Frames in Mycobacterium tuberculosis H37Rv Not Predicted by Genomics ¨ LLER,2 JENS MATTOW,3 PETER R. JUNGBLUT,1* EVA-CHRISTINA MU 3 AND STEFAN H. E. KAUFMANN Core Facility for Protein Analysis1 and Department of Immunology,3 Max Planck Institute for Infection Biology, and Protein Chemistry, Max Delbru ¨ck Center,2 Berlin, Germany Received 23 February 2001/Returned for modification 27 March 2001/Accepted 25 May 2001

Genomics revealed the sequence of 3924 genes of the H37Rv strain of Mycobacterium tuberculosis. Proteomics complements genomics in showing which genes are really expressed, and here we show the expression of six genes not predicted by genomics, as proved by two-dimensional electrophoresis and matrix-assisted laser desorption ionization and nano-electrospray mass spectrometry. The proteome reflects the functional status of a cell in response to environmental stimuli and thus serves as a valuable complement to genomics. In searching for novel strategies for immune intervention, we have initiated a systematic proteome investigation by comparing the protein compositions of virulent M. tuberculosis strains with attenuated vaccine strains (4). Approximately 1,800 protein spots were separated by twodimensional electrophoresis (2-DE) and, despite the similarity of the overall patterns, distinct and reproducible differences were detected between the strains. Only ⫹/⫺ variants were accepted, which occurred in all gels of independent prepara-

Each year eight million new cases and two million deaths are caused by tuberculosis (5). Therefore, the World Health Organization (WHO) declared tuberculosis to be a global emergency, and new strategies toward the prevention and therapy are urgently required. Six years after the first publication of a complete bacterial genome (3), the complete genomes of 38 microorganisms have been sequenced (http://www-fp.mcs.anl .gov/⬃gaasterland/genomes.html and http://www.tigr.org/tdb /mdb/mdbcomplete.html), including Mycobacterium tuberculosis strain H37Rv (1). The sequencing of the genome of a clinical isolate of M. tuberculosis, CDC1551, is also nearly complete (http://www.tigr.org/tdb/CMR/gmt/htmls/SplashPage.html).

FIG. 1. Sector 5 of M. tuberculosis H37Rv 2-DE pattern. Proteins were stained with silver nitrate. The Mr range between 6 and 15 kDa and the pI range between 4 and 6 are shown. The spots numbered were sequenced de novo by nanospray MS/MS and revealed ORFs not predicted previously. * Corresponding author. Mailing address: Max Planck Institute for Infection Biology, Core Facility for Protein Analysis, Schumannstr. 21-22, D-10117 Berlin, Germany. Phone: 49-30-28460133. Fax: 49-302846-0507. E-mail: [email protected].

FIG. 2. MS analysis of spot 5_98. (a) Spectrum of the trypsinized protein. Labeled peptides were fragmented to obtain sequence information. (b) fragmentation pattern of the peptide with an m/z of 708.36 identified as VEIEVDDDLIQK. 5905

5906

NOTES

INFECT. IMMUN.

TABLE 1. Protein identification by n-ESI-MS/MS (boldface residues) and MALDI MS (underlined residues) of previously unpredicted ORFs of M. tuberculosis H37Rv Spot

H37Rv Sequence

Sanger EMBL accession no.

ORF detected in CDC1551

Comparison of H37Rv and CDC1551

Mr

pI

100% identity

8,872

4.5

10,118

4.9

9,403

4.9

11,309

5.9

5_37

GGAPVARVVV POGQAIVGAL VRQGKRFELE AEIAESLLAN DPQ

HVMPKAEILD GRLGHLGISD VDDTVDDTTL TVIEDWTISR

Z80226 (32260–32508)

5_53

MPMEGATVEV LVFSSAQTPS RDDSGLLTLT TARIAYVEIG VGVDAAAGSA

KIGITDSPRE EVEELVSNAL DERGRRFLIH VADARRVGFG GKVATSG

Z95120 (5517–5311)

03128

100% identity

5_98

LGSDCGCGGY EVDDDLIQKV EAVNLALRTL HDDEYDEFSD DTG

LWSMLKRVEI IRRYRVKGAR LGEADTAEHG PNAWVPRRSR

Z92772 (17111–17359)

D0043

Leu-13Met-1 in CDC1551

5_115

PVTVYRRGMA LHDLNGWQRA PTFMAGIDAV NHHPDIDIRW AVGGITENDI GA

VLTDEQVDAA GGVLRRSIKF RRVAERAEEV RTVTFALVTH AMAHDIDAMF

Z95584 (24791–24486)

06120

Pro-1-Val-23Met-1 in CDC1551

5_123

VQEGGPQETM SARSTQHDAA DALFRAIIET LDKHRNERTL TEDVLDTLAR AYASISTNVP EQGRLG

AL021646 (44673–44494)

03103

Val-13Met-1 in CDC1551

7,253

4.9

5_139

MSNHTYRVIE IVGTSPDGVD AAIQGGLARA AQTMRALDWF EVQSIRGHLV DGAVAHFQVT MKVGFRLEDS

Z79701 (17944–17735)

00401

100% identity

7,629

5.8

tions of six virulent and six attenuated strains. A total of 263 proteins were identified by Matrix-assisted laser desorption ionization (MALDI) mass spectrometry (MS) and a bioinformatics platform was constructed to store our data and connect it by hyperlinks with the genomics data (10) (http://www.mpiib berlin.mpg.de/2D-PAGE/). Using this proteome approach, namely, a combination of 2-DE (6) and MS, we detected six genes previously not predicted in the genome of M. tuberculosis H37Rv. Our data demonstrate the value of proteomics in identifying gene products undetected by the genomics approach. M. tuberculosis H37Rv was grown in Middlebrook medium for 6 to 8 days to a cell density of 1 ⫻ 108 to 2 ⫻ 108 cells/ml. The cells were washed and sonicated in the presence of proteinase inhibitors, and the proteins were treated with urea, dithiothreitol, and Triton X-100 to obtain final concentrations of 9 M, 70 mM, and 2%, respectively (4). Up to 900 ␮g of proteins were separated in preparative 2-DE gels (23 by 30 cm) and stained with Coomassie brilliant blue (CBB) G-250 (2). Spot positions were assigned to the standard 2-DE pattern, in which proteins are detected by silver staining. Given that proteins are detectable by CBB, the sequence coverage is superior when CBB-stained spots are the starting material compared to the use of silver-stained spots (11). Therefore, we started identification with CBB-stained spots. Peptide mass fingerprints were obtained by tryptic in-gel digestion and MALDI MS (Voyager Elite; Perseptive Biosystems, Framingham, Mass.) (7). Sequence information resulted from nanoelectro-

spray-tandem MS (nano-ESI-MS/MS) (Q-TOF; Micromass, Manchester, United Kingdom). The sequence tag method (8) was used to search the proteins in a translated protein sequence database (http://195.41.108.38/PA_PeptidePatternForm .html). If no protein matched, de novo sequencing was performed. Then the tBLASTN program of the National Center for Biotechnology Information (NCBI) (http://www.ncbi.nlm .nih.gov:80/blast.cgi?Jform⫽1) and the sequence search program of the Institute for Genome Research (TIGR) (http: //www.tigr.org/tdb/CMR/gmt/htmls/SeqSearch.html) were applied to search within the entire genome of M. tuberculosis H37Rv and the clinical isolate CDC1551. Detailed investigations were focused on 190 spots in the pI range from 4 to 6 and the Mr range from 6 to 15 kDa representing about one-sixth of the whole 2-DE gel and one-tenth of all spots of the complete gel (9). Sixty-two 2-DE spots were identified by their peptide mass fingerprints, and ten further spots needed sequence information by n-ESI-MS/MS for their identification. Eleven spots contained more than one protein. Ten genes gave rise to more than one protein species. Within this sector of the gel (Fig. 1) sequences of six proteins could not be assigned to genes of M. tuberculosis H37Rv. As an example for the MS analysis, the identification of spot 5_98 is shown in Fig. 2a with the MS spectrum of the peptide mixture after digestion with trypsin, and in Fig. 2b with the MS/MS spectrum obtained by fragmentation of one peptide. Open reading frames (ORFs) were found in the genome of the strain CDC1551 for five spots,

VOL. 69, 2001

and no ORF was found for one spot (Table 1). A search in the genome of M. tuberculosis H37Rv revealed the presence of these DNA sequences, suggesting that the ORFs were not recognized by the search algorithms used by Cole et al. (1). The predicted Mr values from theoretical gene sequences are in the same range as the ones estimated by 2-DE. Three of the gene sequences are completely identical between H37Rv and CDC1551 (5_53, 5_139, and 5_37). The reasons for the failure of detection of these ORFs in H37Rv remain elusive. In contrast, the exchange of methionine in position 1 in 5_98, 5_123, and 5_115 by leucine, valine, and proline-valine, respectively, may have prevented the detection of the starting codon. Spot 5_53 contains two further proteins: 14-kDa antigen (SwissProt: 14KD_MYCTU) and hypothetical protein Rv2626c (PIR: A70573). The protein of spot 5_37 was predicted neither in the H37Rv nor CDC1551 genome so far. A hypothetical M. leprae protein (SwissProt: Y525_MYCLE) shows 83.5% similarity to the new ORF. Recently, a sequence as part of an U.S. patent was published (EMBLNEW: AX023830) identical to the sequence of spot 5_53 without the residues 1 to 7 and methionine instead of valine as residue 8. MALDI MS proved highly effective in the rapid identification of the main components of a 2-DE gel, if the proteins are known in a sequence database. A more detailed analysis of spots in 2-DE gels by nano-ESI-MS/MS elucidated additional proteins per spot and additional genes not predicted from genome investigations. Our findings illustrate the value of proteomics in complementing genomics in both functional and genomic analyses. Proteomics is a further building block to unravel the molecular network in bacterium-host interactions, a prerequisite for the development of new vaccines to fight against infectious diseases like tuberculosis. This work was supported by Chiron Behring, Marburg, Germany, and the WHO (Global Programme for Vaccines and Immunization– Vaccine Research and Development).

Editor: R. N. Moore

NOTES

5907

REFERENCES 1. Cole, S. T., R. Brosch, J. Parkhill, T. Garnier, C. Churcher, D. Harris, S. V. Gordon, K. Eiglmeier, S. Gas, C. E. Barry, F. Tekaia, K. Badcock, D. Basham, D. Brown, T. Chillingworth, R. Connor, R. Davies, K. Devlin, T. Feltwell, S. Gentles, N. Hamlin, S. Holroyd, T. Hornsby, K. Jagels, and B. G. Barrell. 1998. Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature 393:537–544. 2. Doherty, N. S., B. H. Littman, K. Reilly, A. C. Swindell, J. M. Buss, and N. L. Anderson. 1998. Analysis of changes in acute-phase plasma proteins in an acute inflammatory response and in rheumatoid arthritis using two-dimensional gel electrophoresis. Electrophoresis 19:355–363. 3. Fleischmann, R. D., M. D. Adams, O. White, R. A. Clayton, E. F. Kirkness, A. R. Kerlavage, C. J. Bult, J. F. Tomb, B. A. Dougherty, J. M. Merrick, K. Mckenney, G. Sutton, W. Fitzhugh, C. Fields, J. D. Gocayne, J. Scott, R. Shirley, L. I. Liu, A. Glodek, J. M. Kelley, J. F. Weidman, C. A. Phillips, T. Spriggs, E. Hedblom, M. D. Cotton, J. C. Venter, et al. 1995. Whole-genome random sequencing and assembly of Haemophilus influenzae RD. Science 269:496–511. 4. Jungblut, P. R., U. E. Schaible, H.-J. Mollenkopf, U. Zimny-Arndt, B. Raupach, J. Mattow, P. Halada, S. Lamer, K. Hagens, and S. H. E. Kaufmann. 1999. Comparative proteome analysis of Mycobacterium tuberculosis and Mycobacterium bovis BCG strains: towards functional genomics of microbial pathogens. Mol. Microbiol. 33:1103–1117. 5. Kaufmann, S. H. E. 2000. Is the development of a new tuberculosis vaccine possible? Nat. Med. 6:955–960. 6. Klose, J., and U. Kobalz. 1995. Two-dimensional electrophoresis of proteins: an updated protocol and implications for a functional analysis of the genome. Electrophoresis 16:1034–1059. 7. Lamer, S., and P. R. Jungblut. 2001. Matrix-assisted laser desorption-ionization mass spectrometry peptide mass fingerprinting for proteome analysis: identification efficiency after on-blot or in-gel digestion with and without desalting procedures. J. Chromatogr. B 752:311–322. 8. Mann, M., and M. Wilm. 1994. Error tolerant identification of peptides in sequence databases by peptide sequence tags. Anal. Chem. 66:4390–4399. 9. Mattow, J., P. R. Jungblut, E.-C. Mu ¨ller, and S. H. E. Kaufmann. 2001. Identification of acidic, low molecular mass proteins of Mycobacterium tuberculosis strain H37Rv by MALDI- and ESI-mass spectrometry. Proteomics 1:494–507. 10. Mollenkopf, H.-J., P. R. Jungblut, B. Raupach, J. Mattow, S. Lamer, U. Zimny-Arndt, U. E. Schaible, and S. H. E. Kaufmann. 1999. A dynamic two-dimensional polyacrylamide gel electrophoresis database: the mycobacterial proteome via the internet. Electrophoresis 20:2172–2180. 11. Scheler, C., S. Lamer, Z. Pan, X.-P. Li, J. Salnikow, and P. Jungblut. 1998. Peptide mass fingerprint sequence coverage from differently stained proteins on 2-DE patterns by MALDI-MS. Electrophoresis 19:918–927.