Genomewide transcriptional changes associated ... - Semantic Scholar

0 downloads 0 Views 196KB Size Report
metabolism (3), including the yhaG (trpP) gene that is believed to encode a transmembrane protein involved in tryptophan transport (26). The hierarchical cluster ...
Genomewide transcriptional changes associated with genetic alterations and nutritional supplementation affecting tryptophan metabolism in Bacillus subtilis Randy M. Berka*, Xianju Cui*, and Charles Yanofsky†‡ *Novozymes Biotech, Inc., Davis, CA 95616; and †Department of Biological Sciences, Stanford University, Stanford, CA 94305 Contributed by Charles Yanofsky, March 19, 2003

DNA microarrays comprising ⬇95% of the Bacillus subtilis annotated protein coding ORFs were deployed to generate a series of snapshots of genomewide transcriptional changes that occur when cells are grown under various conditions that are expected to increase or decrease transcription of the trp operon segment of the aromatic supraoperon. Comparisons of global expression patterns were made between cells grown in the presence of indole acrylic acid, a specific inhibitor of tRNATrp charging; cells deficient in expression of the mtrB gene, which encodes the tryptophanactivated negative regulatory protein, TRAP; WT cells grown in the presence or absence of two or three of the aromatic amino acids; and cells harboring a tryptophanyl tRNA synthetase mutation conferring temperature-sensitive tryptophan-dependent growth. Our findings validate expected responses of the tryptophan biosynthetic genes and presumed regulatory interrelationships between genes in the different aromatic amino acid pathways and the histidine biosynthetic pathway. Using a combination of supervised and unsupervised statistical methods we identified ⬇100 genes whose expression profiles were closely correlated with those of the genes in the trp operon. This finding suggests that expression of these genes is influenced directly or indirectly by regulatory events that affect or are a consequence of altered tryptophan metabolism.

omologous protein domains are used by Bacillus subtilis and Escherichia coli to catalyze the same reactions in the biosynthesis of the aromatic amino acids (1). Despite this similarity, very different regulatory proteins and mechanisms are used by these bacteria to regulate aromatic amino acid synthesis. These differences must be partly caused by the different evolutionary histories and experiences of these microorganisms. Operon organization also is somewhat different in the two species, reflecting regulatory interrelationships described as cross-pathway control, that exist between genes for different pathways in B. subtilis that are not evident in E. coli. Thus, the six-gene trp operon of B. subtilis resides within an aromatic supraoperon that contains six additional genes, three upstream and three downstream, concerned with the common aromatic pathway and with phenylalanine, tyrosine, and histidine biosynthesis (Fig. 1). The seventh trp gene, trpG (pabA), is in the folate operon. This gene specifies a protein that functions both in tryptophan and folate biosynthesis; presumably because of this, it is subject to regulation by both metabolites. In E. coli the five-gene trp operon encodes all seven protein domains needed for tryptophan biosynthesis; two of these genes encode fused protein domains that engender bifunctional polypeptides (2). The B. subtilis aromatic supraoperon has three promoters as shown in Fig. 1 (1, 3). One promoter is located before the three genes upstream of the six-gene trp operon. A second promoter immediately precedes the trp operon segment. These two promoters provide trp operon transcripts. The third supraoperon promoter is within trpA, the last trp gene; it provides transcripts derived from the last three genes of the supraoperon. A major regulatory difference between E. coli and B. subtilis is that E. coli uses a DNA-binding repressor protein to control transcription

H

5682–5687 兩 PNAS 兩 May 13, 2003 兩 vol. 100 兩 no. 10

initiation at the trp operon promoter兾operator. Transcription initiation is not known to be regulated at the trp operon promoter of B. subtilis. Transcription of the structural genes in the trp operons of both organisms is regulated by transcription attenuation, but by different mechanisms. In B. subtilis a tryptophan-activated RNA-binding protein (TRAP), the product of the mtrB gene, regulates attenuation. The mtrB coding sequence resides in a two-gene operon with mtrA, which specifies GTP cyclohydrolase I, the enzyme catalyzing the first step in pterin formation in folic acid biosynthesis. In addition to these aromatic-folate cross-pathway features, at least one additional operon, rtpA-ycbK, appears to play an important regulatory role in trp operon expression in B. subtilis (4, 5). Transcription of rtpA-ycbK is regulated by the T box antitermination mechanism in response to a deficiency of charged tRNATrp (4, 6). Expression of the rtpA-ycbK operon leads to the synthesis of the anti-TRAP regulatory protein AT, the rtpA gene product (5). AT can inactivate TRAP and when it does this allows trp operon transcription and trpG translation. AT production is also regulated translationally, in response to the accumulation of uncharged tRNATrp (G. Chen and C.Y., unpublished data). Many of the known B. subtilis genes that are presumed to play a role in aromatic amino acid metabolism and folic acid synthesis and regulation are presented in Fig. 1. B. subtilis also lacks a structural homolog of TyrR, the major regulatory protein of E. coli that controls expression of the common aromatic pathway genes and the genes required for phenylalanine and tyrosine synthesis. However, some of these B. subtilis genes are regulated in response to tyrosine or phenylalanine accumulation. In addition, in B. subtilis, a single gene, aroA, specifies 3-deoxy-D-arabino-heptulosonate-7-phosphate (DAHP) synthase, the enzyme that catalyzes the first step in the common aromatic pathway. Synthesis of this enzyme is regulated only indirectly by aromatic amino acids (1, 3). E. coli produces three nearly identical DAHP synthases that catalyze this reaction, and each is subject to transcriptional regulation principally by a different aromatic amino acid. The present study parallels a similar investigation of E. coli genes that respond transcriptionally to culture conditions and genetic alterations that influence tryptophan metabolism (7). Herein we describe the application of B. subtilis DNA microarrays as an initial step toward our goal of determining the global effects on gene expression of varying tryptophan and charged tRNATrp availability in B. subtilis. Materials and Methods B. subtilis Strains. The following B. subtilis strains were used in

these studies: CYBS400 (WT) (4), CYBS222 (mtrB⫺) [a frameshift mutation located near the end of the mtrB coding region yields a TRAP protein with reduced activity (3)], and BS1A353 Abbreviations: TRAP, tryptophan-activated RNA-binding protein; PC, principal component; PCA, PC analysis. ‡To

whom correspondence should be addressed. E-mail: [email protected].

www.pnas.org兾cgi兾doi兾10.1073兾pnas.1031606100

Table 1. B. subtilis cells and growth conditions used as sources of RNA for microarray studies Experiment 1 2

3

4

5

(trpS1), a mutation in the tryptophanyl-tRNA synthetase structural gene resulting in temperature-sensitive tryptophandependent growth (8). Culture Conditions and Isolation of RNA. Cultures (50 ml) were

grown to midlog phase with shaking in minimal medium (9) plus trace elements, plus and minus various supplements, at 37°C. Where indicated, the following supplements were included: 30 ␮g兾ml indole acrylic acid, 50 ␮g兾ml phenylalanine, 50 ␮g兾ml tryptophan, 50 ␮g兾ml tyrosine, or 0.2% acid hydrolyzed casein. Harvested cultures were chilled rapidly on ice, 1 ml of 2 M sodium azide was added to each culture, and the cells were collected by centrifugation at 4°C. The pelleted cells were resuspended in minimal medium plus azide, recentrifuged, and frozen. For RNA isolation the pellets were resuspended in 2.5 ml of ice-cold sterile water. RNA was prepared by using the Bio101 FastRNA Blue Kit (Qbiogene, Carlsbad, CA), according to the manufacturer’s instructions, with minor modifications (see Supporting Text, which is published as supporting information on the PNAS web site, www.pnas.org). DNA Microarrays. For the first six experiments listed in Table 1, we

constructed DNA microarrays consisting of PCR-amplified B. subtilis 168 ORFs by using a complete set of ORF-specific PCR primers as described (10). Primers were designed based on the ORFs listed on the SubtiList database (http:兾兾genolist. pasteur.fr兾SubtiList). The primers were designed to amplify each of the ⬇4,100 protein-coding ORFs listed on the SubtiList database). For the seventh experiment we used microarrays that were prepared by spotting ORF-specific oligonucleotides (60 mers, selected by using protein-coding ORFs listed in the SubtiList database) purchased from Compugen (Jamesburg, NJ).

Synthesis of cDNA Probes and Hybridization Conditions. Fluorescent probes were prepared by reverse transcription of 25 ␮g of total RNA from B. subtilis to incorporate aminoallyl-dUTP into first-strand cDNA. The amino-cDNA products were subsequently labeled by direct coupling to either Cy3 or Cy5 monofunctional reactive dyes (Amersham Pharmacia). The details of Berka et al.

6

7

Replicates*

WT in minimal medium versus WT in minimal ⫹ 50 ␮g兾ml tryptophan WT grown in minimal medium ⴙ 30 ␮g兾ml indole acrylic acid versus WT grown in minimal medium mtrB-deficient mutant grown in minimal medium ⴙ 50 ␮g兾ml tryptophan versus WT grown in minimal medium ⫹ 50 ␮g兾ml tryptophan WT grown in minimal medium with 50 ␮g兾ml each phenylalanine and tyrosine versus WT grown in minimal medium WT grown in minimal medium versus WT grown in minimal medium with 50 ␮g兾ml each tryptophan, phenylalanine, and tyrosine WT grown in minimal medium with 50 ␮g兾ml each phenylalanine and tyrosine versus WT grown in minimal medium with 50 ␮g兾ml each tryptophan, phenylalanine, and tyrosine trpS1ts mutant grown at 38°C in minimal medium ⴙ 0.2% acid hydrolyzed casein ⴙ 50 ␮g兾ml tryptophan versus WT grown in the same medium

8 6

8

5

7

7

8

To minimize intensity biases that are sometimes observed with the use of Cy3兾Cy5 dyes we used a dye-swapping strategy for some of the experiments listed. Consequently, the fluorescence intensity ratios were calculated as the intensity derived from one cDNA probe (growth condition or strain) divided by the other. The inducing condition, indicated in boldface type, corresponds to the cells indicated by * in Table 2. The term “replicates” as used here refers to the number of times the B. subtilis genome was queried with fluorescently labeled cDNA probes derived from the culture conditions listed.

this protocol can be found at http:兾兾cmgm.stanford.edu兾 pbrown兾protocols. Cy3- and Cy5-labeled probes were combined, denatured, and applied to a microarray slide under a cover glass, placed in a humidified chamber, and incubated overnight (15–16 h) in a water bath at 63°C (11). Before scanning, the arrays were washed consecutively in 1⫻ SSC with 0.03% SDS, 0.2⫻ SSC, and 0.05⫻ SSC and centrifuged for 2 min at 500 rpm to remove excess liquid. The hybridization and washing conditions were the same for both PCR-amplified ORFs and oligonucleotide-based microarrays. Lastly, the slides were imaged by using an Axon 4000B scanner (Axon Instruments, Union City, CA). Treatment of Microarray Data. The fluorescence intensity values for microarray spots were quantified (including background subtraction), and the resulting figures were normalized by using the Lowess function provided in GENESPRING software (Silicon Genetics, Redwood City, CA). Genes whose expression was significantly up- or down-regulated in each experiment were identified by using SAM (significance analysis for microarrays) software (12). Those genes for which the fluorescence intensity ratios were not significantly altered (up or down) in any of the experiments were excluded from further analysis. Principal component analysis (PCA), hierarchical clustering, and K-means cluster analysis of the data were done by using the algorithms included in GENESPRING. The PATHWAY TOOLS suite of software (13, 14) was used to superimpose the gene expression data onto a predicted metabolic network for B. subtilis that was generated based on the complete genome annotations extracted from GenBank (www.ncbi.nlm.nih.gov). PNAS 兩 May 13, 2003 兩 vol. 100 兩 no. 10 兩 5683

BIOCHEMISTRY

Fig. 1. Known relationships among the genes of the aromatic supraoperon. Colored lines connect the genes that are directly responsible for synthesis of tryptophan (red), folate (blue), histidine (green), phenylalanine (gray), chorismate (violet), and tyrosine (brown). Promoters are denoted by P⬘. Orange rectangles represent leader regulatory regions controlled by tRNA-mediated antitermination, and gray boxes define regions at which TRAP (mtrB gene product) binds and regulates translation. The promoter of the trp operon itself (trpEDCFBA) is denoted as a violet box because TRAP regulates transcription at this site. Anti-TRAP (rtpA gene product) forms a complex with TRAP and inhibits its activity and is noted as AT.

B. subtilis strains and culture conditions

Results and Discussion Experimental Design, Data Collection, and Significance Testing. Mi-

croarray-based gene expression comparisons were performed with mutant and兾or WT B. subtilis cultures grown in parallel to log phase under the conditions listed in Table 1. Although the cultures were harvested at a single time point, we analyzed mRNA profiles from seven different culture conditions and兾or mutant strains to provide informative comparisons of global transcription profiles. Background-subtracted intensity data collected from these experiments were normalized by using the Lowess method (GENESPRING software). Genes for which transcript levels were significantly up- or down-regulated in each experiment were identified by using SAM software (12). SAM does not rely on the magnitude of the fold-change in intensity ratios, but rather assigns a score to each gene based on the change in gene expression relative to the standard deviation of replicate microarray measurements. The ‘‘q value’’ derived for each gene corresponds to the lowest false discovery rate (FDR) at which the gene is called significant. It is analogous to the well-known statistical P value, but modified for multiple testing circumstances. Several published reports have noted that the use of an arbitrary fold-change value as the criterion for determining upor down-regulation produces inappropriately high FDR values (12, 15, 16). Although the number of genes that SAM deemed significant varied among the seven data sets, we selected fairly stringent statistical limits so that the median FDR was acceptably low (1.04–2.52%). In addition to genes known to be concerned with aromatic amino acid metabolism, many other genes were observed to be up-regulated or down-regulated under each of the seven experimental comparisons. There were 544 genes for which transcript levels were not significantly altered in any of the seven experiments; these were excluded from subsequent analyses. Changes in Relative mRNA Levels for Genes of the trp, dhb, and his Operons. Table 2 lists the mean fluorescence intensity ratios (i.e.,

fold-change) for 45 genes whose transcript levels were previously reported to be altered by perturbations of aromatic amino acid metabolism. For those transcript levels that were significantly altered according to SAM, the q value is also given. Nearly all of the genes that SAM classified as significant yielded q values substantially lower than the false-discovery rates of 1% that we selected to filter the data. In fact, the mean q value for the upand down-regulated genes listed in Table 2 suggests that on average these genes would be recognized as statistically significant at a false discovery rate of 0.3%. In general, and as we expected, the genes of the trp operon itself and the distal genes of the supraoperon (hisC-tyrA-aroE) responded somewhat similarly (increased expression) to conditions that reduced tryptophan or charged tRNATrp availability. Experiment 1 detected those genes that were up-regulated or down-regulated in cultures grown without tryptophan versus with excess tryptophan. In the second experiment, the addition of indole acrylic acid increased not only expression of the trp operon that respond to tryptophan deficiency, but also expression of other genes such as rtpA-ycbK and trpS that are known to respond to charged tRNATrp deficiency (4, 5). In contrast, as expected, transcription of these three genes was not increased in the mtrB mutant (experiment 3), which overproduces tryptophan and presumably charged tRNATrp. However, expression of the trp operon, the distal genes of the supraoperon, and the genes involved in folate biosynthesis were increased in this comparison. Overexpression of the trp operon presumably also leads to a partial depletion of phenylalanine and tyrosine and their charged tRNAs, because transcription of pheS, pheT, and tyrS was activated by trp operon overexpression (experiment 3). Interestingly, increased transcription of the genes in the his 5684 兩 www.pnas.org兾cgi兾doi兾10.1073兾pnas.1031606100

operon and hisS was also observed in the mtrB mutant (experiment 3) as well as in experiments 4, 5, and 6. Apparently growth without the mtrB product (the TRAP regulatory protein) and兾or tryptophan overproduction deprives the cell of the proteins or intermediates needed for histidine biosynthesis, promoting increased expression of the histidine biosynthetic genes and hisS. An apparent scheme of cross-pathway regulation involving the his operon and genes of aromatic amino acid biosynthesis has been recognized (17, 18). Mutants that were derepressed for both aromatic amino acid biosynthesis and histidine biosynthesis were isolated and mapped to a single locus (19, 20); however, the biochemical basis for this cross-pathway control is unknown. Additionally, by overlaying our expression data onto the predicted biochemical pathways of B. subtilis (13, 14) we observed that genes encoding histidine utilization activities (hut operon) were down-regulated in the mtrB mutant and in comparison 4 (not shown). This finding suggests that expression of these genes may be induced by excess histidine. The results of experiments 4, 5, and 6 suggest that the presence of excess phenylalanine and tyrosine leads to partial tryptophan starvation, resulting in increased expression of genes required for tryptophan biosynthesis. It seems reasonable that histidine utilization activities would be down-regulated under conditions in which histidine biosynthesis is up-regulated. As anticipated, the results of experiments 4, 5, and 6 suggest that expression of the genes involved in the biosynthesis of phenylalanine and tyrosine are decreased by addition of the corresponding amino acids. However, the data from experiment 4 correctly indicate that the presence of phenylalanine and tyrosine in minimal medium partially starves the cells of tryptophan, leading to elevated expression of the trp operon and hisC, tyrA, and aroE, the three downstream genes of the aromatic supraoperon, over the level observed in minimal medium alone. Furthermore, it appears that the genes of the his operon are up-regulated to a greater extent in the presence of added phenylalanine, tyrosine, and tryptophan compared with minimal medium (experiments 4 and 5). Likewise the data derived from experiment 6 suggest that the apparent cross-activation of the his operon is more efficient in medium supplemented with all three aromatic amino acids compared with minimal medium supplemented with phenylalanine and tyrosine only. Nester and coworkers (18, 21) previously observed that tyrosine (or phenylalanine plus tyrosine) represses the synthesis of the bifunctional deoxy-D-arabinoheptulosonate-7-phosphate synthase-chorismate mutase (aroA gene product). The microarray data from experiments 4 and 5 support these earlier studies; however, our data do not confirm the moderate repression of aroH (monofunctional chorismate mutase) and aroI (shikimate kinase) under the same conditions. In experiment 7 that compares gene expression in a trpS mutant and WT B. subtilis cells grown at 38°C in the presence of all of the amino acids, we observed that transcript levels for the trp operon as well as for the trpS and rtpA-ycbK operons were elevated in the trpS1 mutant. This finding implies that under these conditions the cell is deficient in charged tRNATrp, thus overall protein synthesis is somewhat adversely affected. In contrast, we observed that the genes responsible for synthesis of 2,3-dihydroxybenzoate (dhb operon) were decidedly downregulated in the trpS mutant growing at 38°C (not shown). This operon was also significantly down-regulated by tryptophan starvation, indole acrylic acid induction, and loss of mtrB gene function (experiments 1, 2, and 3, respectively). These observations are of interest because 2,3-dihydroxybenzoate is produced from chorismate, the central precursor for aromatic amino acid biosynthesis (22). Thus, it appears there may be specific conditions that not only incite expression of the trp operon, but also reduce transcription of the dhb operon, thereby directing chorismate into tryptophan biosynthesis. Berka et al.

Table 2. Mean fluorescence intensity ratios (i.e., fold-change) for known genes involved in aromatic amino acid metabolism Gene

Exp. 1

Exp. 2

Exp. 3

Exp. 4

Exp. 5

Exp. 6

Exp. 7

aroA aroB aroC aroD aroE aroF aroH aroK folB folC folD folK hisA hisB hisC hisD hisF hisG hisH hisI hisJ hisS hisZ mtrA mtrB pabA pabB pabC pheA pheB pheS pheT suI trpA trpB trpC trpD trpE trpF trpS tyrA tyrS tyrZ ycbK rtpA

0.99 0.92 1.04 0.92 1.24 (0.0066)* 1.16 (0.0106)* 1.00 1.02 1.15 1.31 (0.0046)* 1.06 1.34 (0.0031)* 1.01 1.19 1.17 1.00 1.09 1.11 0.97 1.14 0.82 (0.0046)† 1.21 1.01 0.99 0.99 1.10 1.02 1.29 (0.0031)* 0.91 1.00 0.92 0.93 1.12 1.49 (0.0031)* 2.34 (0.0031)* 1.16 1.29 (0.0031)* 0.92 1.38 (0.0031)* 1.36 (0.0031)* 1.09 0.88 1.00 0.95 1.12

1.18 1.06 1.25 1.09 1.27 (0.0124)* 1.11 1.05 0.94 1.03 1.09 0.95 1.02 0.85 0.79 1.36 (0.0033)* 0.83 0.91 0.83 0.89 0.89 1.30 (0.0108)* 1.18 1.00 1.43 1.36 1.10 1.09 1.25 1.14 0.98 2.00 (0.0033)* 1.17 1.12 1.42 1.91 (0.0033)* 1.38 (0.0124)* 3.56 (0.0033)* 5.11 (0.0033)* 2.42 (0.0033)* 2.61 (0.0033)* 1.27 1.42 (0.0124)* 0.88 2.87 (0.0033)* 3.46 (0.0033)*

1.24 1.37 (0.0003)* 0.99 1.83 (0.0003)* 2.43 (0.0003)* 1.11 1.09 0.99 1.49 (0.0003)* 2.08 (0.0003)* 0.89 1.37 2.54 (0.0003)* 2.33 (0.0003)* 3.70 (0.0003)* 2.56 (0.0003)* 3.06 (0.0003)* 2.49 (0.0003)* 1.87 (0.0003)* 3.01 (0.0003)* 1.18 (0.0003)* 1.97 (0.0003)* 2.04 (0.0003)* 1.76 (0.0003)* 1.15 2.64 (0.0003)* 1.05 3.66 (0.0003)* 1.31 (0.0003)* 1.49 (0.0003)* 2.05 (0.0003)* 1.66 (0.0003)* 1.91 (0.0003)* 2.13 (0.0003)* 2.24 (0.0003)* 1.86 (0.0003)* 8.71 (0.0003)* 6.63 (0.0003)* 5.13 (0.0003)* 0.91 2.00 (0.0003)* 1.31 (0.0021)* 0.80 (0.0007)† 0.61 (0.0003)† 0.87

0.40 (0.0007)† 0.95 (0.0012)† 1.02 0.99 0.62 (0.0007)† 0.77 0.96 0.73 1.29 0.85 (0.0007)† 1.17 1.02 2.25 (0.0007)* 1.38 (0.0007)* 0.83 (0.0007)* 1.49 (0.0007)* 1.91 (0.0007)* 2.17 (0.0007)* 1.62 (0.0007)* 2.15 (0.0007)* 0.93 0.79 (0.0012)† 2.26 (0.0007)* 0.77 (0.0007)† 0.90 (0.0028)† 1.02 1.39 (0.0081)* 1.03 0.98 (0.0025)† 1.17 (0.0104)* 0.79 (0.0007)† ND 1.30 (0.0007)* 0.95 1.39 (0.0012)* 1.32 (0.0035)* 1.38 (0.0104)* 1.81 (0.0007)* 1.20 1.33 (0.0007)* 0.96 (0.0081)† 0.94 (0.0035)† 0.89 0.9 1.00

2.67 (0.0020)* 1.31 (0.0166)* 0.84 1.07 1.98 (0.0020)* 1.25 (0.0068)* 1.18 1.21 (0.0068)* 0.93 1.79 (0.0020)* 0.82 0.93 0.50 (0.0020)† 0.65 (0.0027)† 1.73 (0.0020)* 0.57 (0.0020)† 0.56 (0.0020)† 0.47 (0.0020)† 0.57 (0.0020)† 0.60 (0.0020)† 0.92 1.15 0.44 (0.0020)† 1.65 (0.0020)* 1.06 1.15 0.59 (0.0020)† 1.15 1.14 1.00 1.27 (0.0106)* 1.00 0.98 1.49 (0.0020)* 4.40 (0.0020)* 1.56 (0.0020)* 2.45 (0.0020)* 1.75 (0.0020)* 4.20 (0.0020)* 1.13 1.24 1.08 0.79 0.83 0.91

0.98 0.96 1.11 (0.0093)* 0.73 (0.0001)† 1.61 (0.0012)* 1.23 (0.0012)* 0.93 1.59 (0.0012)* 1.03 0.67 (0.0001)† 0.98 1.03 0.83 (0.0001)† 0.94 1.56 (0.0001)* 0.95 0.71 (0.0001)† 1.11 (0.0020)* 0.91 (0.0093)† 0.75 (0.0001)† 0.91 0.58 (0.0001)† 1.01 0.56 (0.0001)† 0.54 (0.0001)† 0.98 ND 0.80 (0.0001)† 0.80 (0.0004)† 1.11 (0.0001)* 0.99 1.19 1.17 (0.0019)* 1.54 (0.0001)* 7.94 (0.0001)* 2.56 (0.0001)* 4.68 (0.0001)* 3.15 (0.0001)* 5.60 (0.0001)* 1.80 (0.0001)* 1.13 (0.0008)* 0.75 (0.0001)† 1.42 (0.0001)* 1.45 (0.0001)* ND

1.02 0.99 1.01 0.96 0.98 1.01 0.97 0.88 (0.0252)† 0.96 0.95 0.67 (0.0022)† 0.98 0.99 1.07 1.09 0.92 1.02 0.80 (0.0022)† 1.00 1.11 1.00 0.89 0.72 (0.0038)† 1.09 1.16 (0.0127)* 0.95 0.99 0.97 1.12 1.17 1.29 (0.0022)* 0.91 1.01 1.06 1.50 (0.0022)* 1.17 (0.0252)* 1.38 (0.0022)* 1.75 (0.0022)* 1.10 (0.0070)* 3.03 (0.0022)* 1.56 (0.0022)* 1.19 (0.0158)* 1.14 3.05 (0.0022)* 1.89 (0.0022)*

Clustering of Genes Based on Expression Patterns. To further exam-

ine the relationships among the gene expression profiles derived from microarray data in this study, we used several statistical tools that enable clustering of genes with similar expression patterns. K-means cluster analysis is one such method often applied to the sorting of microarray data (23). This technique partitions the data into a predetermined number of clusters based on the similarity of their expression patterns across a series of experiments. This is accomplished by iterative reallocation of the cluster members to minimize intracluster scattering. First, we selected the number of clusters to be 15 by using a ‘‘best K-means’’ script within the GENESPRING software package. With this algorithm ⬇97% of the 3,552 genes that SAM deemed significant in at least one experiment were placed into K-means Berka et al.

clusters. Interestingly, one of these clusters contained most of the genes that were previously known to be involved in aromatic amino acid metabolism (see Table 3, which is published as supporting information on the PNAS web site). In total, this cluster contained 164 genes that shared similar expression profiles to those of the genes from the aromatic supraoperon, suggesting that they are coordinately up-regulated in response to depletion of tryptophan, phenylalanine, tyrosine, or availability of charged tRNATrp. As an alternative method for identifying genes that shared similar transcription profiles with those of the trp operon, we applied a form of hierarchical clustering known as averagelinkage cluster analysis (24), a method that is familiar to most biologists through its application in DNA sequencing and phyPNAS 兩 May 13, 2003 兩 vol. 100 兩 no. 10 兩 5685

BIOCHEMISTRY

The ratios were calculated based on the culture conditions or strain comparisons shown in Table 1. Numbers in parentheses represent the q values determined by SAM (similar to P values; refer to text). ND, not determined. *Transcripts that are significantly up-regulated on the basis of SAM. †Transcripts that are significantly down-regulated.

logenetic analysis. With this method, relationships among gene expression patterns are represented by a tree whose branch lengths reflect the degree of similarity between the gene profiles as assessed by a pairwise similarity function. Hierarchical clustering of the data from the seven experiments in this study showed that the genes of the trp operon were contained mostly in one branch of the gene tree (see Fig. 3, which is published as supporting information on the PNAS web site) comprising 174 genes. Ninety-five of these were in common with the 164 up-regulated genes from K-means cluster that harbored the genes of the aromatic supraoperon (58% overlap). The genes of the trp operon itself (trpEDCFBA, hisC, and aroE) were contained in one node of this branch in addition to proS, the gene encoding prolyl-tRNA synthetase. In the B. subtilis genome proS resides in a cluster with dxr and yluC, and the direction of transcription is the same for all three of these genes. Interestingly, dxr and yluC are clustered in the node that includes folB, pheB, and sul (genes known to be regulated by aromatic amino metabolism) immediately adjacent to the node containing the trp operon. The proS, dxr, and yluC genes are strongly up-regulated in experiments 3 and 6. Previous studies have shown a strong tendency for genes involved in common cellular functions, pathways, and processes to cluster together in this type of analysis (24, 25).§ Consequently, it is tempting to speculate that proS, dxr, and yluC might reflect a peripheral segment of the cross-pathway control system; however, further experimentation will be required to support this hypothesis. It is particularly noteworthy that nearly half (81兾174) of the genes that clustered with those of the trp operon were so-called ‘‘y genes’’ with unknown or putatively assigned functions. However, several of these unknowns (e.g., yqeK, ytpP, ytpQ) were previously classified as genes involved in aromatic amino acid metabolism (3), including the yhaG (trpP) gene that is believed to encode a transmembrane protein involved in tryptophan transport (26). The hierarchical cluster of genes coordinately induced with those of the trp operon includes several genes that are associated with competence development, DNA uptake, or recombination (e.g., comP, comQ, mreB, mreC, mreD, mutSB, and radC) (10). It is possible that the apparent induction of these transcripts is a nonspecific consequence of limiting nitrogen conditions that occurs when cells are grown in minimal medium without added amino acids (27). Hierarchical cluster analysis also yielded possible relationships among the genes that are down-regulated by tryptophan deficiency. For example, genes in the dhb operon that were observed to be down-regulated in experiments 1, 2, 3, and 7 form a single node that includes the following: dbhACEF, ydbL, yuiI, yoeB, yplQ, yukLM, and yvbA. Although ydbL, yuiI, yoeB, yplQ, and yvbA have unknown functions, yukL and yukM are actually part of the dhbF ORF that was recently extended because of sequencing errors (28). Several additional operons also show expression profiles that are inversely correlated to that of the trp operon. Most obvious is a constellation of operons and gene clusters that encode numerous components implicated in energy metabolism and electron transport functions. For example, atpABDEFGH, cydABCD, hemABCDE, hemX, hemL, narGH, and qoxBCD, genes兾 operons lie in the same down-regulated gene K-means cluster as the dhb operon described above (not shown). In addition, genes involved in purine (purDFHMN) and pyrimidine (pyrABCDFK) biosynthesis, glycolysis (eno, pgk, pgm, and tpi), and several ribosomal proteins lie in this cluster. However, in contrast to the dhb operon, which has a clear connection to aromatic amino acid §Talaat,

A. M., Lyons, R. & Johnston, S. A. (2001) Abstr. Ann. Meeting Am. Soc. Microbiol. 101, 711.

5686 兩 www.pnas.org兾cgi兾doi兾10.1073兾pnas.1031606100

Fig. 2. PCA of the gene expression data from all seven experiments listed in Table 1 (3,552 genes filtered for significance by using SAM) showing the rotated and dimensionally reduced gene expression data. The first PC is plotted on the x axis, PC2 is plotted on the y axis, and PC3 is plotted on the z axis. Data points are color-coded by expression (color bar at right denotes normalized intensity ratios), but those points in red correspond to the 95 genes in common between the K-means and hierarchical clusters including the genes of the trp operon.

biosynthesis, we believe that these transcripitional variations are likely to be indirect effects representing an overall slowing of cellular metabolism precipitated by depletion of an essential amino acid such as tryptophan. Lastly, we performed a PCA on the microarray data (Fig. 2) as a third method to cluster the gene expression patterns and compare the results to those derived by K-means and hierarchical clustering algorithms. PCA is a statistical technique that can be used to simplify the analysis and visualization of multidimensional data sets, and it is particularly well suited for microarray data in which the expression levels for thousands of genes are measured across multiple conditions (29). This tool allows the key variables in a data set to be identified, and each resulting component defines a linear combination of experimental parameters that can be used to distinguish the genes parsimoniously. Fig. 2 shows a scatter plot generated from PCA of the gene expression data from 3,552 genes filtered for significance with SAM across the seven experiments listed in Table 1. The 100 genes in common between the K-means and hierarchical clusters are indicated as red points in Fig. 2. Clearly, their proximity to each other on the PCA plot suggests that their expression patterns are not random. Fig. 4, which is published as supporting information on the PNAS web site, illustrates that most of the variance in the microarray data (⬇73%) can be summarized in just three components. Thus, even though there were seven individual experiments in this study, there were only three major independent features for each gene. Half of the variance is captured in the first PC, a weighted average that distinguishes genes on the basis of their expression. Genes with highly positive values along this component, such as those marked in blue (Fig. 2), are up-regulated under conditions that simulate tryptophan deficiency. The second component represents the change in gene expression across each of the different experimental conditions, and the third component measures concavity of the data. Collectively, these results demonstrate that three distinct clusBerka et al.

tering methods identify a subset of ⬇100 genes that respond in a coordinated fashion with the genes of the trp operon. Conclusions By analysis of DNA microarray experiments in which we queried the known protein coding ORFs of B. subtilis, we identified ⬇100 genes whose transcription patterns closely parallel those of the aromatic supraoperon under conditions that simulate mild starvation for aromatic amino acids and兾or depletion of tRNATrp. By deploying established statistical tools (K-means, hierarchical clustering, and PCA) to cluster the resulting gene expression profiles, we found that the genes of the trp operon itself as well as a number of additional ORFs appeared to represent a coherent subset of coordinately regulated transcription units. The microarray data presented herein are confirmatory and consistent with previous observations regarding transcriptional and translational regulation of the supraoperon, and they extend

We gratefully acknowledge Michael Rey for assistance with establishing a computer database for our microarray data. These studies were supported in part by funds provided to C.Y. from the National Science Foundation (MCB-0093023). 16. Bilban, M., Buehler, L. K., Head, S., Desoye, G. & Quarnata, V. (2002) BMC Genomics 3, 19. 17. Nester, E. W., Schafer, M. & Lederberg, J. (1963) Genetics 48, 529–551. 18. Nester, E. W. (1968) J. Bacteriol. 96, 1649–1657. 19. Chapman, L. F. & Nester, E. W. (1968) J. Bacteriol. 96, 1658–1663. 20. Nester, E. W., Dale, B., Montoya, A. & Vold, B. (1974) Biochim. Biophys. Acta 361, 59–72. 21. Nester, E. W., Jensen, R. A. & Nasser, D. S. (1969) J. Bacteriol. 97, 83–90. 22. Stachelhaus, T., Mootz, H. D. & Marahiel, M. A. (2002) in Bacillus subtilis and Its Closest Relatives: From Genes to Cells, eds. Sonenshein, A. L., Hoch, J. A. & Losick, R. (Am. Soc. Microbiol., Washington, DC), pp. 415–435. 23. Knudsen, S. (2002) A Biologists Guide to Analysis of DNA Microarray Data (Wiley, New York), pp. 43–44. 24. Eisen, M. B., Spellman, P. T., Brown, P. O. & Botstein, D. (1998) Proc. Natl. Acad. Sci. USA 95, 14863–14868. 25. Sepulveda, A. R., Tao, H., Carloni, E., Sepulveda, J., Graham, D. Y. & Peterson, L. E. (2002) Aliment. Pharmacol. Ther. 16, 145–157. 26. Sarsero, J. P., Merino, E. & Yanofsky, C. (2000) J. Bacteriol. 182, 2329–2331. 27. Jarmer, H., Berka, R., Knudsen, S. & Saxild, H. H. (2002) FEMS Microbiol. Lett. 206, 197–200. 28. May, J. J., Wendrich, T. M. & Marahiel, M. A. (2001) J. Biol. Chem. 276, 7209–7217. 29. Raychaudhuri, S., Stuart, J. M. & Altman, R. B. (2000) Pacific Symp. Biocomput. 5, 455–466.

BIOCHEMISTRY

1. Henner, D. & Yanofsky, C. (1993) in Bacillus subtilis and Other Gram Positive Bacteria: Biochemistry, Physiology, and Molecular Genetics, ed. Losick, R. (Am. Soc. Microbiol., Washington, DC), pp. 269–280. 2. Yanofsky, C., Miles, E. W., Kirschner, K. & Bauerle, R. (1999) Encyclopedia Mol. Biol. 4, 2676–2689. 3. Gollnick, P., Babitzke, P., Merino, E. & Yanofsky, C. (2002) in Bacillus subtilis and Its Closest Relatives: From Genes to Cells, eds. Sonenshein, A. L., Hoch, J. A. & Losick, R. (Am. Soc. Microbiol., Washington, DC), pp. 233–244. 4. Sarsero, J. P., Merino, E. & Yanofsky, C. (2000) Proc. Natl. Acad. Sci. USA 97, 2656–2661. 5. Valbuzzi, A. & Yanofsky, C. (2001) Science 293, 2057–2059. 6. Henkin, T. M. (2000) Curr. Opin. Microbiol. 3, 149–153. 7. Khodursky, A. B., Peter, B. J., Cozzarelli, N. R., Botstein, D., Brown, P. O. & Yanofsky, C. (2000) Proc. Natl. Acad. Sci. USA 97, 12170–12175. 8. Steinberg, W. (1974) J. Bacteriol. 117, 1023–1034. 9. Vogel, H. J. & Bonner, D. M. (1956) J. Biol. Chem. 218, 97–106. 10. Berka, R. M., Hahn, J., Albano, M., Draskovic, I., Persuh, M., Cui, X., Sloma, A., Widner, W. & Dubnau, W. (2002) Mol. Microbiol. 43, 1331–1345. 11. Eisen, M. B. & Brown, P. O. (1999) Methods Enzymol. 303, 179–205. 12. Tusher, V. G., Tibshirani, R. & Chu, G. (2001) Proc. Natl. Acad. Sci. USA 98, 5116–5121. 13. Karp, P., Krummenacker, M., Paley, S. & Wagg, J. (1999) Trends Biotechnol. 17, 275–281. 14. Karp, P., Paley, S. & Romero, P. (2002) Bioinformatics 18, S1–S8. 15. Mills, J. C. & Gordon, J. I. (2001) Nucleic Acids Res. 29, e72.

previous interpretations by uncovering associations to other genes and operons that are either up- or down-regulated in response to experimental perturbations of aromatic amino acid metabolism (e.g., his operon, dhb operon, and dxr-yluC-proS cluster兾operon). The underlying molecular mechanisms that control these manifestations of cross-pathway gene regulation are unknown at present, although future studies and comparisons to other model organisms such as E. coli may provide valuable clues. In addition, such comparisons could yield insights into the evolution of diverse mechanisms that govern aromatic amino acid synthesis and metabolism among various bacterial genera.

Berka et al.

PNAS 兩 May 13, 2003 兩 vol. 100 兩 no. 10 兩 5687