Improving Metabolic Pathway Efficiency by ... - ACS Publications

26 downloads 0 Views 4MB Size Report
Aug 4, 2016 - Department of Chemistry, Wellesley College, 106 Central Street, Wellesley, Massachusetts 02481, United States. •S Supporting Information.
Research Article pubs.acs.org/synthbio

Improving Metabolic Pathway Efficiency by Statistical Model-Based Multivariate Regulatory Metabolic Engineering Peng Xu,†,§ Elizabeth Anne Rizzoni,‡ Se-Yeong Sul,† and Gregory Stephanopoulos*,† †

Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States ‡ Department of Chemistry, Wellesley College, 106 Central Street, Wellesley, Massachusetts 02481, United States S Supporting Information *

ABSTRACT: Metabolic engineering entails target modification of cell metabolism to maximize the production of a specific compound. For empowering combinatorial optimization in strain engineering, tools and algorithms are needed to efficiently sample the multidimensional gene expression space and locate the desirable overproduction phenotype. We addressed this challenge by employing design of experiment (DoE) models to quantitatively correlate gene expression with strain performance. By fractionally sampling the gene expression landscape, we statistically screened the dominant enzyme targets that determine metabolic pathway efficiency. An empirical quadratic regression model was subsequently used to identify the optimal gene expression patterns of the investigated pathway. As a proof of concept, our approach yielded the natural product violacein at 525.4 mg/L in shake flasks, a 3.2-fold increase from the baseline strain. Violacein production was further increased to 1.31 g/L in a controlled benchtop bioreactor. We found that formulating discretized gene expression levels into logarithmic variables (Linlog transformation) was essential for implementing this DoE-based optimization procedure. The reported methodology can aid multivariate combinatorial pathway engineering and may be generalized as a standard procedure for accelerating strain engineering and improving metabolic pathway efficiency. KEYWORDS: metabolic engineering, promoter library, combinatorial optimization, synthetic biology, statistical models and response surface methodology

R

pathways; intermediate accumulation or depletion may compromise cell viability and pathway productivity,14 and overexpressed heterologous proteins may penalize the cell with additional energy cost and elicit cellular stress response.15 To address these issues, renewed interest has emerged in exploring combinatorial approaches to control and optimize biosynthetic pathways. For example, recent metabolic engineering effort has shifted toward coordinating metabolic flux redistribution including modification of plasmid copy number,16 promoter strength,17 gene codon usage,18 and RBS strength.15,19 In addition, combinatorial transcriptional engineering coupled with efficient gene assembly and molecular evolution tools has attracted considerable interest as a potential method for optimizing multigene pathways in different host cells. Examples include the heterologous production of anticancer taxol precursors and flavonoids in Escherichia coli,20,21 combinatorial optimization of the fatty acids pathway to produce advanced biofuels,22,23 multiplexed regulatory RNA,24 and global transcriptional machinery engineering25 to produce L-tyrosine. Existing pathway engineering strategies follow a rather intuitive framework reminiscent of reverse engineering princi-

ecent advances in gene synthesis, facile cloning, and gene assembly techniques allow metabolic engineers to rapidly reconfigure a cell’s genetic blueprint at a speed and scale never seen before.1 As enabling technology, metabolic engineering has become a major driver to construct efficient microbial cell factories for next-generation bioeconomy.2 Mirroring nature’s exquisite chemistry to synthesize chemicals and natural products, metabolic engineers have now been able to produce a large portfolio of commodity chemicals,3,4 novel materials,5 sustainable fuels,6,7 and pharmaceuticals8,9 from renewable feedstocks. This is often achieved through standard metabolic engineering strategies including overexpression of rate-limiting steps,10 deletion of competing pathways,11 managing ATP,12,13 and recycling NADPH and other cofactors.13 Although these approaches have been shown to effectively improve cellular productivity and yield, they should be further complemented with algorithm and tools that accelerate the genotype search process across a larger gene expression landscape. Apart from chemistry, the metabolic complexity underlying biological systems further compounds the task to engineer microbial overproduction phenotypes, as introduced heterologous pathways typically compete for limited cellular resources. For example, precursor flux improvement by overexpression of upstream pathways may not be accommodated by downstream © 2016 American Chemical Society

Received: July 6, 2016 Published: August 4, 2016 148

DOI: 10.1021/acssynbio.6b00187 ACS Synth. Biol. 2017, 6, 148−158

Research Article

ACS Synthetic Biology

Figure 1. Construction of T7 promoter library that covers a broad range of transcriptional dynamics. (a) Synthetic degenerated oligonucleotides used to construct the T7 promoter and lacI repressor binding region (lacO). Nx denotes x consecutive degenerated nucleotides. (b) Transcriptional activity of original mutant T7 promoter libraries screened by green fluorescence signal. (c) Representative sequence of mutant promoters.

and determine the optimal conditions of the investigated variables.35 Combined with Plackett−Burman and Box− Behnken designs, optimal physical parameters (gene expression profile herein) could be rapidly located to maximize/minimize an objective function when a large subset of variable space is interrogated. In this report, we applied a statistical model-based DoE procedure to guide the implementation of combinatorial pathway engineering. We first constructed an efficient T7 promoter library that covers a broad range of gene expression dynamics spanning across 1,000-fold transcriptional activity. Combined with DoE principles, this promoter library was used to systematically sample the multidimensional gene expression space defined by a five gene pathway and optimize the production titer with limited experimental inputs. Without too much a priori biochemical knowledge, the Plackett−Burman design allowed us to quickly screen the main pathway components that determine the overall pathway efficiency while avoiding reconstructing all possible pathway candidates. Finally, a quadratic response surface design (Box−Behnken) enabled us to capture the main features of the genotype− phenotype landscape and identify the optimal expression patterns of the investigated pathway. The statistical DoE methodology is applied to the synthesis of the chemotherapeutic drug violacein and allowed marked production improvement in an engineered E. coli strain. This statistical DoE methodology provided a powerful and efficient solution to probe the production titer−gene expression relationship with minimal experiment effort. We envision that this statistical model-based

ples. It comprises an iterative cycle of pathway debottlenecking and evolutionary improvement, which involves extensive trialand-error work and requires examining large libraries of strain candidates.26,27 Furthermore, the cost of constructing and testing a large number of pathways limits our ability to search the gene expression space to locate the desirable overproduction phenotype. As such, effective Design of Experiments (DoE) procedures are essential to probe the gene expression landscape and locate the high-producing expression patterns. Indeed, statistical model-based DoE procedures have been widely used to optimize biological processes including media design,28 bioreactor operation, therapeutic protein purification, and virus transfection.29 In particular, principal component analysis30 and linear regression models31−33 have recently been used to identify pathway bottlenecks and improve metabolite production. For instance, Zhou et al. reported the combinatorial optimization of genetic constructs and cultivation conditions to improve nylon precursor 6-aminocaproic acid production in E. coli.33 To pursue other statistical models that improve pathway efficiency, we explored the classical Plackett−Bruman and Box− Behnken designs to probe the gene expression space. Invented by statisticians Plackett and Burman in 1946, Plackett−Burman design is an efficient screening method to identify the main factors with as few experimental runs as possible.34 The Plackett−Burman design has been widely used to determine the most important factors early in the experiment phase when complete interactions of multiple variables are not available. When considering interactions of multiple variables, the Box− Behnken design is a powerful tool to explore the quadratic effects 149

DOI: 10.1021/acssynbio.6b00187 ACS Synth. Biol. 2017, 6, 148−158

Research Article

ACS Synthetic Biology

region (−35/−10) and the repressor binding (lacO) region, we were able to construct artificial T7 promoters with transcriptional activity spanning across 3 orders of magnitude while maintaining leaky expression at relatively low levels, a feature that would not be easily achieved by conventional promoter engineering approaches. The constructed promoter library enabled three important functions in the quest for optimal gene expression patterns: (a) identification of promoters with extremely high and low strength that are useful for defining the boundary of gene expression space; (b) implementation of DoE procedures to reduce the gene expression space and simplify strain engineering effort; (c) the combinatorial search of optimal gene expression patterns determined from Box−Behnken design. These results are further elaborated in the following sections. 2.2. Plackett−Burman Design to Screen the Main Effects That Determine Violacein Pathway Efficiency. To facilitate statistical model-based pathway optimization, we specifically chose the promoters that confer biologically meaningful phenotype changes. To this end, we ranked the constructed promoters using a linear-logarithmic (Linlog) transformation that geometrically rescales the discretized linear gene expression levels into logarithmic dimensionless variables (we will discuss why we adopt the Linlog transformation in section 4). This Linlog transformation follows a general mathematical formula described in Figure 2a and b for the

multiplexed combinatorial engineering approach could be generalized to optimize multiple gene pathways, accelerate strain engineering, and improve natural product production as a standard practice.

2. RESULTS AND DISCUSSION 2.1. Construction of T7 Promoter Library That Covers a Broad Range of Transcriptional Dynamics. We first constructed promoter libraries to facilitate the combinatorial search of multidimensional gene expression space. Traditional library construction approaches, including error-prone PCR36 or site-directed mutagenesis,37 suffer from low mutation rates, intense library screening efforts, and biased gene expression levels. Here, we adopted a novel library construction approach that combines partially overlapping synthetic oligonucleotides with sophisticated gene recombination techniques (Gibson assembly). Random mutations in the form of degenerated oligonucleotides were specifically introduced to the core region of regulatory genetic elements, and the mutant library could be further screened from a fluorescence protein readout. We targeted the bacteriophage T7 promoter as it is widely used in various metabolic engineering applications. The strength and efficiency of T7 RNA polymerase (RNAP) are mainly determined by two factors: the recruitment of T7 RNAP and sigma factors on the promoter core region (−35 and −10) and procession of the RNAP across the repressor binding region (lacO).38 As such, we used randomly synthesized oligonucleotides to mutate both the sigma factor binding region (−35 and −10) and the lacI repressor binding region (lacO) (Figure 1a). IPTG induces the dissociation of lacI repressor from the lacO site and thus allows T7 RNAP to read across the DNA template and give rise to fluorescence signals. With green fluorescence protein as the reporter, we quantified the T7 promoter transcriptional activity of the constructed promoter candidates. eGFP results indicate that the constructed library covers a broad range of transcriptional dynamics spanning across 3 orders of magnitude (Figure 1b). For example, the highest transcriptional activity is obtained at 21,340 FU/OD with promoter no. 2, a 7-fold increase compared to that of the original T7 promoter (3,942 FU/OD with promoter no. 48). In contrast, the lowest transcriptional activity is achieved at 22.3 FU/OD with promoter no. 46, relatively 1000-times lower than that of the highest promoter (no. 2). From the promoter strength distribution profile, we sequenced 20 promoters, and the sequencing results indicated that all investigated promoters contained the desired mutations at the designed genetic loci (−35/−10 and lacI repressor binding regions) (Figure 1c and Figure S1). It is well-known that T7 promoter activity is stringently controlled by the amount of lacI repressor protein within the cell. Mutation in the lacI repressor binding region (lacO) would alter the lacI repressor and lacO binding affinity and possibly lead to leaky protein expression. Interestingly, promoter libraries with a mutant lacI repressor binding region exhibit relatively small leaky expression discrepancy (eGFP expression in the absence of IPTG induction), ranging roughly from 107 to 311 FU/OD (Figure S2), indicating that our designed lacO region may effectively block the proceedings of the T7 RNAP. The 1,000-fold dynamic range of transcription obtained in the investigated promoters is primarily ascribed to the synergistic interaction between the promoter core region (−35 and −10) and the repressor binding region (lacO) that may cooperatively reshape the transcriptional response. By simultaneously mutating both the core promoter

Figure 2. Linlog transformation of discretized promoter library into logarithmically scaled variables. (a) Forward Linlog transformation. P is the absolute promoter activity (FU/OD), and X is the rescaled dimensionless promoter activity. (b) Reverse Linlog transformation. (c) Linlog rescaled promoter activity that is evenly distributed spanning across 3 orders of magnitude. This geometric rescaling allows us to pick the appropriate promoters for combinatorial pathway engineering.

forward and reverse transformations, respectively. When the promoter activity is at a predesignated high level Pmax (i.e., promoter no. 2), the recoded promoter has a rescaled value +1; on the other hand, when the promoter activity is at a low level Pmin (i.e., promoter no. 36), the recoded promoter has a rescaled value −1 (Figure 2c). The geometric average of Pmax and Pmin (the square root of the product of Pmax and Pmin) corresponds to the middle level (0 point) of the rescaled promoters. 150

DOI: 10.1021/acssynbio.6b00187 ACS Synth. Biol. 2017, 6, 148−158

Research Article

ACS Synthetic Biology

Figure 3. Schematic representation of violacein biosynthetic pathway (a) and pictorial illustration of constructed violacein strains on agar plates that correspond to pathway nos. 1−9 in the Box−Behnken design (b).

Next, we applied these rescaled promoters to sample the gene expression space of a model natural product pathway, a five gene violacein pathway. Violacein biosynthesis starts with the oxidation of the endogenous amino acid L-tryptophan, and the resulting intermediate further undergoes dimerization, decarboxylation, and multiple reductions to form the final purple compound violacein (Figure 3a). The major byproduct of the pathway, deoxyviolacein, is generated through the reduction of the overflowed intermediate (protodeoxyviolaceinic acid) catalyzed by VioC (Figure 3a). The search for an optimal gene expression profile that maximizes violacein production would typically require the assembly of a complete set of pathway combinations corresponding to different expression levels of the five pathway genes and assessment of product formation by each strain. Instead of constructing all possible pathway combinations to exhaustively search the gene expression space, we employed the Plackett−Burman design (a factorial design) to fractionally sample the gene expression space with the aim to minimize the experimental effort and quickly identify the main pathway components that determine violacein pathway efficiency. Specifically, the number of pathway constructs in the Plackett− Burman design will be reduced to 12 (Table 1) for a five gene pathway (VioA, VioB, VioC, VioD, and VioE), given the fact that either the strong promoter (+1) or the weak promoter (−1) is used to drive the expression of each gene. As a comparison, a full factorial design for a five gene pathway requires 32 constructs (25) to cover all combinations of the gene expression space. For an 11-gene pathway, it requires just 12 genetic constructs with PB design. The total number of genetic constructs will be 121 if a full factorial design is pursued (112) for the 11-gene pathway. Obviously, PB design is very powerful to screen the main effects from multiple input variables with minimal experimental runs. Following the arrangements indicated in Table 1, we assembled 12 violacein pathways with the individual gene components (VioA, VioB, VioC, VioD and VioE) driven by either a strong promoter (+1, promoter no. 2) or a weak promoter (−1, promoter no. 36). The detailed genetic configuration of the 12 violacein pathways is illustrated in Figure 4. The output of the pathway, violacein production in shake flasks, was statistically analyzed by one-way analysis of variance (ANOVA) to screen the major enzymatic steps that determine

Table 1. Plackett−Burman Design to Screen Main Pathway Components That Determine Violacein Pathway Efficiency with −1, Weak Promoter; +1, Strong Promoter; and N.D., Not Detectablea VioA VioB −1 1 1 −1 1 1 1 −1 1 −1 −1 −1

1 1 −1 1 −1 1 −1 −1 1 −1 1 −1

VioC

VioD

VioE

violacein (mg/L)

deoxyviolacein (mg/L)

−1 1 −1 −1 −1 −1 1 −1 1 1 1 1

1 −1 −1 1 1 −1 1 −1 1 −1 −1 1

−1 −1 1 1 −1 −1 1 1 1 −1 1 −1

264.3 ± 5.9 176.5 ± 6.4 298.6 ± 15.8 309.4 ± 12.4 373.2 ± 18.1 231.5 ± 7.5 247.2 ± 8.9 329.1 ± 27.7 164.6 ± 5.9 216.3 ± 19.5 141.9 ± 10.3 244.3 ± 19.2

N.D. 5.4 ± 2.1 N.D. 1.4 ± 0.2 N.D. N.D. 41.7 ± 8.9 N.D. 25.8 ± 3.6 5.8 ± 1.4 26.8 ± 1.7 9.3 ± 2.5

All of the reported value indicate mean ± SD; SD is the standard deviation of three replicates.

a

pathway efficiency. On the basis of the mean value distribution of violacein production, a p value can be calculated for each of the pathway components to statistically quantify how significant each of the enzymatic steps contributes to the violacein production (Figure 5). ANOVA analysis indicates that the effects of VioA and VioE on violacein production are insignificant (p > 0.05, Figure 5), as violacein production is hardly impacted whether a strong or weak promoter is used, indicating that a broad range of VioA or VioE expression levels would accommodate a high violacein production phenotype. However, the effect of VioB, VioC, and VioD on violacein production is statistically significant (p < 0.05, Figure 5): the production of violacein is significantly affected when the promoter is switched from a weak to a strong version. With this guidance, the five-step biosynthetic pathways were grouped into three pivotal components defined by the three important genes (VioB, VioC, and VioD). The less important genes VioA and VioE were lumped with the important genes in synthetic operons, thus reducing the five dimensional gene expression design space to a 151

DOI: 10.1021/acssynbio.6b00187 ACS Synth. Biol. 2017, 6, 148−158

Research Article

ACS Synthetic Biology

Figure 4. Genetic configuration of the constructed pathways that conforms to the Plackett−Burman design (Table 1). All of the genes are organized in monocistronic form to ensure context-independent gene expression. VioA and VioB are carried by the pETM6 vector; VioC, VioD, and VioE are carried by the pCDMx vector. +1, strong promoter (P2); −1, weak promoter (P36); T, terminator; A, VioA; B, VioB; C, VioC; D, VioD; and E, VioE.

space, where we can reorganize gene clusters in synthetic operons to reduce experimental effort. 2.3. Box−Behnken Design to Determine the Optimal Expression Level of the Main Pathway Components. After screening gene expression combinations and identifying the key genetic determinants of the violacein pathway, we applied a quadratic polynomial regression model to determine the optimal gene expression level of the investigated pathways. To reduce the gene expression design space, we constructed three synthetic operons comprising VioAB, VioD, and VioEC to reconfigure the violacein pathway. Multivariate analysis was performed to identify the quadratic correlation between violacein production and the expression level of the three reorganized gene clusters using terms of linear and quadratic interactions. Details of the mathematical formulation of the quadratic statistical model are given in Figure S5. We next applied the Box−Behnken design to probe the optimal expression level of the reorganized gene cluster (Table 2). A total of 13 violacein pathways (the central point is repeated three times to estimate the prediction variance over the design space) were constructed, and the detailed genetic configuration of the pathways are illustrated in Figure 6. Violacein production of the engineered strains were tested in shake flasks. Representative strains carrying these pathways are also illustrated in Figure 3b. All of the strains were found to be able to produce the purple pigment violacein in shake flasks. Quantitative relationships between gene expression and violacein production are plotted as a 3D surface with contour plots projected on the xy plane (Figure 7a and b). This genotype−phenotype landscape allows us to quickly identify the optimal gene expression levels with a limited number of pathway constructs and experimental inputs. The quadratic response surface predicted that a maximal

Figure 5. Analysis of variance to determine how significantly each of the pathway components contributes to violacein production. This data point was retrieved from the Plackett−Burman design. Blue box indicates 25 and 75% of the violacein production distribution; × represents 1 and 99% of the violacein production distribution. Pink line indicates the mean value of the data, and diamond point indicates the average of the data.

three-dimensional design space. In this way, Plackett-Burman design allowed us to quickly screen the relative impacts of the pathway components and reduce the gene expression design 152

DOI: 10.1021/acssynbio.6b00187 ACS Synth. Biol. 2017, 6, 148−158

Research Article

ACS Synthetic Biology

constructed libraries (Section 2.1) that can drive the optimal gene expression levels shown above. Because of the discretized nature of the promoter library, we chose promoters from the Linlog transformed libraries (Figure 2c) with transcriptional activity that is closest to the predicted values. For example, we used promoter nos. 26, 10, and 31, which approximate the optimal expression of −0.59, 0.77, and −0.17, respectively, to construct pathways that drive VioAB, VioD, and VioEC expression. The resulting strain produced violacein at 525.4 mg/L with minimal byproduct accumulation (around 2.7 mg/L deoxyviolacein), which is in good agreement with the model prediction (500.2 mg/L) and represents an additional 21.5% increase compared to that of the best strain obtained in the PB screening experiments. When this strain was cultivated in a 2 L bioreactor, violacein production was further increased to 1.31 g/ L with the glucose conversion yield at 0.021 g/g (Figure S7). Violacein was found to be primarily intracellular (Figure 8c) and accounted for 3.3% of the total dry cell weight (or 32.7 mg/ gDW). Regression coefficients for each of the terms in the quadratic model are shown in Figure 8a. The proposed model was found to fit well the experimental data with an adjusted coefficient of determination (R2) of 0.885 (Figure 8b and Figure S6). ANOVA analysis indicated that the linear terms (VioAB, VioD, and VioEC) and their associated quadratic terms (VioAB*VioAB, VioD*VioD, and VioEC*VioEC) play the most significant role on violacein production (Figure S6), whereas the crossinteractions of pathway modules (VioAB*VioD, VioAB*VioEC, and VioD*VioEC) are less significant (p > 0.05). In summary,

Table 2. Box−Behnken Design to Determine the Quadratic Correlations between Violacein Production and the Relative Expression of VioAB, VioD, and VioEC with −1, Weak Promoter; 0, Medium Strength Promoter; +1, Strong Promoter; and N.D., Not Detectablea VioAB

VioD

VioEC

violacein (mg/L)

deoxyviolacein (mg/L)

0 1 1 0 0 −1 0 −1 0 −1 0 0 1 −1 1

1 −1 1 0 1 −1 −1 0 0 0 −1 0 0 1 0

−1 0 0 0 1 0 1 −1 0 1 −1 0 1 0 −1

427.2 ± 12.3 334.5 ± 10.8 384.2 ± 3.1 474.5 ± 15.4 386.1 ± 7.6 319.1 ± 15.4 295.5 ± 15.6 434.2 ± 19.1 462.1 ± 20.4 383.6 ± 4.6 336.8 ± 19.4 468.6 ± 16.9 257.6 ± 2.1 478.1 ± 21.1 348.1 ± 4.2

N.D. N.D. 11.7 ± 2.6 N.D. 9.6 ± 3.2 2.9 ± 0.5 30.4 ± 4.3 N.D. 5.4 ± 0.9 14.7 ± 1.6 N.D. 8.4 ± 2.5 29.3 ± 3.1 N.D. N.D.

All of the reported values indicate mean ± SD; SD is the standard deviation of three replicates.

a

violacein production of 500.2 mg/L would be achieved when the expression levels of VioAB, VioD, and VioEC are tuned to −0.59, 0.77, and −0.17, respectively (Figure 7a and b). Validation of the model predictions was performed by constructing pathways that are driven by promoters from the

Figure 6. Genetic configuration of the constructed pathways that conforms to the Box−Behnken design (Table 2). VioA and VioB are organized in operon and carried by the pETM6 vector; VioE and VioC are organized in operon and carried by the pCDMx vector. +1, strong promoter (P2); 0, medium strength promoter (P44); −1, weak promoter (P36); T, terminator; A, VioA; B, VioB; C, VioC; D, VioD; and E, VioE. 153

DOI: 10.1021/acssynbio.6b00187 ACS Synth. Biol. 2017, 6, 148−158

Research Article

ACS Synthetic Biology

Figure 7. Violacein production−gene expression level phenotype−genotype landscape. (a) Violacein production in response to the expression of VioAB and VioEC on a logarithmic scale. (b) Violacein production in response to the expression of VioD and VioEC on a logarithmic scale. (c) Violacein production in response to the expression of VioAB and VioEC on a linear scale. (d) Violacein production in response to the expression of VioD and VioEC on a linear scale.

Figure 8. Summary of Box−Behnken fitting results and validation of model predictions. (a) Regression coefficients of Box−Behnken design a1−a9 are indicated in column bars. (b) Model predicted results are in good agreement with the experimental value with an adjusted coefficient of determination (R2) of 0.885. (c) Representative pictures of violacein culture from the benchtop bioreactor. The upper panel shows the E. coli suspension culture, and the lower panel shows settling of the violacein-producing E. coli.

Statistical Thermodynamic Theory. The reason we used the Linlog transformation to rescale the screened promoter libraries is rooted in two basic biophysical laws. The first is Weber’s law that describes how physiological and cellular sensory systems would change upon an input of cellular stimuli. In a

the quadratic regression model allowed us to identify the optimal gene expression level of the investigated pathway components and further increase the metabolite production by 21.5%. 2.4. Correlation between Gene Expression and Violacein Production May Follow Weber’s Law and 154

DOI: 10.1021/acssynbio.6b00187 ACS Synth. Biol. 2017, 6, 148−158

Research Article

ACS Synthetic Biology

applied statistical model-based DoE to guide the implementation of combinatorial pathway engineering. An efficient promoter library was constructed and coupled with Plackett−Burman design to effectively screen enzymatic steps that determine pathway efficiency and reduce the gene expression design space. Then, a quadratic regression model (Box−Behnken design) was used to determine the optimal expression level of the investigated pathway modules. The final strain produced 1.31 g/L of violacein and represents a promising starter strain for further industrial application. The T7 promoter library shows transcriptional activity covering 3 orders of magnitude and can be widely used in other synthetic biology and metabolic engineering applications. This statistical DoE methodology addresses one of the major challenges in implementing multivariate combinatorial metabolic engineering, for example to reduce gene expression space, combinatorial search of the optimal gene expression patterns, and rapid identification of the desired overproduction phenotype. We envision that this optimization framework can generally aid efforts to accelerate strain engineering and improve metabolite production.

strengthened version, Weber’s law states that cellular response dynamics is proportional to the relative change in input signal but not its absolute value.39,40 Translated into mathematical language, it implies that the appreciable phenotypic response dP is proportional to the relative change of the stimuli dS/S. By integration, the cellular response P follows a logarithmic relationship with the input S (i.e., P = kln S + C, where k and C are constants). When Weber’s law is used to delineate the gene expression−metabolite production relationship, it would naturally follow that changes in production titer should be logarithmically correlated with promoter strength. As such, our statistical models used the logarithmic value of promoter strength as input variable to probe the production titer−gene expression relationship. A second consideration that encouraged us to use Linlog transformation is based on observations that promoter transcriptional activity could be described well with statistical thermodynamic theory.41−43 This theory accounts for the binding states of RNA polymerase (RNAP) and the molecular interactions among RNAP, promoter, and transcriptional factors (TFs). The basic assumption is that gene expression level is proportional to the equilibrium probability that RNAP is bound to the promoter of interest.44 This promoter occupancy probability could be further formulated with the ratio of favorable binding states over the sum of all possible states that RNAP could have with each of the binding states following the Boltzmann partition function. The probability of the formation of favorable RNAP−promoter complex is simply a function of the binding energy (P = k*e−ΔG/(kT), where P is the binding probability that is proportional to the promoter activity, ΔG is the free energy change upon binding a specific promoter, k is the Boltzmann constant, and T is the absolute temperature). Because of the exponential relationship between promoter activity and the free energy change (P = ke−ΔG/(kT)), the Linlog transformation actually allows the production titer to be directly related to the free energy change of the RNAP−promoter interaction. A practical consideration of using the Linlog transformation is to normalize the discretized gene expression data into evenly distributed logarithmic variables. For example, Linlog transformation has converted the highly skewed gene expression data (Figure 1b) into evenly distributed logarithmic variables (Figure 2c). This geometric scaling allowed us to choose the appropriate promoters to perform combinatorial pathway engineering. As the effect of gene expression on metabolite production involves complex biological events (including transcription, translation, protein folding, enzyme catalysis, and metabolic flux redistribution), correlating violacein production with these logarithmically scaled promoters covers these multiple unknown steps. Consequently, the response of violacein production to the relative expression of VioAB, VioD, and VioEC is defined by a smooth surface, which enabled us to easily extrapolate and determine the optimal expression level of the investigated gene clusters (Figure 7a and b). In contrast, the genotype−phenotype landscape displays a rather broad and skewed pattern when the actual values of promoter activity are used (Figure 7c and d), making it rather difficult to identify the optimal expression value of the investigated gene cluster.

4. MATERIALS AND METHODS 4.1. T7 Promoter Library Construction and Screening. A T7 promoter library was constructed by Gibson assembly of synthetic overlapped oligonucleotides to replace the original T7 promoter in pETM6-eGFP. Single-stranded degenerate DNA oligonucleotides T7lacO_F and T7lacO_R (Table S1) were annealed in Tris-HCl and EDTA buffer following three cycles of heating (95 °C for 2 min) and cooling (25 °C for 1.5 min) on a Biorad PCR block. Then, 5 μL of 10 μM of the annealed oligonucleotides were mixed with 2.5 μL of the AvrII- and XbaIdigested and gel-purified pETM6-eGFP vector (∼25 ng plasmid DNA) and 7.5 μL of the 2× Gibson assembly. The mixture was kept at 50 °C for 1 h. Then, 3 μL of the Gibson reaction was chemically transformed into 12 μL of NEB5α high efficiency competent cells. Colonies grown overnight were scraped with a razor blade, and the library plasmid DNA was prepared with Zyppy miniprep kits. Then, 1.5 μL of the library plasmid was transformed into 20 μL of BL21(DE3) star cells with electroporation. Positive clones containing T7 promoter libraries should grow on LB-Ampicillin plates. For promoter activity screening, BL21 transformants were individually inoculated into 2.5 mL of LB broth supplemented with 100 μg/mL of ampicillin and grown at 37 °C with shaking at 250 rpm overnight. The next morning, 20 μL of the overnight culture was inoculated with 220 μL of fresh LB (with 100 μg/mL of ampicillin) in a Greiner Bio-One black fluorescence plate. Ten microliters of 5 mM IPTG or sterile water was added to each well with a multichannel pipet. The entire plate was incubated on a 30 °C plate shaker with shaking at 400 rpm. Then, green fluorescence was detected with a SpectraMax microplate reader (Molecular device) with excitation at 495 nm and emission at 512 nm every hour. Promoter activity was calculated with the rate of green fluorescence accumulation divided by the rate of cell density increase. Almost all the samples have a linear response curve with R2 above 0.95. All experiments were performed in triplicate to ensure reproducibility. 4.2. ePathBrick-Directed Modular Assembly of the Violacein Pathway. Chromobacterium violaceum 12472 type strain was purchased from ATCC, and the genomic DNA was isolated and purified with the Invitrogen genomic DNA purification kit. VioA, VioB, VioC, VioD, and VioE were PCR amplified using the primers listed in Table S1 with the 2XQ5

3. CONCLUSIONS Combinatorial pathway engineering requires iterative cycles of pathway debottlenecking to improve strain performance. To minimize experimental effort and accelerate this process, we 155

DOI: 10.1021/acssynbio.6b00187 ACS Synth. Biol. 2017, 6, 148−158

Research Article

ACS Synthetic Biology

at 1 vvm, and temperature was maintained at 20 °C by cooling water. 4.4. Metabolite Extraction and Analysis. Residual glucose was analyzed with an Agilent 1260 HPLC equipped with a BioRad Aminex HPX-87h column and refractive index detector eluted with 14 mM sulfuric acid. Violacein was primarily found intracellularly. The extraction and assay of violacein was carried out following the protocol reported by Jones et al.37 Violacein and deoxyviolacein standards were purchased from SigmaAldrich. A spectrophotometer-based assay was also developed according to the maximal absorbance of violacein at 575 nm. 4.5. Data Analysis and Model Fitting. Statistical data analysis and model fitting were performed with the JMP 12.0 software suite. All quantitative plots were made in OriginPro 9.0 software.

PCR master mix (NEB). The amplified VioA, VioC, and VioE PCR products were cleaned up using Zymoclean PCR kits and further double digested with NdeI and XhoI and ligated with the NdeI- and XhoI-digested pETM6 to give pETM6-VioA, pETM6VioC, and pETM6-VioE. The amplified and PCR-cleaned VioB and VioD gene fragments were Gibson assembled into the NdeIand XhoI-digested and gel-purified pETM6 to give pETM6-VioB and pETM6-VioD. Then, T7 weak (promoter #36), medium (promoter #44), and strong (promoter #2) promoters were subcloned to the previously constructed violacein pathways to replace the original T7 promoter in pETM6-VioA, pETM6VioB, pETM6-VioC, pETM6-VioD, and pETM6-VioE. To facilitate the expression of multiple genes and reduce the length of the operon structure, a new vector pCDMx was constructed by mutating the additional XbaI site on pCDM4 using primers Xba_F and Xba_R. Site-directed mutagenesis was performed using Pfu Ultra DNA polymerase (Agilent genomic science) and BW27784 as host strain. Operon and monocistronic gene configurations were assembled following a combinatorial modular approach reported by Xu et al. 45,46 For Plackett−Burman screening, the monocistronic gene construct was used to ensure contextindependent gene expression. For example, pCDMx-P36-VioCP2-VioD (monocistronic) was constructed by ligating the AvrIIand SalI-digested pETM6-P2-VioD with the NheI-and SalIdigested pCDMx-P36-VioC. pETM6-P26-VioAB (operon) was constructed by ligating the ApaI- and SpeI-digested pETM6-P26VioA with the ApaI- and XbaI-digested pETM6-VioB. pCDMxP31-VioEC-P10-VioD (VioEC in operon; VioD is placed downstream of VioEC insulated by a single terminator and driven by a P10 promoter) was constructed by ligating the AvrIIand SalI-digested pETM6-P10-VioD with the NheI- and SalIdigested pCDMx-P31-VioEC. Following this facile gene assembly procedure, all violacein pathways were constructed in this modular and combinatorial fashion. Normally, one can finish three rounds of cloning and assemble around 40 genetic constructs in 1 week with a normal workload schedule (40 h per week). The detailed genetic configuration of the constructed pathways are listed in Figures 4 and 6. All constructed pathways are listed in Table S2. 4.3. Media and Growth Conditions. LB broth was routinely used to cultivate E. coli NEB5α, BW27784, and BL21(DE3) star to maintain, propagate, and prepare plasmid and screen promoter activity. Modified AAM media37 was used to cultivate violacein-producing strains. The modified media contains 3.5 g/L of KH2PO4, 5.0 g/L of K2HPO4, 3.5 g/L of (NH4)2HPO4, 2 g/L of casamino acids, 8.4 g/L of MOPS, 0.72 g/L of Tricine, 2.8 mg/L of FeSO4·7H2O, 2.92 g/L of NaCl, 0.51 g/L of NH4Cl, 1 mM MgSO4, 0.1 mM CaCl2, and 20 g/L of glucose supplemented with ampicillin (70 μg/mL) and streptomycin (40 μg/mL). Strains constructed with Plackett− Burman and Box−Behnken design were cultivated in a 250 mL flask containing 30 mL of modified AAM media and grown at 25 °C for 3 days. Then, 0.2 mM IPTG was used to induce the expression of protein at an OD of around 0.6−0.8. A scaled-up experiment was carried out using a 2 L benchtop bioreactor using the same media but with 20 g/L of glucose pulsing at 20 and 44 h; 0.2 mM IPTG, 70 μg/mL of ampicillin, and 40 μg/mL of streptomycin were also pulsed at 20 and 44 h to the fermentation broth to maintain protein expression and strain genetic stability. The dissolved oxygen was maintained at 20% saturation by coupling agitation with DO control. The aeration was controlled



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acssynbio.6b00187. Primers and synthetic oligonucleotides (Table S1), strains and plasmids (Table S2), sequence of T7 promoter library (Figure S1), T7 promoter library leaky expression pattern (Figure S2), main effect plot of PB design (Figure S3), statistical analysis of PB design (Figure S4), model of Box−Behnken design (Figure S5), statistical analysis of Box−Behnken design (Figure S6), and fermenter scale-up of violacein-producing strain (Figure S7) (PDF)



AUTHOR INFORMATION

Corresponding Author

*Phone: (617)-253-4583. Fax: (617)-258-6876. E-mail: [email protected]. Present Address §

P.X.: Department of Chemical, Biochemical and Environmental Engineering, University of Maryland, 1000 Hilltop Circle, Baltimore, MD 21250 Author Contributions

P.X. and G.S. designed the study. P.X., E.A.R., and S.Y.S. performed the study with input from Adlai R. Grayson. P.X. analyzed the data and wrote the manuscript with advice from G.S. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS The authors would like to thank the G.S. lab members (particularly KJ Qiao) for helpful discussions. P.X. would like to thank the postdoctoral funding support from DOE and Stephanopoulos lab.



REFERENCES

(1) Xu, P., Bhan, N., and Koffas, M. A. G. (2013) Engineering plant metabolism into microbes: from systems biology to synthetic biology. Curr. Opin. Biotechnol. 24 (2), 291−299. (2) Biggs, B. W., De Paepe, B., Santos, C. N. S., De Mey, M., and Kumaran Ajikumar, P. (2014) Multivariate modular metabolic engineering for pathway and strain optimization. Curr. Opin. Biotechnol. 29, 156−162. (3) Lee, J. W., Kim, T. Y., Jang, Y. S., Choi, S., and Lee, S. Y. (2011) Systems metabolic engineering for chemicals and materials. Trends Biotechnol. 29 (8), 370−8.

156

DOI: 10.1021/acssynbio.6b00187 ACS Synth. Biol. 2017, 6, 148−158

Research Article

ACS Synthetic Biology (4) Tai, Y. S., Xiong, M., and Zhang, K. (2015) Engineered biosynthesis of medium-chain esters in Escherichia coli. Metab. Eng. 27, 20−8. (5) Xiong, M., Schneiderman, D. K., Bates, F. S., Hillmyer, M. A., and Zhang, K. (2014) Scalable production of mechanically tunable block polymers from sugar. Proc. Natl. Acad. Sci. U. S. A. 111 (23), 8357−8362. (6) Qiao, K., Imam Abidi, S. H., Liu, H., Zhang, H., Chakraborty, S., Watson, N., Kumaran Ajikumar, P., and Stephanopoulos, G. (2015) Engineering lipid overproduction in the oleaginous yeast Yarrowia lipolytica. Metab. Eng. 29, 56−65. (7) Xu, P., Li, L., Zhang, F., Stephanopoulos, G., and Koffas, M. (2014) Improving fatty acids production by engineering dynamic pathway regulation and metabolic control. Proc. Natl. Acad. Sci. U. S. A. 111 (31), 11299−11304. (8) Lin, Y., Shen, X., Yuan, Q., and Yan, Y. (2013) Microbial biosynthesis of the anticoagulant precursor 4-hydroxycoumarin. Nat. Commun. 4, 2603. (9) Thodey, K., Galanie, S., and Smolke, C. D. (2014) A microbial biomanufacturing platform for natural and semisynthetic opioids. Nat. Chem. Biol. 10 (10), 837−844. (10) Tai, M., and Stephanopoulos, G. (2013) Engineering the push and pull of lipid biosynthesis in oleaginous yeast Yarrowia lipolytica for biofuel production. Metab. Eng. 15 (1), 1−9. (11) Stephanopoulos, G. (2012) Synthetic Biology and Metabolic Engineering. ACS Synth. Biol. 1 (11), 514−525. (12) Lan, E. I., and Liao, J. C. (2012) ATP drives direct photosynthetic production of 1-butanol in cyanobacteria. Proc. Natl. Acad. Sci. U. S. A. 109 (16), 6018−6023. (13) Singh, A., Soh, K., Hatzimanikatis, V., and Gill, R. (2011) Manipulating redox and ATP balancing for improved production of succinate in E. coli. Metab. Eng. 13, 76−81. (14) Leonard, E., Ajikumar, P., Thayer, K., Xiao, W., Mo, J., Tidor, B., Stephanopoulos, G., and Prather, K. (2010) Combining metabolic and protein engineering of a terpenoid biosynthetic pathway for overproduction and selectivity control. Proc. Natl. Acad. Sci. U. S. A. 107, 13654−13659. (15) Zelcbuch, L., Antonovsky, N., Bar-Even, A., Levin-Karp, A., Barenholz, U., Dayagi, M., Liebermeister, W., Flamholz, A., Noor, E., Amram, S., Brandis, A., Bareia, T., Yofe, I., Jubran, H., and Milo, R. (2013) Spanning high-dimensional expression space using ribosomebinding site combinatorics. Nucleic Acids Res. 41 (9), e98−e98. (16) Juminaga, D., Baidoo, E. E., Redding-Johanson, A. M., Batth, T. S., Burd, H., Mukhopadhyay, A., Petzold, C. J., and Keasling, J. D. (2012) Modular engineering of L-tyrosine production in Escherichia coli. Appl. Environ. Microbiol. 78 (1), 89−98. (17) Anthony, J., Anthony, L., Nowroozi, F., Kwon, G., Newman, J., and Keasling, J. (2009) Optimization of the mevalonate-based isoprenoid biosynthetic pathway in Escherichia coli for production of the anti-malarial drug precursor amorpha-4,11-diene. Metab. Eng. 11 (1), 13−19. (18) Bokinsky, G., Peralta-Yahya, P. P., George, A., Holmes, B. M., Steen, E. J., Dietrich, J., Soon Lee, T., Tullman-Ercek, D., Voigt, C. A., Simmons, B. A., and Keasling, J. D. (2011) Synthesis of three advanced biofuels from ionic liquid-pretreated switchgrass using engineered Escherichia coli. Proc. Natl. Acad. Sci. U. S. A. 108 (50), 19949−54. (19) Salis, H., Mirsky, E., and Voigt, C. (2009) Automated design of synthetic ribosome binding sites to control protein expression. Nat. Biotechnol. 27 (10), 946−950. (20) Ajikumar, P. K., Xiao, W.-H., Tyo, K. E. J., Wang, Y., Simeon, F., Leonard, E., Mucha, O., Phon, T. H., Pfeifer, B., and Stephanopoulos, G. (2010) Isoprenoid Pathway Optimization for Taxol Precursor Overproduction in Escherichia coli. Science 330 (6000), 70−74. (21) Wu, J., Du, G., Zhou, J., and Chen, J. (2013) Metabolic engineering of Escherichia coli for (2S)-pinocembrin production from glucose by a modular metabolic strategy. Metab. Eng. 16, 48−55. (22) Xu, P., Gu, Q., Wang, W., Wong, L., Bower, A. G. W., Collins, C. H., and Koffas, M. A. G. (2013) Modular optimization of multi-gene pathways for fatty acids production in E. coli. Nat. Commun. 4, 1409.

(23) Chen, B., Lee, D.-Y., and Chang, M. W. (2015) Combinatorial metabolic engineering of Saccharomyces cerevisiae for terminal alkene production. Metab. Eng. 31, 53−61. (24) Na, D., Yoo, S. M., Chung, H., Park, H., Park, J. H., and Lee, S. Y. (2013) Metabolic engineering of Escherichia coli using synthetic small regulatory RNAs. Nat. Biotechnol. 31 (2), 170−174. (25) Santos, C. N. S., Xiao, W., and Stephanopoulos, G. (2012) Rational, combinatorial, and genomic approaches for engineering Ltyrosine production in Escherichia coli. Proc. Natl. Acad. Sci. U. S. A. 109 (34), 13538−13543. (26) Yadav, V. G., De Mey, M., Giaw Lim, C., Kumaran Ajikumar, P., and Stephanopoulos, G. (2012) The future of metabolic engineering and synthetic biology: Towards a systematic practice. Metab. Eng. 14 (3), 233−241. (27) Boock, J. T., Gupta, A., and Prather, K. L. J. (2015) Screening and modular design for metabolic pathway optimization. Curr. Opin. Biotechnol. 36, 189−198. (28) Xu, P., Ding, Z., Qian, Z., Zhao, C., and Zhang, K. (2008) Improved production of mycelial biomass and ganoderic acid by submerged culture of Ganoderma lucidum SB97 using complex media. Enzyme Microb. Technol. 42 (4), 325−331. (29) Mandenius, C.-F., and Brundin, A. (2008) Bioprocess optimization using design-of-experiments methodology. Biotechnol. Prog. 24 (6), 1191−1203. (30) Alonso-Gutierrez, J., Kim, E.-M., Batth, T. S., Cho, N., Hu, Q., Chan, L. J. G., Petzold, C. J., Hillson, N. J., Adams, P. D., Keasling, J. D., Garcia Martin, H., and Lee, T. S. (2015) Principal component analysis of proteomics (PCAP) as a tool to direct metabolic engineering. Metab. Eng. 28, 123−133. (31) Lee, M. E., Aswani, A., Han, A. S., Tomlin, C. J., and Dueber, J. E. (2013) Expression-level optimization of a multi-enzyme pathway in the absence of a high-throughput assay. Nucleic Acids Res. 41, 10668. (32) Jeschek, M., Gerngross, D., and Panke, S. (2016) Rationally reduced libraries for combinatorial pathway optimization minimizing experimental effort. Nat. Commun. 7, 11163. (33) Zhou, H., Vonk, B., Roubos, J. A., Bovenberg, R. A. L., and Voigt, C. A. (2015) Algorithmic co-optimization of genetic constructs and growth conditions: application to 6-ACA, a potential nylon-6 precursor. Nucleic Acids Res., gkv1071. (34) Plackett, R. L., and Burman, J. P. (1946) The Design of Optimum Multifactorial Experiments. Biometrika 33 (4), 305−325. (35) Ferreira, S. L. C., Bruns, R. E., Ferreira, H. S., Matos, G. D., David, J. M., Brandão, G. C., da Silva, E. G. P., Portugal, L. A., dos Reis, P. S., Souza, A. S., and dos Santos, W. N. L. (2007) Box−Behnken design: An alternative for the optimization of analytical methods. Anal. Chim. Acta 597 (2), 179−186. (36) Alper, H., Fischer, C., Nevoigt, E., and Stephanopoulos, G. (2005) Tuning genetic control through promoter engineering. Proc. Natl. Acad. Sci. U. S. A. 102 (36), 12678−12683. (37) Jones, J. A., Vernacchio, V. R., Lachance, D. M., Lebovich, M., Fu, L., Shirke, A. N., Schultz, V. L., Cress, B., Linhardt, R. J., and Koffas, M. A. G. (2015) ePathOptimize: A Combinatorial Approach for Transcriptional Balancing of Metabolic Pathways. Sci. Rep. 5, 11301. (38) Temme, K., Hill, R., Segall-Shapiro, T. H., Moser, F., and Voigt, C. A. (2012) Modular control of multiple pathways using engineered orthogonal T7 polymerases. Nucleic Acids Res. 40 (17), 8773−8781. (39) Adler, M., Mayo, A., and Alon, U. (2014) Logarithmic and Power Law Input-Output Relations in Sensory Systems with Fold-Change Detection. PLoS Comput. Biol. 10 (8), e1003781. (40) Shoval, O., Goentoro, L., Hart, Y., Mayo, A., Sontag, E., and Alon, U. (2010) Fold-change detection and scalar symmetry of sensory input fields. Proc. Natl. Acad. Sci. U. S. A. 107 (36), 15995−16000. (41) Kuhlman, T., Zhang, Z., Saier, M. H., and Hwa, T. (2007) Combinatorial transcriptional control of the lactose operon of Escherichia coli. Proc. Natl. Acad. Sci. U. S. A. 104 (14), 6043−6048. (42) Kinney, J. B., Murugan, A., Callan, C. G., and Cox, E. C. (2010) Using deep sequencing to characterize the biophysical mechanism of a transcriptional regulatory sequence. Proc. Natl. Acad. Sci. U. S. A. 107 (20), 9158−9163. 157

DOI: 10.1021/acssynbio.6b00187 ACS Synth. Biol. 2017, 6, 148−158

Research Article

ACS Synthetic Biology (43) Brewster, R. C., Jones, D. L., and Phillips, R. (2012) Tuning Promoter Strength through RNA Polymerase Binding Site Design in Escherichia coli. PLoS Comput. Biol. 8 (12), e1002811. (44) Bintu, L., Buchler, N. E., Garcia, H. G., Gerland, U., Hwa, T., Kondev, J., Kuhlman, T., and Phillips, R. (2005) Transcriptional regulation by the numbers: applications. Curr. Opin. Genet. Dev. 15 (2), 125−135. (45) Xu, P., Vansiri, A., Bhan, N., and Koffas, M. (2012) ePathBrick: A Synthetic Biology Platform for Engineering Metabolic Pathways in Ecoli. ACS Synth. Biol. 1 (7), 256−266. (46) Xu, P., Koffas, M. A. G., Polizzi, K., and Kontoravdi, C. (2013) Assembly of Multi-gene Pathways and Combinatorial Pathway Libraries Through ePathBrick Vectors. Methods Mol. Biol. 1073, 107−129.

158

DOI: 10.1021/acssynbio.6b00187 ACS Synth. Biol. 2017, 6, 148−158