Codon Optimization to Enhance Expression Yields ...

2 downloads 0 Views 2MB Size Report
Jul 27, 2016 - Chan HT, Xiao Y, Weldon WC, Oberste SM, Chumakov K, Daniell H .... Wang X, Su J, Sherman A, Rogers GL, Liao G, Hoffman BE, Leong KW,.
Breakthrough Technologies

Codon Optimization to Enhance Expression Yields Insights into Chloroplast Translation1[OPEN] Kwang-Chul Kwon, Hui-Ting Chan, Ileana R. León, Rosalind Williams-Carrier, Alice Barkan, and Henry Daniell* Department of Biochemistry, School of Dental Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104-6030 (K.-C.K., H.-T.C., H.D.); Global Research, Novo Nordisk, Malov DK-2760, Denmark (I.R.L.); and Institute of Molecular Biology, University of Oregon, Eugene, Oregon 97403-1229 (R.W.-C., A.B.) ORCID IDs: 0000-0002-4037-1776 (K.-C.K.); 0000-0001-7319-2080 (I.R.L.); 0000-0003-4485-1176 (H.D.).

Codon optimization based on psbA genes from 133 plant species eliminated 105 (human clotting factor VIII heavy chain [FVIII HC]) and 59 (polio VIRAL CAPSID PROTEIN1 [VP1]) rare codons; replacement with only the most highly preferred codons decreased transgene expression (77- to 111-fold) when compared with the codon usage hierarchy of the psbA genes. Targeted proteomic quantification by parallel reaction monitoring analysis showed 4.9- to 7.1-fold or 22.5- to 28.1-fold increase in FVIII or VP1 codon-optimized genes when normalized with stable isotope-labeled standard peptides (or housekeeping protein peptides), but quantitation using western blots showed 6.3- to 8-fold or 91- to 125-fold increase of transgene expression from the same batch of materials, due to limitations in quantitative protein transfer, denaturation, solubility, or stability. Parallel reaction monitoring, to our knowledge validated here for the first time for in planta quantitation of biopharmaceuticals, is especially useful for insoluble or multimeric proteins required for oral drug delivery. Northern blots confirmed that the increase of codonoptimized protein synthesis is at the translational level rather than any impact on transcript abundance. Ribosome footprints did not increase proportionately with VP1 translation or even decreased after FVIII codon optimization but is useful in diagnosing additional rate-limiting steps. A major ribosome pause at CTC leucine codons in the native gene of FVIII HC was eliminated upon codon optimization. Ribosome stalls observed at clusters of serine codons in the codon-optimized VP1 gene provide an opportunity for further optimization. In addition to increasing our understanding of chloroplast translation, these new tools should help to advance this concept toward human clinical studies.

Heterologous gene expression has facilitated our understanding of DNA replication, recombination, transcription, and translation and protein import in chloroplasts. The expression of precursor proteins via the chloroplast genome demonstrated that cleavage of transit peptides takes place in the stroma and not in the chloroplast envelope (Daniell et al., 1998). Most importantly, the role of nucleus-encoded cytosolic 1 This work was supported by the National Institutes of Health (grant nos. R01 HL107904, R01 HL109442, and R01 EY 024564), the Bill and Melinda Gates Foundation (grant no. OPP1031406 to H.D.), and the National Science Foundation (grant no. IOS–1339130 to A.B.). * Address correspondence to [email protected]. The author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (www.plantphysiol.org) is: Henry Daniell ([email protected]). K.-C.K. organized codon tables, created and characterized transplastomic plants, and interpreted and wrote sections of the article; H.-T.C. created and characterized transplastomic plants and contributed data; I.R.L. performed MS and PRM analyses, interpreted data, and wrote this section of the article; R.W.-C. contributed ribosome profiling data analyses; A.B. interpreted ribosome profiling data and wrote this section of the article; H.D. conceived and designed the project, analyzed and interpreted data, and wrote and revised several sections and versions of the article. [OPEN] Articles can be viewed without a subscription. www.plantphysiol.org/cgi/doi/10.1104/pp.16.00981

62

proteins that bind to regulatory sequences and their species specificity were demonstrated using transgenes expressed in chloroplasts (Ruhlman et al., 2010). When the lettuce (Lactuca sativa) psbA regulatory sequence was used to drive transgene expression in tobacco (Nicotiana tabacum) chloroplasts, there was greater than 90% reduction in the accumulation of foreign proteins. This underscores the importance of the species specificity of chloroplast regulatory sequences. Likewise, details of the homologous recombination process and the deletion of mismatched nucleotides were evident using heterologous flanking sequences (Ruhlman et al., 2010). The translation of native polycistrons without the need for processing to monocistrons has been demonstrated (Barkan, 1988; Zoschke and Barkan, 2015), but the similarity of this process using heterologous polycistrons engineered via the chloroplast genome offered even more direct evidence for this process (De Cosa et al., 2001; Quesada-Vargas et al., 2005). The insertion of replication origins into chloroplast vectors offered further insight into minimal sequences required to study this process (Daniell et al., 1990). Therefore, in this study, we use transgenes, chloroplast genome sequences, and cutting-edge tools to understand the process of translation in chloroplasts. Each plant cell contains up to 10,000 copies of the chloroplast genome. Therefore, transgenes inserted into chloroplast genomes are expressed at high levels, up to

Plant PhysiologyÒ, September 2016, Vol. 172, pp. 62–77, www.plantphysiol.org Ó 2016 American Society of Plant Biologists. All rights reserved.

Downloaded from www.plantphysiol.org on October 2, 2016 - Published by www.plantphysiol.org Copyright © 2016 American Society of Plant Biologists. All rights reserved.

New Tools to Study Transgene Expression in Chloroplasts

70% of total leaf protein (De Cosa et al., 2001; Ruhlman et al., 2010). A wide range of proteins, from very small antimicrobial peptides (Lee et al., 2011) or hormones (Boyhan and Daniell, 2011; Kwon et al., 2013) to very large proteins encoded by bacterial, viral, fungal, animal, and human genes, have been expressed successfully in plant chloroplasts (DeGray et al., 2001; Daniell et al., 2009; Verma et al., 2010; Shenoy et al., 2014; Sherman et al., 2014; Shil et al., 2014). Most importantly, expressed proteins are highly stable when lyophilized plant cells are stored at ambient temperature (Kwon et al., 2013; Lakshmi et al., 2013; Kohli et al., 2014; Jin and Daniell, 2015). Therefore, oral delivery of proinsulin or exendin-4 reduced blood sugar levels similar to injected proteins (Boyhan and Daniell, 2011; Kwon et al., 2013). Oral delivery of angiotensin and ANGIOTENSIN-CONVERTING ENZYME2 expressed in chloroplasts reversed or prevented pulmonary hypertension by shifting the renin-angiotensin system to its protective axis, resulting in a decrease in fibrosis, improvement in cardiopulmonary structure and function, and restoration of right heart function (Shenoy et al., 2014). Furthermore, ocular inflammation caused by decreased activity of the protective axis of the renin-angiotensin system was improved significantly (Shil et al., 2014). Likewise, oral delivery of myelin basic protein reduced Ab plaques in advanced mouse and human Alzheimer’s brains (Kohli et al., 2014). Delivery of coagulation factors to hemophilic mice induced oral tolerance and suppressed inhibitor formation and anaphylaxis (Verma et al., 2010; Sherman et al., 2014; Wang et al., 2015a). The aforementioned examples illustrate the significance of this novel, cost-effective protein drug-delivery concept. However, a major limitation in the clinical translation of human therapeutic proteins in chloroplasts is their low-level expression. Prokaryotic or shorter human genes are highly expressed in chloroplasts (De Cosa et al., 2001; Arlen et al., 2007; Daniell et al., 2009; Ruhlman et al., 2010). However, expression of larger human proteins is a major challenge. For example, cholera nontoxic B subunit (CNTB)-fused native human blood-clotting factor VIII heavy chain (FVIII HC; 86.4 kD) or ANGIOTENSIN CONVERTING ENZYME2 (92.5 kD) were expressed at very low levels (Shenoy et al., 2014; Sherman et al., 2014). Likewise, the expression of viral vaccine antigens is quite unpredictable, with high, moderate, or extremely low expression levels (Birch-Machin et al., 2004; Lenzi et al., 2008; Waheed et al., 2011a, 2011b; Inka Borchers et al., 2012; Hassan et al., 2014). Furthermore, viral antigens are highly unstable, with expression observed in youngest leaves but not in mature leaves (McCabe et al., 2008). It is well known that high doses of vaccine antigens stimulate high-level immunity and confer greater protection against pathogens; therefore, higher level expression in chloroplasts is a key requirement for vaccine development (Chan and Daniell, 2015; Chan et al., 2016). Such challenges in transgene expression have been addressed by the use of optimal regulatory sequences (promoters and 5ʹ and 3ʹ untranslated regions [UTRs]),

especially species-specific endogenous elements (Ruhlman et al., 2010). In vitro assays of inserted genes with several synonymous codons show that translation efficiency does not always correlate with codon usage in plastid mRNAs (Nakamura and Sugiura, 2007), but they have been used in several codon optimization studies (Lutz et al., 2001; Ye et al., 2001; Franklin et al., 2002; Lenzi et al., 2008; Jabeen at al., 2010; Madesis et al., 2010; Gisby et al., 2011; Wang et al., 2015b; Boehm et al., 2016; Nakamura et al., 2016). While some studies achieved significant increases in expression (75- to 80-fold) after codon optimization (Franklin et al., 2002; Gisby et al., 2011), other studies observed negligible enhancement (Ye et al., 2001; Lenzi et al., 2008; Daniell et al., 2009; Wang et al., 2015b; Nakamura et al., 2016). However, translation initiation and the elongation efficiency of codon-optimized sequences were enhanced when chloroplast gene N-terminal sequences were inserted downstream of 5ʹ UTRs (Ye et al., 2001; Lenzi et al., 2008). In a recent study (Nakamura et al., 2016), the importance of compatibility between the psbA 5ʹ UTR and its 5ʹ coding sequence was shown using codon-optimized heterologous genes. The aforementioned codon optimization studies used only smaller eukaryotic coding sequences (less than 30 kD), but there is a great need to express larger human genes (e.g. FVIII; greater than 200 kD) that would require not only the optimization of codons but also compatibility with regulatory sequences for optimal translation initiation, elongation, and greater understanding of tRNAs encoded by the chloroplast genome or imported from the cytosol. However, no systematic study has been done to utilize the extensive knowledge gathered by sequencing several hundred chloroplast genomes (Daniell et al., 2016a) to understand codon usage and the frequency of highly expressed chloroplast genes. Another major challenge is the lack of reliable methods to quantify insoluble proteins; the only reliable method (ELISA) cannot be used due to the aggregation or formation of multimeric structures that are required for oral drug delivery. Although the FDA accepts ELISA for the quantitation of purified protein drugs, it is not suitable for quantifying protein drugs from impure extracts due to cross-reacting proteins, autoantibodies (Kim and You, 2013), or for the quantitation of insoluble, multimeric, or membrane proteins. Similarly, immunoblots used for quantitation also have several limitations (i.e. aggregation of proteins at high protein concentrations trapped in wells, alteration of mobility by incomplete solubilization or secondary structures, saturation of antibody-binding sites, and inefficient transfer of large proteins to membranes and variable quantitation due to short or long exposure to films). However, peptide-centric quantitation strategies (e.g. targeted mass spectrometry quantitation by parallel reaction monitoring [PRM]) can overcome most of the limitations mentioned above. In the preparation of protein samples for PRM, strong denaturing and reducing conditions are used (e.g. higher concentrations of SDS and DTT) in combination with optimal enzymatic proteolysis conditions (e.g. sodium-deoxycholate; León

Plant Physiol. Vol. 172, 2016

Downloaded from www.plantphysiol.org on October 2, 2016 - Published by www.plantphysiol.org Copyright © 2016 American Society of Plant Biologists. All rights reserved.

63

Kwon et al.

et al., 2013), especially suitable for insoluble, multimeric, or membrane proteins (Savas et al., 2011). Moreover, PRM can be used for relative and absolute protein quantitation of target proteins present in highly complex protein backgrounds based on its high specificity and sensitivity (Domon and Aebersold, 2010; Gallien et al., 2012; Picotti and Aebersold, 2012). In addition, PRM offers high specificity and multiplexing characteristics, which allow for specific monitoring of up to several hundred peptides in a single analysis (Gallien et al., 2012). Determination of protein drug dose in planta, especially of insoluble proteins without purification, is an unexplored area of research, and to our knowledge, we investigate this concept for the first time to quantify recombinant protein drugs made in chloroplasts. This study explores heterologous gene expression utilizing chloroplast genome sequences, ribosome profiling, and targeted mass spectrometry (PRM) to enhance our understanding of the translation of foreign genes in chloroplasts. We developed a codon optimizer program based on the analysis of psbA genes from 133 plant species to compare the translational efficiencies of native and codon-optimized genes driven by identical regulatory sequences. PRM using peptides selected from the N or C terminus were used to study the complete or incomplete synthesis of proteins and to validate this approach to quantify the dosage of protein drugs made in plant cells when compared with current methods. The codon optimizer program was evaluated in chloroplasts from two different species to identify any species specificity. Ribosome profile was evaluated for its suitability to diagnose limiting steps in transgene expression. These observations provide new insight into limitations in the translation of heterologous genes and approaches to address this in future studies. RESULTS Codon Optimization of Human/Viral Transgenes

Differences in codon usage by chloroplasts frequently decrease translation. We observed that plants expressing native sequences of FVIII HC or VIRAL CAPSID PROTEIN1 (VP1) from polio virus showed very low levels of expression, less than 0.05% for FVIII and approximately 0.1% for VP1 (see below). The psbA gene is among the most highly expressed genes in chloroplasts, and the translation efficiency of the psbA gene is greater than 200 times higher than that of the rbcL gene (Eibl et al., 1999). The 5ʹ UTR of psbA also showed the highest translation activity in vitro among 11 5ʹ UTRs investigated (Yukawa et al., 2007). Therefore, among 140 transgenes expressed in chloroplasts, more than 75% use the psbA regulatory sequences (Jin and Daniell, 2015; Daniell et al., 2016a, 2016b). Most importantly, compatibility between the 5ʹ UTR of psbA and its coding region is important for efficient translation initiation (Nakamura et al., 2016). For these reasons, a new codon optimization program was developed using codon 64

usage of the psbA genes from 133 sequenced chloroplast genomes (Fig. 1A). We first investigated the expression of synthetic genes using only the most highly preferred codon for each amino acid, which is referred to as the old algorithm in this study. When this resulted in even lower levels of expression than the native gene (see below), a new codon optimizer algorithm was developed using the codon usage hierarchy observed among sequenced psbA genes. Therefore, most of the rare codons in heterologous genes were modified based on codons with greater than 5% frequency of use in the psbA genes. Synonymous codons for each amino acid were ranked according to their frequency of use (Fig. 1B). In this study, native sequences for FVIII HC (2,262 bp) and VP1 (906 bp) were codon optimized using the old or new algorithm and synthesized. After codon optimization, the AT content of FVIII HC increased slightly, from 56% to 62%, and 406 codons out of 754 amino acids were optimized. For the VP1 sequence from the Sabin 1 polio virus strain, the 906-bp-long native sequence was codon optimized, which slightly increased the AT content from 52% to 59%, and 187 codons out of 302 amino acids were optimized. However, the CNTB coding sequence was not codon optimized because of its prokaryotic origin and high AT content (65.4%). Most importantly, the expression level of CNTB (native sequence) fused with proinsulin reached up to 72% of total leaf protein in tobacco chloroplasts (Ruhlman et al., 2010) and 53% of total leaf protein in lettuce chloroplasts (Boyhan and Daniell, 2011), indicating that there is no limitation on translation of the CNTB coding sequence in chloroplasts. All sequences, including native and codon-optimized synthetic genes (new and old algorithms), are shown in Supplemental Figure S1; rare codons in native genes are shown in red and modified codons are highlighted in yellow in Supplemental Figure S2. When the psbA-based codon table is compared with total chloroplast codon usage tables, which are generated based on all chloroplast genes of lettuce (57,528 codons from 189 coding sequences) or tobacco (34,756 codons from 137 coding sequences; Nakamura et al., 2000), there was no significant difference in AT content of coding sequences: it varied between 59.59% and 61.76%. However, there are striking differences between psbA-based and total chloroplast gene-based codon tables when individual codons are compared. Native FVIII HC used CTC Leu codon 11 times, but codon-optimized (new algorithm) HC eliminated all CTC codons. However, if the total chloroplast codon table is used, codon-optimized HC would still use five CTC codons. As seen in ribosome profiles, discussed below, tandem repeat of CTC-CTC in the native FVIII HC sequence resulted in major stalling sites that were completely eliminated by psbA-based codon optimization (new algorithm). Likewise, another rare codon, TCA (Ser), is used 16 times in the FVIII HC and seven times in VP1 coding sequences. However, the TCA rare codon was eliminated completely in both genes after codon optimization using the new algorithm. However, if the total codon table is used for codon optimization, Plant Physiol. Vol. 172, 2016

Downloaded from www.plantphysiol.org on October 2, 2016 - Published by www.plantphysiol.org Copyright © 2016 American Society of Plant Biologists. All rights reserved.

New Tools to Study Transgene Expression in Chloroplasts

Figure 1. Development of a codon optimization algorithm for the expression of heterologous genes in plant chloroplasts. A, Process to develop the codon optimization algorithm. Sequence data of psbA genes from 133 plant species collected from the National Center for Biotechnology Information, and their codon preferences, were analyzed. Finally, the codon optimizer was developed using Java. B, Codon preference table. Codon preference is indicated by the percentage of use for each amino acid. Black and underlined codons indicate codons that were not used when optimizing sequences due to their low usage frequency among synonymous codons (less than 5% use or, for amino acids with six synonymous codons [Leu, Ser, and Arg], the two codons used least frequently).

FVIII HC and VP1 would still contain 12 and five TCA codons. Collectively, the new codon optimization algorithm eliminated 105 and 59 rare codons from FVIII HC and VP1, respectively, resulting in enhanced expression of both genes. However, if the total codon table is used, there will be 75 and 35 rare codons in codon-optimized FVIII HC and VP1 coding sequences, respectively. All 13 codons (GCG [Ala], GGG [Gly], CTG [Leu], CTC [Leu], CCG [Pro], CCC [Pro], AGG [Arg], CGG [Arg], TCA [Ser], TCG [Ser], ACG [Tyr], GTC [Val], and CTG [Val]) rarely used in the psbA gene were eliminated using our codon-optimized table (new algorithm). More detailed information on the codon distribution between different codon tables is included in Supplemental Figure S3. Synthetic gene cassettes were inserted into the chloroplast transformation vector, pLSLF for lettuce or pLD-utr for tobacco (Fig. 2A). Native and synthetic

genes were fused to the native CTNB sequence, which is used for efficient transmucosal delivery of fused proteins via monosialotetrahexosylganglioside receptors present on intestinal epithelial cells. To eliminate possible steric hindrance caused by the fusion of two proteins and facilitate the release of tethered proteins into the circulation after internalization, nucleotide sequences for a hinge (Gly-Pro-Gly-Pro) and a furin cleavage site (Arg-Arg-Lys-Arg) were engineered between CNTB and fused proteins. Fusion genes were placed under the control of identical psbA promoters, 59 UTR and 39 UTR regulatory sequences, for specific evaluation of codon optimization (Fig. 2A). To select transformants, the aminoglycoside-399-adenylyl-transferase gene was driven by the rRNA promoter to confer resistance to spectinomycin in transformed cells. Expression cassettes were flanked by sequences for isoleucyl-tRNA synthetase and alanyl-tRNA synthetase, which are

Plant Physiol. Vol. 172, 2016

Downloaded from www.plantphysiol.org on October 2, 2016 - Published by www.plantphysiol.org Copyright © 2016 American Society of Plant Biologists. All rights reserved.

65

Kwon et al.

Figure 2. Construction of chloroplast vectors using native or codon-optimized genes, and evaluation of homoplasmy and transgene expression. A, Lettuce or tobacco chloroplast vector maps. aadA, Aminoglycoside 39-adenylytransferase gene; CNTB, coding sequence of cholera nontoxic B subunit; FVIII HC, factor 8 heavy chain native (N) or codon optimized (CN) using the new algorithm; PpsbA, promoter and 59 UTR of the psbA gene; Prrn, rRNA operon promoter; SB-P, BamHI fragment; TpsbA, 39 UTR of the psbA gene; trnA, alanyl-tRNA; trnI, isoleucyl-tRNA. B and C, Southern-blot analysis of homoplasmic lines. Total genomic DNA (3 mg) from untransformed (UT), native (N), or codon-optimized CNTB-FVIII HC (new algorithm; CN) was digested with HindIII and separated on a 0.8% agarose gel, blotted onto a Nytran membrane, and probed with a BamHI fragment. Lanes 1 to 4 show four independent transplastomic lines. L.s., Lactuca sativa. D, Comparison of the expression level of CNTB-VP1 between transplastomic lines expressing the native (N) or codon-optimized genes using the old (CO) or new (CN) algorithm. Total extracted proteins were loaded as indicated protein concentrations and were probed with anti-CNTB antibody. CNTB, Standard protein of cholera nontoxic B subunit; IDV, integrated density values; N.t., Nicotiana tabacum.

identical to endogenous chloroplast genome sequences, leading to efficient double homologous recombination and optimal processing of introns with flanking sequences (Fig. 2A). Transformation vectors containing the native or synthetic sequences for FVIII HC and VP1 sequences were used to create transplastomic lettuce or tobacco plants. To confirm homoplasmy, Southern-blot analysis was performed on four independent lettuce and tobacco lines expressing native or codon-optimized FVIII HC and VP1. For lettuce plants expressing either native or codon-optimized CNTB-FVIII HC, chloroplast genomic DNA was digested with HindIII and probed with digoxigenin (DIG)-labeled probe spanning the flanking region. All selected lines showed the expected distinct hybridizing fragments and no untransformed fragment (Fig. 2, B and C). The homoplasmic tobacco lines expressing native or codon-optimized CNTB-VP1 sequences were confirmed already in a previous study (Chan et al., 2016). Therefore, these data confirm the homoplasmy of all transplastomic lines; therefore, 66

transgene expression levels should be attributed to translation efficiency and not transgene copy number. Translation Efficiency of Native and Codon-Optimized Genes in Lettuce and Tobacco Chloroplasts

Expression levels between native and codon-optimized genes in chloroplasts were compared using immunoblot and densitometry assays. Early studies in this project compared the translation efficiency of the old algorithm (using only the most preferred codons) with that of the new algorithm (using the psbA codon hierarchy) quantified by integrated density values of western blots (Fig. 2D). The CNTB-VP1 expression level in transplastomic plants using the old algorithm for codon optimization was 2.7- to 3.1-fold lower than that of the native VP1 viral gene sequence, and the increase in VP1 expression was 77- to 111-fold higher using the new algorithm (Fig. 2D). Therefore, the new algorithm of the codon optimizer program was used in all subsequent studies. In order to correct for overexposure or Plant Physiol. Vol. 172, 2016

Downloaded from www.plantphysiol.org on October 2, 2016 - Published by www.plantphysiol.org Copyright © 2016 American Society of Plant Biologists. All rights reserved.

New Tools to Study Transgene Expression in Chloroplasts

underexposure of western blots to x-ray film, data on variable exposures were collected. In order to account for extreme variation in the expression levels of native and codon-optimized genes, serial dilutions of extracted proteins were loaded on each blot (Fig. 3, A and B; Supplemental Fig. S4). In a densitometry assay of lettuce expressing native and codon-optimized CNTBFVIII HC, which also was used for PRM, the concentration of FVIII HC from the codon-optimized gene (108.8–137.5 mg g21 dry weight) was 6.3- to 8-fold higher than that of the native FVIII HC gene (16.9–17.4 mg g21 dry weight; Supplemental Fig. S4). For tobacco plants expressing CNTB-VP1, the batch used for PRM mass spectrometry showed a 91- to 125-fold difference between codon-optimized (11.3–18.1 mg mg21) and

native sequence (0.12–0.15 mg mg21; Fig. 3C; Supplemental Fig. S4). Based on these data, codonoptimized sequences obtained from our newly developed codon optimizer program improved the translation of transgenes to different levels, based on the coding sequence. To investigate the impact of codon optimization on transcript stability, northern blotting was performed using a probe for the psbA 5ʹ sequence (Fig. 4). Although loading controls show equal amounts of total RNA in each lane based on ethidium bromide staining, higher or lower levels of the endogenous psbA transcript are observed among samples, suggesting subtle changes in RNA loading. The mRNA levels of codon-optimized or native sequences for CNTB-FVIII HC and CNTB-VP1

Figure 3. Quantitation of native or codon-optimized CNTB-FVIII HC or CNTB-VP1 gene expression using western blots. Extracted leaf proteins were resolved on gradient (4%–20%) SDS-PAGE and probed with anti-CNTB antibody (1:10,000). For a loading control, the same membranes were stripped and reprobed with anti-RbcL antibody (1:5,000). A, Lettuce leaf protein extracts (5 or 10 mg) expressing CNTB-FVIII HC or untransformed. For loading controls, Ponceau S staining of membrane prior to western blot or reprobed blot with the large subunit of Rubisco (RbcL) is provided. B, Serial dilution of the native (5–20 mg) or codon-optimized (1–4 mg) CNTB-FVIII HC lettuce leaf extracts. C, Serial dilution of the native (2–8 mg) or codon-optimized (0.1– 0.4 mg) CNTB-VP1 tobacco leaf extracts. CO or CN, Codon optimized with old algorithm (CO) or new algorithm (CN); L.s., Lactuca sativa; N, native sequence; N.t., Nicotiana tabacum; UT, untransformed wild type. Plant Physiol. Vol. 172, 2016

Downloaded from www.plantphysiol.org on October 2, 2016 - Published by www.plantphysiol.org Copyright © 2016 American Society of Plant Biologists. All rights reserved.

67

Kwon et al.

were normalized to endogenous psbA transcripts using densitometry, and the normalized ratios in each sample were compared. Northern blots indicated that the increase of codon-optimized CNTB-FVIII HC and CNTBVP1 accumulation is at the translational level rather than RNA transcript accumulation. Several previous studies on the expression of foreign genes have shown a lack of variation or modest increases in transcript abundance but significant variation in translation efficiency (Franklin et al., 2002; Gisby et al., 2011; Nakamura et al., 2016). Franklin et al. (2002) reported a lack of variation in transcript abundance for GFP expression in Chlamydomonas reinhardtii chloroplasts despite an 80-fold increase in GFP protein accumulation of the codonoptimized sequence. Even though there was a 3-fold increase in mRNA levels of codon-optimized TGF-b3 when compared with the native sequence (Gisby et al., 2011), the greater part of the 75-fold increase in synthetic TGF-b sequence was attributed to enhanced translation. A recent study also showed that the compatibility of the 5ʹ UTR and its coding sequence increased the efficient translation of codon-optimized sequences rather than mRNA abundance (Nakamura et al., 2016). Absolute Quantitation by PRM Analysis

Expression levels of codon-optimized and native gene sequences also were quantified using PRM mass spectrometry (Fig. 5). To select the optimal proteotypic peptides for PRM analysis of the CNTB and FVIII HC sequences, we first performed a standard tandem mass spectrometry analysis (data not shown) of a tryptic digest of lettuce plants expressing CNTB-FVIII HC to choose specific peptides. The expression of codonoptimized FVIII HC was 5.4- or 5.8-fold higher than that of the native sequence when the fold changes were

normalized based on the housekeeping protein peptides or stable isotope-labeled standard (SIS) peptides (Figs. 5 and 6A). Peptides chosen from CNTB showed minor variations in fold change based on the locations of peptides and normalized with SIS or housekeeping protein peptides from Rubisco (small or large subunits) or ATP synthase subunit b: 4.9 (or 4.5; IAYLTEAK), 5.2 (or 4.8; IFSYTESLAGK), or 6.6 (or 6.1; LCVWNNK). Peptides chosen from FVIII HC also showed minor variations: 5.4 (or 5; FDDDNSPSFIQIR), 5.7 (or 5.2; YYSSFVNMER), or 7.1 (or 6.6; WTVTVEDGPTK; Fig. 6A). The locations of these selected peptides within CNTB-FVIII HC are shown in Supplemental Figure S5. For more details, see the raw data included in Supplemental Data Set S1. The expression of codon-optimized CNTB-VP1 was 25.9- or 26.1-fold higher than that of the native sequence when their fold changes were normalized based on the SIS peptides or housekeeping protein peptides (Figs. 5 and 6B). Peptides chosen from CNTB showed minimal variations in fold changes based on their locations: 22.5 (or 22.5; LCVWNNK) to 26.1 (or 26; IAYLTEAK) to 28.1 (or 28; IFSYTESLAGK; Fig. 6B). The linearity of the quantification range also was investigated by spiking stable SIS peptides in a constant amount of plant digest (1:1:1:1 mix of all four types of plant materials) in a dynamic range covering 220 amol to 170 fmol (values equivalent on column per injection). These results are reported in detail in Supplemental Figure S7. For all six peptides, we observed an r2 value over 0.98. Absolute quantitation can be achieved by spiking a known amount of the counterpart SIS peptide into samples. For each counterpart, SIS peptide (34 fmol) was injected on column mixed with protein digest (equivalent to protein extracted from 33.3 mg of lyophilized leaf powder). By calculating ratios of area under the curve of SIS and endogenous peptides, we estimated the endogenous

Figure 4. Northern analysis of transplastomic lines. Transgene transcripts of CNTB-FVIII HC (A) or CNTB-VP1 (B) were probed with 200 bp of lettuce psbA 5ʹ UTR (for FVIII HC) or tobacco psbA 5ʹ UTR (for VP1) regulatory sequences. Bottom and top arrowheads represent the endogenous psbA gene and CNTB-FVIII or CNTB-VP1 transgene, respectively. Ethidium bromide (EtBr)stained gels are included for the evaluation of equal loading. CN, Codon-optimized sequence using the new algorithm; N, native sequence; UT, untransformed wild type. 68

Plant Physiol. Vol. 172, 2016

Downloaded from www.plantphysiol.org on October 2, 2016 - Published by www.plantphysiol.org Copyright © 2016 American Society of Plant Biologists. All rights reserved.

New Tools to Study Transgene Expression in Chloroplasts

Figure 5. PRM mass spectrometry analysis of CNTB-FVIII and CNTB-VP1 proteins at N- to C-terminal protein sequences. The y axis shows molarity (fmol on column) of peptides from CNTB-FVIII HC or CNTB-VP1 in codon-optimized or native genes. CNTB: peptide 1, IFSYTESLAGK; peptide 2, IAYLTEAK; peptide 3, LCVWNNK. FVIII: peptide 4, FDDDNSPSFIQIR; peptide 5, WTVTVEDGPTK; peptide 6, YYSSFVNMER. The median of four technical replicates is presented for each sample. Circles represent native sequences, and squares represent codon-optimized (c.o.) sequences using the new algorithm. CV, Coefficient of variation.

peptide molarity, expressed as femtomoles on column (Fig. 6). The mean of all calculated ratios of femtomoles on column (six and three peptides for CNTB-FVIII HC and CNTB-VP1, respectively) for codon-optimized and native sequences is reported as the fold increase of protein expression in codon-optimized constructs. The high reproducibility of the sample preparation and PRM analysis is shown in Figure 5. All peptide measurements were the result of four technical replicates, two sample preparation replicates (from leaf powder to extraction to protein digestion), and two mass spectrometry technical replicates. Coefficients of variation among the four measurements per peptide ranged from 0.5% to 10% in all but two cases, where they were 17% and 22%.

Ribosome Profiling Studies

Ribosome profiling uses deep sequencing to map ribosome footprints, mRNA fragments that are protected

by ribosomes from exogenous nuclease attack. The method provides a genome-wide, high-resolution, and quantitative snapshot of mRNA segments occupied by ribosomes in vivo (Ingolia et al., 2009). Total ribosome footprint abundance within an open reading frame can provide an estimate of translational output, and positions at which ribosomes slow or stall are marked by regions of particularly high ribosome occupancy. To examine how codon optimization influenced ribosome behavior, we profiled ribosomes from plants expressing the native and codon-optimized CNTBFVIII HC and CNTB-VP1 transgenes. Figure 7 shows the abundance of ribosome footprints as a function of position in each transgene; footprint coverage on the endogenous chloroplast psbA and rbcL genes is shown as a means to normalize the transgene data between the optimized and native constructs. Ribosome footprint coverage was much higher in the codon-optimized VP1 sample than in the native VP1 sample (Fig. 7A). However, the magnitude of this increase varies depending upon

Plant Physiol. Vol. 172, 2016

Downloaded from www.plantphysiol.org on October 2, 2016 - Published by www.plantphysiol.org Copyright © 2016 American Society of Plant Biologists. All rights reserved.

69

Kwon et al.

Figure 6. Fold change (increase) of CNTB-FVIII HC or CNTB-VP1 proteins based on targeted mass spectrometry analysis of CNTB and HC peptides. The reported data represent medians of the results from six and three peptides from CNTB-FIII HC (A) and CNTB-VP1 (B), respectively. The y axis represents the fold change increase (based on measured fmol on column) of peptides from plant materials expressing genes codon optimized using the new algorithm (CO) with respect to plant materials expressing native sequence (N). CNTB: peptide 1, IFSYTESLAGK; peptide 2, IAYLTEAK; peptide 3, LCVWNNK. FVIII HC: peptide 4, FDDDNSPSFIQIR; peptide 5, WTVTVEDGPTK; peptide 6, YYSSFVNMER. SIS-normalized values represent fold change as a ratio to each spiked SIS peptide. Housekeeping (HK) protein normalization values represent fold change as a normalized ratio to Rubisco large or small subunit and ATP synthase CF1 b-subunit protein peptides. For peptide ratio results for CNTB-FVIII and CNTB-VP1, see Supplemental Data Set S1.

how the data are normalized (Fig. 7C): the increase is 5-, 16-, or 1.5-fold when normalized to total chloroplast ribosome footprints, psbA ribosome footprints, or rbcL ribosome footprints, respectively. These numbers are considerably lower than the 22.5- to 28.1-fold increase in VP1 protein abundance inferred from the quantitative mass spectrometry data. The topography of ribosome profiles is generally highly reproducible among biological replicates (see rbcL and psbA in Fig. 7B), which are at the same developmental stage and grown under the same conditions. In that context, it is noteworthy that the peaks and valleys in the endogenous psbA and rbcL genes are quite different in the native and optimized tobacco VP1 lines. It could be envisaged that competition with the endogenous psbA 5ʹ UTR could, in principle, reduce the translation of the endogenous psbA open reading frame. However, no such competition was observed for the lettuce construct. In addition, the degree of competition would depend on the abundance of the transgene mRNA. The abundance of the transgene mRNA was similar in the native and codonoptimized constructs, so competition via the psbA 5ʹ UTR is unlikely to contribute to differences in psbA ribosome occupancy in these lines. Many of the large peaks (presumed ribosome pauses) observed in these endogenous genes, specifically in the native VP1 line, map to paired Ala codons (asterisks in Fig. 7A). This suggests a limitation of Ala tRNA specifically in the native VP1 line. Although the basis for this is unclear, it is conceivable that it has to do with minor differences in the age of the plants used for the analyses (2.5 versus 2 months). It is also conceivable that introduction of the transgene had an unanticipated effect on the expression of the nearby gene encoding Ala tRNA. In the same vein, ribosome pause sites in the CNTB region would be expected, but the sites of the native and optimized VP1 constructs were not similar. This global difference in 70

ribosome behavior at Ala codons may well contribute to differential transgene expression in the native and codon-optimized lines. The total number of ribosome footprints in the FVIII gene decreased approximately 2-fold in the codonoptimized line, whereas protein accumulation increased 4.5- to 6.6-fold. However, a major ribosome pause can be observed near the 3ʹ end of the native transgene, followed by a region of very low ribosome occupancy (see bracketed region in Fig. 7B). This ribosome pause maps to a pair of CTC Leu codons, a codon that is almost not used in native psbA genes (Fig. 1). These results strongly suggest that the stalling of ribosomes at these Leu codons limits the translation of the downstream sequences and overall protein output while also causing a buildup of ribosomes on the upstream sequences. Thus, overall ribosome occupancy does not reflect translational output in this case. Modification of those Leu codons in the codon-optimized variant eliminated this ribosome stall and resulted in a much more even ribosome distribution over the transgene (Fig. 7B, right). Taken together, the ribosome-profiling data revealed dramatic differences in ribosome dynamics between codon-optimized and native transgenes. Although total ribosome occupancy did not reliably predict protein output from transgenes expressed in chloroplasts, the detection of strong ribosome pauses at specific sites can provide insight into rate-limiting steps that could be mitigated through sequence modifications.

DISCUSSION

Past studies on transgene expression in chloroplasts reported abundant transcripts but variable levels of translation based on the origin of the coding sequence. Prokaryotic genes were translated more efficiently than Plant Physiol. Vol. 172, 2016

Downloaded from www.plantphysiol.org on October 2, 2016 - Published by www.plantphysiol.org Copyright © 2016 American Society of Plant Biologists. All rights reserved.

New Tools to Study Transgene Expression in Chloroplasts

Figure 7. Ribosome profiling data from transplastomic plants expressing native and codon-optimized VP1 or FVIII HC. Read coverage for native transgenes (N), codon-optimized transgenes with new algorithm (CN), and the endogenous psbA and rbcL genes are displayed with the Integrated Genome Viewer. A, Data from tobacco leaves expressing native and codon-optimized VP1 transgenes. Asterisks mark each pair of consecutive Ala codons in the data from the native line. The + symbol marks three consecutive Ala codons. Many strong ribosome pause sites in the plants expressing native VP1 map to paired Ala codons, whereas Plant Physiol. Vol. 172, 2016

Downloaded from www.plantphysiol.org on October 2, 2016 - Published by www.plantphysiol.org Copyright © 2016 American Society of Plant Biologists. All rights reserved.

71

Kwon et al.

eukaryotic genes. Transcript abundance is attributed to the high copy number of transgenes and the strength of the psbA promoter. Among more than 150 transgenes expressed in chloroplasts, more than 75% utilized psbA regulatory sequences (Jin and Daniell, 2015; Daniell et al., 2016a, 2016b). In addition, three ribosome-binding regions in the 5ʹ UTR of psbA recruit ribosomes and efficiently form the translational initiation complex (Zou et al., 2003). Therefore, it is expected that improvement of translation elongation in heterologous genes should increase transgene expression. There is a drawback of using a codon table based on all chloroplast genes, which assumes that all tRNA species are equally abundant. However, such translational selection is not possible (Surzycki et al., 2009). Therefore, in this study, we developed a codon optimizer program based on the codon usage of psbA genes across 133 plant species to increase the expression of heterologous genes in chloroplasts. Codon Optimization Significantly Enhances Translation in Chloroplasts

The psbA promoter and 5ʹ UTR are the most widely used regulatory sequences for transgene expression in chloroplasts. Among more than 115 transgenes expressed via the chloroplast genome, 84 use the psbA regulatory sequence (Jin and Daniell, 2015; Daniell et al., 2016a, 2016b). A recent study (Nakamura et al., 2016) shows the absence of any detectable translation when codons for the tat coding sequence of HIV-1 were optimized using all 79 tobacco chloroplast mRNAs and regulated by the psbA 5ʹ UTR (Nakamura et al., 2016), but the same sequence was expressed well using the phage T7 GENE10 5ʹ UTR. However, when the 5ʹ psbA coding sequence was inserted between the psbA 5ʹ UTR and the tat sequence, translation was initiated. Therefore, compatibility between the psbA regulatory element and codons is vital for initiation and elongation during the translation of heterologous genes (Nakamura et al., 2016). Therefore, when heterologous genes are regulated by psbA, codon optimization based on psbA codon usage should facilitate the movement of ribosomes more efficiently from the translational initiation complex than codon-optimized sequences based on any other chloroplast genes. In this study, we developed and tested two new codon optimizer programs based on the codon preference of psbA genes to improve the expression of heterologous genes in chloroplasts in concert with the psbA regulatory elements. The first old algorithm of the codon optimizer was programmed to use only the most

highly used codons, resulting in lower expression than the native gene. The increase in expression of VP1 in chloroplasts between the old and new algorithm is 77to 111-fold. Therefore, removal of rare codons and replacement with only highly preferred codons did not help in enhancing translation when tRNA pools were limited. Thus, the new algorithm of the codon optimizer program was used in all subsequent studies. The new algorithm of the codon optimizer used the codon distribution hierarchy observed among psbA genes. As a result, 105 rare codons out of 754 codons in the FVIII HC gene and 59 rare codons out of 302 codons in the VP1 gene were replaced with psbA preferentially used codons. However, the replaced codons are not identified as rare codons in codon tables using all chloroplast genes. Therefore, the total chloroplast codon table would have retained 75 rare codons in FVIII HC and 35 rare codons in the VP1 coding sequence. Although we used a psbA-based codon optimization program to improve translation in chloroplasts, many other factors, including the size and origin of heterologous genes and the compatibility of the 5ʹ UTR and its 5ʹ coding region, are important. The CNTB-fused native sequence of human proinsulin (approximately 22 kD) was expressed up to 72% of total leaf protein (Ruhlman et al., 2010), and the expression of ZZTEV-IGF-1 (Staphylococcus aureus Z domains and TEV cleavage site fused to native human insulin-like growth factor1 gene; approximately 26 kD) was up to 32.4% of TSP (Daniell et al., 2009). However, human TGF-b3 (13 kD, 56% GC) was expressed in up to 12% of leaf protein only after codon optimization (Gisby et al., 2011). Also, the expression of GFP (approximately 26 kD) increased approximately 80-fold after codon optimization (Franklin et al., 2002). Therefore, proteins with shorter coding sequences are not ideal to evaluate codon optimization concepts and other limitations in translation. Consequently, a better understanding of codon usage and other rate-limiting steps (compatibility with regulatory sequences, efficiency of translation initiation, elongation, and availability of tRNAs) in translation is essential for the successful expression of human or other eukaryotic coding sequences. Codon usage in psbA (our program) is different for preferred Arg, Asn, Gly, His, Leu, and Phe codons than those reported for 79 tobacco chloroplast mRNAs based on in vitro studies (Nakamura and Sugiura, 2007). Preferred codons are decoded more rapidly than nonpreferred codons, presumably due to higher concentrations of corresponding tRNAs that recognize preferred codons, which speeds up the elongation rate

Figure 7. (Continued.) this is not observed in the codon-optimized line. Triangles mark each pair of consecutive Ser codons in the codon-optimized line. A major ribosome stall maps to a region harboring five closely spaced Ser codons in the codon-optimized VP1 gene. nt, Nucleotides. B, Data from lettuce plants expressing the native and codon-optimized FVIII HC transgenes. A major ribosome stall in the native FVIII HC gene maps to a pair of adjacent CTC Leu codons, a codon that is not used in the native psbA gene. Ribosome footprint coverage is much more uniform on the codon-optimized transgene. C, Absolute and relative ribosome footprints counts. 72

Plant Physiol. Vol. 172, 2016

Downloaded from www.plantphysiol.org on October 2, 2016 - Published by www.plantphysiol.org Copyright © 2016 American Society of Plant Biologists. All rights reserved.

New Tools to Study Transgene Expression in Chloroplasts

of protein synthesis (Yu et al., 2015). Higher plant chloroplast genomes code for a conserved set of 30 tRNAs. This set is believed to be sufficient to support the translation machinery in chloroplasts (Lung et al., 2006). In the ribosome profiling data for codon-optimized VP1, two major ribosome stalling sites correlated with an unusually high concentration of Ser codons (Fig. 7A). Five Ser codons were clustered at codons 71, 73, 75, 76, and 79, and three other Ser codons were found at codons 178, 179, and 182. Two adjacent Ser residues in each cluster, codons 75 and 76 (UCU-AGU) and codons 178 and 179 (UCC-UCU; see triangles in Fig. 7A), show a high level of ribosome stalling. Thus, it may be possible to further increase the expression of the codonoptimized VP1 transgene by replacing these codons with codons for a different but similar amino acid. As seen in this study, the AT content of codonoptimized VP1 was increased marginally, but the protein level of the optimized CNTB-VP1 increased significantly, up to 22.5- to 28.1-fold (by PRM) and 91to 125-fold (by western blot), over the native sequence when expressed in chloroplasts. Therefore, several other factors play key roles in regulating the efficiency of translation. As observed in ribosome profiling studies of CNTB-VP1, the availability and density of specific codons could severely impact translation. Similarly, FVIII HC ribosome footprint results showed that ribosome pauses mapped to CTC Leu codons, which are almost not used in psbA genes. This codon also is rarely used in the lettuce rbcL gene (2.44%) and is never used in tobacco rbcL. Native FVIII HC uses the CTC codon as much as 15.28%, but the CTC codon was eliminated from the codon-optimized sequence based on psbA codon usage. More detailed analysis of the codon frequency of the native FVIII HC and the psbA gene reveals further insight into rare codons: GGG for Gly is used 2.3% in psbA but 11.63% in native HC; CTG for Leu is 3.7% in psbA but 26.39% in native HC; CCC for Pro is 1.9% versus 11.9%; CGG for Arg is 0.5% versus 10.81%; and CTG for Val is 1.7% versus 25.49%. So, similar to the CTC codon, several other rare codons in native human genes should have reduced translational efficiency in chloroplasts. In the process of developing the codon optimizer, the cutoff value used for the determination of codons was set at 5% to eliminate rare codons. So, there is room to further modify the codon optimizer program.

New Solution for the Quantitation of Insoluble Multimeric Proteins

A major challenge is the lack of reliable methods to quantify insoluble proteins, because the only reliable method (ELISA) cannot be used due to the aggregation or formation of multimeric structures. CNTB fusion proteins expressed in chloroplasts form pentameric structures that are highly resistant to detergents, and this hampers solubilization due to tight interactions between CNTB monomers, mediated by 30 hydrogen

bonds, seven salt bridges, and hydrophobic interactions (Miyata et al., 2012). In our previous studies (Boyhan and Daniell, 2011; Kwon et al., 2013; Kohli et al., 2014; Shil et al., 2014), multimeric forms exist even after treatment with DTT, detergents (SDS), and boiling. Also, acid (pH 2) could not completely dissociate CNTB pentamers due to the reformation of multimeric structures. Although such stability of pentamers is ideal for the oral drug delivery of CNTB fusion proteins, quantitation of the dose continues to be a major challenge. Delivering accurate doses of protein drugs is a fundamental requirement for their clinical use. Therefore, in this study, we carried out PRM analysis for the absolute quantitation of CNTB-FVIII HC and CNTB-VP1 in plants carrying codon-optimized and native sequences. Limitations in quantitation using western blots, including protein aggregation and inefficient transfer of large proteins to membranes, inadequate solubilization, and differential exposure to films, were quite evident, resulting in unreliable quantification of drug dosage in planta. Use of strong denaturing and reducing conditions in combination with optimal enzymatic proteolysis conditions maximized the solubilization of multimeric CNTB proteins. PRM analysis has been broadly adopted in quantitative proteomics studies (e.g. biomarker discovery in plasma), due to its high sensitivity, specificity, and precise quantitation of specific protein targets within complex protein matrices (Gallien et al., 2012). These qualities clearly show the advantage of using PRM in the quantification of specific protein targets, independent of the protein matrix source (e.g. plant extracts from tobacco or lettuce) or complexity. Moreover, the development of a PRM assay for a handful of proteins can be achieved in a relatively short time and at low cost (not considering the mass spectrometry instrumentation). As a peptide-centric quantitation methodology, it also offers robustness and versatility of protein extraction methods, and keeping the protein of interest in a native conformation is not required. However, it is intrinsically biased by the enzymatic cleavage site access of the enzymes used for digestion. In order to overcome this bias, we used strong denaturing conditions (i.e. 2% SDS) and buffers that favor the activity of the proteolytic enzymes (i.e. sodium deoxycholate-based buffers; León et al., 2013). For FVIII HC (Figs. 5 and 6), there were no significant variations in the values for fold increases of codonoptimized over native sequences, which were determined by the peptides chosen for quantification. In addition, the fold increases were very similar between two different normalization approaches. Three peptides selected from the CNTB region (N terminus of the fusion protein) showed that the range of the fold increase was from 4.5 to 6.6, while the range was 5 to 7.1 for the peptides chosen from FVIII regions (C terminus of the fusion protein). Therefore, quantification results obtained from PRM analysis are consistent, irrespective of the selected region of the fusion protein (N or C terminus) or the component protein (CNTB or FVIII HC).

Plant Physiol. Vol. 172, 2016

Downloaded from www.plantphysiol.org on October 2, 2016 - Published by www.plantphysiol.org Copyright © 2016 American Society of Plant Biologists. All rights reserved.

73

Kwon et al.

By using absolute quantified SIS peptides at identical concentrations in all samples and by examining the entire length from N to C terminus, one could accurately quantify the absolute amount of the target protein (Streng et al., 2016). Furthermore, the accuracy of PRM assays in this study was further consolidated by using two different normalization methods: SIS peptides and peptides for housekeeping proteins (large or small subunit proteins of Rubisco and ATP synthase b-subunit). Incomplete/cleaved proteins can be detected using targeted peptide located closer to the C or N terminus or in the midregions. Quantification results obtained from PRM analysis of both CNTB fusion proteins in our study are consistent, irrespective of the selected region of the fusion protein (N or C terminus or elsewhere), and offer data for reliable quantitation. Also, the same three CNTB peptides for CNTB-VP1 showed consistent fold increases, ranging from 22.5 to 28.1. PRM analysis is better than western blotting because it eliminates variation introduced by mobility and the transfer of different-sized proteins and the saturation of antibody probes. Overall, the PRM workflow included selection of the proteotypic peptides from CNTB and FVIII HC sequences and synthesis of the counterpart SIS peptides (Supplemental Fig. S6). Six peptides were selected and scheduled for PRM analysis on the Q Exactive mass spectrometer, based on observed retention time on the chromatograph with a window of 65 min and mass-tocharge ratio (m/z) of the double and/or triple charge state of these peptides. This double way of targeting the selection of precursor ions, in addition to the high resolution of the Q Exactive mass spectrometer, contributes to the high specificity of the assay. The PRM data analysis, postacquisition, also offers a high specificity to the assay. The five most intense fragment ions, with no clear contaminant contribution from the matrix, are then selected for the quantification of the peptide. The confidence of the fragment ion assignment by the bioinformatics tool used (i.e. Skyline; MacLean et al., 2010) is finally achieved by the comparison of the reference tandem mass spectrometry spectra and the retention time profiles, generated with each of the counterpart SIS peptides. The high sensitivity, specificity, versatility, and robustness of PRM offer a new opportunity for characterizing translational systems in plants.

due to the aggregation or formation of multimeric structures is a major challenge. Both biopharmaceuticals used in this study are CNTB fusion proteins that form pentamers, which is a requirement for their binding to intestinal epithelial monosialotetrahexosylganglioside receptors. Such a multimeric structure excludes the commonly used ELISA for the quantitation of dosage. However, delivering accurate doses of protein drugs is a fundamental requirement for their clinical use, and this important goal was accomplished in this study. Indeed, plant biomass generated in this study has resulted in the development of a polio booster vaccine that has been validated by the Centers for Disease Control and Prevention, a timely invention to meet the World Health Organization requirement to withdraw the current oral polio vaccine, which causes severe polio in outbreak areas, in April 2016 (Chan et al., 2016). Such an increase of codon-optimized protein accumulation is at the translational level rather than any impact on transcript abundance. The codon optimizer program increases transgene expression in chloroplasts in both tobacco and lettuce with no species specificity. In contrast to previous in vitro studies, these in-depth in vivo studies of heterologous gene expression using a wealth of newly sequenced chloroplast genomes helped us to understand the codon optimization process. While the removal of rare codons is very important, replacing those with the most highly used psbA codons indeed decreased translation efficiency. Therefore, the key factor in enhancing translation is the replacement of rare codons following the hierarchy of a highly expressed gene. Ribosome footprints obtained using profiling studies did not increase proportionately with VP1 translation or even decreased after FVIII codon optimization, but it is a valuable tool for diagnosing ratelimiting steps in translation. A major ribosome pause at CTC Leu codons, a rarely used codon in chloroplasts, was eliminated from the native gene after codon optimization. Ribosome stalls observed at clusters of other codons in codon-optimized genes provide opportunities for further optimization. These observations provide further insight into limitations in chloroplast translation and approaches to address these in future studies. MATERIALS AND METHODS Codon Optimization

CONCLUSION

This study explored heterologous gene expression utilizing chloroplast genome sequences, ribosome profiling, and targeted mass spectrometry to enhance our understanding of the synthesis of valuable biopharmaceuticals in chloroplasts. Targeted proteomic quantification by mass spectrometry showed that codon optimization increases translation efficiency 4.5- to 28.1-fold based on the coding sequence, validating this approach, to our knowledge, for the first time for the quantitation of protein drug dosage in plant cells. The lack of reliable methods to quantify insoluble proteins 74

To maximize the expression of heterologous genes in chloroplasts, a chloroplast codon optimizer program was developed based on the codon preference of psbA genes across 133 seed plant species. All sequences were downloaded from the National Center for Biotechnology Information (http://www.ncbi.nlm.nih. gov/genomes/GenomesGroup.cgi?taxid=2759&opt=plastid). The usage preference among synonymous codons for each amino acid was determined by analyzing a total of 46,500 codons from 133 psbA genes. The optimization algorithm (Chloroplast Optimizer version 2.1) was made to facilitate changes from rare codons to codons that are frequently used in chloroplasts using Java.

Creation of Transplastomic Lines The native sequence of FVIII HC was amplified using the pAAV-TTR-hF8 mini plasmid (Sherman et al., 2014) as the PCR template. The codon-optimized HC Plant Physiol. Vol. 172, 2016

Downloaded from www.plantphysiol.org on October 2, 2016 - Published by www.plantphysiol.org Copyright © 2016 American Society of Plant Biologists. All rights reserved.

New Tools to Study Transgene Expression in Chloroplasts

sequence obtained using Codon Optimizer version 2.1 was synthesized by GenScript. The native VP1 gene (906 bp) of Sabin 1 (provided by Dr. Konstantin Chumakov, Food and Drug Administration) was used as the template for PCR amplification. The codon-optimized VP1 sequence also was synthesized by GenScript. Amplified and synthetic gene sequences were cloned into chloroplast transformation vectors pLSLF and pLD-utr for lettuce (Lactuca sativa) and tobacco (Nicotiana tabacum ‘Petite Havana’), respectively. Sequence-confirmed plasmids were used for bombardment to create transplastomic plants as described previously (Verma et al., 2008). Transplastomic lines were confirmed using Southern-blot analysis as described previously (Verma et al., 2008), except for probe labeling and detection, for which the DIG High Prime DNA Labeling and Detection Starter Kit II (Roche; catalog no. 11585624910) was used.

All protein extracts (100 mL) were enzymatically digested with 10 mg of trypsin/Lys-C (Promega) on a centrifugal device with a filter cutoff of 10 kD (Vivacon) in the presence of 0.5% sodium deoxycholate, as described previously (León et al., 2013). After digestion, sodium deoxycholate was removed by acid precipitation with 1% (final concentration) trifluoroacetic acid. SIS peptides (greater than 97% purity, C-terminal Lys and Arg as Lys U-13C6;U-15N2 and Arg U-13C6;U-15N4; JPT Peptide Technologies) were spiked into the samples prior to desalting. Samples were desalted prior to mass spectrometry analysis with OligoR3 stage tips (Applied Biosystems). The initial protein extract (10 mL) was desalted on an OligoR3 stage tip column. Desalted material was then dried on a speed vacuum device and suspended in 6 mL of 0.1% formic acid in water. Mass spectrometry analysis was performed in duplicate by injecting 2 mL of desalted material into the column.

Evaluation of Translation PRM Mass Spectrometry Analysis and Data Analysis

To compare the level of protein expression between native and codonoptimized sequences, immunoblot and densitometric assays were performed using anti-CNTB antibody. For total plant protein, powdered lyophilized plant cells were suspended in extraction buffer (100 mM NaCl, 10 mM EDTA, 200 mM Tris-Cl, pH 8, 0.05% [v/v] Tween 20, 0.1% SDS, 14 mM b-mercaptoethanol, 400 mM Suc, 2 mM phenylmethylsulfonyl fluoride, and proteinase inhibitor cocktail) in a ratio of 10 mg per 500 mL and incubated on ice for 1 h for rehydration. Suspended cells were sonicated (pulse on for 5 s and pulse off for 10 s; sonicator 3000; Misonix) after vortexing (approximately 30 s). After Bradford assay, equal amounts of homogenized protein were loaded and separated on SDS-polyacrylamide gels with known amounts of CNTB protein standard. To detect CNTB fusion proteins, anti-CNTB polyclonal antibody (GenWay Biotech) was diluted 1:10,000 in 13 phosphate-buffered saline + 0.1% Tween 20, and then membranes were probed with goat anti-rabbit IgG-horseradish peroxidase secondary antibody (Southern Biotechnology; 4030-05) diluted 1:4,000 in 13 phosphate-buffered saline + 0.1% Tween 20. For loading controls, protein-blotted membrane was stained with Ponceau S (Sigma; P-3504) prior to immunoprobing with anti-CNTB antibody, and anti-RbcL antibody (Agrisera; AS03 037; 1:5,000) was used on the same blots after stripping anti-CNTB antibody. Chemiluminescent signals were developed on x-ray films, which were used for quantitative analysis with ImageJ software (IJ 1.46r; National Institutes of Health).

Liquid chromatography-coupled targeted mass spectrometry analysis was performed by injecting the column with 2 mL of peptide, corresponding to the amount of total protein extracted, and digested from 33.3 mg of lyophilized leaf powder, with 34 fmol of each SIS peptide spiked in. Peptides were separated using the Easy-nLC 1000 (Thermo Scientific) on a home-made 30-cm 3 75-mm i.d. C18 column (1.9 mm particle size; ReproSil; Dr. Maisch HPLC). Mobile phases consisted of an aqueous solution of 0.1% formic acid (A) and 90% acetonitrile and 0.1% formic acid (B), both HPLC grade (Fluka). Peptides were loaded on the column at 250 nL min21 with an aqueous solution of 4% solvent B. Peptides were eluted by applying a nonlinear gradient for 4%-7%-27%-36%65%-80% B in 2-50-10-10-5 min, respectively. Mass spectrometry analysis was performed using the PRM mode on a Q Exactive mass spectrometer (Thermo Scientific) equipped with a nanospray Flex ion source (Gallien et al., 2012). Isolation of targets from the inclusion list involved a 2-m/z window, a resolution of 35,000 (at m/z 200), a target AGC value of 1 3 106, and a maximum filling time of 120 ms. Normalized collision energy was set at 29. Retention time schedules were determined by the analysis of SIS peptides under equal nano-liquid chromatography. A list of target precursor ions and a retention time schedule are reported in Supplemental Data Set S1. PRM data analysis was performed using Skyline software (MacLean et al., 2010).

Evaluation of Transcripts

Ribosome Profiling

Total RNA was extracted from leaves of plants grown in agar medium in a tissue culture room using the easy-BLUE Total RNA Extraction Kit (iNtRON; catalog no. 17061). For the RNA gel blot, equal amounts of total RNA were separated on a 0.8% agarose gel (containing 1.85% formaldehyde and 13 MOPS) and blotted onto a nylon membrane (Nytran SPC; Whatman). For northern blot, the PCR-amplified product from the psbA 5ʹ UTR of the chloroplast transformation plasmid was used as the probe. Hybridization signals on membranes were detected using a DIG labeling and detection kit as described above.

Second and third leaves from the top of the plant were harvested for ribosome profiling. Lettuce plants were approximately 2 months old. Tobacco plants were 2.5 or 2 months old, for native and codon-optimized VP1 constructs, respectively. Leaves were harvested at noon and flash frozen in liquid nitrogen. Ribosome footprints were prepared as described by Zoschke et al. (2013), except that RNase I was substituted for micrococcal nuclease. Ribosome footprints were converted to a sequencing library with the NEXTflex Illumina Small RNA Sequencing Kit version 2 (BIOO Scientific; 5132-03). rRNA contaminants were depleted by subtractive hybridization after first-strand cDNA synthesis using biotinylated oligonucleotides corresponding to abundant rRNA contaminants observed in pilot experiments. Samples were sequenced at the University of Oregon Genomics Core Facility. Sequence reads were processed with cutadapt to remove adapter sequences and bowtie2 with default parameters to align reads to the engineered chloroplast genome sequence.

Lyophilization Confirmed homoplasmic lines were transferred to a temperature- and lightcontrolled greenhouse. Mature leaves from fully grown transplastomic plants were harvested and stored at 280°C before lyophilization. To freeze dry plant leaf materials, frozen, crumbled small leaf pieces were sublimated under 400mTorr vacuum while increasing the chamber temperature from 240°C to 25°C for 3 d (Genesis 35XL; VirTis SP Scientific). Dehydrated leaves were powdered using a coffee grinder (Hamilton Beach) at maximum speed; tobacco was ground three times for 10 s each, and lettuce was ground three times for 5 s. Powdered leaves were stored in containers under air-tight and moisture-free conditions at room temperature with silica gel.

Protein Extraction and Sample Preparation for Mass Spectrometry Analysis Total protein was extracted from 10 mg of lyophilized leaf powder by adding 1 mL of extraction buffer (2% SDS, 100 mM DTT, and 20 mM TEAB). Lyophilized leaf powder was incubated for 30 min at room temperature with sporadic vortexing to allow rehydration of plant cells. Homogenates were then incubated for 1 h at 70°C, followed by overnight incubation at room temperature under constant rotation. Cell wall/membrane debris was pelleted by centrifugation at 14,000 rpm (approximately 20,800 rcf). The procedure was performed in duplicate.

Accession Numbers Sequence data from this article can be found in the GenBank/EMBL data libraries under accession numbers NM_000132.3 for FVIII HC and AY184219 for VP1.

Supplemental Data The following supplemental materials are available. Supplemental Figure S1. Sequences of native and codon-optimized FVIII HC and VP1 genes. Supplemental Figure S2. Comparison of native and codon-optimized (new and old) sequences. Supplemental Figure S3. Three different codon tables for the expression of heterologous genes in chloroplasts.

Plant Physiol. Vol. 172, 2016

Downloaded from www.plantphysiol.org on October 2, 2016 - Published by www.plantphysiol.org Copyright © 2016 American Society of Plant Biologists. All rights reserved.

75

Kwon et al.

Supplemental Figure S4. Plot of integrated density values for the quantification of CNTB-FVIII HC and CNTB-VP1 based on standard curves. Supplemental Figure S5. Peptide sequences used for targeted mass spectrometry. Supplemental Figure S6. Comparison of CNTB-FVIII HC and VP1 by PRM analysis. Supplemental Figure S7. Evaluation of PRM assay linearity. Supplemental Data Set S1. Codon usage table and mass spectrometry data.

ACKNOWLEDGMENTS We thank Mark Yarmarkovich for help with developing the codon optimization algorithms, Nick Stiffler for help with the bioinformatic analysis of ribosome profiling data, and Non Chotewutmontri for helpful discussions. Received June 20, 2016; accepted July 25, 2016; published July 27, 2016.

LITERATURE CITED Arlen PA, Falconer R, Cherukumilli S, Cole A, Cole AM, Oishi KK, Daniell H (2007) Field production and functional evaluation of chloroplast-derived interferon-alpha2b. Plant Biotechnol J 5: 511–525 Barkan A (1988) Proteins encoded by a complex chloroplast transcription unit are each translated from both monocistronic and polycistronic mRNAs. EMBO J 7: 2637–2644 Birch-Machin I, Newell CA, Hibberd JM, Gray JC (2004) Accumulation of rotavirus VP6 protein in chloroplasts of transplastomic tobacco is limited by protein stability. Plant Biotechnol J 2: 261–270 Boehm CR, Ueda M, Nishimura Y, Shikanai T, Haseloff J (2016) A cyan fluorescent reporter expressed from the chloroplast genome of Marchantia polymorpha. Plant Cell Physiol 57: 291–299 Boyhan D, Daniell H (2011) Low-cost production of proinsulin in tobacco and lettuce chloroplasts for injectable or oral delivery of functional insulin and C-peptide. Plant Biotechnol J 9: 585–598 Chan HT, Daniell H (2015) Plant-made oral vaccines against human infectious diseases: are we there yet? Plant Biotechnol J 13: 1056–1070 Chan HT, Xiao Y, Weldon WC, Oberste SM, Chumakov K, Daniell H (2016) Cold chain and virus free chloroplast-made booster vaccine to confer immunity against different polio virus serotypes. Plant Biotechnol J (in press) doi/10.1111/pbi.12575 Daniell H, Chan HT, Pasoreck EK (2016b) Vaccination through chloroplast genetics: affordable protein drugs for the prevention and treatment of inherited or infectious diseases. Annu Rev Genet 50: (in press) Daniell H, Datta R, Varma S, Gray S, Lee SB (1998) Containment of herbicide resistance through genetic engineering of the chloroplast genome. Nat Biotechnol 16: 345–348 Daniell H, Lin CS, Yu M, Chang WJ (2016a) Chloroplast genomes: diversity, evolution, and applications in genetic engineering. Genome Biol 17: 134 Daniell H, Ruiz G, Denes B, Sandberg L, Langridge W (2009) Optimization of codon composition and regulatory elements for expression of human insulin like growth factor-1 in transgenic chloroplasts and evaluation of structural identity and function. BMC Biotechnol 9: 33 Daniell H, Vivekananda J, Nielsen BL, Ye GN, Tewari KK, Sanford JC (1990) Transient foreign gene expression in chloroplasts of cultured tobacco cells after biolistic delivery of chloroplast vectors. Proc Natl Acad Sci USA 87: 88–92 De Cosa B, Moar W, Lee SB, Miller M, Daniell H (2001) Overexpression of the Bt cry2Aa2 operon in chloroplasts leads to formation of insecticidal crystals. Nat Biotechnol 19: 71–74 DeGray G, Rajasekaran K, Smith F, Sanford J, Daniell H (2001) Expression of an antimicrobial peptide via the chloroplast genome to control phytopathogenic bacteria and fungi. Plant Physiol 127: 852–862 Domon B, Aebersold R (2010) Options and considerations when selecting a quantitative proteomics strategy. Nat Biotechnol 28: 710–721 Eibl C, Zou Z, Beck A, Kim M, Mullet J, Koop HU (1999) In vivo analysis of plastid psbA, rbcL and rpl32 UTR elements by chloroplast transformation: tobacco plastid gene expression is controlled by modulation of transcript levels and translation efficiency. Plant J 19: 333–345 76

Franklin S, Ngo B, Efuet E, Mayfield SP (2002) Development of a GFP reporter gene for Chlamydomonas reinhardtii chloroplast. Plant J 30: 733–744 Gallien S, Duriez E, Crone C, Kellmann M, Moehring T, Domon B (2012) Targeted proteomic quantification on quadrupole-Orbitrap mass spectrometer. Mol Cell Proteomics 11: 1709–1723 Gisby MF, Mellors P, Madesis P, Ellin M, Laverty H, O’Kane S, Ferguson MW, Day A (2011) A synthetic gene increases TGFb3 accumulation by 75-fold in tobacco chloroplasts enabling rapid purification and folding into a biologically active molecule. Plant Biotechnol J 9: 618–628 Hassan SW, Waheed MT, Müller M, Clarke JL, Shinwari ZK, Lössl AG (2014) Expression of HPV-16 L1 capsomeres with glutathione-S-transferase as a fusion protein in tobacco plastids: an approach for a capsomere-based HPV vaccine. Hum Vaccin Immunother 10: 2975–2982 Ingolia NT, Ghaemmaghami S, Newman JR, Weissman JS (2009) Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science 324: 218–223 Inka Borchers AM, Gonzalez-Rabade N, Gray JC (2012) Increased accumulation and stability of rotavirus VP6 protein in tobacco chloroplasts following changes to the 59 untranslated region and the 59 end of the coding region. Plant Biotechnol J 10: 422–434 Jabeen R, Khan MS, Zafar Y, Anjum T (2010) Codon optimization of cry1Ab gene for hyper expression in plant organelles. Mol Biol Rep 37: 1011–1017 Jin S, Daniell H (2015) The engineered chloroplast genome just got smarter. Trends Plant Sci 20: 622–640 Kim JW, You J (2013) Protein target quantification decision tree. Int J Proteomics 2013: 701247 Kohli N, Westerveld DR, Ayache AC, Verma A, Shil P, Prasad T, Zhu P, Chan SL, Li Q, Daniell H (2014) Oral delivery of bioencapsulated proteins across blood-brain and blood-retinal barriers. Mol Ther 22: 535–546 Kwon KC, Nityanandam R, New JS, Daniell H (2013) Oral delivery of bioencapsulated exendin-4 expressed in chloroplasts lowers blood glucose level in mice and stimulates insulin secretion in beta-TC6 cells. Plant Biotechnol J 11: 77–86 Lakshmi PS, Verma D, Yang X, Lloyd B, Daniell H (2013) Low cost tuberculosis vaccine antigens in capsules: expression in chloroplasts, bio-encapsulation, stability and functional evaluation in vitro. PLoS ONE 8: e54708 Lee SB, Li B, Jin S, Daniell H (2011) Expression and characterization of antimicrobial peptides Retrocyclin-101 and Protegrin-1 in chloroplasts to control viral and bacterial infections. Plant Biotechnol J 9: 100–115 Lenzi P, Scotti N, Alagna F, Tornesello ML, Pompa A, Vitale A, De Stradis A, Monti L, Grillo S, Buonaguro FM, et al (2008) Translational fusion of chloroplast-expressed human papillomavirus type 16 L1 capsid protein enhances antigen accumulation in transplastomic tobacco. Transgenic Res 17: 1091–1102 León IR, Schwämmle V, Jensen ON, Sprenger RR (2013) Quantitative assessment of in-solution digestion efficiency identifies optimal protocols for unbiased protein analysis. Mol Cell Proteomics 12: 2992–3005 Lung B, Zemann A, Madej MJ, Schuelke M, Techritz S, Ruf S, Bock R, Hüttenhofer A (2006) Identification of small non-coding RNAs from mitochondria and chloroplasts. Nucleic Acids Res 34: 3842–3852 Lutz KA, Knapp JE, Maliga P (2001) Expression of bar in the plastid genome confers herbicide resistance. Plant Physiol 125: 1585–1590 MacLean B, Tomazela DM, Shulman N, Chambers M, Finney GL, Frewen B, Kern R, Tabb DL, Liebler DC, MacCoss MJ (2010) Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 26: 966–968 Madesis P, Osathanunkul M, Georgopoulou U, Gisby MF, Mudd EA, Nianiou I, Tsitoura P, Mavromara P, Tsaftaris A, Day A (2010) A hepatitis C virus core polypeptide expressed in chloroplasts detects anticore antibodies in infected human sera. J Biotechnol 145: 377–386 McCabe MS, Klaas M, Gonzalez-Rabade N, Poage M, Badillo-Corona JA, Zhou F, Karcher D, Bock R, Gray JC, Dix PJ (2008) Plastid transformation of high-biomass tobacco variety Maryland Mammoth for production of human immunodeficiency virus type 1 (HIV-1) p24 antigen. Plant Biotechnol J 6: 914–929 Miyata T, Oshiro S, Harakuni T, Taira T, Matsuzaki G, Arakawa T (2012) Physicochemically stable cholera toxin B subunit pentamer created by peripheral molecular constraints imposed by de novo-introduced intersubunit disulfide crosslinks. Vaccine 30: 4225–4232 Nakamura M, Hibi Y, Okamoto T, Sugiura M (2016) Cooperation between the chloroplast psbA 59-untranslated region and coding region is important for translational initiation: the chloroplast translation machinery cannot read a human viral gene coding region. Plant J 85: 772–780 Plant Physiol. Vol. 172, 2016

Downloaded from www.plantphysiol.org on October 2, 2016 - Published by www.plantphysiol.org Copyright © 2016 American Society of Plant Biologists. All rights reserved.

New Tools to Study Transgene Expression in Chloroplasts

Nakamura M, Sugiura M (2007) Translation efficiencies of synonymous codons are not always correlated with codon usage in tobacco chloroplasts. Plant J 49: 128–134 Nakamura Y, Gojobori T, Ikemura T (2000) Codon usage tabulated from international DNA sequence databases: status for the year 2000. Nucleic Acids Res 28: 292 Picotti P, Aebersold R (2012) Selected reaction monitoring-based proteomics: workflows, potential, pitfalls and future directions. Nat Methods 9: 555–566 Quesada-Vargas T, Ruiz ON, Daniell H (2005) Characterization of heterologous multigene operons in transgenic chloroplasts: transcription, processing, and translation. Plant Physiol 138: 1746–1762 Ruhlman T, Verma D, Samson N, Daniell H (2010) The role of heterologous chloroplast sequence elements in transgene integration and expression. Plant Physiol 152: 2088–2104 Savas JN, Stein BD, Wu CC, Yates JR III (2011) Mass spectrometry accelerates membrane protein analysis. Trends Biochem Sci 36: 388–396 Shenoy V, Kwon KC, Rathinasabapathy A, Lin S, Jin G, Song C, Shil P, Nair A, Qi Y, Li Q, et al (2014) Oral delivery of Angiotensin-converting enzyme 2 and Angiotensin-(1-7) bioencapsulated in plant cells attenuates pulmonary hypertension. Hypertension 64: 1248–1259 Sherman A, Su J, Lin S, Wang X, Herzog RW, Daniell H (2014) Suppression of inhibitor formation against FVIII in a murine model of hemophilia A by oral delivery of antigens bioencapsulated in plant cells. Blood 124: 1659–1668 Shil PK, Kwon KC, Zhu P, Verma A, Daniell H, Li Q (2014) Oral delivery of ACE2/Ang-(1-7) bioencapsulated in plant cells protects against experimental uveitis and autoimmune uveoretinitis. Mol Ther 22: 2069–2082 Streng AS, de Boer D, Bouwman FG, Mariman EC, Scholten A, van Dieijen-Visser MP, Wodzig WK (2016) Development of a targeted selected ion monitoring assay for the elucidation of protease induced structural changes in cardiac troponin T. J Proteomics 136: 123–132 Surzycki R, Greenham K, Kitayama K, Dibal F, Wagner R, Rochaix JD, Ajam T, Surzycki S (2009) Factors effecting expression of vaccines in microalgae. Biologicals 37: 133–138 Verma D, Moghimi B, LoDuca PA, Singh HD, Hoffman BE, Herzog RW, Daniell H (2010) Oral delivery of bioencapsulated coagulation factor IX prevents inhibitor formation and fatal anaphylaxis in hemophilia B mice. Proc Natl Acad Sci USA 107: 7101–7106

Verma D, Samson NP, Koya V, Daniell H (2008) A protocol for expression of foreign genes in chloroplasts. Nat Protoc 3: 739–758 Waheed MT, Thönes N, Müller M, Hassan SW, Gottschamel J, Lössl E, Kaul HP, Lössl AG (2011a) Plastid expression of a double-pentameric vaccine candidate containing human papillomavirus-16 L1 antigen fused with LTB as adjuvant: transplastomic plants show pleiotropic phenotypes. Plant Biotechnol J 9: 651–660 Waheed MT, Thönes N, Müller M, Hassan SW, Razavi NM, Lössl E, Kaul HP, Lössl AG (2011b) Transplastomic expression of a modified human papillomavirus L1 protein leading to the assembly of capsomeres in tobacco: a step towards cost-effective second-generation vaccines. Transgenic Res 20: 271–282 Wang X, Su J, Sherman A, Rogers GL, Liao G, Hoffman BE, Leong KW, Terhorst C, Daniell H, Herzog RW (2015a) Plant-based oral tolerance to hemophilia therapy employs a complex immune regulatory response including LAP+CD4+ T cells. Blood 125: 2418–2427 Wang YP, Wei ZY, Zhong XF, Lin CJ, Cai YH, Ma J, Zhang YY, Liu YZ, Xing SC (2015b) Stable expression of basic fibroblast growth factor in chloroplasts of tobacco. Int J Mol Sci 17: E19 Ye GN, Hajdukiewicz PTJ, Broyles D, Rodriguez D, Xu CW, Nehra N, Staub JM (2001) Plastid-expressed 5-enolpyruvylshikimate-3-phosphate synthase genes provide high level glyphosate tolerance in tobacco. Plant J 25: 261–270 Yu CH, Dang Y, Zhou Z, Wu C, Zhao F, Sachs MS, Liu Y (2015) Codon usage influences the local rate of translation elongation to regulate co-translational protein folding. Mol Cell 59: 744–754 Yukawa M, Kuroda H, Sugiura M (2007) A new in vitro translation system for non-radioactive assay from tobacco chloroplasts: effect of pre-mRNA processing on translation in vitro. Plant J 49: 367–376 Zoschke R, Barkan A (2015) Genome-wide analysis of thylakoid-bound ribosomes in maize reveals principles of cotranslational targeting to the thylakoid membrane. Proc Natl Acad Sci USA 112: E1678–E1687 Zoschke R, Watkins KP, Barkan A (2013) A rapid ribosome profiling method elucidates chloroplast ribosome behavior in vivo. Plant Cell 25: 2265–2275 Zou Z, Eibl C, Koop HU (2003) The stem-loop region of the tobacco psbA 5’UTR is an important determinant of mRNA stability and translation efficiency. Mol Genet Genomics 269: 340–349

Plant Physiol. Vol. 172, 2016

Downloaded from www.plantphysiol.org on October 2, 2016 - Published by www.plantphysiol.org Copyright © 2016 American Society of Plant Biologists. All rights reserved.

77

Replicate name

fmol SIS on  column

Ratio AUC  SIS/endogenous

IFSYTESLAGK

CNTB

SD1_Rep1

170

3.037

IFSYTESLAGK

CNTB

SD1_Rep2

170

2.740

3.500

IFSYTESLAGK

CNTB

SD2_Rep1

34

0.616

3.000

IFSYTESLAGK

CNTB

SD2_Rep2

34

0.573

IFSYTESLAGK

CNTB

SD3_Rep1

11.3

0.171

IFSYTESLAGK

CNTB

SD3_Rep2

11.3

0.165

IFSYTESLAGK

CNTB

SD4_Rep1

3.4

0.062

IFSYTESLAGK

CNTB

SD4_Rep2

3.4

0.063

IFSYTESLAGK

CNTB

SD5_Rep1

1.13

0.005

IFSYTESLAGK

CNTB

SD5_Rep2

1.13

0.005

IFSYTESLAGK

CNTB

SD6_Rep1

0.45

0.003

0.000

IFSYTESLAGK

CNTB

SD6_Rep2

0.45

0.004

‐0.500

IFSYTESLAGK

CNTB

SD7_Rep1

0.22

0.001

IFSYTESLAGK

CNTB

SD7_Rep2

0.22

0.002

IAYLTEAK

CNTB

SD1_Rep1

170

3.848

IAYLTEAK

CNTB

SD1_Rep2

170

3.700

IAYLTEAK

CNTB

SD2_Rep1

34

0.792

CNTB

SD3_Rep2

11.3

0.267

IAYLTEAK

CNTB

SD4_Rep1

3.4

0.084

IAYLTEAK

CNTB

SD4_Rep2

3.4

0.083

IAYLTEAK

CNTB

SD5_Rep1

1.13

0.025

IAYLTEAK

CNTB

SD5_Rep2

1.13

0.032

IAYLTEAK

CNTB

SD6_Rep1

0.45

0.017

IAYLTEAK

CNTB

SD6_Rep2

0.45

0.012

IAYLTEAK

CNTB

SD7_Rep1

0.22

0.011

IAYLTEAK

CNTB

SD7_Rep2

0.22

0.010

LCVWNNK

CNTB

SD1_Rep1

170

9.506

LCVWNNK

CNTB

SD1_Rep2

170

10.040

LCVWNNK

CNTB

SD2_Rep1

34

1.848

LCVWNNK

CNTB

SD2_Rep2

34

1.936

LCVWNNK

CNTB

SD3_Rep1

11.3

0.712

LCVWNNK

CNTB

SD3_Rep2

11.3

0.681

LCVWNNK

CNTB

SD4_Rep1

3.4

0.226

LCVWNNK

CNTB

SD4_Rep2

3.4

0.231

LCVWNNK

CNTB

SD5_Rep1

1.13

0.064

LCVWNNK

CNTB

SD5_Rep2

1.13

0.058

LCVWNNK

CNTB

SD6_Rep1

0.45

0.022

0.500

FVIII HC

SD1_Rep1

170

18.116

20.000

FDDDNSPSFIQIR

FVIII HC

SD1_Rep2

170

18.657

18.000

FDDDNSPSFIQIR

FVIII HC

SD2_Rep1

34

4.158

FDDDNSPSFIQIR

FVIII HC

SD2_Rep2

34

4.083

FDDDNSPSFIQIR

FVIII HC

SD3_Rep1

11.3

0.978

FDDDNSPSFIQIR

FVIII HC

SD3_Rep2

11.3

1.028

FDDDNSPSFIQIR

FVIII HC

SD4_Rep1

3.4

0.327

FDDDNSPSFIQIR

FVIII HC

SD4_Rep2

3.4

0.329

FDDDNSPSFIQIR

FVIII HC

SD5_Rep1

1.13

0.046

FDDDNSPSFIQIR

FVIII HC

SD5_Rep2

1.13

0.048

FDDDNSPSFIQIR

FVIII HC

SD6_Rep1

0.45

0.064

FDDDNSPSFIQIR

FVIII HC

SD6_Rep2

0.45

0.055

FDDDNSPSFIQIR

FVIII HC

SD7_Rep1

0.22

0.069

FDDDNSPSFIQIR

FVIII HC

SD7_Rep2

0.22

0.060

WTVTVEDGPTK

FVIII HC

SD1_Rep1

170

16.556

18.000

WTVTVEDGPTK

FVIII HC

SD1_Rep2

170

15.480

16.000

WTVTVEDGPTK

FVIII HC

SD2_Rep1

34

3.334

WTVTVEDGPTK

FVIII HC

SD2_Rep2

34

3.315

WTVTVEDGPTK

FVIII HC

SD4_Rep2

3.4

0.300

WTVTVEDGPTK

FVIII HC

SD5_Rep1

1.13

0.062

WTVTVEDGPTK

FVIII HC

SD5_Rep2

1.13

0.059

WTVTVEDGPTK

FVIII HC

SD6_Rep1

0.45

0.021

WTVTVEDGPTK

FVIII HC

SD6_Rep2

0.45

0.020

WTVTVEDGPTK

FVIII HC

SD7_Rep1

0.22

0.008

WTVTVEDGPTK

FVIII HC

SD7_Rep2

0.22

0.008

YYSSFVNMER

FVIII HC

SD1_Rep1

170

33.784

YYSSFVNMER

FVIII HC

SD1_Rep2

170

33.003

YYSSFVNMER

FVIII HC

SD2_Rep1

34

8.319

YYSSFVNMER

FVIII HC

SD2_Rep2

34

8.032

YYSSFVNMER

FVIII HC

SD3_Rep1

11.3

1.806

YYSSFVNMER

FVIII HC

SD3_Rep2

11.3

1.641

YYSSFVNMER

FVIII HC

SD4_Rep1

3.4

0.613

YYSSFVNMER

FVIII HC

SD4_Rep2

3.4

0.623

YYSSFVNMER

FVIII HC

SD5_Rep1

1.13

0.056

0.300 0.200

50

100

150

1.000 0.500

Ratio AUC (SIS/endogenous)

6.000 4.000 2.000 0

50

100

150

200

R² = 0.9913

3.000 2.500 2.000 1.500 1.000 0.500

‐0.500

0

3.500

12.000 10.000 8.000 6.000 4.000 2.000 0

50

100 150 SIS (fmol on column)

200

25.000 20.000 15.000 10.000

0.000

YYSSFVNMER

FVIII HC

SD6_Rep2

0.45

0.037

YYSSFVNMER

FVIII HC

SD7_Rep1

0.22

0.036

YYSSFVNMER

FVIII HC

SD7_Rep2

0.22

0.024

2.000 1.500 1.000 0.500 0

10

20

30

40

SIS (fmol on column)

8.000

R² = 0.9971

30.000

0.013

2.500

9.000

35.000

0.070

R² = 0.9967

3.000

‐0.500

40.000

0.45

40

0.000

0.000

1.13

10 20 30 SIS (fmol on column)

4.000 R² = 0.9984

14.000

SD6_Rep1

40

3.500

SIS (fmol on column)

SD5_Rep2

30

0.000

0.000

FVIII HC

20

4.000

8.000

FVIII HC

10

4.500

10.000

YYSSFVNMER

40

R² = 0.9975

0

R² = 0.9988

12.000

YYSSFVNMER

30

SIS (fmol on column)

14.000

5.000

20

1.500

200

16.000

‐2.000

10

2.000

SIS (fmol on column)

Ratio AUC (SIS/endogenous)

0.927

0.400

0.000 0

Ratio AUC (SIS/endogenous)

0.880 0.286

0.500

0

0.000

FDDDNSPSFIQIR

3.4

R² = 0.9999

0.600

2.500 R² = 0.9989

2.000

0.005

40

SIS (fmol on column)

4.000

0.22

30

0.700

200

6.000

SD7_Rep2

11.3

150

8.000

CNTB

11.3

100

10.000

LCVWNNK

SD3_Rep2

50

12.000

0.006

20

SIS (fmol on column)

SIS (fmol on column)

0.022

10

0.000 0

0.22

SD3_Rep1

0

0.100

0.000

0.45

SD4_Rep1

0.100

0.800

1.000

SD7_Rep1

FVIII HC

0.200

0.900 R² = 0.9994

1.500

SD6_Rep2

FVIII HC

0.300

‐0.100

2.000

CNTB

FVIII HC

0.400

200

2.500

CNTB

WTVTVEDGPTK

150

3.000

LCVWNNK

WTVTVEDGPTK

100

SIS (fmol on column)

3.500

LCVWNNK

WTVTVEDGPTK

50

Ratio AUC (SIS/endogenous)

IAYLTEAK

R² = 0.9955

0.500

0.000 0

Ratio AUC (SIS/endogenous)

0.265

0.500

Ratio AUC (SIS/endogenous)

0.803

11.3

1.000

Ratio AUC (SIS/endogenous)

34

SD3_Rep1

1.500

Ratio AUC (SIS/endogenous)

SD2_Rep2

CNTB

2.000

4.000 Ratio AUC (SIS/endogenous)

CNTB

IAYLTEAK

0.600

2.500

4.500

Ratio AUC (SIS/endogenous)

IAYLTEAK

0.700 R² = 0.9965

Ratio AUC (SIS/endogenous)

Part of the  construct wich the  peptide belongs to

Ratio AUC (SIS/endogenous)

Peptide Sequence

R² = 0.9859

7.000 6.000 5.000 4.000 3.000 2.000 1.000 0.000

0

50

100

150

SIS (fmol on column)

200

‐1.000

0

10 20 30 SIS (fmol on column)

40

Supplementary figure S7. Evaluation of PRM assay linearity. Stable isotope‐labelled standard (SIS) peptides were spiked  in a dynamic range (0.22 to 170 fmol per injection on  column) in a constant amount of plant digest (1:1:1:1 mixture of all 4 types of plant materials, CNTB‐FVIII HC (N): CNTB‐FVIII HC (CN): CNTB‐VP1 (N): CNTB‐VP1 (C N)). MS  measurements were performed in duplicate. R‐squared values are shown in each graph over the full dynamic range from 0.22 to 170 fmol (left panel) and from 0.22 to 34 fmol  (right panel). N, native sequence; CN, codon‐optimized sequence obtained from new optimizer algorithm.

A CNTB (N)

FVIII HC (N)

16S trnI

Prrn aadA

PpsbA CNTB FVIII HC

TpsbA

trnA

ATGACACCTCAAAATATTACTGATTTGTGTGCAGAATACCACAACACACAAATACATACGCTAAATGATAAGATATTTTCGTATACAGAATCTCTAGCTGGAAAAAGAGAGATGGCTATCATTACTTTTAAGAATGGTGCAACTTTTCAAGTAGAAGTACCA GGTAGTCAACATATAGATTCACAAAAAAAAGCAATTGAAAGGATGAAGGATACCCTGAGGATTGCATATCTTACTGAAGCTAAAGTCGAAAAGTTATGTGTATGGAATAATAAAACGCCTCATGCGATTGCCGCAATTAGTATGGCAAATgggcccgggcccc ggcgtaaacgttctgtt  GCCACCAGAAGATACTACCTGGGTGCAGTGGAACTGTCATGGGACTATATGCAAAGTGATCTCGGTGAGCTGCCTGTGGACGCAAGATTTCCTCCTAGAGTGCCAAAATCTTTTCCATTCAACACCTCAGTCGTGTACAAAAAGACTCTGTTTGTAGAATT CACGGATCACCTTTTCAACATCGCTAAGCCAAGGCCACCCTGGATGGGTCTGCTAGGTCCTACCATCCAGGCTGAGGTTTATGATACAGTGGTCATTACACTTAAGAACATGGCTTCCCATCCTGTCAGTCTTCATGCTGTTGGTGTATCCTACTGGAAAGC TTCTGAGGGAGCTGAATATGATGATCAGACCAGTCAAAGGGAGAAAGAAGATGATAAAGTCTTCCCTGGTGGAAGCCATACATATGTCTGGCAGGTCCTGAAAGAGAATGGTCCAATGGCCTCTGACCCACTGTGCCTTACCTACTCATATCTTTCTCATG TGGACCTGGTAAAAGACTTGAATTCAGGCCTCATTGGAGCCCTACTAGTATGTAGAGAAGGGAGTCTGGCCAAGGAAAAGACACAGACCTTGCACAAATTTATACTACTTTTTGCTGTATTTGATGAAGGGAAAAGTTGGCACTCAGAAACAAAGAACTC CTTGATGCAGGATAGGGATGCTGCATCTGCTCGGGCCTGGCCTAAAATGCACACAGTCAATGGTTATGTAAACAGGTCTCTGCCAGGTCTGATTGGATGCCACAGGAAATCAGTCTATTGGCATGTGATTGGAATGGGCACCACTCCTGAAGTGCACTCA ATATTCCTCGAAGGTCACACATTTCTTGTGAGGAACCATCGCCAGGCGTCCTTGGAAATCTCGCCAATAACTTTCCTTACTGCTCAAACACTCTTGATGGACCTTGGACAGTTTCTACTGTTTTGTCATATCTCTTCCCACCAACATGATGGCATGGAAGCTTA TGTCAAAGTAGACAGCTGTCCAGAGGAACCCCAACTACGAATGAAAAATAATGAAGAAGCGGAAGACTATGATGATGATCTTACTGATTCTGAAATGGATGTGGTCAGGTTTGATGATGACAACTCTCCTTCCTTTATCCAAATTCGCTCAGTTGCCAAGA AGCATCCTAAAACTTGGGTACATTACATTGCTGCTGAAGAGGAGGACTGGGACTATGCTCCCTTAGTCCTCGCCCCCGATGACAGAAGTTATAAAAGTCAATATTTGAACAATGGCCCTCAGCGGATTGGTAGGAAGTACAAAAAAGTCCGATTTATGGC ATACACAGATGAAACCTTTAAGACTCGTGAAGCTATTCAGCATGAATCAGGAATCTTGGGACCTTTACTTTATGGGGAAGTTGGAGACACACTGTTGATTATATTTAAGAATCAAGCAAGCAGACCATATAACATCTACCCTCACGGAATCACTGATGTCC GTCCTTTGTATTCAAGGAGATTACCAAAAGGTGTAAAACATTTGAAGGATTTTCCAATTCTGCCAGGAGAAATATTCAAATATAAATGGACAGTGACTGTAGAAGATGGGCCAACTAAATCAGATCCTCGGTGCCTGACCCGCTATTACTCTAGTTTCGTTA ATATGGAGAGAGATCTAGCTTCAGGACTCATTGGCCCTCTCCTCATCTGCTACAAAGAATCTGTAGATCAAAGAGGAAACCAGATAATGTCAGACAAGAGGAATGTCATCCTGTTTTCTGTATTTGATGAGAACCGAAGCTGGTACCTCACAGAGAATATA CAACGCTTTCTCCCCAATCCAGCTGGAGTGCAGCTTGAGGATCCAGAGTTCCAAGCCTCCAACATCATGCACAGCATCAATGGCTATGTTTTTGATAGTTTGCAGTTGTCAGTTTGTTTGCATGAGGTGGCATACTGGTACATTCTAAGCATTGGAGCACAG ACTGACTTCCTTTCTGTCTTCTTCTCTGGATATACCTTCAAACACAAAATGGTCTATGAAGACACACTCACCCTATTCCCATTCTCAGGAGAAACTGTCTTCATGTCGATGGAAAACCCAGGTCTATGGATTCTGGGGTGCCACAACTCAGACTTTCGGAACA GAGGCATGACCGCCTTACTGAAGGTTTCTAGTTGTGACAAGAACACTGGTGATTATTACGAGGACAGTTATGAAGATATTTCAGCATACTTGCTGAGTAAAAACAATGCCATTGAACCAAGAAGCTTCTCCCAGAATCCACCAGTCTTGAAACGCCATCAA CGCtaa

GCAACTCGTCGTTACTATTTAGGAGCCGTTGAACTAAGTTGGGATTATATGCAATCTGATCTAGGTGAATTACCAGTAGACGCTCGTTTCCCTCCTCGTGTTCCTAAATCTTTTCCTTTTAACACATCCGTTGTTTACAAAAAGACTCTATTTGTTGAGTTCAC TGATCACCTATTCAACATTGCTAAACCACGTCCTCCATGGATGGGCCTACTTGGCCCTACTATTCAAGCTGAAGTATATGATACTGTTGTAATTACCCTAAAGAACATGGCTTCCCACCCTGTTTCTTTACATGCAGTTGGTGTTTCTTACTGGAAAGCTAGT GAGGGTGCTGAATACGATGATCAGACTTCCCAACGAGAAAAAGAAGATGATAAAGTTTTCCCTGGTGGCTCTCACACCTACGTTTGGCAAGTTTTAAAAGAAAACGGACCTATGGCCTCCGATCCATTATGTCTAACTTACAGTTATCTATCTCATGTTGA TTTGGTTAAAGATTTGAATAGTGGTCTAATTGGTGCTCTATTAGTATGTCGTGAAGGTTCTCTTGCAAAAGAAAAAACACAAACTCTTCACAAATTCATCCTTTTATTTGCTGTATTTGATGAAGGAAAAAGCTGGCACAGTGAAACTAAAAATTCTTTGAT GCAAGATCGTGATGCTGCAAGCGCTCGCGCTTGGCCAAAAATGCACACTGTAAATGGTTACGTAAATAGATCTTTGCCTGGTCTTATTGGCTGTCACCGTAAAAGCGTATATTGGCATGTAATTGGTATGGGTACCACTCCTGAGGTACACTCCATCTTCT Codon‐ TAGAAGGACATACTTTCTTAGTACGCAATCACAGACAGGCTTCTCTTGAAATTTCTCCAATCACTTTTCTTACAGCTCAGACCTTGTTAATGGACTTAGGACAGTTCTTACTATTTTGTCACATCAGCTCTCATCAACATGACGGTATGGAAGCATACGTAAA optimized  GGTTGATAGCTGCCCAGAGGAACCTCAATTGCGTATGAAAAACAACGAAGAAGCTGAAGATTATGACGATGATCTAACTGATTCTGAGATGGATGTTGTTCGTTTCGATGATGACAATTCTCCAAGCTTCATACAAATTAGAAGCGTAGCAAAGAAACA FVIII HC TCCAAAAACTTGGGTACACTACATTGCTGCAGAAGAAGAGGATTGGGATTATGCCCCTTTGGTTCTTGCTCCAGACGATCGTAGTTATAAATCTCAATATTTGAACAACGGTCCTCAACGCATCGGTCGAAAATACAAAAAAGTTAGATTTATGGCTTACA CCGATGAAACTTTCAAGACCCGTGAAGCTATTCAGCATGAATCTGGAATTCTTGGTCCTCTATTATATGGTGAAGTTGGTGATACTCTTCTAATTATTTTCAAGAACCAAGCTAGCCGTCCTTACAACATTTATCCTCATGGCATCACTGATGTACGCCCTTT (CN) GTATTCTCGACGTTTACCTAAAGGAGTAAAACACTTAAAGGATTTCCCTATCCTTCCAGGTGAAATTTTCAAATATAAATGGACCGTAACCGTAGAGGATGGTCCAACCAAATCTGACCCTCGCTGTCTAACTCGTTACTACTCTAGCTTCGTAAATATGGA ACGTGATCTTGCTAGTGGTTTGATCGGTCCATTACTAATCTGTTACAAAGAGTCCGTTGACCAAAGAGGCAACCAAATTATGAGTGATAAACGTAATGTTATACTATTCAGTGTTTTCGATGAAAATCGTTCTTGGTATCTAACTGAAAATATTCAACGATT TTTACCTAACCCTGCTGGTGTTCAACTAGAGGATCCTGAATTCCAAGCCAGTAATATCATGCATAGCATTAATGGATATGTATTCGATAGTTTACAATTATCCGTTTGTTTGCATGAAGTTGCTTACTGGTATATTCTATCTATCGGTGCTCAAACTGACTTC CTATCTGTATTCTTCTCTGGTTATACCTTCAAACACAAAATGGTATACGAGGATACCTTGACCCTTTTTCCTTTCAGTGGTGAAACAGTTTTCATGAGTATGGAAAACCCAGGCCTTTGGATCCTAGGTTGTCACAATTCTGATTTCCGTAATCGCGGTATGA CTGCTTTGCTAAAAGTATCTTCTTGCGATAAAAACACTGGTGATTACTATGAGGATAGTTATGAAGATATATCTGCTTATTTGCTATCCAAAAACAATGCTATTGAGCCTCGTTCTTTCTCTCAAAATCCACCTGTTTTAAAACGTCACCAACGCTAA GCTACTAGAAGATATTATTTAGGTGCTGTTGAATTATCATGGGATTATATGCAAAGTGATTTAGGTGAATTACCTGTTGATGCTAGATTTCCTCCTAGAGTTCCAAAATCTTTTCCATTTAATACTTCAGTTGTTTATAAAAAAACTTTATTTGTAGAATTTA CTGATCATTTATTTAATATTGCTAAACCAAGACCACCTTGGATGGGTTTATTAGGTCCTACTATTCAAGCTGAAGTTTATGATACAGTTGTTATTACATTAAAAAATATGGCTTCTCATCCTGTTAGTTTACATGCTGTTGGTGTATCTTATTGGAAAGCTTC TGAAGGAGCTGAATATGATGATCAAACTAGTCAAAGAGAAAAAGAAGATGATAAAGTTTTTCCTGGTGGATCTCATACTTATGTTTGGCAAGTTTTAAAAGAAAATGGTCCAATGGCTTCTGATCCATTATGTTTAACTTATTCATATTTATCTCATGTTG ATTTAGTAAAAGATTTAAATTCAGGTTTAATTGGAGCTTTATTAGTATGTAGAGAAGGTAGTTTAGCTAAAGAAAAAACACAAACTTTACATAAATTTATATTATTATTTGCTGTATTTGATGAAGGTAAAAGTTGGCATTCAGAAACAAAAAATTCTTTA ATGCAAGATAGAGATGCTGCTTCTGCTAGAGCTTGGCCTAAAATGCATACAGTTAATGGTTATGTAAATAGATCTCTTCCAGGTTTAATTGGATGTCATAGAAAATCAGTTTATTGGCATGTTATTGGAATGGGTACTACTCCTGAAGTTCATTCAATATT TTTAGAAGGTCATACATTTTTAGTTAGAAATCATAGACAAGCTTCTTTAGAAATTTCTCCAATAACTTTTTTAACTGCTCAAACATTATTAATGGATTTAGGACAATTTTTATTATTTTGTCATATTTCTTCTCATCAACATGATGGTATGGAAGCTTATGTTA Codon‐ AAGTAGATTCTTGTCCAGAAGAACCTCAATTACGAATGAAAAATAATGAAGAAGCTGAAGATTATGATGATGATTTAACTGATTCTGAAATGGATGTTGTTAGATTTGATGATGATAATTCTCCTTCTTTTATTCAAATTAGATCAGTTGCTAAAAAACAT optimized  CCTAAAACTTGGGTACATTATATTGCTGCTGAAGAAGAAGATTGGGATTATGCTCCTTTAGTTTTAGCTCCTGATGATAGAAGTTATAAAAGTCAATATTTAAATAATGGTCCTCAAAGAATTGGTAGAAAATATAAAAAAGTTCGATTTATGGCTTATA FVIII HC CAGATGAAACTTTTAAAACTCGTGAAGCTATTCAACATGAATCAGGAATTTTAGGACCTTTATTATATGGTGAAGTTGGAGATACATTATTAATTATATTTAAAAATCAAGCTAGCAGACCATATAATATTTATCCTCATGGAATTACTGATGTTCGTCCTT TATATTCAAGAAGATTACCAAAAGGTGTAAAACATTTAAAAGATTTTCCAATTTTACCAGGAGAAATATTTAAATATAAATGGACAGTTACTGTAGAAGATGGTCCAACTAAATCAGATCCTAGATGTTTAACTAGATATTATTCTAGTTTTGTTAATATG (CO) GAAAGAGATTTAGCTTCAGGATTAATTGGTCCTTTATTAATTTGTTATAAAGAATCTGTAGATCAAAGAGGAAATCAAATAATGTCAGATAAAAGAAATGTTATTTTATTTTCTGTATTTGATGAAAATCGATCTTGGTATTTAACAGAAAATATACAAAG ATTTTTACCTAATCCAGCTGGAGTTCAATTAGAAGATCCAGAATTTCAAGCTTCTAATATTATGCATTCTATTAATGGTTATGTTTTTGATAGTTTACAATTATCAGTTTGTTTACATGAAGTTGCTTATTGGTATATTTTATCTATTGGAGCTCAAACTGATT TTTTATCTGTTTTTTTTTCTGGATATACTTTTAAACATAAAATGGTTTATGAAGATACATTAACTTTATTTCCATTTTCAGGAGAAACTGTTTTTATGTCTATGGAAAATCCAGGTTTATGGATTTTAGGTTGTCATAATTCAGATTTTAGAAATAGAGGTATG ACTGCTTTATTAAAAGTTTCTAGTTGTGATAAAAATACTGGTGATTATTATGAAGATAGTTATGAAGATATTTCAGCTTATTTATTAAGTAAAAATAATGCTATTGAACCAAGATCTTTTTCTCAAAATCCACCAGTTTTAAAAAGACATCAAAGATAA

CNTB: Native sequence of cholera non‐toxic B subunit (312 bp) FVIII HC: Human blood clotting factor 8 heavy chain (2265 bp) N: Native sequence  CN : Codon‐optimized sequence (new algorithm of optimizer) CO : Codon‐optimized sequence (old v algorithm of optimizer)

gggcccgggccccggcgtaaacgttctgtt (GPGPRRKRSV):GPGP‐furin cleavage site ‐SV  Two amino aicd (SV) were added behind furin cleavage site (RRKR) to facilitate the cleavage 

B 16S trnI

CNTB (N)

VP1  (N) 

VP1  (CN) 

VP1  (CO)

Prrn aadA

PpsbA CNTB VP1

TpsbA

trnA

ATGACACCTCAAAATATTACTGATTTGTGTGCAGAATACCACAACACACAAATACATACGCTAAATGATAAGATATTTTCGTATACAGAATCTCTAGCTGGAAAAAGAGAGATGGCTATCATTACTTTTAAGAAT GGTGCAACTTTTCAAGTAGAAGTACCAGGTAGTCAACATATAGATTCACAAAAAAAAGCAATTGAAAGGATGAAGGATACCCTGAGGATTGCATATCTTACTGAAGCTAAAGTCGAAAAGTTATGTGTATGG AATAATAAAACGCCTCATGCGATTGCCGCAATTAGTATGGCAAATgggcccgggccccggcgtaaacgttctgtt  GGGTTAGGTCAGATGCTTGAAAGCATGATTGACAACACAGTCCGTGAAACGGTGGGGGCGGCAACGTCTAGAGACGCTCTCCCAAACACTGAAGCCAGTGGACCAGCACACTCCAAGGAAATTCCGGCACTCACCG CAGTGGAAACTGGGGCCACAAATCCACTAGTCCCTTCTGATACAGTGCAAACCAGACATGTTGTACAACATAGGTCAAGGTCAGAGTCTAGCATAGAGTCTTTCTTCGCGCGGGGTGCATGCGTGGCCATTATAACCG TGGATAACTCAGCTTCCACCAAGAATAAGGATAAGCTATTTACAGTGTGGAAGATCACTTATAAAGATACTGTCCAGTTACGGAGGAAATTGGAGTTCTTCACCTATTCTAGATTTGATATGGAATTTACCTTTGTGGT TACTGCAAATTTCACTGAGACTAACAATGGGCATGCCTTAAATCAAGTGTACCAAATTATGTACGTACCACCAGGCGCTCCAGTGCCCGAGAAATGGGACGACTACACATGGCAAACCTCATCAAATCCATCAATCTTT TACACCTACGGAACAGCTCCAGCCCGGATCTCGGTACCGTATGTTGGTATTTCGAACGCCTATTCACACTTTTACGACGGTTTTTCCAAAGTACCACTGAAGGACCAGTCGGCAGCACTAGGTGACTCCCTCTATGGTG CAGCATCTCTAAATGACTTCGGTATTTTGGCTGTTAGAGTAGTCAATGATCACAACCCGACCAAGGTCACCTCCAAAATCAGAGTGTATCTAAAACCCAAACACATCAGAGTCTGGTGCCCGCGTCCACCGAGGGCAG TGGCGTACTACGGCCCTGGAGTGGATTACAAGGATGGTACGCTTACACCCCTCTCCACCAAGGATCTGACCACATATTGA GGTTTAGGACAAATGTTGGAATCTATGATTGATAACACAGTACGTGAAACTGTTGGTGCTGCAACTTCTCGTGATGCTCTACCTAATACTGAAGCTAGTGGTCCTGCTCATAGCAAAGAAATTCCAGCTCTTACCGCTG TTGAGACCGGTGCTACTAACCCTCTAGTTCCTTCTGATACTGTACAAACACGTCATGTAGTTCAACATAGAAGTCGTAGCGAATCTAGTATCGAGTCCTTCTTTGCTCGCGGTGCTTGTGTTGCAATCATTACCGTAGAT AACTCTGCTTCCACTAAAAATAAAGATAAGCTATTCACTGTATGGAAGATTACCTACAAAGATACTGTTCAATTACGTCGAAAATTAGAGTTCTTTACTTACTCCCGCTTTGATATGGAATTCACCTTCGTAGTTACTGCT AATTTCACCGAAACTAACAATGGTCACGCTTTGAATCAGGTATATCAAATCATGTACGTACCACCTGGAGCTCCTGTACCAGAAAAATGGGATGACTATACTTGGCAGACTTCCTCTAACCCTTCTATTTTTTATACATA CGGTACCGCACCTGCTCGTATTAGCGTTCCATACGTAGGTATTAGTAACGCTTACTCTCACTTCTATGATGGTTTCTCTAAAGTACCATTAAAAGATCAAAGTGCTGCACTAGGTGACTCTCTATATGGTGCTGCATCTCT AAATGATTTCGGTATTTTAGCTGTACGTGTTGTAAACGATCACAATCCAACCAAAGTAACCTCTAAAATCCGCGTTTATCTTAAACCTAAGCATATTAGAGTATGGTGTCCTCGCCCACCTCGAGCTGTTGCTTATTACG GTCCTGGAGTAGATTACAAAGATGGCACACTAACTCCATTAAGCACAAAGGACTTGACCACTTATTAA GGTTTAGGTCAAATGTTAGAATCTATGATTGATAATACTGTTCGTGAAACTGTTGGTGCTGCTACTTCTAGGGATGCTTTACCAAATACTGAAGCTAGTGGTCCTGCTCATTCTAAAGAAATTCCTGCTTTAACTGCTGT TGAAACTGGTGCTACAAATCCATTAGTTCCTTCTGATACAGTTCAAACTAGACATGTTGTACAACATAGATCAAGATCAGAATCTTCTATAGAATCTTTTTTTGCTAGAGGTGCTTGTGTTGCTATTATAACTGTTGATAA TTCAGCTTCTACTAAAAATAAAGATAAATTATTTACAGTTTGGAAAATTACTTATAAAGATACTGTTCAATTAAGAAGAAAATTAGAATTTTTTACTTATTCTAGGTTTGATATGGAATTTACTTTTGTTGTTACTGCTAAT TTTACTGAAACTAATAATGGTCATGCTTTAAATCAAGTTTATCAAATTATGTATGTACCACCAGGTGCTCCAGTTCCTGAAAAATGGGATGATTATACATGGCAAACTTCATCAAATCCATCAATTTTTTATACTTATGGA ACAGCTCCAGCTAGAATTTCTGTACCTTATGTTGGTATTTCTAATGCTTATTCACATTTTTATGATGGTTTTTCTAAAGTACCATTAAAAGATCAATCTGCTGCATTAGGTGATTCTTTATATGGTGCTGCATCTTTAAATG ATTTTGGTATTTTAGCTGTTAGAGTAGTTAATGATCATAATCCTACTAAAGTTACTTCTAAAATTAGAGTTTATCTAAAACCTAAACATATTAGAGTTTGGTGTCCTCGTCCACCTAGAGCAGTTGCTTATTATGGTCCTG GAGTTGATTATAAAGATGGTACTTTAACACCTTTATCTACTAAAGATTTAACTACATATTAA

CNTB: Native sequence of cholera non‐toxic B subunit (312 bp) VP1: polio viral capsid protein 1 (909 bp) N: Native sequence  CN : Codon‐optimized sequence (new algorithm of optimizer) CO : Codon‐optimized sequence (old v algorithm of optimizer)

gggcccgggccccggcgtaaacgttctgtt (GPGPRRKRSV):GPGP‐furin cleavage site ‐SV  Two amino aicd (SV) were added behind furin cleavage site (RRKR) to facilitate the cleavage 

Supplemental Figure S1. Sequences of native and codon-optimized FVIII HC or VP1 genes. A, Nucleotide sequences of native (N) and codon-optimized (CN and CO) FVIII HC genes. B, Nucleotide sequences of native (N) and codon-optimized (CN and CO) VP1 genes. CNTB, Native sequence of cholera non-toxic B subunit; N, native sequence; CN, codon-optimized sequence obtained from new optimizer algorithm; CO, codon-optimized sequence obtained from old optimizer algorithm.

A

HC N seq HC CN seq HC CO seq

GCC ACC AGA AGA TAC TAC CTG GGT GCA GTG GAA CTG TCA TGG GAC TAT ATG CAA AGT GAT CTC GGT GAG CTG CCT GTG GAC GCA ACT CGT CGT TAC TAT TTA GGA GCC GTT GAA CTA AGT TGG GAT TAT ATG CAA TCT GAT CTA GGT GAA TTA CCA GTA GAC GCT ACT AGA AGA TAT TAT TTA GGT GCT GTT GAA TTA TCA TGG GAT TAT ATG CAA AGT GAT TTA GGT GAA TTA CCT GTT GAT A T R R Y Y L G A V E L S W D Y M Q S D L G E L P V D

81 81 81

HC N seq HC CN seq HC CO seq

GCA AGA TTT CCT CCT AGA GTG CCA AAA TCT TTT CCA TTC AAC ACC TCA GTC GTG TAC AAA AAG ACT CTG TTT GTA GAA TTC GCT CGT TTC CCT CCT CGT GTT CCT AAA TCT TTT CCT TTT AAC ACA TCC GTT GTT TAC AAA AAG ACT CTA TTT GTT GAG TTC GCT AGA TTT CCT CCT AGA GTT CCA AAA TCT TTT CCA TTT AAT ACT TCA GTT GTT TAT AAA AAA ACT TTA TTT GTA GAA TTT A R F P P R V P K S F P F N T S V V Y K K T L F V E F

162 162 162

HC N seq HC CN seq HC CO seq

ACG GAT CAC CTT TTC AAC ATC GCT AAG CCA AGG CCA CCC TGG ATG GGT CTG CTA GGT CCT ACC ATC CAG GCT GAG GTT TAT ACT GAT CAC CTA TTC AAC ATT GCT AAA CCA CGT CCT CCA TGG ATG GGC CTA CTT GGC CCT ACT ATT CAA GCT GAA GTA TAT ACT GAT CAT TTA TTT AAT ATT GCT AAA CCA AGA CCA CCT TGG ATG GGT TTA TTA GGT CCT ACT ATT CAA GCT GAA GTT TAT T D H L F N I A K P R P P W M G L L G P T I Q A E V Y

243 243 243

HC N seq HC CN seq HC CO seq

GAT ACA GTG GTC ATT ACA CTT AAG AAC ATG GCT TCC CAT CCT GTC AGT CTT CAT GCT GTT GGT GTA TCC TAC TGG AAA GCT GAT ACT GTT GTA ATT ACC CTA AAG AAC ATG GCT TCC CAC CCT GTT TCT TTA CAT GCA GTT GGT GTT TCT TAC TGG AAA GCT GAT ACA GTT GTT ATT ACA TTA AAA AAT ATG GCT TCT CAT CCT GTT AGT TTA CAT GCT GTT GGT GTA TCT TAT TGG AAA GCT D T V V I T L K N M A S H P V S L H A V G V S Y W K A

324 324 324

HC N seq HC CN seq HC CO seq

TCT GAG GGA GCT GAA TAT GAT GAT CAG ACC AGT CAA AGG GAG AAA GAA GAT GAT AAA GTC TTC CCT GGT GGA AGC CAT ACA AGT GAG GGT GCT GAA TAC GAT GAT CAG ACT TCC CAA CGA GAA AAA GAA GAT GAT AAA GTT TTC CCT GGT GGC TCT CAC ACC TCT GAA GGA GCT GAA TAT GAT GAT CAA ACT AGT CAA AGA GAA AAA GAA GAT GAT AAA GTT TTT CCT GGT GGA TCT CAT ACT S E G A E Y D D Q T S Q R E K E D D K V F P G G S H T

405 405 405

HC N seq HC CN seq HC CO seq

TAT GTC TGG CAG GTC CTG AAA GAG AAT GGT CCA ATG GCC TCT GAC CCA CTG TGC CTT ACC TAC TCA TAT CTT TCT CAT GTG TAC GTT TGG CAA GTT TTA AAA GAA AAC GGA CCT ATG GCC TCC GAT CCA TTA TGT CTA ACT TAC AGT TAT CTA TCT CAT GTT TAT GTT TGG CAA GTT TTA AAA GAA AAT GGT CCA ATG GCT TCT GAT CCA TTA TGT TTA ACT TAT TCA TAT TTA TCT CAT GTT Y V W Q V L K E N G P M A S D P L C L T Y S Y L S H V

486 486 486

HC N seq HC CN seq HC CO seq

GAC CTG GTA AAA GAC TTG AAT TCA GGC CTC ATT GGA GCC CTA CTA GTA TGT AGA GAA GGG AGT CTG GCC AAG GAA AAG ACA GAT TTG GTT AAA GAT TTG AAT AGT GGT CTA ATT GGT GCT CTA TTA GTA TGT CGT GAA GGT TCT CTT GCA AAA GAA AAA ACA GAT TTA GTA AAA GAT TTA AAT TCA GGT TTA ATT GGA GCT TTA TTA GTA TGT AGA GAA GGT AGT TTA GCT AAA GAA AAA ACA D L V K D L N S G L I G A L L V C R E G S L A K E K T

567 567 567

HC N seq HC CN seq HC CO seq

CAG ACC TTG CAC AAA TTT ATA CTA CTT TTT GCT GTA TTT GAT GAA GGG AAA AGT TGG CAC TCA GAA ACA AAG AAC TCC TTG CAA ACT CTT CAC AAA TTC ATC CTT TTA TTT GCT GTA TTT GAT GAA GGA AAA AGC TGG CAC AGT GAA ACT AAA AAT TCT TTG CAA ACT TTA CAT AAA TTT ATA TTA TTA TTT GCT GTA TTT GAT GAA GGT AAA AGT TGG CAT TCA GAA ACA AAA AAT TCT TTA Q T L H K F I L L F A V F D E G K S W H S E T K N S L

648 648 648

HC N seq HC CN seq HC CO seq

ATG CAG GAT AGG GAT GCT GCA TCT GCT CGG GCC TGG CCT AAA ATG CAC ACA GTC AAT GGT TAT GTA AAC AGG TCT CTG CCA ATG CAA GAT CGT GAT GCT GCA AGC GCT CGC GCT TGG CCA AAA ATG CAC ACT GTA AAT GGT TAC GTA AAT AGA TCT TTG CCT ATG CAA GAT AGA GAT GCT GCT TCT GCT AGA GCT TGG CCT AAA ATG CAT ACA GTT AAT GGT TAT GTA AAT AGA TCT CTT CCA M Q D R D A A S A R A W P K M H T V N G Y V N R S L P

729 729 729

HC N seq HC CN seq HC CO seq

GGT CTG ATT GGA TGC CAC AGG AAA TCA GTC TAT TGG CAT GTG ATT GGA ATG GGC ACC ACT CCT GAA GTG CAC TCA ATA TTC GGT CTT ATT GGC TGT CAC CGT AAA AGC GTA TAT TGG CAT GTA ATT GGT ATG GGT ACC ACT CCT GAG GTA CAC TCC ATC TTC GGT TTA ATT GGA TGT CAT AGA AAA TCA GTT TAT TGG CAT GTT ATT GGA ATG GGT ACT ACT CCT GAA GTT CAT TCA ATA TTT G L I G C H R K S V Y W H V I G M G T T P E V H S I F

810 810 810

HC N seq HC CN seq HC CO seq

CTC GAA GGT CAC ACA TTT CTT GTG AGG AAC CAT CGC CAG GCG TCC TTG GAA ATC TCG CCA ATA ACT TTC CTT ACT GCT CAA TTA GAA GGA CAT ACT TTC TTA GTA CGC AAT CAC AGA CAG GCT TCT CTT GAA ATT TCT CCA ATC ACT TTT CTT ACA GCT CAG TTA GAA GGT CAT ACA TTT TTA GTT AGA AAT CAT AGA CAA GCT TCT TTA GAA ATT TCT CCA ATA ACT TTT TTA ACT GCT CAA L E G H T F L V R N H R Q A S L E I S P I T F L T A Q

891 891 891

HC N seq HC CN seq HC CO seq

ACA CTC TTG ATG GAC CTT GGA CAG TTT CTA CTG TTT TGT CAT ATC TCT TCC CAC CAA CAT GAT GGC ATG GAA GCT TAT GTC ACC TTG TTA ATG GAC TTA GGA CAG TTC TTA CTA TTT TGT CAC ATC AGC TCT CAT CAA CAT GAC GGT ATG GAA GCA TAC GTA ACA TTA TTA ATG GAT TTA GGA CAA TTT TTA TTA TTT TGT CAT ATT TCT TCT CAT CAA CAT GAT GGT ATG GAA GCT TAT GTT T L L M D L G Q F L L F C H I S S H Q H D G M E A Y V

972 972 972

HC N seq HC CN seq HC CO seq

AAA GTA GAC AGC TGT CCA GAG GAA CCC CAA CTA CGA ATG AAA AAT AAT GAA GAA GCG GAA GAC TAT GAT GAT GAT CTT ACT AAG GTT GAT AGC TGC CCA GAG GAA CCT CAA TTG CGT ATG AAA AAC AAC GAA GAA GCT GAA GAT TAT GAC GAT GAT CTA ACT AAA GTA GAT TCT TGT CCA GAA GAA CCT CAA TTA CGA ATG AAA AAT AAT GAA GAA GCT GAA GAT TAT GAT GAT GAT TTA ACT K V D S C P E E P Q L R M K N N E E A E D Y D D D L T

1053 1053 1053

HC N seq HC CN seq HC CO seq

GAT TCT GAA ATG GAT GTG GTC AGG TTT GAT GAT GAC AAC TCT CCT TCC TTT ATC CAA ATT CGC TCA GTT GCC AAG AAG CAT GAT TCT GAG ATG GAT GTT GTT CGT TTC GAT GAT GAC AAT TCT CCA AGC TTC ATA CAA ATT AGA AGC GTA GCA AAG AAA CAT GAT TCT GAA ATG GAT GTT GTT AGA TTT GAT GAT GAT AAT TCT CCT TCT TTT ATT CAA ATT AGA TCA GTT GCT AAA AAA CAT D S E M D V V R F D D D N S P S F I Q I R S V A K K H

1134 1134 1134

HC N seq HC CN seq HC CO seq

CCT AAA ACT TGG GTA CAT TAC ATT GCT GCT GAA GAG GAG GAC TGG GAC TAT GCT CCC TTA GTC CTC GCC CCC GAT GAC AGA CCA AAA ACT TGG GTA CAC TAC ATT GCT GCA GAA GAA GAG GAT TGG GAT TAT GCC CCT TTG GTT CTT GCT CCA GAC GAT CGT CCT AAA ACT TGG GTA CAT TAT ATT GCT GCT GAA GAA GAA GAT TGG GAT TAT GCT CCT TTA GTT TTA GCT CCT GAT GAT AGA P K T W V H Y I A A E E E D W D Y A P L V L A P D D R

1215 1215 1215

HC N seq HC CN seq HC CO seq

AGT TAT AAA AGT CAA TAT TTG AAC AAT GGC CCT CAG CGG ATT GGT AGG AAG TAC AAA AAA GTC CGA TTT ATG GCA TAC ACA AGT TAT AAA TCT CAA TAT TTG AAC AAC GGT CCT CAA CGC ATC GGT CGA AAA TAC AAA AAA GTT AGA TTT ATG GCT TAC ACC AGT TAT AAA AGT CAA TAT TTA AAT AAT GGT CCT CAA AGA ATT GGT AGA AAA TAT AAA AAA GTT CGA TTT ATG GCT TAT ACA S Y K S Q Y L N N G P Q R I G R K Y K K V R F M A Y T

1296 1296 1296

HC N seq HC CN seq HC CO seq

GAT GAA ACC TTT AAG ACT CGT GAA GCT ATT CAG CAT GAA TCA GGA ATC TTG GGA CCT TTA CTT TAT GGG GAA GTT GGA GAC GAT GAA ACT TTC AAG ACC CGT GAA GCT ATT CAG CAT GAA TCT GGA ATT CTT GGT CCT CTA TTA TAT GGT GAA GTT GGT GAT GAT GAA ACT TTT AAA ACT CGT GAA GCT ATT CAA CAT GAA TCA GGA ATT TTA GGA CCT TTA TTA TAT GGT GAA GTT GGA GAT D E T F K T R E A I Q H E S G I L G P L L Y G E V G D

1377 1377 1377

HC N seq HC CN seq HC CO seq

ACA CTG TTG ATT ATA TTT AAG AAT CAA GCA AGC AGA CCA TAT AAC ATC TAC CCT CAC GGA ATC ACT GAT GTC CGT CCT TTG ACT CTT CTA ATT ATT TTC AAG AAC CAA GCT AGC CGT CCT TAC AAC ATT TAT CCT CAT GGC ATC ACT GAT GTA CGC CCT TTG ACA TTA TTA ATT ATA TTT AAA AAT CAA GCT AGC AGA CCA TAT AAT ATT TAT CCT CAT GGA ATT ACT GAT GTT CGT CCT TTA T L L I I F K N Q A S R P Y N I Y P H G I T D V R P L

1458 1458 1458

HC N seq HC CN seq HC CO seq

TAT TCA AGG AGA TTA CCA AAA GGT GTA AAA CAT TTG AAG GAT TTT CCA ATT CTG CCA GGA GAA ATA TTC AAA TAT AAA TGG TAT TCT CGA CGT TTA CCT AAA GGA GTA AAA CAC TTA AAG GAT TTC CCT ATC CTT CCA GGT GAA ATT TTC AAA TAT AAA TGG TAT TCA AGA AGA TTA CCA AAA GGT GTA AAA CAT TTA AAA GAT TTT CCA ATT TTA CCA GGA GAA ATA TTT AAA TAT AAA TGG Y S R R L P K G V K H L K D F P I L P G E I F K Y K W

1539 1539 1539

HC N seq HC CN seq HC CO seq

ACA GTG ACT GTA GAA GAT GGG CCA ACT AAA TCA GAT CCT CGG TGC CTG ACC CGC TAT TAC TCT AGT TTC GTT AAT ATG GAG ACC GTA ACC GTA GAG GAT GGT CCA ACC AAA TCT GAC CCT CGC TGT CTA ACT CGT TAC TAC TCT AGC TTC GTA AAT ATG GAA ACA GTT ACT GTA GAA GAT GGT CCA ACT AAA TCA GAT CCT AGA TGT TTA ACT AGA TAT TAT TCT AGT TTT GTT AAT ATG GAA T V T V E D G P T K S D P R C L T R Y Y S S F V N M E

1620 1620 1620

HC N seq HC CN seq HC CO seq

AGA GAT CTA GCT TCA GGA CTC ATT GGC CCT CTC CTC ATC TGC TAC AAA GAA TCT GTA GAT CAA AGA GGA AAC CAG ATA ATG CGT GAT CTT GCT AGT GGT TTG ATC GGT CCA TTA CTA ATC TGT TAC AAA GAG TCC GTT GAC CAA AGA GGC AAC CAA ATT ATG AGA GAT TTA GCT TCA GGA TTA ATT GGT CCT TTA TTA ATT TGT TAT AAA GAA TCT GTA GAT CAA AGA GGA AAT CAA ATA ATG R D L A S G L I G P L L I C Y K E S V D Q R G N Q I M

1701 1701 1701

HC N seq HC CN seq HC CO seq

TCA GAC AAG AGG AAT GTC ATC CTG TTT TCT GTA TTT GAT GAG AAC CGA AGC TGG TAC CTC ACA GAG AAT ATA CAA CGC TTT AGT GAT AAA CGT AAT GTT ATA CTA TTC AGT GTT TTC GAT GAA AAT CGT TCT TGG TAT CTA ACT GAA AAT ATT CAA CGA TTT TCA GAT AAA AGA AAT GTT ATT TTA TTT TCT GTA TTT GAT GAA AAT CGA TCT TGG TAT TTA ACA GAA AAT ATA CAA AGA TTT S D K R N V I L F S V F D E N R S W Y L T E N I Q R F

1782 1782 1782

HC N seq HC CN seq HC CO seq

CTC CCC AAT CCA GCT GGA GTG CAG CTT GAG GAT CCA GAG TTC CAA GCC TCC AAC ATC ATG CAC AGC ATC AAT GGC TAT GTT TTA CCT AAC CCT GCT GGT GTT CAA CTA GAG GAT CCT GAA TTC CAA GCC AGT AAT ATC ATG CAT AGC ATT AAT GGA TAT GTA TTA CCT AAT CCA GCT GGA GTT CAA TTA GAA GAT CCA GAA TTT CAA GCT TCT AAT ATT ATG CAT TCT ATT AAT GGT TAT GTT L P N P A G V Q L E D P E F Q A S N I M H S I N G Y V

1863 1863 1863

HC N seq HC CN seq HC CO seq

TTT GAT AGT TTG CAG TTG TCA GTT TGT TTG CAT GAG GTG GCA TAC TGG TAC ATT CTA AGC ATT GGA GCA CAG ACT GAC TTC TTC GAT AGT TTA CAA TTA TCC GTT TGT TTG CAT GAA GTT GCT TAC TGG TAT ATT CTA TCT ATC GGT GCT CAA ACT GAC TTC TTT GAT AGT TTA CAA TTA TCA GTT TGT TTA CAT GAA GTT GCT TAT TGG TAT ATT TTA TCT ATT GGA GCT CAA ACT GAT TTT F D S L Q L S V C L H E V A Y W Y I L S I G A Q T D F

1944 1944 1944

HC N seq HC CN seq HC CO seq

CTT TCT GTC TTC TTC TCT GGA TAT ACC TTC AAA CAC AAA ATG GTC TAT GAA GAC ACA CTC ACC CTA TTC CCA TTC TCA GGA CTA TCT GTA TTC TTC TCT GGT TAT ACC TTC AAA CAC AAA ATG GTA TAC GAG GAT ACC TTG ACC CTT TTT CCT TTC AGT GGT TTA TCT GTT TTT TTT TCT GGA TAT ACT TTT AAA CAT AAA ATG GTT TAT GAA GAT ACA TTA ACT TTA TTT CCA TTT TCA GGA L S V F F S G Y T F K H K M V Y E D T L T L F P F S G

2025 2025 2025

HC N seq HC CN seq HC CO seq

GAA ACT GTC TTC ATG TCG ATG GAA AAC CCA GGT CTA TGG ATT CTG GGG TGC CAC AAC TCA GAC TTT CGG AAC AGA GGC ATG GAA ACA GTT TTC ATG AGT ATG GAA AAC CCA GGC CTT TGG ATC CTA GGT TGT CAC AAT TCT GAT TTC CGT AAT CGC GGT ATG GAA ACT GTT TTT ATG TCT ATG GAA AAT CCA GGT TTA TGG ATT TTA GGT TGT CAT AAT TCA GAT TTT AGA AAT AGA GGT ATG E T V F M S M E N P G L W I L G C H N S D F R N R G M

2106 2106 2106

HC N seq HC CN seq HC CO seq

ACC GCC TTA CTG AAG GTT TCT AGT TGT GAC AAG AAC ACT GGT GAT TAT TAC GAG GAC AGT TAT GAA GAT ATT TCA GCA TAC ACT GCT TTG CTA AAA GTA TCT TCT TGC GAT AAA AAC ACT GGT GAT TAC TAT GAG GAT AGT TAT GAA GAT ATA TCT GCT TAT ACT GCT TTA TTA AAA GTT TCT AGT TGT GAT AAA AAT ACT GGT GAT TAT TAT GAA GAT AGT TAT GAA GAT ATT TCA GCT TAT T A L L K V S S C D K N T G D Y Y E D S Y E D I S A Y

2187 2187 2187

HC N seq HC CN seq HC CO seq

TTG CTG AGT AAA AAC AAT GCC ATT GAA CCA AGA AGC TTC TCC CAG AAT CCA CCA GTC TTG AAA CGC CAT CAA CGC TAA TTG CTA TCC AAA AAC AAT GCT ATT GAG CCT CGT TCT TTC TCT CAA AAT CCA CCT GTT TTA AAA CGT CAC CAA CGC TAA TTA TTA AGT AAA AAT AAT GCT ATT GAA CCA AGA TCT TTT TCT CAA AAT CCA CCA GTT TTA AAA AGA CAT CAA AGA TAA L L S K N N A I E P R S F S Q N P P V L K R H Q R

2265 2265 2265

B

VP1 N seq VP1 CN seq VP1 CO seq

GGG TTA GGT CAG ATG CTT GAA AGC ATG ATT GAC AAC ACA GTC CGT GAA ACG GTG GGG GCG GCA ACG TCT AGA GAC GCT CTC GGT TTA GGA CAA ATG TTG GAA TCT ATG ATT GAT AAC ACA GTA CGT GAA ACT GTT GGT GCT GCA ACT TCT CGT GAT GCT CTA GGT TTA GGT CAA ATG TTA GAA TCT ATG ATT GAT AAT ACT GTT CGT GAA ACT GTT GGT GCT GCT ACT TCT AGG GAT GCT TTA G L G Q M L E S M I D N T V R E T V G A A T S R D A L

81 81 81

VP1 N seq VP1 CN seq VP1 CO seq

CCA AAC ACT GAA GCC AGT GGA CCA GCA CAC TCC AAG GAA ATT CCG GCA CTC ACC GCA GTG GAA ACT GGG GCC ACA AAT CCA CCT AAT ACT GAA GCT AGT GGT CCT GCT CAT AGC AAA GAA ATT CCA GCT CTT ACC GCT GTT GAG ACC GGT GCT ACT AAC CCT CCA AAT ACT GAA GCT AGT GGT CCT GCT CAT TCT AAA GAA ATT CCT GCT TTA ACT GCT GTT GAA ACT GGT GCT ACA AAT CCA P N T E A S G P A H S K E I P A L T A V E T G A T N P

162 162 162

VP1 N seq VP1 CN seq VP1 CO seq

CTA GTC CCT TCT GAT CTA GTT CCT TCT GAT TTA GTT CCT TCT GAT L V P S D

GTG CAA ACC AGA CAT GTT GTA CAA CAT AGG TCA AGG TCA GAG TCT AGC ATA GAG TCT TTC TTC GTA CAA ACA CGT CAT GTA GTT CAA CAT AGA AGT CGT AGC GAA TCT AGT ATC GAG TCC TTC TTT GTT CAA ACT AGA CAT GTT GTA CAA CAT AGA TCA AGA TCA GAA TCT TCT ATA GAA TCT TTT TTT V Q T R H V V Q H R S R S E S S I E S F F

243 243 243

VP1 N seq VP1 CN seq VP1 CO seq

GCG CGG GGT GCA TGC GTG GCC ATT ATA ACC GTG GAT AAC TCA GCT TCC ACC AAG AAT AAG GAT AAG CTA TTT ACA GTG TGG GCT CGC GGT GCT TGT GTT GCA ATC ATT ACC GTA GAT AAC TCT GCT TCC ACT AAA AAT AAA GAT AAG CTA TTC ACT GTA TGG GCT AGA GGT GCT TGT GTT GCT ATT ATA ACT GTT GAT AAT TCA GCT TCT ACT AAA AAT AAA GAT AAA TTA TTT ACA GTT TGG A R G A C V A I I T V D N S A S T K N K D K L F T V W

324 324 324

VP1 N seq VP1 CN seq VP1 CO seq

AAG ATC ACT TAT AAA GAT ACT GTC CAG TTA CGG AGG AAA TTG GAG TTC TTC ACC TAT TCT AGA TTT GAT ATG GAA TTT ACC AAG ATT ACC TAC AAA GAT ACT GTT CAA TTA CGT CGA AAA TTA GAG TTC TTT ACT TAC TCC CGC TTT GAT ATG GAA TTC ACC AAA ATT ACT TAT AAA GAT ACT GTT CAA TTA AGA AGA AAA TTA GAA TTT TTT ACT TAT TCT AGG TTT GAT ATG GAA TTT ACT K I T Y K D T V Q L R R K L E F F T Y S R F D M E F T

405 405 405

VP1 N seq VP1 CN seq VP1 CO seq

TTT GTG GTT ACT GCA AAT TTC ACT GAG ACT AAC AAT GGG CAT GCC TTA AAT CAA GTG TAC CAA ATT ATG TAC GTA CCA CCA TTC GTA GTT ACT GCT AAT TTC ACC GAA ACT AAC AAT GGT CAC GCT TTG AAT CAG GTA TAT CAA ATC ATG TAC GTA CCA CCT TTT GTT GTT ACT GCT AAT TTT ACT GAA ACT AAT AAT GGT CAT GCT TTA AAT CAA GTT TAT CAA ATT ATG TAT GTA CCA CCA F V V T A N F T E T N N G H A L N Q V Y Q I M Y V P P

486 486 486

VP1 N seq VP1 CN seq VP1 CO seq

GGC GCT CCA GTG CCC GAG AAA TGG GAC GAC TAC ACA TGG CAA ACC TCA TCA AAT CCA TCA ATC TTT TAC ACC TAC GGA ACA GGA GCT CCT GTA CCA GAA AAA TGG GAT GAC TAT ACT TGG CAG ACT TCC TCT AAC CCT TCT ATT TTT TAT ACA TAC GGT ACC GGT GCT CCA GTT CCT GAA AAA TGG GAT GAT TAT ACA TGG CAA ACT TCA TCA AAT CCA TCA ATT TTT TAT ACT TAT GGA ACA G A P V P E K W D D Y T W Q T S S N P S I F Y T Y G T

567 567 567

VP1 N seq VP1 CN seq VP1 CO seq

GCT CCA GCC CGG ATC TCG GTA CCG TAT GTT GGT ATT TCG AAC GCC TAT TCA CAC TTT TAC GAC GGT TTT TCC AAA GTA CCA GCA CCT GCT CGT ATT AGC GTT CCA TAC GTA GGT ATT AGT AAC GCT TAC TCT CAC TTC TAT GAT GGT TTC TCT AAA GTA CCA GCT CCA GCT AGA ATT TCT GTA CCT TAT GTT GGT ATT TCT AAT GCT TAT TCA CAT TTT TAT GAT GGT TTT TCT AAA GTA CCA A P A R I S V P Y V G I S N A Y S H F Y D G F S K V P

648 648 648

VP1 N seq VP1 CN seq VP1 CO seq

CTG AAG GAC CAG TCG GCA GCA CTA GGT GAC TCC CTC TAT GGT GCA GCA TCT CTA AAT GAC TTC GGT ATT TTG GCT GTT AGA TTA AAA GAT CAA AGT GCT GCA CTA GGT GAC TCT CTA TAT GGT GCT GCA TCT CTA AAT GAT TTC GGT ATT TTA GCT GTA CGT TTA AAA GAT CAA TCT GCT GCA TTA GGT GAT TCT TTA TAT GGT GCT GCA TCT TTA AAT GAT TTT GGT ATT TTA GCT GTT AGA L K D Q S A A L G D S L Y G A A S L N D F G I L A V R

729 729 729

VP1 N seq VP1 CN seq VP1 CO seq

GTA GTC AAT GAT CAC AAC CCG ACC AAG GTC ACC TCC AAA ATC AGA GTG TAT CTA AAA CCC AAA CAC ATC AGA GTC TGG TGC GTT GTA AAC GAT CAC AAT CCA ACC AAA GTA ACC TCT AAA ATC CGC GTT TAT CTT AAA CCT AAG CAT ATT AGA GTA TGG TGT GTA GTT AAT GAT CAT AAT CCT ACT AAA GTT ACT TCT AAA ATT AGA GTT TAT CTA AAA CCT AAA CAT ATT AGA GTT TGG TGT V V N D H N P T K V T S K I R V Y L K P K H I R V W C

810 810 810

ACA ACT ACA T

VP1 N seq VP1 CN seq VP1 CO seq

CCG CGT CCA CCG AGG GCA GTG GCG TAC TAC GGC CCT GGA GTG GAT TAC AAG GAT GGT ACG CTT ACA CCC CTC TCC ACC AAG CCT CGC CCA CCT CGA GCT GTT GCT TAT TAC GGT CCT GGA GTA GAT TAC AAA GAT GGC ACA CTA ACT CCA TTA AGC ACA AAG CCT CGT CCA CCT AGA GCA GTT GCT TAT TAT GGT CCT GGA GTT GAT TAT AAA GAT GGT ACT TTA ACA CCT TTA TCT ACT AAA P R P P R A V A Y Y G P G V D Y K D G T L T P L S T K

891 891 891

VP1 N seq VP1 CN seq VP1 CO seq

GAT CTG ACC ACA TAT TGA GAC TTG ACC ACT TAT TAA GAT TTA ACT ACA TAT TAA D L T T Y *

909 909 909

Supplemental Figure S2. Comparison of native and codon-optimized (new and old) sequences. A, Sequence alignments of native (N) and codon-optimized (CN and CO) FVIII HC genes. B, Sequence alignments of native (N) and codon-optimized (CN and CO) VP1 genes. Any nucleotides that differ between codon-optimized sequences and the native sequence are marked in yellow. Red font indicates rare codons in the native sequence, which are eliminated in synthetic genes; N, native sequence; CN, codon-optimized sequence obtained from the new optimizer algorithm; CO, codon-optimized sequence obtained from the old optimizer algorithm.

psbA (46,500 codons from 133 psbA genes)

Chloroplast: Lactuca sativa (57,528 codons from 189 CDS's)

TTT (71.9%)

F

TCT (43.3%)

S

TAT (52.6%)

Y

TGT (85.8%)

C

TTT (64.6%)

F

TCT (29.8%)

S

TAT (80.9%)

Y

TGT (74.5%)

C

TTC (28.1%)

F

TCC (12.8%)

S

TAC (47.4%)

Y

TGC (14.2%)

C

TTC (35.4%)

F

TCC (16.2%)

S

TAC (19.1%)

Y

TGC (25.5%)

C

STOP

TTA (32.1%)

L

TCA (19.6%)

S

TAA (54.1%) STOP TGA (18.0%) STOP

W

TTG (21.6%)

L

TCG (8.7%)

S

TAG (27.8%) STOP TGG (100%)

W

TTA (26.2%)

L

TCA (5.6%)

S

TAA (100%)

STOP TGA (0%)

TTG (22.5%)

L

TCG (2%)

S

TAG (0%)

STOP TGG (100%)

CTT (20.5%)

L

CCT (65.8%)

P

CAT (47.9%)

H

CGT (54.1%)

R

CTT (21.9%)

L

CCT (38.9%)

P

CAT (75.2%)

H

CGT (22.5%)

R

CTC (0.1%)

L

CCC (1.9%)

P

CAC (52.2%)

H

CGC (17.8%)

R

CTC (6.4%)

L

CCC (17.0%)

P

CAC (24.8%)

H

CGC (7.3%)

R

CTA (27.1%)

L

CCA (27.8%)

P

CAA (80.4%)

Q

CGA (8.4%)

R

CTA (13.4%)

L

CCA (27.9%)

P

CAA (75.7%)

Q

CGA (21.7%)

R

CTG (3.7%)

L

CCG (4.5%)

P

CAG (19.6%)

Q

CGG (0.5%)

R

CTG (6.7%)

L

CCG (14.4%)

P

CAG (24.3%)

Q

CGG (7.6%)

R

ATT (57.5%)

I

ACT (58.7%)

T

AAT (47.4%)

N

AGT (22.0%)

S

ATT (48.9%)

I

ACT (41.6%)

T

AAT (77.3%)

N

AGT (20.2%)

S

ATC (34.0%)

I

ACC (30.9%)

T

AAC (52.6%)

N

AGC (14.7%)

S

ATC (20.0%)

I

ACC (18.2%)

T

AAC (22.7%)

N

AGC (5.5%)

S

ATA (8.6%)

I

ACA (9.6%)

T

AAA (84.4%)

K

AGA (12.3%)

R

ATA (31.1%)

I

ACA (30.6%)

T

AAA (74.4%)

K

AGA (29.6%)

R

ATG (100%)

M ACG (0.8%)

T

AAG (15.6%)

K

AGG (6.9%)

R

ATG (100%)

M ACG (9.6%)

T

AAG (25.6%)

K

AGG (11.4%)

R

GTT (44.8%)

V

GCT (68.7%)

A

GAT (81.0%)

D

GGT (67.2%)

G

GTT (36.6%)

V

GCT (44.8%)

A

GAT (80.9%)

D

GGT (33.3%)

G

GTC (2.2%)

V

GCC (7.0%)

A

GAC (19.0%)

D

GGC (13.0%)

G

GTC (12.2%)

V

GCC (15.8%)

A

GAC (19.1%)

D

GGC (11.4%)

G

GTA (51.3%)

V

GCA (19.4%)

A

GAA (75.0%)

E

GGA (17.6%)

G

GTA (37.1%)

V

GCA (18.4%)

A

GAA (73.9%)

E

GGA (38.6%)

G

GTG (2%)

V

GCG (4.9%)

A

GAG (25.0%)

E

GGG (2.3%)

G

GTG (14.1%)

V

GCG (11.0%)

A

GAG (26.1%)

E

GGG (16.7%)

G

Chloroplast: Nicotiana tabacum (34,756 codons from 137 CDS's) TTT (62.4%)

F

TCT (28.2%)

S

TAT (78.1%)

Y

TGT (72.7%)

C

TTC (37.6%)

F

TCC (16.4%)

S

TAC (21.9%)

Y

TGC (27.3%)

C

TTA (29.6%)

L

TCA (19.2%)

S

TAA (50.4%) STOP TGA (25.6%) STOP

TTG (21.1%)

L

TCG (10.3%)

S

TAG (26.0%) STOP TGG (100%)

W

CTT (21.6%)

L

CCT (40.6%)

P

CAT (75.3%)

H

CGT (20.5%)

R

CTC (7.5%)

L

CCC (17.3%)

P

CAC (24.7%)

H

CGC (6.7%)

R

CTA (13.0%)

L

CCA (28.8%)

P

CAA (74.3%)

Q

CGA (23.9%)

R

CTG (7.1%)

L

CCG (13.3%)

P

CAG (25.7%)

Q

CGG (8.3%)

R

ATT (48.5%)

I

ACT (39.5%)

T

AAT (74.1%)

N

AGT (19.0%)

S

ATC (21.3%)

I

ACC (19.8%)

T

AAC (25.9%)

N

AGC (6.9%)

S

ATA (30.2%)

I

ACA (29.9%)

T

AAA (72.0%)

K

AGA (29.2%)

R

ATG (100%)

M ACG (10.8%)

T

AAG (28.0%)

K

AGG (11.4%)

R

GTT (35.4%)

V

GCT (45.4%)

A

GAT (78.7%)

D

GGT (33.0%)

G

GTC (12.7%)

V

GCC (17.1%)

A

GAC (21.3%)

D

GGC (11.4%)

G

GTA (37.6%)

V

GCA (27.3%)

A

GAA (73.0%)

E

GGA (38.4%)

G

GTG (14.3%)

V

GCG (10.2%)

A

GAG (27.0%)

E

GGG (17.3%)

G

Supplemental Figure S3. Three different codon tables for the expression of heterologous genes in chloroplasts. The numbers of codons and genes used to create codon tables are indicated. In the psbA-based codon table, black and underlined codons indicate codons that are not used in optimized sequences due to their low usage frequency among synonymous codons (less than 5% usage or, for amino acids with 6 synonymous codons – leucine, serine and arginine – the two codons used least frequently). The chloroplast-based total codon table of lettuce (Lactuca sativa) or tobacco (Nicotiana tabacum) were produced based on data obtained through http://www.kazusa.or.jp/codon/ (Nakamura et al., 2000).

6000

B

4000 2000 0 0

2

4

6

8

Integrated density value (IDV)

y = 1599.5x ‐ 1091 R² = 0.990

8000

2500 y = 520.01x ‐ 1029.3 R² = 0.911

2000 1500 1000 500 0 0

2

CNTB (ng) 14000 12000 10000 8000 6000 4000 2000 0

y = 2198.1x ‐ 1435.8 R² = 0.994

0

2

4

6

Augmentation of  exposure time  (