Wheat germ cell-free platform for eukaryotic ... - Wiley Online Library

7 downloads 20514 Views 772KB Size Report
Center for Eukaryotic Structural Genomics, Biochemistry Department, University of Wisconsin-Madison, ... We describe a platform that utilizes wheat germ cell-free technology to pro- ..... protocols utilizing E. coli extract usually call for the.
MINIREVIEW

Wheat germ cell-free platform for eukaryotic protein production Dmitriy A. Vinarov, Carrie L. Loushin Newman and John L. Markley Center for Eukaryotic Structural Genomics, Biochemistry Department, University of Wisconsin-Madison, Madison, WI, USA

Keywords cell-free extract; in vitro; isotopic labeling; NMR screening; NMR structure determination; protein production; protein structure; transcription; translation; wheat germ Correspondence J. L. Markley, Biochemistry Department, University of Wisconsin-Madison, 433 Babcock Drive, Madison, WI 53706, USA Fax: +1 608 262 3759 Tel: +1 608 263 9349 E-mail: [email protected] Website: http://uwstructuralgenomics.org (Received 2 May 2006, revised 13 July 2006, accepted 26 July 2006) doi:10.1111/j.1742-4658.2006.05434.x

We describe a platform that utilizes wheat germ cell-free technology to produce protein samples for NMR structure determinations. In the first stage, cloned DNA molecules coding for proteins of interest are transcribed and translated on a small scale (25 lL) to determine levels of protein expression and solubility. The amount of protein produced (typically 2–10 lg) is sufficient to be visualized by polyacrylamide gel electrophoresis. The fraction of soluble protein is estimated by comparing gel scans of total protein and soluble protein. Targets that pass this first screen by exhibiting high protein production and solubility move to the second stage. In the second stage, the DNA is transcribed on a larger scale, and labeled proteins are produced by incorporation of [15N]-labeled amino acids in a 4 mL translation reaction that typically produces 1–3 mg of protein. The [15N]-labeled proteins are screened by 1H-15N correlated NMR spectroscopy to determine whether the protein is a good candidate for solution structure determination. Targets that pass this second screen are then translated in a medium containing amino acids doubly labeled with 15N and 13C. We describe the automation of these steps and their application to targets chosen from a variety of eukaryotic genomes: Arabidopsis thaliana, human, mouse, rat, and zebrafish. We present protein yields and costs and compare the wheat germ cell-free approach with alternative methods. Finally, we discuss remaining bottlenecks and approaches to their solution.

Introduction One of the most important tasks in biotechnology today is the development of improved systems and strategies for synthesizing any desired protein or protein fragment in its folded, soluble form on a preparative scale. This task is fundamental to the success of structural genomics projects, which promise to capitalize upon numerous advances in science and technology to change the appreciation and understanding of biological systems. Structural genomics implies a move away from hypothesis-driven research to a sys-

tem of solving structures first and using these structures and other structures modeled from them as the source of hypotheses for further research. The medical incentives for understanding protein structure are great. Many diseases are caused by defects in a single protein that alter its folding, stability, or activity. The structures of proteins involved in diseases will move us a step closer to improving disease treatment, diagnosis, and prevention. Beyond their specific medical applications, structural genomics projects are teaching fundamental lessons about the structural basis of life on this planet.

Abbreviations CESG, Center for Eukaryotic Structural Genomics; GST, glutathione S-transferase; HSQC, heteronuclear single-quantum correlation spectroscopy; IMAC, immobilized metal affinity chromatography; PDB, Protein Data Bank; [U-15N]-, uniform labeling with nitrogen-15; SAIL, stereo-array isotope labeled; Se-Met, selenomethionine.

4160

FEBS Journal 273 (2006) 4160–4169 ª 2006 The Authors Journal compilation ª 2006 FEBS

D. A. Vinarov et al.

Protein production remains a bottleneck in proteomics, for both structural and functional studies. Most structural biology groups and structural genomics centers utilize cell-based, heterologous protein production from Escherichia coli. However, this approach fails with many individual proteins, particularly those from eukaryotes. Failures result from no or low expression, low solubility, or degradation. Expression levels can be improved by producing the protein of interest as a cleavable fusion with a highly expressing protein. Low solubility can result from failure of the protein to fold properly, aggregation of folded protein, or from unfavorable properties of the construct (intrinsic insolubility of the native sequence or insolubility introduced by a non-native sequence, such as a purification tag or other cloning artifact). As indicated in TargetDB, the target registration database for structural genomics (http://targetdb.pdb.org/), the proportion of targets that code for ‘unique proteins’ that yield soluble protein is only about one-third for prokaryotic proteins and much lower for eukaryotic proteins. In this context, a unique protein is defined as one with a peptide sequence exhibiting £ 30% sequence identity to the sequence of any protein with a three-dimensional structure deposited in the Protein Data Bank. Solubility can be improved greatly by producing the protein of interest as a cleavable fusion with a highly soluble protein. This strategy may enable the protein to fold properly without aggregation so that it stays in solution following cleavage. Many eukaryotic proteins are ‘natively disordered’, that is they do not adopt a single, stable, folded structure. Some natively disordered proteins require an additional factor for folding: a metal ion, a small molecule cofactor, another peptide chain, or an oligonucleotide. Other proteins may require extensive post-translational modification to achieve their native folded state. Platforms for structural investigations must support the production of proteins on the scale of 2–10 mg. For efficient structure determination by NMR spectroscopy, the proteins must be labeled with stable isotopes (15N or 13C+15N, or for larger proteins 2H+13C+15N). For X-ray crystallography, proteins normally are labeled with selenomethionine (Se-Met) to support multiwavelength anomalous dispersion data collection for phase determination. Because protein production and labeling on this scale is expensive, it is important to screen targets first on a smaller scale to identify which constructs are expressed, soluble without aggregation, folded, and stable under the conditions used for NMR structure determinations or crystallization trials. In vitro cell-free methods for protein synthesis with extracts from prokaryotic [1] or eukaryotic [2] cells

Wheat germ cell-free eukaryotic protein production

offer an alternative to the E. coli cell-based platforms. Cell-free approaches have a number of potential advantages over other alternatives to heterologous expression in E. coli cells. Stable isotope or Se-Met labeling is easier with cell-free systems than with yeast, mammalian, or insect cell systems [3–5]. Cell-free systems may permit successful production of proteins that undergo proteolysis [6,7] or accumulate in inclusion bodies [8] in cells. Cell-free systems support selective labeling strategies [9–12] that cannot be achieved in bacterial whole cell systems. An important emerging approach is the incorporation of stereo-array isotope labeled (SAIL) amino acids [13], chemically synthesized amino acids with stereo-specifically arrayed stable isotope (2H and 13C) labeling patterns that are optimal for NMR spectroscopy. SAIL amino acids are being commercialized by a start-up company in Japan (Sail Technologies, Inc., Yokohama, Japan) and when available will raise the threshold for high-throughput NMR structure determinations from 20 kDa to 40 kDa and above [13]. The SAIL amino acids must be incorporated into proteins by in vitro synthesis so as not to disturb the labeling pattern. Cell-free systems have been used for the production of various kinds of proteins, including membrane proteins [14] and proteins that are toxic to cells [8,15]. It is possible to collect NMR spectra of [15N]-labeled proteins prior to isolation from the cell-free protein synthesis mixture [16,17]. One of the features of cellfree protein production is that only the protein of interest is labeled, so that contaminating proteins do not show up in normal multinuclear NMR spectra. Cell-free protein production protocols are streamlined compared to cell-based protocols, in that they do not require cell harvesting or cell lysis. Protein purification is usually simpler, because the protein of interest starts out more concentrated and is isolated from a smaller set of contaminants. The RIKEN Structural Genomics Center in collaboration with Roche has pioneered the use of cell-free protein production through a coupled transcriptiontranslation system employing E. coli extracts [18–22]. It has been found, however, that most of the proteins that produce well in E. coli cell-free systems are the same ones that are produced successfully from E. coli cells [10]. Thus, despite other potential advantages, the E. coli cell-free approach may not greatly expand the range of proteins that can be produced in soluble, folded state, although it may be possible to overcome this limitation by redesigning the gene sequence (see below), by adding chaperones or other factors [22,24], or by reengineering ribosomal proteins [25].

FEBS Journal 273 (2006) 4160–4169 ª 2006 The Authors Journal compilation ª 2006 FEBS

4161

Wheat germ cell-free eukaryotic protein production

D. A. Vinarov et al.

One of the first in vitro translation systems to be investigated was prepared from wheat germ extracts, but yields from this eukaryotic extract were low [2]. Y. Endo and his group at Ehime University (Matsuyama, Japan) achieved a breakthrough in this technology by finding that an inhibitor of ribosomal protein synthesis, tritin, is associated with the coat of the wheat embryo [26]. They developed a process for removing this contaminating inhibitor and patented this process along with methods for utilizing the improved wheat germ extract [26–31]. Endo founded a company, CellFree Sciences Co., Ltd. (Yokohama, Japan), to commercialize the technology. We found this approach to be promising and formed a cooperative undertaking with Ehime University and the CellFree Sciences Co., Ltd. with the goal of investigating the potential of wheat germ cell-free protein production as an enabling technology in our structural genomics project, the Center for Eukaryotic Structural Genomics (CESG; Madison, WI). As discussed here, we have found this technology to be robust, and our wheat germ cell-free pipeline now supports high-throughput screening for protein production and solubility and provides stable isotope labeled protein samples for the majority of the NMR structures determined at CESG [32,33].

scale (25–50 lL) screening to assay the level of protein production and solubility, (3) larger scale (4–12 mL) production of [U-15N]-protein used to evaluate whether solution conditions can be found that render the target suitable for NMR structure determination (soluble, monodisperse, folded, and stable), and (4) production of sufficient [U-13C,15N]-protein for multidimensional, multinuclear magnetic resonance data collection. We purchase the wheat germ extract from CellFree Sciences, Inc., the RNA polymerase from Promega (Madison, WI), and the labeled amino acids from Cambridge Isotope Laboratories, (Andover, MA). Details about these and other reagents and supplies are found in our publications [32–34]. The purification workflow diagram is shown in Fig. 1B. In step (1), a defined series of cloning procedures are used to create a DNA plasmid containing the target gene and 5¢ and 3¢ extensions that promote efficient transcription and translation. In step (2), small scale protein expression and purification trials are carried out, generally in a 96 well format. Successful candidates from these screens (those estimated to yield > 0.5 mgÆmL)1 target protein with solubility > 75%) are then selected for larger scale protein production with incorporation of [15N]-labeled amino acids. Purified [U-15N]-protein samples produced in step (3) are then assayed by 1H-15N correlation spectroscopy (1H-15N HSQC) for their suitability as structural candidates (they must be folded, monodisperse, and stable at room temperature for at least 14 days). The solution conditions can be refined as part of this step. Targets that pass these tests are then prepared as [U-13C,15N]protein samples, step (4).

CESG’s wheat germ cell-free platform Our detailed protocol for wheat germ cell-free protein production is available elsewhere [34]. In short, the approach consists of four steps (Fig. 1A): (1) creation of a plasmid used for in vitro transcription, (2) small

A

Target selection

B

Production for structural analysis Cell Free Reaction

Screening (50 µl scale) 1. Cloning – PCR from cDNA, Ligation cloning DNA plasmid prep 2. Small scale – Transcription, Translation

Analysis, Expression level, Solubility, (Tag cleavage) Production for structural analysis (4-12 ml scale)

(4-12ml)

Protein product with non-cleavable N-(His)6 tag

Protein product with cleavable N-GST tag

Ni-HiTrap Chelating Column

1st GSTrap Column

Concentrate

Concentrate

3. Production and analysis of [15N]-protein DNA plasmid preps, Transcription Translation on [15N]-amino acids (4 ml reaction) Isolation, purification (tag removal) HSQC NMR analysis, Solubility, stability, and MS analysis Successful targets 4. Production of [13C,15 N]-protein As above but with double-labeling (4 –12 ml reaction) Structure determination by NMR

4162

PreScission Protease Cleavage Superdex75 in NMR Buffer

2nd GSTrap Column

Concentration NMR sample

Fig. 1. (A) Workflow diagram showing how wheat germ cell-free platform is used to screen constructs for the expression of soluble protein, to produce [15N]-labeled protein for NMR screening for suitability as a structural candidate, and for the production of double-labeled [13C,15N]protein for structure determination. (B) Schematic illustration of the steps involved in isolating and purifying proteins produced by wheat germ cell-free platform depending on the type of tag: noncleavable N-(His)6 tag or cleavable N-GST tag.

FEBS Journal 273 (2006) 4160–4169 ª 2006 The Authors Journal compilation ª 2006 FEBS

D. A. Vinarov et al.

We have tested the wheat germ cell-free platform in the context of NMR-based structural genomics of eukaryotic proteins and have compared it with our parallel E. coli cell-based platform. Our experience is summarized briefly as follows. (a) Targets can be screened more quickly and more economically for protein expression and solubility by the cell-free approach than by the cell-based approach. The efficiency of this process is important, because we need to screen many targets or multiple constructs of a given target in order to find one that produces a protein that is soluble and well folded. As an example of multiple screening of a given target, we have screened targets with a noncleavable His6 tag, with a cleavable His6 tag, and with a cleavable glutathione S-transferase (GST) tag and have shown complementary success with these [35]. (b) Because of the smaller volumes involved, the isolation and purification of 1–5 mg quantities of labeled protein for NMR structural studies is faster and less labor intensive with proteins prepared by the cell-free approach than the cell-based approach. (c) Proteins produced with the wheat germ extract from CellFree Sciences and labeled amino acids generally show high levels of enrichment by mass spectrometry: > 95% 15 N ⁄ (14N+15N) or 13C ⁄ (12C+13C). These high levels are excellent for NMR spectroscopy. (d) The cell-free system supports the production of proteins with a variety of labeling patterns: uniform labeling with 2H, 13C, and 15N, selective labeling by residue type, and SAIL (discussed above). We recently carried out a detailed comparison of the wheat germ cell-free and E. coli cell-based approaches to protein production for NMR structure determination [35]. In this study 96 randomly chosen Arabidopsis thaliana targets were carried through CESG’s wheat germ cell-free and E. coli cell pipelines. If possible, [15N]-labeled versions of each protein were produced for analysis by 1H-15N correlation NMR spectroscopy. Of the 96 targets started with, only eight from the cellfree pipeline and five from the cell-based pipeline were found suitable for NMR structural analysis on the basis of the NMR results. In this comparison, the five targets that proved successful by the E. coli cell-based approach also were successful by the cell-free approach. Our wheat germ cell-free approach appears to have advantages over published in vitro protein production protocols that utilize E. coli S30 extract. (a) Cell-free protocols utilizing E. coli extract usually call for the testing of multiple plasmids with sequence differences outside the protein coding region to determine one that produces protein in high yield [36]. By contrast, with the wheat germ cell-free protocol we have found

Wheat germ cell-free eukaryotic protein production

no advantage of modifying the plasmid sequence outside the coding region, and hence utilize a single plasmid construct for all targets. (b) Protocols for E. coli S30 cell-free synthesis typically employ additives, such as polyethylene glycol to improve protein yields [10]. These additives need to be removed prior to NMR structural studies. No such additives are required with the wheat germ cell-free approach probably because the wheat germ extract contains chaperones and other factors that contribute to higher protein yields. (c) To achieve a high level of label incorporation from E. coli S30 extract it may be necessary to take pains to remove endogenous unlabeled amino acids [10]. (d) Proteins prepared from E. coli S30 extract may be heterogeneous as the result of incomplete cleavage of the N-terminal methionine. This heterogeneity can lead to doubling of NMR peaks [10]. An effective solution is to make all proteins with a cleavable N-terminal sequence. This complication does not occur with proteins produced in vitro from wheat germ extract. (e) Wheat germ extracts contain chaperones, and do not require the addition of chaperones as sometimes needed for high yields from E. coli S30 extract [37,38]. A comparison of protein production from wheat germ extract and E. coli S30 extract [39] demonstrated that a significantly higher proportion of multiple domain eukaryotic proteins were soluble when translated by wheat germ extract.

Automation All of the cell-free operations can be carried out by hand, and this is how we started using the technology. Because of the small volume requirements for screening (25–50 lL) and protein production for structural studies (4–12 mL), cell-free methods have proved amenable to automation. CESG makes use a CellFree Sciences GeneDecoder1000TM robotic system (Fig. 2) in automating the small scale screening of constructs for protein production and solubility. This unit makes it possible to carry out as many as 1052 small scale (25 lL) screening reactions per week. CESG has two prototype robotic units developed by CellFree Sciences for larger scale protein production (Fig. 2). The Protemist10TM robotic system requires preparation of the mRNA off-line, whereas the newer Protemist100TM starts with DNA and produces the mRNA transcript prior to the translation step. Each of these systems supports 24 4 mL transcription and translation reactions per week. Typical yields for the Protemist runs are 0.3–0.5 mg purified protein per mL reaction mixture. These robotic systems handle the many steps that are tedious to carry out by hand, and work

FEBS Journal 273 (2006) 4160–4169 ª 2006 The Authors Journal compilation ª 2006 FEBS

4163

Wheat germ cell-free eukaryotic protein production

D. A. Vinarov et al.

Fig. 2. Fully automated protein synthesizers from CellFree Sciences. (Left) GeneDecoder1000TM , which operates in two small scale modes. In the screening mode, it handles up to four 96 well plates per overnight run, produces 2–10 lg protein per well, and uses 1.0–5.0 mL wheat germ extract per plate. In the small scale protein production mode it can handle up to two 96 well plates per overnight run, produces between 10 and 50 lg protein per well, and uses 5.0–10.0 mL wheat germ extract per plate. (Center) Protemist10TM robotic system, which is capable of carrying out 24 4 mL translation reactions per week. The unit produces 1–3 mg protein per reaction and utilizes 3 mL wheat germ extract per reaction. This system requires off-line preparation of the mRNA. (Right) Protemist100TM robotic system, which supports 24 4 mL transcription and translation reactions per week. Its capabilities are similar to those of the Protemist10, but it has the added feature of automated production of mRNA. These robotics systems carry out a variety of operations including solvent extraction, high level multichannel liquid handling, centrifugation, and incubation at various temperatures. An onboard microprocessor interfaced with the computer connected to the database keeps detailed log files that contain information about temperatures, volumes, and operational performance at every step.

through the night. They have greatly reduced the manpower requirements of cell-free screening and protein production.

Success rates with eukaryotic targets The centers involved in the NIH Protein Structure Initiative (USA) are generating information about success rates in going from a selected target gene to a completed and deposited three-dimensional protein structure. The overall success rates still tend to be quite low, in the range of 2% to 20%, depending on the center and the types of targets selected. It is clear from all centers that the yields of structures for eukaryotic targets are much lower than for prokaryotic targets. In the interest of efficiency and cost savings, it is important to analyze where failures occur and to devise strategies to minimize these. The most effective routes for improve4164

ment involve a combination of bioinformatics and small scale screening. Bioinformatics relies on prior information and mathematical models for correlating success rates with gene sequences. Small scale screening offers the most economical way of testing whether a cloned and sequenced target will proceed through the critical stages leading to a structure. The initial screening step determines the level of gene expression and the solubility of the product. As described above, CESG’s wheat germ cell-free platform supports rapid and economical small scale screening for expression and solubility. We currently test constructs with and without an N-terminal tag and have shown success in rescuing failed targets by truncating the N- and ⁄ or C-termini. The second screening operation relevant to NMR structure determinations is the screening of the [15N]-labeled protein target by 1H-15N HSQC spectroscopy). This test, which is repeated after one week to

FEBS Journal 273 (2006) 4160–4169 ª 2006 The Authors Journal compilation ª 2006 FEBS

D. A. Vinarov et al.

Wheat germ cell-free eukaryotic protein production

Table 1. Statistics on eukaryotic proteins produced by CESG’s wheat germ cell-free platform. Small scale (lg), automated 96 well format production overnight

Genome

Targets selected

Targets cloned successfully

Human Mouse Arabidopsis Total

191 150 381 722

174 129 351 654

(91%)b (86%) (92%) (91%)

Large scale (mg), automated 8 · 4 mL production overnight

Targets showing acceptable expression

Targets showing adequate solubility

[15N]-labeled proteins produced

Acceptable [15N-1H]-HSQC spectrum

Protein stable for > 10 days

[13C,15N]labeled protein madea

3D structures by NMR

135 47 269 451

55 14 120 189

36 11 76 123

15 2 17 34

10 1 9 20

10 1 9 20

9 1 8 18

(77%) (36%) (77%) (69%)

(41%) (30%) (45%) (42%)

(66%) (79%) (63%) (65%)

(42%) (18%) (22%) (28%)

(67%) (50%) (53%) (59%)

(100%) (100%) (100%) (100%)

(90%) (100%) (89%) (90%)

a

Average yield of purified double-labeled proteins used in structural investigations was 0.3 mgÆmL–1 reaction mixture. b Percentages represent the number of successful targets at a given step divided by the number coming from the previous step (174 ⁄ 191) ¼ 91% in the case indicated.

determine if the protein is stable in solution, is highly diagnostic for the success of an NMR structure determination. Proteins that pass this test are then produced with [15N+13C]-labeling. We have accumulated experience in using the cell-free platform to produce proteins from several eukaryotic genomes. These include over 722 different structural genomics targets from human, mouse, and Arabidopsis (Table 1). Most of the targets selected for testing have coded for proteins less then 25 kDa, because this is the size limit for high-throughput structure determinations by NMR spectroscopy. In addition, we have carried out small scale wheat germ cell-free screening of approximately 150 larger proteins (25–70 kDa), and the success rates for expressing soluble proteins appear to be comparable to our earlier results with smaller targets presented in Table 1. We define ‘highly soluble’ as ‡ 75% of the total protein being present in the soluble fraction. Of the same proteins produced with N-terminal GST tags and N-terminal (His)6 tags, 9% more were highly soluble with the GST tag. Only  5% of proteins soluble as GST fusions became insoluble following cleavage and removal of the GST tag. Thus the results show that proteins fused to GST can be more highly soluble and that the advantage may persist after the tag is removed (presumably through improved folding of the purified fusion protein prior to cleavage). We have gathered statistics specific to human proteins. Of 174 human targets (most with unknown function) that were successfully cloned, 135 (78%) showed expression at levels suitable for structural investigations. Of these expressed proteins, 55 (41%) were soluble at levels needed for NMR spectroscopy. Of these, 36 (66%) gave [15N]-labeled samples at levels that could be evaluated by NMR spectroscopy. To date, nine of these human proteins yielded NMR

structures. In total, CESG has determined NMR structures of 18 eukaryotic proteins produced by this methodology (Fig. 3). The average yield of purified, labeled, human proteins made for NMR structural studies has been 0.3 mgÆmL)1 reaction mixture.

Costs Labor savings, coupled with the high level of incorporation of labeled amino acids and the high yield of folded protein samples, makes the overall cost of the wheat germ cell-free method comparable to that of the E. coli cell-based approach for NMR structure determinations of eukaryotic proteins. One of the main advantages of the automated wheat germ cell-free protein expression system is that the overall process requires much less time and effort compared to our current cell-based methods. Not including the cloning steps, it generally takes 48 h (using the GeneDecoder1000TM ), or 72 h (manually), to screen 96 targets for expression and solubility on the small scale. The purification protocols also require less time and effort than cell-based protocols because of the smaller volumes (4–12 mL versus 500– 1000 mL) and higher initial purity. Using the latest in General Electric Healthcare HIS-TRAP purification technology (Piscataway, NJ), immobilized metal affinity chromatography (IMAC) purification of His tagged proteins requires 40 min of processing time and results in protein samples that are 75–85% pure. Gel filtration adds an additional 3 h and can increase the purity to > 95% for proteins < 15 kDa and to 90% for proteins < 20 kDa. GST purification results in > 95% purity regardless of size; however, the minimal time to process the sample is greater than 10 h. Because stable isotope labeled amino acids required for NMR structure determinations are expensive, it is important that the protein yield per quantity of amino

FEBS Journal 273 (2006) 4160–4169 ª 2006 The Authors Journal compilation ª 2006 FEBS

4165

Wheat germ cell-free eukaryotic protein production

(1) Hs.78877 11 kDa PDB: 2G2B

(4) Dr.13312 12 kDa PDB: 2FB7

(7) At3g51030.1 14 kDa PDB: 1XFL

(10) At2g23090.1 9 kDa PDB: 1WVK

D. A. Vinarov et al.

(2) At5g39720.1 19 kDa PDB: 2G0Q

(5) At3g01050.1 13 kDa PDB: 1SE9

(3) At5g66040.1 14 kDa PDB: 1TQ1

(6) At2g24940.1 11 kDa PDB: 1T0G

(8) Hs.102419 13 kDa PDB: 1ZR9

(9) Hs.157607 14 kDa PDB: 2ETT

(11) P62627 dimer 22 kDa PDB: 1Y4O

(12) At2g46140.1 19 kDa PDB: 1YYC

Fig. 3. Examples of three-dimensional solution structures of eukaryotic proteins determined by NMR spectroscopy from labeled samples produced by the wheat germ cell-free platform described here. All structures have been deposited in the Protein Data Bank under the accession codes indicated. The molecular masses of the proteins are indicated; these proteins are relatively small because they were chosen as targets for high-throughput NMR structure determination, which currently has a practical size limit of  25 kDa. (1) Hs.78877 is human allograft inflammatory factor 1. (2) At5g39720.1 is a protein of unknown function from A. thaliana. (3) At5g66040.1 is a single domain sulfurtransferase and is annotated as a senescence-associated protein (sen1-like protein) and ketoconazole resistance protein. (4) Dr13312 is a protein of unknown function from zebrafish. (5) At3g01050.1 from A. thaliana has a ubiquitin-like fold, and may be prenylated at a putative C-terminal CAAX box motif so as to target the protein and its binding partners to a membrane compartment of the cell [32]. (6) At2g24940.1 from A. thaliana gave a structure with a cytochrome b5-like fold but with some resemblance to steroid binding proteins [42]; a subsequent NMR study showed that the protein binds progesterone. This protein failed to express in the E. coli cell-based pipeline. (7) At3g51030.1 is an h1 thioreodoxin from A. thaliana [43]. This protein was also produced from E. coli cells; it gave an acceptable HSQC spectrum but failed to crystallize. (8) Hs.102419 is a human C2h2-type zinc finger protein. (9) Hs.157607 is a human sorting nexin 22 px domain. (10) At2g23090.1 is an unknown, partially disordered protein from A. thaliana. (11) P62627 from mouse is isoform 1 of Roadblock ⁄ LC7, a light chain in the dynein complex [44]. (12) At2g46140.1 from A. thaliana is late embryogenesis abundant (LEA) protein of a type expressed under conditions of cellular stress, such as desiccation, cold, osmotic stress, and heat [45].

4166

FEBS Journal 273 (2006) 4160–4169 ª 2006 The Authors Journal compilation ª 2006 FEBS

D. A. Vinarov et al.

acid supplied be high. With cell-free systems (E. coli or wheat germ)  10% of the labeled amino mixture supplied is incorporated into the protein produced and purified. Although the cell-free approach is much less labor intensive in comparison to our E. coli cell-based pipeline, it requires more expensive reagents and supplies. Current limitations of the method stem from the restricted availability and high cost of highly active wheat germ extract. These problems should ease as the wheat germ cell-free approach becomes more widespread and as increasing demands for cell-free extract stimulate improvements in production technology. The costs of stable isotope labeled amino acids also may be expected to decrease as demand accelerates. Average supplies costs currently are: US$47 per target for cloning and expression solubility testing (with unpurified reaction mixture assayed by SDS ⁄ PAGE), US$370 per mg for Se-Met protein, US$390 per mg for [15N]protein, and US$470 per mg for [13C,15N]protein (with proteins isolated and purified). The major advantages of the wheat germ cell-free method over the E. coli cell-based pipeline are that it supports the production of a larger fraction of targets as folded, soluble protein and that it is much faster to prepare additional samples or truncated samples as needed for successful structure determinations. The E. coli approach has a cost advantage when its protein yields are much higher than cell-free. The overall costs of each approach appear to be similar for NMR structure determinations.

Prospects Because of the complementarity of cell-free and cellbased methods, we envision that it will be most efficient to screen each new target by both methods. Initially, we did not have an easy way to screen targets by the two approaches, because the cell-based pipeline was using ligation-independent cloning technology, whereas the cell-free pipeline used ligation cloning into the pEU vector. To remedy this, we recently implemented a cloning strategy that enables efficient small scale screening by cell-free and cell-based methods [40]; this approach utilizes Promega’s Flexi Vector technology to transfer the target gene from one plasmid to another. By comparing the small scale screening results from the two platforms, we can now choose the one more likely to be successful. If the cell-based approach is selected for an NMR target, we make use of a self-induction medium developed for producing [15N] or [13C+15N]-labeled protein from E. coli cells [41].

Wheat germ cell-free eukaryotic protein production

The largest remaining bottlenecks associated with the wheat germ cell-free protocol are the limited solubility, aggregation, or limited stability exhibited by many targets. Improvements in any of these areas would greatly lower the costs of structure determinations. Our ongoing research is aimed at investigating reasons for failures of these types and at developing approaches for rescuing failed targets. Some structural genomics centers start multiple constructs for each target selected (different N- and C-termini, different fusions, or different vectors and hosts) and choose the one that yields the most soluble protein. We have initiated a pilot study aimed at determining whether the initial production of constructs with multiple N- and C-termini for small scale screening would be more efficient than our current approach of redesigning failed constructs. Currently, CESG’s X-ray structure pipeline requires in the order of 10 mg of Se-Met protein for each target. We anticipate that as reliable small scale crystallization screening methods become available, the wheat germ cell-free method could become part of the X-ray crystallography pipeline. We have already determined by mass spectrometry that the wheat germ cell-free approach supports high level incorporation of Se-Met, and we have made small quantities of Se-Met-labeled proteins for use chip (Fluidigm, South San Francisco, CA) crystallization screening.

Acknowledgements We gratefully acknowledge the work of all CESG staff members and collaborators and fruitful interactions with Professor Y. Endo and his group at Ehime University, Matsuyama, Japan, and staff members of CellFree Sciences Co., Ltd. (Yokohama, Japan) in adapting their technology to research and production environments. Supported by NIH grants 1U54 G074901 (which supports CESG), and P41 RR02301 (which supports the National Magnetic Resonance Facility at Madison, where NMR spectroscopy was carried out).

References 1 Kramer G, Kudlicki W, Hardesty B, Higgens SJ & Hames BD (1999) Cell-free coupled transcription-translation systems from Escherichia coli. In Protein Expression. A Practical Approach (Higgens SJ & Hames BD, eds), pp. 201–223. Oxford University Press, Oxford, UK. 2 Clemens MM, Prujin GJ, Higgens SJ & Hames BD (1999) Protein synthesis in eukaryotic cell-free systems.

FEBS Journal 273 (2006) 4160–4169 ª 2006 The Authors Journal compilation ª 2006 FEBS

4167

Wheat germ cell-free eukaryotic protein production

3

4

5

6

7

8

9

10

11

12

13

14

15

D. A. Vinarov et al.

In Protein Expression. A Practical Approach (Higgens SJ & Hames BD, eds), pp. 129–165. Oxford University Press, Oxford, UK. Cubeddu L, Moss CX, Swarbrick JD, Gooley AA, Williams KL, Curmi PM, Slade MB & Mabbutt BC (2000) Dictyostelium discoideum as expression host: isotopic labeling of a recombinant glycoprotein for NMR studies. Protein Expr Purif 19, 335–342. Strauss A, Bitsch F, Cutting B, Fendrich G, Graff P, Liebetanz J, Zurini M & Jahnke W (2003) Amino-acidtype selective isotope labeling of proteins expressed in baculovirus-infected insect cells useful for NMR studies. J Biomol NMR 26, 367–372. Bruggert M, Rehm T, Shanker S, Georgescu J & Holak TA (2003) A novel medium for expression of proteins selectively labeled with 15N-amino acids in Spodoptera frugiperda (Sf9) insect cells. J Biomol NMR 25, 335–348. Goff SA & Goldberg AL (1987) An Increased Content of Protease LA, the Lon Gene-Product, Increases Protein-Degradation and Blocks Growth in Escherichia coli. J Biol Chem 262, 4508–4515. Maurizi MR (1987) Degradation in vitro of bacteriophage lambda N protein by Lon protease from Escherichia coli. J Biol Chem 262, 2696–2703. Chrunyk BA, Evans J, Lillquist J, Young P & Wetzel R (1993) Inclusion-Body Formation and Protein Stability in Sequence Variants of Interleukin-1-Beta. J Biol Chem 268, 18053–18061. Shi J, Pelton JG, Cho HS & Wemmer DE (2004) Protein signal assignments using specific labeling and cellfree synthesis. J Biomol NMR 28, 235–247. Torizawa T, Shimizu M, Taoka M, Miyano H & Kainosho M (2004) Efficient production of isotopically labeled proteins by cell-free synthesis: a practical protocol. J Biomol NMR 30, 311–325. Yabuki T, Kigawa T, Dohmae N, Takio K, Terada T, Ito Y, Laue ED, Cooper JA, Kainosho M & Yokoyama S (1998) Dual Amino Acid-Selective and Site-Directed Stable-Isotope Labeling of the Human c-Ha-Ras Protein by Cell-Free Synthesis. J Biomol NMR 11, 295–306. Kigawa T, Muto Y & Yokoyama S (1995) Cell-Free Synthesis and Amino Acid-Selective Stable Isotope Labeling of Proteins for NMR Analysis. J Biomol NMR 6, 129–134. Kainosho M, Torizawa T, Iwashita Y, Terauchi T, Mei Ono A & Gu¨ntert P (2006) Optimal isotope labelling for NMR protein structure determinations. Nature 440, 52–57. Klammt C, Lohr F, Schafer B, Haase W, Doetsch V, Ru¨terjans H, Glaubitz C & Bernhard F (2004) High level cell-free expression and specific labeling of integral membrane proteins. Eur J Biochem 271, 568–580. Henrich B, Lubitz W & Plapp R (1982) Lysis of Escherichia coli by induction of Cloned Phi-X174 Genes. Mol Gen Gen 185, 493–497.

4168

16 Guignard L, Ozawa K, Pursglove SE, Otting G & Dixon NE (2002) NMR analysis of in vitro-synthesized proteins without purification: a high-throughput approach. FEBS Lett 524, 159–162. 17 Kohno T (2005) Production of proteins for NMR studies using the wheat germ cell-free system. Methods Mol Biol 310, 169–185. 18 Kigawa T & Yokoyama S (2002) [High-throughput cellfree protein expression system for structural genomics and proteomics studies]. Tanpakushitsu Kakusan Koso 47, 1014–1019. 19 Yokoyama S (2003) Protein expression systems for structural genomics and proteomics. Curr Opin Chem Biol 7, 39–43. 20 Kigawa T, Yabuki T, Yoshida Y, Tsutsui M, Ito Y, Shibata T & Yokoyama S (1999) Cell-free production and stable-isotope labeling of milligram quantities of proteins. FEBS Lett 442, 15–19. 21 Kim DM, Kigawa T, Choi CY & Yokoyama S (1996) A Highly Efficient Cell-Free Protein Synthesis System from Escherichia coli. Eur J Biochem 239, 881–886. 22 Yokoyama S, Hirota H, Kigawa T, Yabuki T, Shirouzu M, Terada T, Ito Y, Matsuo Y, Kuroda Y, Nishimura Y, Kyogoku Y, Miki K, Masui R & Kuramitsu S (2000) Structural genomics projects in Japan. Nat Struct Biol 7 (Suppl.), 943–945. 23 Kim DM & Swartz JR (2000) Prolonging cell-free protein synthesis by selective reagent additions. Biotechnol Prog 16, 385–390. 24 Yin G & Swartz JR (2004) Enhancing multiple disulfide bonded protein folding in a cell-free system. Biotechnol Bioeng 86, 188–195. 25 Chumpolkulwong N, Hori-Takemoto C, Hosaka T, Inaoka T, Kigawa T, Shirouzu M, Ochi K & Yokoyama S (2004) Effects of Escherichia coli ribosomal protein S12 mutations on cell-free protein synthesis. Eur J Biochem 271, 1127–1134. 26 Madin K, Sawasaki T, Ogasawara T & Endo Y (2000) A highly efficient and robust cell-free protein synthesis system prepared from wheat embryos: Plants apparently contain a suicide system directed at ribosomes. Proc Natl Acad Sci USA 97, 559–564. 27 Endo Y (2001) Genomics to Proteomics: A Highthroughput Cell-free Protein Synthesis System for Practical Use. The 3rd ORCS International Symposium on Ribosome Engineering, January 22–23, 2001. Tsukuba, Japan. 28 Kawasaki T, Gouda MD, Sawasaki T, Takai K & Endo Y (2003) Efficient synthesis of a disulfide-containing protein through a batch cell-free system from wheat germ. Eur J Biochem 270, 4780–4786. 29 Morita EH, Sawasaki T, Tanaka R, Endo Y & Kohno T (2003) A wheat germ cell-free system is a novel way to screen protein folding and function. Protein Sci 12, 1216–1221.

FEBS Journal 273 (2006) 4160–4169 ª 2006 The Authors Journal compilation ª 2006 FEBS

D. A. Vinarov et al.

30 Sawasaki T, Ogasawara T, Morishita R & Endo Y (2002) A cell-free protein synthesis system for highthroughput proteomics. Proc Natl Acad Sci USA 99, 14652–14657. 31 Sawasaki T, Hasegawa Y, Tsuchimochi M, Kamura N, Ogasawara T, Kuroita T & Endo Y (2002) A bilayer cell-free protein synthesis system for highthroughput screening of gene products. FEBS Lett 514, 102–105. 32 Vinarov DA, Lytle BL, Peterson FC, Tyler EM, Volkman BF & Markley JL (2004) Cell-free protein production and labeling protocol for NMR-based structural proteomics. Nat Methods 1, 149–153. 33 Vinarov DA & Markley JL (2005) High-Throughput Automated Platform for NMR-Based Structural Proteomics. Expert Rev Proteomics 2, 49–55. 34 Vinarov DA, Tyler EM, Loushin Newman CL, Shahan MN & Markley JL (2006) Protein Production using the Wheat Germ Cell-Free Expression System. In Current Protocols in Protein Science (Coligan JE, Dunn BM, Ploegh HL, Speicher DW & Wingfield PT, eds. Series ed. Taylor G), pp. 5.18.1–5.18.18. Unlimited Learning Resources, Winston-Salem, NC. 35 Tyler RC, Aceti DJ, Bingman CA, Cornilescu CC, Fox BG, Frederick RO, Jeon WB, Lee MS, Newman CS, Peterson FC, Phillips GN Jr, Shahan MN, Singh S, Song J, Sreenath H, Tyler EM, Ulrich EL, Vinarov DA, Vojtik FC, Volkman BF, Wrobel RL, Zhao Q & Markley JL (2005) Comparison of cell-based and cellfree protocols for producing target proteins from the Arabidopsis thaliana genome for structural studies. Proteins 59, 633–643. 36 Betton JM (2003) Rapid translation system (RTS): a promising alternative for recombinant protein production. Curr Protein Pept Sci 4, 73–80. 37 Ryabov LA, Desplancq D, Spirin AS & Pluckthun A (1997) Functional antibody production using cell-free

Wheat germ cell-free eukaryotic protein production

38

39

40

41

42

43

44

45

translation: effects of protein disulfide isomerase and chaperones. Nat Biotechnol 15, 79–84. Kang SH, Kim DM, Kim HJ, Jun SY, Lee. KY & Kim HJ (2005) Cell-free production of aggregation-prone proteins in soluble and active forms. Biotechnol Prog 21, 1412–1419. Hirano N, Sawasaki T, Tozawa Y, Endo Y & Takai K (2006) Tolerance for random recombination of domains in prokaryotic and eukaryotic translation systems: Limited interdomain misfolding in a eukaryotic translation system. Proteins 64, 343–354. Blommel PG, Martin PA, Wrobel RL, Steffen E & Fox BG (2006) High-efficiency single-step production of expression plasmids from cDNA clones using the Flexi Vector cloning system. Protein Expr Purif 47, 562–570. Tyler RC, Sreenath H, Aceti DJ, Bingman CA, Singh S, Markley JL & Fox BG (2005) Auto-Induction Medium for the Production of [U-15N]- and [U-13C, U-15N]-labeled Proteins for NMR Screening and Structure Determination. Protein Expr Purif 40, 268–278. Song J, Vinarov D, Tyler EM, Shahan MN, Tyler RC & Markley JL (2004) Hypothetical protein At2g24940.1 from Arabidopsis thaliana has a cytochrome b5 like fold. J Biomol NMR 30, 215–218. Peterson FC, Lytle BL, Sampath S, Vinarov D, Tyler E, Shahan M, Markley JL & Volkman BF (2005) Solution structure of thioredoxin h1 from Arabidopsis thaliana. Protein Sci 14, 2195–2200. Song J, Tyler RC, Lee MS, Tyler EM & Markley JL (2005) Solution structure of isoform 1 of Roadblock ⁄ LC7, a light chain in the dynein complex. J Mol Biol 354, 1043–1051. Singh S, Cornilescu CC, Tyler RC, Cornilescu G, Tonelli M, Lee MS & Markley JL (2005) Solution structure of a late embryogenesis abundant protein (LEA14) from Arabidopsis thaliana, a cellular stressrelated protein. Protein Sci 14, 2601–2609.

FEBS Journal 273 (2006) 4160–4169 ª 2006 The Authors Journal compilation ª 2006 FEBS

4169