Structural investigation of ribosomally synthesized

0 downloads 0 Views 1MB Size Report
Aug 19, 2014 - (1). RiPPs are biosynthesized from genetically encoded and ribosomally ... only contain a small fraction of informative signals among a large number of ... ments of RiPPs have been reported based on insufficient in- formation .... quence of MibA, the proteins encoded in the biosynthetic gene cluster, and the ...
Structural investigation of ribosomally synthesized natural products by hypothetical structure enumeration and evaluation using tandem MS Qi Zhanga,b, Manuel Ortegaa,c, Yanxiang Shia,b, Huan Wanga,b, Joel O. Melbyb,d, Weixin Tanga,b, Douglas A. Mitchellb,d,e, and Wilfred A. van der Donka,b,c,d,1 a Howard Hughes Medical Institute, Departments of bChemistry, cBiochemistry, and eMicrobiology, and dInstitute for Genomic Biology, University of Illinois at Urbana–Champaign, Urbana, IL 61801

Ribosomally synthesized and posttranslationally modified peptides (RiPPs) are a growing class of natural products that are found in all domains of life. These compounds possess vast structural diversity and have a wide range of biological activities, promising a fertile ground for exploring novel natural products. One challenging aspect of RiPP research is the difficulty of structure determination due to their architectural complexity. We here describe a method for automated structural characterization of RiPPs by tandem mass spectrometry. This method is based on the combined analysis of multiple mass spectra and evaluation of a collection of hypothetical structures predicted based on the biosynthetic gene cluster and molecular weight. We show that this method is effective in structural characterization of complex RiPPs, including lanthipeptides, glycopeptides, and azole-containing peptides. Using this method, we have determined the structure of a previously structurally uncharacterized lanthipeptide, prochlorosin 1.2, and investigated the order of the posttranslational modifications in three biosynthetic systems. dehydration

| genome mining | lantibiotics | directionality

R

ibosomally synthesized and posttranslationally modified peptides (RiPPs) are a major class of natural products as revealed by the genome-sequencing efforts of the past decade (1). RiPPs are biosynthesized from genetically encoded and ribosomally produced precursor peptides, which typically consist of a core peptide that is transformed to the final product and an N-terminal extension called the leader peptide that is usually important for recognition by the posttranslational modification (PTM) enzymes (1). Because of the highly diverse PTMs, these compounds possess vast structural diversity and have a wide range of biological activities, thus representing a fertile ground for exploration. Furthermore, the ribosomal origin of RiPPs makes them particularly well suited for genome mining efforts. By using genome mining to explicitly avoid species harboring biosynthetic gene clusters identical to those that produce known compounds, a combination of strain prioritization and mass spectrometry (MS)-based analysis offers a new route to discovering natural products that can overcome the burden of rediscovery that has increasingly hampered discovery efforts (2, 3). One challenging aspect of high-throughput genome mining for new natural products is the difficulty to determine their molecular structures in high throughput. We present here a method that allows automated RiPP structure elucidation. In contrast to nonribosomal peptides that have an average molecular weight of less than 1,000 Da, as documented in the NORINE database (4), RiPPs in many cases have molecular weights larger than 2,500 Da. Molecules of this size are difficult to rapidly analyze by NMR spectroscopy, rendering MS the most convenient tool for RiPP structural characterization. Even when the precursor peptide sequences are known and the types of PTMs can be predicted based on the sequences of the biosynthetic enzymes (5–8), multiple possible PTM sites on the

www.pnas.org/cgi/doi/10.1073/pnas.1406418111

precursor peptide typically result in a myriad of structures that are often difficult to differentiate. This challenge is further exacerbated by the frequent occurrence of one or more cross-links in RiPPs, which complicates traditional tandem MS-based structure elucidation. One of the main difficulties is that the spectra only contain a small fraction of informative signals among a large number of less diagnostic signals that cloud spectrum interpretation. As the spectra often also vary significantly with different instrument settings (9), selection of the most suitable spectra for drawing conclusions is time consuming and sometimes introduces bias. Indeed, a number of incorrect structural assignments of RiPPs have been reported based on insufficient information content of tandem MS data (10–15). Here, we report use of hypothetical structure enumeration and evaluation (HSEE) for automated and unbiased interpretation of tandem MS data. The method is based on the prediction of a collection of hypothetical structures for a RiPP of certain mass and known biosynthetic information. By listing all of the theoretical daughter ions from this enumeration and automated evaluation of their matches with one or several experimental spectra, the most probable RiPP structure can be determined. We demonstrate here for multiple classes of known RiPPs with complex structures that HSEE is highly effective in analyzing tandem MS data and predicting the correct structure. In addition, we used HSEE to characterize a lanthipeptide whose structure was elusive despite Significance Ribosomally synthesized and posttranslationally modified peptides (RiPPs) constitute a promising repertoire of natural products with potentially useful properties. Several tools have been developed in recent years to facilitate RiPP discovery and to connect genetic information with the chemical entities produced by microorganisms. Structure elucidation remains challenging for the RiPP field and is a bottleneck for genome mining and/or synthetic biology efforts to identify and characterize new RiPP members. This study presents a method for automated structural analysis of RiPPs, which is based on enumeration of a collection of hypothetical structures predicted based on genomic information and evaluation of each structure with multiple tandem mass spectra. We show that this approach provides a powerful method for structural characterization of complex RiPPs and their biosynthetic intermediates. Author contributions: Q.Z. and W.A.v.d.D. designed research; Q.Z., M.O., Y.S., H.W., J.O.M., and W.T. performed research; Q.Z., D.A.M., and W.A.v.d.D. analyzed data; and Q.Z., D.A.M., and W.A.v.d.D. wrote the paper. The authors declare no conflict of interest. This article is a PNAS Direct Submission. J.C.V. is a guest editor invited by the Editorial Board. 1

To whom correspondence should be addressed. Email: [email protected].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1406418111/-/DCSupplemental.

PNAS | August 19, 2014 | vol. 111 | no. 33 | 12031–12036

BIOCHEMISTRY

Edited by John C. Vederas, University of Alberta, Edmonton, AB, Canada, and accepted by the Editorial Board July 9, 2014 (received for review April 8, 2014)

our previous efforts, and to determine the directionality of thiazole-forming enzymes and lanthipeptide synthetases. Results and Discussion The HSEE Concept. Based on the accurate molecular weight obtained by high-resolution MS (16) and the types of PTMs predicted from the genetic and/or biochemical information, the structure of an unknown RiPP can be found within a set of putative structures differing only in the number and location of PTM sites (Fig. 1). Given that every putative structure has its own set of theoretical daughter ions, we reasoned that the structure having the highest number of experimentally observed ions that match the theoretical ions would most likely represent the correct structure. By enumeration of all of the putative structures and comparing the set of theoretical daughter ions for each structure with the experimental data, the PTM sites might thus be located (Fig. 1). This method is akin to shotgun annotation used for identifying reversible PTMs in proteomics (17). To this end, we developed a scoring matrix described as hsði; kÞ = nði; kÞ=NðiÞ, where hs(i,k) is the scoring for hypothetical structure i in spectrum k; n(i,k) is the number of matched ions for structure i in spectrum k; and N(i) is the total number of theoretical ions of structure i used in the analysis. With m independently acquired spectra in total, scoring for P the overallP m evaluation of structure i is HsðiÞ = m k=1 hsði; kÞ = k=1 nði; kÞ= ðmpNðiÞÞ. This algorithm allows one to consider many tandem MS spectra collected with different experimental settings at the same time and the hypothetical structure (Hs) score gives an automated, unbiased, and, as we show, accurate prediction of a RiPP structure from tandem MS data (see SI Appendix for details of data analysis). As a proof of principle of the HSEE approach, we first reexamined the structure of sublancin, an S-linked glycopeptide (Fig. 2A), which until recently was believed to be a lantibiotic (18). We carried out tandem MS analysis on the tris(2-carboxyethyl) phosphine (TCEP)-reduced full-length sublancin (Fig. 2A),

which contains a glucose moiety that is introduced by the glycosyltransferase SunS. Whereas the high-resolution mass of sublancin rules out other PTMs, seven hypothetical structures can represent reduced sublancin (structures 1–7), each corresponding to glucosylation of a different residue (Cys/Ser/Thr) (Fig. 2A). Calculation of the Hs values for the seven hypothetical structures based on two tandem MS spectra (SI Appendix, Fig. S1; R scripts for data processing are provided in SI Appendix) clearly shows that structure 5 has the highest value (Fig. 2B). This structure corresponds to Cys5 being glucosylated, which indeed is the correct structure. HSEE Structural Analysis of Additional Known RiPPs. We next performed HSEE analysis on haloduracin β. This compound is a member of the lanthipeptide family in which select Ser and Thr residues are first dehydrated, and a subset of the dehydroamino acids acts as electrophiles in a subsequent Michael-like reaction with Cys thiols to generate thioether rings (19, 20). Seven of the eight Ser/Thr residues in the precursor peptide of haloduracin are dehydrated. The close proximity of the C-terminal Thr/Ser residues makes identifying the location of the residue that escapes dehydration a difficult task. As a result, an incorrect structure was first proposed for haloduracin β in which the sixth Ser/Thr had escaped dehydration (Fig. 3B, black dot) (12). The nondehydrated residue was later shown to be the eighth Ser/Thr residue by site-directed mutagenesis and comparative analysis (Fig. 3B, red star) (13). Using the HSEE algorithm, however, the correct residue that escapes dehydration is readily identified, as this structure afforded the highest Hs value by comparing hypothetical and experimental fragment ions (Fig. 3B). We note that the Hs values are close for several possible structures. This finding is not surprising because the adjacency of the potential PTM sites results in a relatively large number of common fragment ions among the hypothetical structures (SI Appendix, Fig. S2), as indicated by the hierarchical cluster analysis of the ion list similarity (Fig. 3B).

Fig. 1. Workflow for HSEE. The accurate molecular weight obtained from high-resolution MS analysis and the biosynthetic gene content are used to predict the peptide length and types and number of PTMs, allowing generation of a list of hypothetical structures. The “M” (for modification) shown in red and blue in the “Hypothetical structure list” indicate different PTMs on the precursor peptide. Each hypothetical structure (Hs) has its own set of fragment ions, and different sets of ions derived from different hypothetical structures form a fragment ion matrix. Evaluation of each set of ions with the tandem MS data allows identification of the structure that has the highest Hs value (highlighted by a white star in the Hs bar plot), which represents the most likely structure. Although in many cases one mass spectrum can distinguish between the predicted structures, using more than one spectrum recorded at different instrument settings decreases possible biased interpretation. Moreover, when several structures give the same Hs value based on a single tandem mass spectrum, use of additional tandem MS data allows differentiation in many cases. When two or more structures have close Hs values, the confidence in the predicted best structure can be evaluated by inspection of the matched ions.

12032 | www.pnas.org/cgi/doi/10.1073/pnas.1406418111

Zhang et al.

We next used the method with deschloromicrobisporicin (DC-MIB), a known microbisporicin (MIB) (also called NAI-107) analog in which Trp4 escapes the chlorination seen in MIB (Fig. 3C) (21). The Mib gene cluster (22) encodes a lanthipeptide dehydratase MibB, a lanthipeptide cyclase MibC, a precursor peptide MibA, and a flavin-dependent decarboxylase MibD; the latter protein is likely responsible for introducing an S-[(Z)-2aminovinyl]-D-cysteine (AviCys) system (23). Furthermore, the Mib cluster also contains a P450 monooxygenase MibO (22). Searching against the SWISS-PROT database (24) using MibO as the query indicated that MibO is similar to several P450 enzymes that act on aliphatic chains, suggesting that Val1, Leu6, Pro9, Pro14, and the aliphatic parts of Trp4 and Phe22 of MibA are potential sites of modification by MibO. On the basis of the sequence of MibA, the proteins encoded in the biosynthetic gene cluster, and the accurate molecular weight of DC-MIB (m/z 2,213.81 Da for [M+H]+), one predicts that C-terminal decarboxylation, various Ser/Thr dehydrations, and mono- or bishydroxylation of Val1, Leu6, Pro9, or Pro14, or monohydroxylation of Trp4 and Phe22 could possibly be involved in DC-MIB maturation. Therefore, 49 hypothetical structures were drawn for DC-MIB, in which the ring topology was not taken into consideration (SI Appendix, Fig. S3). Using HSEE, DC-MIB was deduced to contain a bishydroxylated Pro14 and a nondehydrated Thr12 (Fig. 3D and SI Appendix, Fig. S4), and the predicted PTM sites indeed correspond to the correct structure (21) (Fig. 3C). Using the same procedure, we evaluated HSEE for the structure elucidation of 10 additional previously characterized RiPPs and RiPP derivatives (SI Appendix, Table S1). The analyses predicted the correct sites of PTMs for all, and thus far, no false predictions have been encountered, demonstrating the consistency and robustness of HSEE. HSEE was also used to evaluate whether heterologous expression of lanthipeptides in Escherichia coli results in the native structure (SI Appendix, Figs. S5–S7), which is a key requisite for synthetic biology applications. HSEE Structural Characterization of Prochlorosin 1.2. We next investigated the structure of the cyanobacterial lanthipeptide prochlorosin 1.2 (Pcn1.2) to further test the robustness of HSEE. It has been shown that for Pcn1.2, four of the five Ser/Thr residues in its precursor peptide are dehydrated (Fig. 4A), but its structure remains elusive despite our previous efforts that could not distinguish between several potential structures (25). In this work, HSEE with three different tandem mass spectra (SI Zhang et al.

BIOCHEMISTRY

Fig. 2. HSEE analysis of sublancin. (A) Structure representation of sublancin (Upper) and TCEP-reduced sublancin peptide sequence showing the seven numbered possible glucosylation sites in blue (Lower). The correct glucosylation site is marked with an asterisk. (B) HSEE scoring for seven hypothetical structures of TCEP-reduced sublancin. Structures 1–7 represent peptides corresponding to a different glucosylated residue. The correct structure is again denoted with an asterisk. The lower Inset shows the hierarchical cluster analysis of the hypothetical structures based on Euclidean distance (ED) of the fragment ion matrix, which indicates the similarities between different structures. Note that hypothetical structures that are more similar to the correct structures have more common fragment ions and therefore higher Hs values, e.g., with regard to fragmentation pattern, structure 4 is most similar to the real structure of sublancin (structure 5) and has the second highest Hs value in HSEE analysis.

Appendix, Fig. S8) allowed localization of the nondehydrated residue to Ser2 (Fig. 4B). Unlike haloduracin β and microbisporicin, determination of the ring topology of Pcn1.2 is not aided by comparative analysis with other known lanthipeptides (12, 26). To further explore the scope, we used HSEE to investigate the topology of cross-links between two residues, which normally poses a great challenge for structural analysis because cross-link formation for lanthipeptides does not involve a change in mass and because cross-link formation generally decreases the fragmentation of the peptide. To this end, we divided Pcn1.2 into six regions (i–vi) according to the position of dehydroamino acids and Cys residues (Fig. 4A). To observe a fragment ion from the region that is covered by a thioether cross-link, a peptide bond and a thioether bond must both be broken, which is a low-probability event in the MS collision cell. Accordingly, comparing the Hs value within each region can provide valuable insights into the ring structures: a region that falls within a cyclic peptide would be expected to have a very small Hs value, whereas a linear region is expected to have a relatively high Hs value. For Pcn1.2, the tandem MS data did not display any fragment ions in regions i, iii, iv, and vi, whereas several matched ions are observed in regions ii and v (Fig. 4C), suggesting a ring pattern of Pcn1.2 shown in Fig. 4D. To validate this assignment, Pcn1.2 was subjected to trypsin digestion. Subsequent MALDI-TOF MS analysis of the tryptic fragments clearly showed protease cleavage within region v (SI Appendix, Fig. S9), supporting the proposed ring structure of Pcn1.2. We note that none of the three independently recorded spectra contained the full fragmentation information of Pcn1.2

Fig. 3. Structural analysis of haloduracin β and deschloromicrobisporicin (DC-MIB). (A) Structure of haloduracin β and explanations of the shorthand notations used for each PTM. The Cys-, Ser-, and Thr-derived structures are shown in red, blue, and purple, respectively. (B) HSEE analysis of haloduracin β. Each number represents a structure with a different Ser/Thr escaping dehydration, with the numbering used indicated in the peptide sequence. The correct and originally proposed (incorrect) nondehydrated residues in haloduracin β are indicated by a red asterisk and a black dot, respectively. The correct structure in the Hs bar plot is highlighted by a white asterisk. (C) Structure of DC-MIB and the shorthand notations used for its specific PTMs; other PTMs are shown in the same manner as in A. (D) HSEE analysis of DC-MIB. The nondehydrated Thr residue and the dihydroxylated Pro residue are shown by red and blue asterisks, respectively. The hypothetical structures 1–49 are shown in SI Appendix, Fig. S3. After determination of the dehydration pattern by HSEE, the lanthionine ring topologies were predicted by sequence alignment with known lanthipeptides and by analysis of the tandem MS data as explained in the section on prochlorosin 1.2 (Pcn1.2).

PNAS | August 19, 2014 | vol. 111 | no. 33 | 12033

Fig. 4. Structural analysis of Pcn1.2. (A) Peptide sequence of Pcn1.2. The nondehydrated Ser residue is shown by a red asterisk; i–vi represent different peptide regions, which are defined according to the specific fragmentation sites (e.g., ion resulting from the cleavage N-terminal to the Ser labeled 1 belongs to region i, but C-terminal to this Ser belongs to region ii). For the detailed grouping of ions in region i–vi, please see SI Appendix, Fig. S9A. (B) HSEE analysis revealed that the Ser labeled 2 is not dehydrated. (C) HSEE analysis focusing on different peptide regions, which revealed that regions ii and v are unlikely to contain lanthionines. (D) Proposed structure of Pcn1.2. The conclusion is based on the observation that the second Ser escapes dehydration and that regions ii and v are unlikely to contain lanthionines because fragmentation within these regions is observed. The fraction below or above the blue arrows indicate the number of spectra containing these fragmentation ions in the three tandem MS spectra. The trypsin cleavage site is indicated by a red arrow. The bars in B and C are colored green and brown to indicate that the analyses were focused on dehydration and cyclization pattern, respectively.

(SI Appendix, Fig. S8), and only the ability of HSEE to consider several spectra at once allowed determination of its structure, demonstrating the utility of this method. Directionality of Thiazole Formation on BalhA2. Another area of research in which HSEE may be valuable is in the analysis of the temporal installation of multiple PTMs, which provides important mechanistic insights into RiPP biosynthesis. To test this application, we first used HSEE to analyze the in vitro intermediates of the azole-forming enzymes BalhC/BalhD/BcerB (1, 27). In this biosynthetic system, BalhC and BalhD catalyze the heterocyclization of three Cys thiols of the BalhA2 peptide onto the carbonyl groups of the preceding amino acids, followed by a net dehydration to form thiazoline rings. The thiazolines are subsequently dehydrogenated by the flavin mononucleotide (FMN)-containing enzyme BcerB to form thiazole heterocycles (Fig. 5A) (28, 29). HSEE analysis indicated that for the intermediate containing one thiazole ring (loss of 20 Da) the structure containing a C-terminal thiazole has the highest Hs values (Fig. 5C), whereas for the intermediate having two thiazole rings (loss of 40 Da), the structure having two thiazoles installed at the C terminus has the highest Hs value (Fig. 5D). This result indicates that thiazole formation on BalhA2 is C- to N-directional (Fig. 5B), which is consistent with a previous study (28) and is distinct from the N-to-C azole-forming direction of microcin B17 biosynthesis (30). Directionality of ProcM Catalysis. We next investigated the directionality of ProcM, a highly promiscuous class II lanthipeptide synthetase (25). ProcM acts on a large number of precursor peptides ProcA that consist of conserved N-terminal leader peptides and highly diverse C-terminal core peptides, and produces a diverse array of lanthipeptides termed prochlorosins (Pcns) (25), including Pcn1.2, which has been structurally characterized in this study as mentioned above. Our ProcM directionality study focused on one of the substrates, ProcA3.2 (Fig. 6A), which has four Ser/Thr residues but is dehydrated only three times (25, 31). HSEE analysis of the onefold and twofold dehydrated ProcA3.2 indicated that ProcM initially dehydrated the third Thr in the 12034 | www.pnas.org/cgi/doi/10.1073/pnas.1406418111

core peptide followed by the second Thr (Fig. 6 B and C). However, the catalytic order is not strict, as some ions corresponding to the dehydration at other sites are also found in the spectra, both for the onefold and twofold dehydrated intermediates (SI Appendix, Figs. S10 and S11). HSEE analysis of the threefold dehydrated ProcA3.2 suggests that the most N-terminally located Thr in the core peptide is the last dehydration site, whereas the Ser in the core peptide is not dehydrated at all (Fig. 6D and SI Appendix, Fig. S12). Thus, the dehydration of the core peptide of ProcA3.2 by ProcM is generally following a C- to Nterminal direction, but the directionality is not strict. This result is distinct from the directionality of the class II lanthipeptide synthetase HalM2, which carries out dehydration with generally N- to C-terminal direction (32), suggesting that the directionality of dehydration by lanthipeptide synthetases is nonuniform. Mechanistic Investigation of NisB Catalysis by HSEE. We next applied HSEE to investigate the directionality of catalysis by the nisin dehydratase NisB (33), a prototypical class I lanthipeptide synthetase (19, 26). The activity of NisB was recently reconstituted in vitro (33), allowing for a detailed interrogation of its catalytic mechanism. Dehydration by NisB first involves glutamylation of Ser and Thr, followed by more rapid elimination of glutamate to generate the dehydroalanine and dehydrobutyrine residues, respectively. NisA has nine Ser/Thr residues, but the Ser labeled 8 in Fig. 7A is not dehydrated in wild-type nisin. Previous in vivo studies showed that this Ser was dehydrated when formation of the nisin D and/or E rings (Fig. 7A) was disrupted by site-directed mutagenesis, suggesting that NisB and the cyclase NisC may work in a coordinated fashion (34). However, all nine Ser/Thr residues were dehydrated by NisB in vitro, although the major product was eightfold dehydrated NisA (33). To investigate whether NisB had any selectivity for different dehydration sites on NisA, we performed HSEE analysis on the eightfold dehydrated NisA produced in vitro. The result shows that the structure with the Ser labeled 8 escaping dehydration has the highest Hs value (Fig. 7B). In fact, close examination of the experimentally matched ions showed that no ions were produced that indicate dehydration of this Ser (SI Appendix, Fig. S13), suggesting that NisB prefers not to dehydrate this Ser even in the absence of NisC.

Fig. 5. Use of HSEE for investigation of the directionality of thiazole formation. (A) Thiazole formation on BalhA2. (B) Fragment of BalhA2 that is generated after trypsin treatment. The thiazole-forming Cys residues are highlighted in red. The black arrow shows the direction of thiazole installment on BalhA2 catalyzed by the BalhC/BalhD/BcerB complex. (C) HSEE analysis of BalhA2 containing one thiazole ring. Structures 1–3 correspond to BalhA2 containing a single thiazole derived from Cys1–3, respectively. (D) HSEE analysis of BalhA2 containing two thiazole rings. Structures 1–3 correspond to BalhA2 in which the indicated Cys (Cys1–3) is not converted to a thiazole, whereas the other two Cys residues are converted to thiazoles.

Zhang et al.

flanking Pro. Given the large number of potential PTM sites in NisA, detailed investigation of the dehydration pattern of various biosynthetic intermediates would have been impractical, if not impossible, without the assistance of the HSEE algorithm (e.g., NisA intermediates that are dehydrated fourfold or fivefold can have as many as 126 possible structures).

To investigate the processing direction of NisB, we first established the time dependence of NisA dehydration in vitro, which clearly showed that in vitro NisB dehydration is a distributive rather than a processive process (SI Appendix, Fig. S14). HSEE analysis of onefold dehydrated NisA clearly showed that NisB dehydration starts at the most N-terminal Thr (Fig. 7C and SI Appendix, Fig. S15). Investigation of twofold dehydrated NisA showed that structures St12, St13, and St14 have similar Hs values (SI Appendix, Fig. S16, in this notation, the nine Ser/Thr residues that can be dehydrated are numbered 1–9 from N to C terminus, and St12 represents the structure in which the residues labeled 1 and 2 are dehydrated; similar representations are used for St13, St14, etc.). Examination of the spectra showed that ions characteristic of St12, St13, and St14 are all found (Fig. 7D), suggesting that the second dehydration site is not completely specific and involves Ser2, Ser3, or Thr4 (Fig. 7E). Intriguingly, the third dehydration appears to take place at the residue labeled Thr6 (Fig. 7E), as structures St126, St136, and St146 have similar Hs values (SI Appendix, Figs. S17 and S18). The fourth dehydration site is less specific and takes place at Ser2, Ser3, Thr4, or Thr5 (Fig. 7E), as several structures (e.g., St1246, St1256, St1356) share similarly high Hs values (SI Appendix, Fig. S19), and characteristic ions for each structure are observed (SI Appendix, Fig. S20). The fifth dehydration appears to be highly selective for Thr7 (Fig. 7E and SI Appendix, Figs. S21 and S22), whereas the sixth dehydration takes place at the N terminus again and involves Ser2, Ser3, Thr4, or Thr5 (Fig. 7E and SI Appendix, Figs. S23 and S24). The seventh dehydration occurs specifically at Ser9 (SI Appendix, Figs. S25 and S26), whereas the eighth dehydration involves Ser3, Thr4, or Thr5 (Fig. 7B and SI Appendix, Fig. S13), leaving Ser8 as the only nondehydrated residue in the eightfold dehydrated NisA. Thus, NisB dehydration is an overall N- to C-terminal process, but the directionality is not strict (Fig. 7E). Empirically Ser escapes dehydration more often than Thr in known class I lanthipeptides, and therefore the relatively nonspecific dehydration of the Ser labeled 2 and 3 might be the result of the difficulty of Ser dehydration. Similarly, the Thr labeled 4 appears to be dehydrated relatively slowly, which potentially may be because of the Zhang et al.

BIOCHEMISTRY

Fig. 6. HSEE investigations of ProcM dehydration directionality. (A) The sequence of the ProcA3.2 core peptide after treatment with endoproteinase GluC. The Ser/Thr residues are highlighted in purple. The black arrow shows the dehydration direction on ProcA3.2 catalyzed by ProcM. (B) HSEE analysis of onefold dehydrated ProcA3.2. Each number represents a structure with a different Ser/Thr being dehydrated using the numbering of A. The highest Hs value was found for the Thr labeled 3 (green asterisk in A and B). (C) HSEE analysis of twofold dehydrated ProcA3.2. Each number represents a structure with two different Ser/Thr residues escaping dehydration (e.g., 12 represents a structure in which Thr1 and Thr2 escape dehydration). The highest Hs value was found for the structure in which Thr1 and Ser4 were not dehydrated (orange asterisks). (D) HSEE analysis of threefold dehydrated ProcA3.2. Each number represents a structure with the numbered Ser/Thr escaping dehydration. The highest Hs value was observed for the structure in which Thr1, Thr2, and Thr3 were dehydrated (blue asterisks).

Conclusion In summary, we demonstrate a simple, easily implemented, and effective method for interpreting tandem MS data of RiPPs. Analysis of several spectra simultaneously and automated evaluation of all the hypothetical structures without any bias significantly improves the accuracy and efficiency of interpretation of the tandem MS data. As with all mass spectrometry-based methods, HSEE cannot provide stereochemical information, but the sites and connectivity of PTMs of all RiPPs investigated herein were correctly identified. Some structures with overlapping rings may require an additional chemical step of ring opening, as demonstrated previously (35, 36). The automated structural information provided by HSEE complements recently developed tools for genome mining such as peptidogenomics (37), BAGEL (5), antiSMASH (6), NaPDoS (7), and RiPPquest (8). Although HSEE is particularly well suited

Fig. 7. HSEE investigation of NisB catalysis. (A) NisA sequence and the structure of nisin. The Ser that escapes dehydration (Ser8) is shown in orange. (B) HSEE analysis of eightfold dehydrated NisA. Structures 1–9 correspond to peptides in which one of the Ser/Thr residues 1–9 escapes dehydration. (C) HSEE analysis of onefold dehydrated NisA. Structures 1–9 correspond to peptides in which one of the Ser/Thr residues 1–9 is dehydrated. (D) Tandem MS analysis of twofold dehydrated NisA (collected using trap collision energy linearly increased from 20 to 35 V; parent ions are in the +4 charge state). The ions in the spectra that are characteristic for a certain structure are shown in the same color as the structural drawings in D. Other ions are common to all of the structures and are shown in black. The presence of St14 is not immediately clear from the MS data shown, but it must be present based on the data for threefold dehydrated NisA (b5 ion in SI Appendix, Fig. S18). (E) Dehydration directionality of NisB catalysis determined by HSEE. Each number represents a structure identifying the dehydration sites (e.g., 126 represents a structure in which the residues labeled Thr1, Ser2, and Thr6 are dehydrated). The new dehydration site(s) in each successive round of dehydration are shown in red.

PNAS | August 19, 2014 | vol. 111 | no. 33 | 12035

Materials and Methods

tandem MS spectra were performed using the MaxEnt3 program (Waters) or the Qualbrowser application of Xcalibur (ThermoFisher Scientific). The resulting raw data were first processed to list all of the picked ions, which were selected if their intensities were above a certain level (S/N threshold) defined based on the average ion intensity in a certain range. The picked ion list was then compared with the fragment ion matrix, allowing for summing of the experimentally observed ions for each hypothetical structure and generation of the Hs bar plot. For details of instrumental settings, procedures for gene cloning, peptide expression and purification, in vitro biochemical assays, lists of hypothetical structures, and methods for data processing, please see SI Appendix. SI Appendix also contains the R scripts for HSEE analysis.

High-resolution liquid chromatography (LC)-MS and LC-MS/MS were carried out either using a Synapt electrospray ionization quadrupole TOF mass spectrometry system (Waters) equipped with an Acquity Ultra Performance Liquid Chromatography (UPLC) system (Waters), or using a ThermoFisher Scientific LTQ-FT hybrid linear ion trap connected directly to an Agilent 1200 HPLC system with an autosampler. Deisotoping and deconvolution of

ACKNOWLEDGMENTS. We thank Stefano Donadio (New Anti-Infectives Consortium Srl) for a sample of deschloromicrobisporicin and Mr. Subha Mukherjee (University of Illinois at Urbana–Champaign) for providing the MS/MS data of Pcn2.8 and Pcn3.3 derivatives. This work was supported by National Institutes of Health Grants GM58822 (to W.A.v.d.D.) and GM097142 (to D.A.M.).

1. Arnison PG, et al. (2013) Ribosomally synthesized and post-translationally modified peptide natural products: Overview and recommendations for a universal nomenclature. Nat Prod Rep 30(1):108–160. 2. Baltz RH (2006) Marcel Faber Roundtable: Is our antibiotic pipeline unproductive because of starvation, constipation or lack of inspiration? J Ind Microbiol Biotechnol 33(7):507–513. 3. Li JW, Vederas JC (2009) Drug discovery and natural products: End of an era or an endless frontier? Science 325(5937):161–165. 4. Caboche S, et al. (2008) NORINE: A database of nonribosomal peptides. Nucleic Acids Res 36(Database issue):D326–D331. 5. van Heel AJ, de Jong A, Montalban-Lopez M, Kok J, Kuipers OP (2013) BAGEL3: Automated identification of genes encoding bacteriocins and (non-)bactericidal posttranslationally modified peptides. Nucleic Acids Res 41(Web Server issue):W448–W453. 6. Blin K, et al. (2013) antiSMASH 2.0—a versatile platform for genome mining of secondary metabolite producers. Nucleic Acids Res 41(Web Server issue, W1):W204–W212. 7. Ziemert N, et al. (2012) The natural product domain seeker NaPDoS: A phylogeny based bioinformatic tool to classify secondary metabolite gene diversity. PLoS One 7(3):e34064. 8. Mohimani H, et al. (2014) Automated genome mining of ribosomal peptide natural products. ACS Chem Biol 9(7):1545–1551. 9. Duncan MW, Aebersold R, Caprioli RM (2010) The pros and cons of peptide-centric proteomics. Nat Biotechnol 28(7):659–664. 10. Begley M, Cotter PD, Hill C, Ross RP (2009) Identification of a novel two-peptide lantibiotic, lichenicidin, following rational genome mining for LanM proteins. Appl Environ Microbiol 75(17):5451–5460. 11. Shenkarev ZO, et al. (2010) Isolation, structure elucidation, and synergistic antibacterial activity of a novel two-component lantibiotic lichenicidin from Bacillus licheniformis VK21. Biochemistry 49(30):6462–6472. 12. McClerren AL, et al. (2006) Discovery and in vitro biosynthesis of haloduracin, a twocomponent lantibiotic. Proc Natl Acad Sci USA 103(46):17243–17248. 13. Cooper LE, McClerren AL, Chary A, van der Donk WA (2008) Structure-activity relationship studies of the two-component lantibiotic haloduracin. Chem Biol 15(10): 1035–1045. 14. Castiglione F, et al. (2007) A novel lantibiotic acting on bacterial cell wall synthesis produced by the uncommon actinomycete Planomonospora sp. Biochemistry 46(20): 5884–5895. 15. Maffioli SI, et al. (2009) Structure revision of the lantibiotic 97518. J Nat Prod 72(4): 605–607. 16. Böcker S, Letzel MC, Lipták Z, Pervukhin A (2009) SIRIUS: Decomposing isotope patterns for metabolite identification. Bioinformatics 25(2):218–224. 17. Pesavento JJ, Kim YB, Taylor GK, Kelleher NL (2004) Shotgun annotation of histone modifications: A new approach for streamlined characterization of proteins by top down mass spectrometry. J Am Chem Soc 126(11):3386–3387. 18. Oman TJ, Boettcher JM, Wang H, Okalibe XN, van der Donk WA (2011) Sublancin is not a lantibiotic but an S-linked glycopeptide. Nat Chem Biol 7(2):78–80. 19. Knerr PJ, van der Donk WA (2012) Discovery, biosynthesis, and engineering of lantipeptides. Annu Rev Biochem 81:479–505. 20. Piper C, Cotter PD, Ross RP, Hill C (2009) Discovery of medically significant lantibiotics. Curr Drug Discov Technol 6(1):1–18. 21. Castiglione F, et al. (2008) Determining the structure and mode of action of microbisporicin, a potent lantibiotic active against multiresistant pathogens. Chem Biol 15(1):22–31.

22. Foulston LC, Bibb MJ (2010) Microbisporicin gene cluster reveals unusual features of lantibiotic biosynthesis in actinomycetes. Proc Natl Acad Sci USA 107(30):13461–13466. 23. Sit CS, Yoganathan S, Vederas JC (2011) Biosynthesis of aminovinyl-cysteine-containing peptides and its application in the production of potential drug candidates. Acc Chem Res 44(4):261–268. 24. Bairoch A, Apweiler R (1996) The SWISS-PROT protein sequence data bank and its new supplement TREMBL. Nucleic Acids Res 24(1):21–25. 25. Li B, et al. (2010) Catalytic promiscuity in the biosynthesis of cyclic peptide secondary metabolites in planktonic marine cyanobacteria. Proc Natl Acad Sci USA 107(23): 10430–10435. 26. Zhang Q, Yu Y, Vélasquez JE, van der Donk WA (2012) Evolution of lanthipeptide synthetases. Proc Natl Acad Sci USA 109(45):18361–18366. 27. Melby JO, Nard NJ, Mitchell DA (2011) Thiazole/oxazole-modified microcins: Complex natural products from ribosomal templates. Curr Opin Chem Biol 15(3):369–378. 28. Melby JO, Dunbar KL, Trinh NQ, Mitchell DA (2012) Selectivity, directionality, and promiscuity in peptide processing from a Bacillus sp. Al Hakam cyclodehydratase. J Am Chem Soc 134(11):5309–5316. 29. Dunbar KL, Melby JO, Mitchell DA (2012) YcaO domains use ATP to activate amide backbones during peptide cyclodehydrations. Nat Chem Biol 8(6):569–575. 30. Kelleher NL, Hendrickson CL, Walsh CT (1999) Posttranslational heterocyclization of cysteine and serine residues in the antibiotic microcin B17: Distributivity and directionality. Biochemistry 38(47):15623–15630. 31. Shi Y, Yang X, Garg N, van der Donk WA (2011) Production of lantipeptides in Escherichia coli. J Am Chem Soc 133(8):2338–2341. 32. Lee MV, et al. (2009) Distributive and directional behavior of lantibiotic synthetases revealed by high-resolution tandem mass spectrometry. J Am Chem Soc 131(34): 12258–12264. 33. Garg N, Salazar-Ocampo LM, van der Donk WA (2013) In vitro activity of the nisin dehydratase NisB. Proc Natl Acad Sci USA 110(18):7258–7263. 34. Lubelski J, Khusainov R, Kuipers OP (2009) Directionality and coordination of dehydration and ring formation during biosynthesis of the lantibiotic nisin. J Biol Chem 284(38):25962–25972. 35. Martin NI, et al. (2004) Structural characterization of lacticin 3147, a two-peptide lantibiotic with synergistic activity. Biochemistry 43(11):3049–3056. 36. Lohans CT, Vederas JC (2014) Structural characterization of thioether-bridged bacteriocins. J Antibiot (Tokyo) 67(1):23–30. 37. Kersten RD, et al. (2011) A mass spectrometry-guided genome mining approach for natural product peptidogenomics. Nat Chem Biol 7(11):794–802. 38. Tsur D, Tanner S, Zandi E, Bafna V, Pevzner PA (2005) Identification of post-translational modifications by blind search of mass spectra. Nat Biotechnol 23(12): 1562–1567. 39. Nguyen DD, et al. (2013) MS/MS networking guided analysis of molecule and gene cluster families. Proc Natl Acad Sci USA 110(28):E2611–E2620. 40. Guthals A, Watrous JD, Dorrestein PC, Bandeira N (2012) The spectral networks paradigm in high throughput mass spectrometry. Mol Biosyst 8(10):2535–2544. 41. Yang JY, et al. (2013) Molecular networking as a dereplication strategy. J Nat Prod 76(9):1686–1699. 42. Rasche F, Svatos A, Maddula RK, Böttcher C, Böcker S (2011) Computing fragmentation trees from tandem mass spectrometry data. Anal Chem 83(4):1243–1251. 43. Rasche F, et al. (2012) Identifying the unknowns by aligning fragmentation trees. Anal Chem 84(7):3417–3426.

for genome mining for new RiPPs because of the direct link between final structure and the gene-encoded precursor peptide, the method in principle can also be used for structural analysis of other compounds such as nonribosomal peptides, and is complementary to other tandem MS-based methods that aim to ultimately determine structures in high throughput, including blind search (38), network analysis (39–41), and fragmentation tree construction (42, 43). Moreover, HSEE greatly facilitates investigations of RiPP biosynthetic mechanisms, as illustrated by several examples in this study.

12036 | www.pnas.org/cgi/doi/10.1073/pnas.1406418111

Zhang et al.