Mass Spectrometry, Proteomics, Data Mining ...

2 downloads 0 Views 495KB Size Report
G.; Hambly, B.D.; Jeremy, R.W. Proteomics, 2005, 5(5), 1395. [130] Pan, Y. .... Holder, A.A.; Sinden, R.E.; Yates, J.R.; Carucci, D.J. Nature,. 2002, 419(6906), 520.
Anti-Infective Agents in Medicinal Chemistry, 2007, 6, 000-000

1

Mass Spectrometry, Proteomics, Data Mining Strategies and Their Applications in Infectious Disease Research Andreas Evangelou1, Limor Gortzak-Uzan1, Igor Jurisica2,3,4 and Thomas Kislinger1,3,* Ontario Cancer Institute, Divisions of Cancer Genomics and Proteomics1 and Signaling Biology2, Toronto Medical Discovery Tower, Toronto, Canada; University of Toronto, Departments of Medical Biophysics3 and Computer Science4 , Toronto, Canada Abstract: The ultimate goal of proteome research is the comprehensive description of all proteins present in a given sample using qualitative, quantitative and functional metrics. Traditionally, protein mixtures were first separated by twodimensional gel electrophoresis and spots of interest were excised, in-gel digested and analyzed by mass spectrometry (MS). In most cases, protein identification was done by MALDI-TOF-MS (matrix-assisted laser desorption/ionization time-of-flight). The methodology is time consuming and rarely leads to a comprehensive description of the analyzed proteome. Over the last years shot-gun expression profiling methodologies were developed and can identify thousands of proteins in complex biological samples in a single experiment. We will provide a short historic overview of proteome research and mass spectrometry technologies currently used in the systems biology community. In particular, we will summarize the developments and applications of shot-gun proteomics and allied computational data mining tools to medical research and infectious disease research.

INTRODUCTION Proteomics studies structure and function of proteins in a biological setting. The proteome defines the entire collection of proteins expressed or encoded by a genome. The contents of a proteome can differ in various tissue types and organelles, and change as a result of aging, disease, drug treatment, or environmental effects [1-3]. The aim of systems biology is to use proteomics to identify all the proteins made in a given cell, tissue or organism, and to determine how these proteins form networks in order to develop better therapeutic strategies to target them in disease conditions [48]. In proteomics protein identification is determined by using mass spectrometers to weigh the unique mass of each amino acid, the individual subunits making up proteins. The proteins are first digested into smaller pieces (peptides) that are identified based on the unique mass of their individual amino acid composition, allowing researchers to determine their sequence and identify the protein of which they encompass. SUMMARY AND HISTORY OF PROTEOMICS In proteomics, mass spectrometers (MS) are analytical tools used for measuring the molecular mass of a protein or peptide. Mass spectrometers are used in biotechnology for the analysis of proteins, peptides, and oligonucleotides [9]; in pharmaceutical research for drug discovery, combinatorial chemistry, pharmacokinetics, and drug metabolism [10]; in clinical research for neonatal screening, haemoglobin analysis, and drug testing; in environmental sciences for pollutants *Address correspondence to this author at the Ontario Cancer Institute, Toronto Medical Discovery Tower, 101 College Street, Toronto, Ontario M5G 1L7, Canada; Tel: 416-581-7627; Fax. 416-581-7629; E-mail: [email protected]

1871-5214/07 $50.00+.00

(PAHs and PCBs), water quality, and food contamination [11]; and geologically for oil composition determination [12]. In general, mass spectrometry has become a valuable and powerful tool in proteomics research for identifying proteins by database searching from a proteolytic fragment, and in structural analysis for determining protein folding and protein-ligand complex formation [13, 14]. Although the concept of measuring all the proteins produced in an organism had been proposed in the early 1980s [15], the word “proteome” wasn’t coined until late 1994 by Marc R. Wilkins [16], vice president and head of bioinformatics at Proteome Systems in Sydney, Australia. To understand the proteome, researchers not only needed to identify all of its protein constituents but also to better understand the characteristics of all these proteins [17]. Several methods had been employed in the past 10 years to identify the proteins contained in a sample mixture [17-19]. For example, proteins have a given characteristic size and charge, that shows up as a discrete spot in 2-dimensional gel electrophoresis (2DE). Mass spectrometers on the other hand, which employ magnets or electrical fields to resolve distinct proteins according to the masses of their constituent atoms, display peaks on a graph. However, neither 2DE technique nor MS alone was ideal as very large and very small proteins were difficult to distinguish, and MS sometimes failed to detect low abundance proteins. To tackle this problem, the peptide mass fingerprint hypothesis was established in 1993 [20,21]. Since a protein comprises a set of amino acids arranged in a specific sequence, cutting it in a predictable manner results in pieces that form its fingerprint. Furthermore, in 1988, a rapid evolution began in mass spectrometry with the introduction of new ionization techniques, electrospray ionization (ESI; [22]) and matrix assisted laser desorption/ionization (MALDI; © 2007 Bentham Science Publishers Ltd.

2 Anti-Infective Agents in Medicinal Chemistry, 2007, Vol. 6, No. 2

[23,24]). These new techniques provided detection of proteins down to femtomolar range and a mass range of over 100 kDa. However, it wasn’t until 1991 that the first commercial instruments made these new techniques readily available. The impact of peptide mass fingerprinting on protein identifications in sequence databases was momentous as the quantity of protein required was significantly reduced [20,21,25]. Then in 1993, the introduction of capillary Liquid Chromatography-MS (LC-MS) significantly improved the number of peptides observed, yielding greater sequence coverage and improved protein identification [26,27]. The advent of these two new techniques also led to identification of proteins from a single peptide, based on cross-correlation of a predicted spectrum with the actual fragment ion spectrum, using a computer-based program named SEQUEST that allowed completely automated protein identifications from a set of tandem MS/MS spectra [28]. This ability became the basis for “shot-gun” proteomics years later in 2001 [29-31], and demonstrated the importance of MS as a tool for the characterization of novel proteins. Thus, by the mid 1990s, the common approach to protein identification included the following: one or more methods for protein separation, protein digestion, peptide separation, mass analysis, and database searching [32]. MASS SPECTROMETRY-BASED PROTEOMICS The use of MS has greatly aided proteomics in attempting to accurately identify all the proteins expressed in a cell or tissue, and determining what proteins are expressed in cancer cells that are not expressed in healthy cells, and thus leading to further understanding of diseases and influencing the development of drugs that target these proteins. One of the first techniques used in protein separation was twodimensional gel electrophoresis (2DE) [33,34]. Separated proteins formed spots on the gel, that were than cut out individually, digested and analyzed by MS. High-performance

Evangelou et al.

liquid chromatography (HPLC) has also been employed, in which a mixture of proteins is separated by being passed through a column containing inert beads, which slow the proteins to different extents based on their chemical properties [35]. Furthermore, unlike 2DE, chromatography allows for continuous processing of cellular samples, reducing the requirement for handling samples and thus speeding up analysis. However, measuring the molecular mass of a protein or peptide by MS requires that ions form in, or transfer into, the gas-phase, which are then analyzed by a mass analyzer Fig. (1). Mass spectrometry has also been used to determine if proteins have been modified by the addition of functional groups such as phosphates [36,37], sugars [3841], ubiquitinilation [42], or sumoylation [43,44]. Mass Spectrometers Mass spectrometers can be divided into three fundamental parts: ionization source, the mass analyzer, and the detector Fig. (1). The sample is introduced into the ionization source of the instrument where the sample molecules are ionized. These ions are extracted into the analyzer region of the MS where they are separated according to their mass-tocharge (m/z) ratios [45]. The separated ions are detected and the signal is sent to a computer system where the m/z ratios are stored together with their relative abundance for presentation as an m/z spectrum [45]. The sample is usually inserted directly into the ionization source, or undergoes some type of chromatography en route to the ionization source [45]. This latter method of sample introduction involves the mass spectrometer being coupled directly to a high pressure liquid chromatography (HPLC), gas chromatography (GC) or capillary electrophoresis (CE) separation column, and hence the sample is separated into a series of components, which then enter the mass spectrometer sequentially for individual analysis [45].

Fig. (1). Mass spectrometry is an analytical technique used to measure the mass-to-charge ratio (m/z) of ions. The ion source ionizes the material under analysis (analyte). The ions are transported by magnetic or electrical fields to the mass analyzer. Two techniques used with liquid or solid biological samples include electrospray ionization (ESI) and matrix assisted laser desorption/ionization (MALDI). Mass Analyzers separate ions according to their m/z. Detectors record the charge induced or current produced when an ion passes by or hits a surface.

Mass Spectrometry, Proteomics, Data Mining Strategies

The ionization process occurs by adding (positively charged ions) or removing protons (negatively charged ions) allowing the analyte to accelerate or being pulled toward an opposite charge. Although there are a number of mass analyzers available, the most commonly used are quadrupoles, time-of-flight (TOF), magnetic sectors, Fourier transform, and ion traps [45]. Thus, the main function of the mass analyzer is to separate (resolve) the ions formed in the ionization source of the MS according to their m/z ratios. MS detectors monitor the ion current, amplify it and transmit the signal to the data system where it can be recorded in the form of mass spectra. The m/z values of the ions are plotted against their intensities to show the number of components in the sample and the molecular mass of each component. Methods of Sample Ionization The ionization method used depends on the type of sample under investigation and the MS available. Mass spectrometer ionization methods include: Atmospheric pressure chemical ionization (APCI), chemical ionization (CI), electron impact ionization (EI), electrospray ionization (ESI), Fast atom bombardment (FAB), Field Desorption / Field Ionization (FD/FI), Matrix Assisted Laser Desorption Ionization (MALDI), Surface-enhanced laser desorption ionization (SELDI), and Thermospray Ionization (TSP) (Washington University Center for Biomedical and Bioorganic Mass Spectrometry: An NIH supported Resource Center, http:// www.chemistry.wustl.edu/~msf/). The ionization methods used for the majority of biochemical analyses are ESI [46,47] and MALDI [48]. MALDI-TOF Matrix assisted laser desorption ionization (MALDI) [48] deals well with thermolabile, non-volatile organic compounds especially those of high molecular mass, and is used successfully for proteins and peptides. The technique is based on the bombardment of sample molecules with a pulsed laser to bring about sample ionization. The sample is pre-mixed with a highly absorbing matrix compound such as 3,5-dimethoxy-4-hydroxycinnamic acid (sinapinic acid), cyano-4-hydroxycinnamic acid (alpha-cyano or alphamatrix) or 2,5-dihydroxybenzoic acid (DHB). The matrix transforms the laser energy into excitation energy leading to a sputtering of analyte (e.g., protein sample) and matrix ions from the surface of the mixture Fig. (2A). The MALDI ion source is mainly coupled with TOF mass analyzers. MALDI-TOF mass spectrometers can determine the mass of a protein or peptide with high degree of accuracy. The masses of the various peptides generated by digestion of an isolated protein with an enzyme of known cleavage specificity can identify the protein. Using computational algorithms such as MASCOT together with MALDITOF mass spectrometry peptide analysis enables characterizing proteins based on their peptide mass fingerprint. A target protein is identified by taking the collected MS spectra and generating a list of proteolytic (peptide) fragments that match the masses calculated from the same proteolytic digestion of each entry in a sequence database [49]. More recently TOF/TOF analyzers have been developed allowing for the measurement of tandem mass spectra in MALDI-MS. This significantly increased the confidence in protein identifica-

Anti-Infective Agents in Medicinal Chemistry, 2007, Vol. 6, No. 2

3

tion compared to simple peptide fingerprinting. For a more detailed overview of MALDI-TOF-MS we encourage the reader to consult some more specific reviews or book chapters [50-52]. SELDI-TOF Another technology involved in MS analysis of protein mixtures is surface-enhanced laser desorption ionization – time of flight (SELDI-TOF) [53,54]. This method uses stainless steel or aluminum-based supports, or chips, engineered with chemical (hydrophilic, hydrophobic, pre-activated, normal-phase, immobilized metal affinity, and cationic or anionic exchange) or biological (antibody, antigen binding fragments, DNA, enzyme, or receptor) bait surfaces [53-56]. These surfaces allow for differential capture of proteins based on their intrinsic properties. Solubilized tissue or body fluids in μl-volumes are directly applied to these surfaces, where proteins with affinities to the bait surface bind. The bound proteins are laser desorbed and ionized for MS analysis. As mixtures of proteins are analyzed within different samples, a unique sample fingerprint or signature results for each sample tested [53,57]. Thus, SELDI analysis can produce patterns of masses rather than protein identification. These mass spectral patterns can be used to differentiate patient samples from one another, such as from normal versus diseased. Although, this technology was initially used specifically to determine the mass-to-charge ratio of proteins within a sample, more recently SELDI-TOF instruments coupled to tandem mass spectrometers have been developed to enable protein identification and quantitation [58-60]. ESI During standard electrospray ionization, the sample is dissolved in a polar, volatile solvent and pumped through a narrow capillary (75-150 m) at a flow rate of 1 L/min – 1 ml/min [46,47]. A high voltage of 2 - 4 kV is applied to the tip of the capillary, situated within the ionization source of the MS [61]. As a consequence of this strong electric field, the sample emerging from the tip is dispersed into an aerosol of highly charged droplets. This electrospray emerging from the capillary is directed towards the MS by a co-axially placed N2-source (drying gas), which passes across the ionization source. Eventually the charged ions are released and enter the MS analyzer [62]. The triple-quadrupole, ion-trap, hybrid-quadrupole TOF mass analyzers are most frequently used with ESI. More recently high mass accuracy mass analyzers, such as the FTICR and Orbitrap have been introduced. In addition to measuring peptide mass, both the ESI and MALDI-TOF/TOF methods can also be used to isolate specific ions from a mixture on the basis of their m/z ratio [63]. The ions are fragmented in the gas phase within the instrument by a process called collision-induced dissociation (CID) with an inert gas molecule such as N2 or He, allowing the recording of MS/MS spectra. The MS/MS spectrum of a peptide is characteristic of its amino acid sequence Fig. (2B). Nanospray-ESI Nanospray ionization [64] is a low flow rate version of electrospray ionization. A small volume (1-4 μL) of the

4 Anti-Infective Agents in Medicinal Chemistry, 2007, Vol. 6, No. 2

Evangelou et al.

Fig. (2). Two techniques used to ionize liquid or solid biological materials include matrix-assisted laser desorption/ionization (MALDI) and electrospray ionization (ESI). Both are soft ionization techniques allowing for the ionization of biomolecules such as proteins, peptides and sugars. A) In MALDI, ionization is triggered by a laser beam (normally a nitrogen-laser). A matrix material is used to protect the biomolecule from being destroyed by direct contact with the laser beam. This matrix-analyte solution is spotted onto a metal plate (target). The solvent vaporizes, leaving only the recrystallized matrix with proteins spread throughout the crystals. The laser is fired at the co-crystals on the MALDI target. The matrix material absorbs the energy and transfers part of its charge to the analyte (e.g., protein or peptide) and thus ionizing them. MALDI usually produces singly and doubly charged ions. B) In ESI, a volatile liquid containing the analyte is passed through a micro-capillary. The analyte exists as ions in solution (protonated or anions). As the liquid is passed out of the capillary it forms an aerosol of small droplets. As the small droplets evaporate, the charged analyte molecules are forced closer together and the droplets disperse. The ions continue along to the mass analyzer of a mass spectrometer.

sample is dissolved in a suitable volatile solvent at ~1-10 pmol/μL, and transferred to a miniature sample vial. The flow rate of the solute is at 30 – 1000 nL/min, and therefore far less sample is consumed than in standard ESI. Hence unseparated peptide mixtures are sprayed into the mass spectrometer at very low flow rates and detected at sensitivities not achieved by ESI-MS [65,66]. Later this approach was combined with chromatographic separations at low flow and mixtures of peptides were supplied to the mass spectrometer from an online, capillary, high-pressure liquid chromatography (HPLC) system, an approach referred to as LC-MS. Nanospray-ESI is the choice for the majority of proteomic applications because it significantly increases sensitivity and reduces the amount of sample required compared with ESI. LC-MS Many modern day proteomics labs use liquid chromatography (LC) ion-trap mass spectrometers [30,31,67]. The groundbreaking work for this gel-independent approach to proteomics was established by Hunt et al. in 1992 by demonstrating the ability of LC-MS to handle extremely complex peptide mixtures generated by proteolysis of complex protein samples [68]. Narrow fused silica capillary (50-150 m inner diameter) chromatography columns are pulled to fine tips (about 5 m in diameter) and are costumed packed with a wide array of chromatography resins such as Reverse Phase (RP-18) and Ion Exchange Resins and placed in line with HPLC pumps [69,70]. The peptide mixtures are separated and directly eluted into the MS, which records the m/z ratio of peptides as they elute from the column over time. The most abundant peaks are isolated and fragmented by CID resulting in sequence-dependent tandem mass spectra

(MS/MS) [71]. The combination of LC-MS/MS and sequence database searching is widely used for the analysis of complex peptide mixtures generated from the proteolysis of samples containing mixtures of proteins. This approach is referred to as “shot-gun” proteomics and can catalog thousands of components contained in samples isolated from very different sources. However, this method is somewhat limited because of the difficulty of detecting and analyzing all of the peptides in a sample and the challenge of processing tens of thousands of CID spectra that are generated. In general, a protein digested with trypsin will generate 30-50 different peptides, and thus a tryptic digest of a cell proteome will generate a peptide mixture containing hundreds of thousands of peptides, and hence a major drawback on how to resolve and analyze such a complexity in a reasonable amount of time. Multidimensional (MudPIT)

Protein

Identification

Technology

Shot-gun proteomic analysis is relatively high throughput as compared to other technologies such as intact proteins sequencing or 2D-gel-based protein analysis. The elimination of 2DE separation simplifies sample handling and increases overall data throughput. Furthermore, Yates and colleagues developed an alternative gel-free protein profiling technology termed Multidimensional Protein Identification Technology (MudPIT) that can routinely identify up to thousands of proteins in a single run [29-31]. Briefly, complex protein samples are digested with endoproteinase Lys-C and trypsin into complex peptide mixtures that are separated by 2-dimensional microcapillary chromatography, and analyzed by tandem mass spectrometry Fig. (3). The microcapillary

Mass Spectrometry, Proteomics, Data Mining Strategies

column is packed with two independent chromatography phases, a strong cation exchanger (SCX), and reversed-phase matrix material (RP-18). As the peptides elute from the column, they are directed into an ion-trap MS, where they are mass selected, and fragmented [72,73]. The resulting tandem mass spectra are searched against protein sequence databases using computational search algorithms that allow MudPITs to routinely identify thousands of proteins in a single run and potentially detect hundreds of low-abundance proteins, even in the presence of excess high-abundance proteins [31,7476]. However, despite this novel technical advance in proteomics and mass spectrometry, sample complexity remains a fundamental limiting factor in shot-gun profiling. In order to achieve complete proteome coverage sample is necessary along with shot-gun profiling. Quantitative Proteomics As proteomics technologies become more readily available, the importance of comparing the abundances of proteins in two or more complex samples becomes more compelling. Although there are various methods available for measuring the relative abundances of proteins or modifications in two or more samples, quantitative proteomics approaches have been developed to comprehensively identify and quantify proteins [77,78].

Anti-Infective Agents in Medicinal Chemistry, 2007, Vol. 6, No. 2

5

1. ISOTOPE BASED METHODS An important goal of functional proteomics is to globally profile changes in protein abundances. Although comparative label-free methods have recently been established for quantifying proteins by shot-gun proteomics [79], several isotope labeling methods are still utilized to accurately quantitate proteins by MS Fig. (4). Typically, stable (e.g., nonradioactive) “heavy” isotopes of hydrogen (2H), carbon (13C) or nitrogen (15N) are incorporated into one sample while the other one is labeled with corresponding “light” isotopes (e.g., 1 H, 12C or 14N). The two samples are mixed before the analysis. Peptides derived from the different samples can be distinguished due to their mass difference. Any peptides of identical sequence (sister peptides) derived from two differentially labeled protein samples and differing in mass appear as doublets in the acquired MS-spectra. The relative abundance of the parental protein is thus derived from the ratios of the ion intensities of the two sister peptides. The most popular methods for isotope labeling are i) SILAC (stable isotope labeling with amino acids in cell culture), ii) ICAT (isotope-coded affinity tagging) and ICPL (isotope-coded protein label), and iii) iTRAQ (isotope tags for relative and absolute quantitation) Fig. (4).

Fig. (3). High-Throughput Proteomics using MudPIT. Peptides from a complex mixture are eluted from a biphasic (SCX + RP-18) microcapillary column directly into an ESI MS. The tandem mass spectra (MS/MS) contain fragmentation patterns specific to the amino acid sequence, generated from peptides after elution into MS. A number of different algorithmic approaches have been described to identify peptides from MS/MS data.

6 Anti-Infective Agents in Medicinal Chemistry, 2007, Vol. 6, No. 2

i. SILAC This is a simple and straightforward approach for in vivo incorporation of a label into proteins for MS-based quantitative proteomics. The method relies on the incorporation of amino acids with substituted stable isotopic nuclei. In a typical experiment, two cell populations are grown in culture media that are identical except that one of them contains a “light” and the other a “heavy” form of a particular amino acid [80]. When the labeled analog of an amino acid is supplied to cells in culture instead of the natural amino acid, it is incorporated into all newly synthesized proteins. After a number of cell divisions, each instance of this particular amino acid is replaced by its isotope labeled analog. Since there is hardly any chemical difference between the labeled amino acid and its natural isotopes, the cells behave exactly like the control cell population Fig. (4B). ii. ICAT/ICPL The ICAT and ICPL methods are capable of high throughput quantitative proteome profiling on a global scale. Isotope-coded affinity tagging (ICAT), originally developed by Ruedi Aeborsold’s lab, utilizes stable isotope labeling for quantitative analysis of paired protein samples, followed by separation and identification of proteins by LC-MS [81]. With the ICAT approach, differential isotope labeling of specific cysteine residues within proteins with biotin containing tags, allowed the isolation of modified peptides by avidin affinity chromatography [81,82]. These tags are: biotin tag; isotopically labelled linker chain with “light” and “heavy” forms; and a reactive group that binds and modifies cysteine residues of the protein (iodoacetamide alkylation). The labeled cysteinyl residue is captured and analyzed by LCMS/MS to determine the relative abundance for each peptide-pair. The strength of this technique lies in its ability to allow quantification and identification within a single analysis that can be applied to samples from any source as it does not require metabolic labeling [78]. However, this procedure only targets cystein residues, and certain proteins and peptides can be missed Fig. (4A). The ICPL approach is a more novel approach based on isotope labeling of all free amino groups in intact proteins by an amine specific reagent (N-nictonoyloxy-succinimide) to increase sensitivity of MS [83,84]. Protein mixtures to be analyzed are first reduced and alkylated, then differentially labelled with the light or heavy isotope form of the ICPL reagent, and subjected to high throughput MS/MS [83]. Compared to the ICAT reagent, ICPL results in higher sequence coverage and thus more information about posttranslational modifications and isoforms are obtained [83]. iii) iTRAQ Recently, an improved approach analogous to ICAT or ICPL has been developed by Ross et al. (2004) and is commercially available from Applied Biosystems (Foster City, CA), called iTRAQ [85]. The technique is based upon chemically tagging the peptides generated from protein digests of multiple different samples. Each individual sample is reduced, alkylated, and enzymatically digested with trypsin. The resulting peptide pools are then labelled with one

Evangelou et al.

member of a multiplex set, respectively, of iTRAQTM Reagents (Applied Biosystems), in a parallel set of reactions, combined, fractionated by LC, and subsequently analyzed by tandem mass spectrometry [86]. The iTRAQTM reagents are isobaric tags consisting of a charged reporter mass group (114-117 Da), an amine-specific peptide reactive group, and a neutral balance group (31-28 Da) to maintain an overall mass of 145 Da but in different rearrangements. When reacted with a peptide, the tag forms an amide linkage to any peptide amine (N-terminal or -amino group of lysine). There are four tags available enabling four different conditions to be multiplexed together in one experiment. Fragmentation of the tags attached to the peptides generates a low molecular mass reporter ion that is unique to the tag used to label each of the digests. Thus the resulting mixture gives rise to a set of single unresolved additive precursor ions in MS, allowing for the enhancement of individual protein(s) that may be in low abundance in any given sample. The four reporter group ions appear as distinct masses between m/z 114-117, while the remainder ions remain as additive isobaric signals. Database searching of the fragmentation data from the peptides identifies the labeled peptides and their corresponding proteins. Measurement of the intensity of the reporter ion, enables relative quantification of the peptides in each digest and hence quantitation of the proteins from where they originate Fig. (4C). 2. LABEL-FREE METHODS Despite the reliability and accuracy of isotope labeling methods to detect and quantitate low abundance proteins, their disadvantages include cost of isotopic labels and requirement for pairwise comparisons between samples [79]. Recently, label-free protein quantitation methods have become promising alternatives. These quantitative proteomic strategies do not involve the use of stable isotopes but rather, take advantage of highly reproducible liquid chromatography [87] to extract quantitative data of peptide abundances between multiple samples based on peptide correlation profiling [88]. Distribution curves are generated from the intensities of tens of thousands of peptides across parallel analyses of consecutive fractions. Two methods have evolved – Mass Spectral Peak Integration and Mass Spectral Counting – that demonstrate good correlation with the relative abundance of proteins in complex samples. i. Mass Spectral Peak Integration The protein ratios from spectral peak area intensities can be determined by calculating the ratios of ion intensities for peptides matched between different experiments to measure protein change [79]. Bondarenko et al. (2002) and others have demonstrated that the mass spectral peak area intensities of peptide ions can correlate well with protein abundances in complex samples by demonstrating linear responses of peptide ion peak areas [87,89,90]. ii. Spectral Counting A sensitive method for detecting proteins that undergo changes in abundance, termed spectral counting, compares the number of MS/MS spectra assigned to each protein [79].

Mass Spectrometry, Proteomics, Data Mining Strategies

Anti-Infective Agents in Medicinal Chemistry, 2007, Vol. 6, No. 2

7

Fig. (4). Graphic demonstration of the work flow of three advanced proteomic methodologies developed to utilize stable isotope technology for quantitative protein profiling by mass spectrometry. A) Isotope Coded Affinity Tags (ICAT) can be used to label two protein samples with chemically identical tags that differ only in isotopic composition (heavy and light pairs), contain a thiol-reactive group to covalently link to cysteine residues, and a biotin moiety. The ICAT-labeled fragments can be separated, and quantified by LC-MS analysis. B) A similar approach to quantify proteins in mammalian cells is Stable Isotope Labeling by Amino Acids in Culture (SILAC). Isotopic labels are incorporated into proteins by metabolic labeling in the cell culture. Cell samples to be compared are grown separately in media containing either a heavy (green) or light (blue) form on an essential amino acid such as L-Lysine that cannot be synthesized by the cell. C) iTRAQ is a unique approach that can be used to label protein samples with four independent Tag reagents of the same mass that can give rise to four unique reporter ions (m/z = 114-117) upon fragmentation in MS/MS. This recorded data can be subsequently used to quantify the four different samples, respectively.

The spectral counts for peptides shared between isoforms are considered separately due to ambiguity in protein assignment [79]. Spectral counts of standard proteins added to yeast extracts for example, showed linearity over 2 orders of magnitude with high correlation to the relative protein concentration [91]. The advantage of spectral counting is that relative abundances of different proteins can be measured without the use of expensive isotope labels or sophisticated peak integration algorithms. More recently, the group of Washburn has demonstrated good correlation between quantitative

results obtained by spectral counting and metabolic labeling with stable isotopes [92]. BIOINFORMATICS AND DATA MINING IN PROTEOMICS Tandem mass spectrometry combined with database searching has presently become the most widely used method for high-throughput peptide and protein identification. However, proteomics generates large datasets and comprehensive mining and interpretation is a major bottleneck in

8 Anti-Infective Agents in Medicinal Chemistry, 2007, Vol. 6, No. 2

modern systems biology. Today mass spectrometry is interfaced with a data-dependent data acquisition of MS/MS fragmentation spectra, which are then searched against protein sequence databases using diverse computational algorithms, such as SEQUEST [28,93], MASCOT [94], X!Tandem [95], or OMSSA [96]. This leads to confidently identifying several hundred proteins in a single sample. 1. DATABASE SEARCH ALGORITHMS First MS data has to be searched to identify the peptides and proteins. Different search algorithms and sequence databases can be used. A summary of databases and search algorithms used in high throughput proteomics are listed in (Tables 1 and 2), respectively. i. SEQUEST SEQUEST correlates uninterpreted tandem mass spectra of peptides with amino acid sequences from any protein and nucleotide database [28,93]. It determines the amino acid sequence and thus the protein(s) that correspond to the mass spectrum being analyzed. Its final output is a list of putative peptide matches and their associated scores: i) Xcorr (crosscorrelation), based on the spectral fit between recorded and generated spectrum; ii) Cn (normalized difference between best match and second highest scoring match); and iii) preliminary ranking based on the number of matched ion peaks [28]. ii. MASCOT MASCOT is a commercially available search engine software that can be used to cross-examine MS data in order to identify proteins from primary sequence databases. The Table 1.

Evangelou et al.

MASCOT search engine was initially developed by Perkins et al. [94] and can search any nucleic or amino acid database in FASTA format. MASCOT uses a probability-based algorithm to guard against false-positives. iii. X! TANDEM X! Tandem is an open source software that can match tandem mass spectra with peptide sequences to identify proteins from an enzymatic digest of a mixture of proteins [95,97]. iv. OMSSA The Open Mass Spectrometry Search Algorithm (OMSSA) efficiently identifies tandem mass peptide spectra by searching protein sequence databases [96]. It takes experimental tandem mass spectra, filters out noise peaks, calculates their m/z ratios, and compares them to calculated values derived from peptides produced by an in silico digestion of a protein sequence library [96]. 2. DATA VALIDATION Several computer programs have been developed to rigorously assess and statistically validate the quality of the results obtained from database search algorithms. As the instruments become more sensitive and identify thousands of proteins, more emphasis must be given to ensure confident separation of noise from the signal. The statistical validation of search results is very important to minimize the false discovery rate and only report high-confidence data. Programs such as STATQUEST, Peptide/ProteinProphet have been developed to reduce false positive protein identification by post-processing the output from search algorithms. The

Summary of Popular Bioinformatics Resources with Protein Sequence Databases

Resource

Institute

Database(s)

WEBSITE

ExPASy Proteomics Server (Expert Protein Analysis System)

Swiss Institute of Bioinformatics

Swiss-Prot (UniProtKB) TrEMBL (UniProtKB) PROSITE SWISS-2DPAGE ENZYME SWISS-MODEL

http://ca.expasy.org/

EMBL-EBI

European Bioinformatics Institute of the European Molecular Biology Laboratory

Pfam (InterPro) PRINTS (InterPro) Ensembl Vega PANTHER (InterPro)

http://www.ebi.ac.uk/IPI/Databases. html

NCBI

National Center for Biotechnology Information

ENTREZ RefSeq

http://www.ncbi.nlm.nih.gov/gquer y/gquery.fcgi

MRC

Medical Research Council, Imperial College London

MSDB (Mass Spectrometry protein Sequence DataBase)

http://cscfserve.hh.med.ic.ac.uk/msdb.html

Sanger Institute

Welcome Trust Sanger Institute

Ensembl Pfam Vega

http://www.sanger.ac.uk/

Mass Spectrometry, Proteomics, Data Mining Strategies

Table 2.

Anti-Infective Agents in Medicinal Chemistry, 2007, Vol. 6, No. 2

9

Summary Database Search/Scoring Algorithms used to Identify Proteins Using Mass Spectrometry in Bioinformatics and Computational Biology. Large Numbers of MS/MS Peptide Spectra Generated in Proteomics Experiments Require Efficient, Sensitive and Specific Algorithms for Peptide Identification

Algorithm

Description

Website

SEQUEST

Correlates uninterpreted tandem mass spectra of peptides with amino acid sequences from protein and nucleotide databases.

http://www.sequest.org http://fields.scripps.edu/sequest/

MASCOT

Probability-based protein identification by searching sequence databases using mass spectrometry data.

http://www.matrixscience.com

X! TANDEM

Matches tandem mass spectra with peptide sequences from protein databases to identify proteins.

http://www.thegpm.org/TANDEM/

OMSSA

Open Mass Spectrometry Search Algorithm

http://pubchem.ncbi.nlm.nih.gov/omssa/

search against reverse databases further improves the process. Table 3 summarizes the validation tools and their associated web pages. i. STATQUEST Kislinger et al. (2003) developed this statistical algorithm to provide a more rigorous estimate of the accuracy of SEQUEST predictions [98]. STATQUEST uses an empirical probabilistic method for determining the likelihood of each putative peptide match [98]. It performs a statistical analysis of peptide identification scores to estimate the accuracy of identifications. ii. Peptide/ProteinProphet Both PeptideProphet and ProteinProphet algorithms (Table 3) automatically validate peptide and protein assignments, respectively, made on the basis of tandem mass spectra by database search programs such as SEQUEST [99102]. PeptideProphet is used first following analysis of tandem mass spectra and assigned peptides from database search engines, and it is followed by ProteinProphet analysis. PeptideProphet evaluates the database search engine scores and the peptide properties among correct and incorrect peptides, and computes the probability of correct assignment [100]. ProteinProphet evaluates the probability of correct protein identifications on the basis of peptides identified by their corresponding MS/MS spectra by database search algorithms [99]. iii. Randomized/Reversed Sequence Databases Study-specific methods are warranted to estimate the accuracy or false positive rates of peptide and protein identification. Methods have been devised for estimating false positive identification rates based on searches of randomized (reversed and reshuffled) databases [103]. In a recent study, Higdon et al. (2005) have determined that the use of combined searches of a reshuffled database appended to a forward sequence database is necessary for providing quantitative estimates of false positive identification rates [73,103,104].

3. DATA ANNOTATION AND CLUSTERING TOOLS Multiple tools have been developed by proteomic researchers to interpret global proteomic datasets by placing the identified proteins into a biological context. Hierarchical clustering combined with large-scale systematic annotation are useful data analysis techniques to discover and interpret interesting patterns in global protein expression datasets. Table 3 lists a summary of these proteomic annotation tools and their associated internet web pages. i. Clustering Clustering is used to subgroup proteins with similar expression characteristics or patterns that can reveal groups of coexpressed or co-regulated proteins. Clustering offers several advantages: i) identified clusters provide natural structure to data organization, by grouping co-regulated proteins; and ii) focuses on protein function and that co-regulated proteins might be functionally related [105]. Open-source software tools, such as Cluster 3.0 and TreeView are readily available to the systems biology community [106] and are capable of handling most genome wide expression studies. Additional visualization can be supported by for example SelfOrganizing Maps [107], such as in BTSVQ [50,108]. ii. Annotation In order to streamline the data analysis process in genome-wide profiling projects, systematic data annotation is extremely important. The identification of enriched groups of functionally related proteins can aid the biologist in interpreting the obtained data and planning hypothesis-driven follow-up experiments. Furthermore, functional predictions of unannotated proteins, based on their co-expression or clustering with known proteins/functions, is a significant component of modern proteomics. A) GOMiner GoMiner uses the Gene Ontology schema (http://www.geneontology.org) [109] to aid the biological interpretation of proteomic and genomic datasets by categorizing identified proteins and genes into standardized nomenclature of biological processes, molecular functions, and subcellular locations [109]. The knowledge of coherent, bio-

10 Anti-Infective Agents in Medicinal Chemistry, 2007, Vol. 6, No. 2

Table 3.

Evangelou et al.

Summary of Some Useful Bioinformatics Computational Tools

Algorithm

Description

Website

Babelomics

Suite of web tools for functional annotation and analysis of groups of genes/proteins in high throughput proteomics experiments.

http://fatigo.bioinfo.cipf.es/

GoMiner

Gene Ontology Annotation Tool

http://discover.nci.nih.gov/gominer/

BiNGO

Biological Networks Gene Ontology Tool. Open source java tool to determine which GO terms are significantly overrepresented in a set of genes.

http://www.psb.ugent.be/cbd/papers/BiNGO/

http://www.babelomics.org

logically relevant groups of gene products present within a large dataset allows biologists to generate hypothesis in an objective and streamlined manner [110].

grated access to data on the genetics, genomics, and biology of the laboratory mouse useful to proteomic researchers.

B) Babelomics

BiNGO (Biological Networks Gene Ontology) is an open-source Java tool that determines which GO terms are significantly overrepresented in a set of genes [114]. The output is a graph that can be displayed in Cytoscape [115].

The European Bioinformatics Institute (EBI), which forms part of the European Molecular Biology Laboratory (EMBL), has a web-based resource (http://www.ebi.ac.uk) for research and services in bioinformatics. The portal manages and analyses databases of biological data including DNA, protein sequences, and macromolecular structures. EBI also gives access to the International Protein Index (IPI, http://www.ebi.ac.uk/IPI/IPIhelp.html) that describes the proteomes of higher eukaryotic organisms [117]. IPI database is useful for automated cross-database integration by using multiple database identifiers.

D) InterPro

D) Protein-protein Interaction Databases

InterPro (http://www.ebi.ac.uk/interpro) is a database of protein families and structural functional motifs in which features from known proteins can be used to identify unknown protein sequences [116]. It is a useful resource that can functionally enhance protein annotation.

There are several large resources available for protein interaction data for human and major model organisms, including databases with human-curated known interactions, predicted interactions, and interactions observed in highthroughput experiments (see Table 4 for details). These resources become indispensable additions to annotation and pathway databases, and are used to guide visualization, integration and interpretation of genomic and proteomic profiles. Multiple graph visualization and analysis software systems are available (e.g., Cytoscape [115], MetNet3D [118], NAViGATOR (http://ophid. utoronto.ca/navigator/), Osprey [119], WebInterViewer [120], and PIMWalker [121].

Babelomics (http://babelomics.bioinfo.cipf.es/) integrates a set of web tools for functional annotation and analysis of groups of genes or proteins identified from high-throughput experiments [111-113]. C) BiNGO

iii. Bioinformatics Resources Other useful resources for proteomic researchers include the following web-based servers and tools: A) ExPASy ExPASy (Expert Protein Analysis System) is a proteomics server from the Swiss Institute of Bioinformatics dedicated to analyzing protein sequences and structures (http://ca.expasy.org/). It has links to many other molecular biology databases including UniProt Knowledgebase (SwissProt and TrEMBL) for proteins, PROSITE (protein families and domains), SWISS-2DPAGE, ENZYME (enzyme nomenclature), and SWISS-MODEL Repository (automatically generated protein models). It also provides access to tools and software packages for proteomics and sequence analysis, and has links to other molecular biology resources including major molecular biology servers such as the European BioInformatics Institute (EBI) and the National Center for Biotechnology Information (NCBI). B) MGI Mouse Genome Informatics (MGI) is a web-based resource (http://www.informatics.jax.org/) that provides inte-

C) EMBL-EBI

PROTEOMICS IN BIOMEDICAL RESEARCH OF INFECTIOUS DISEASES There are enormous application of proteomics to clinical issues for biodefense and infectious disease research. The emergence of infectious diseases has increased biomedical research into multiple MS platforms and microbial genomic databases to improve treatment, diagnosis, development of vaccines, and understanding of host immune response to infectious agents [122]. These proteomic strategies have been applied to microbes and viruses, and immunoproteomics to the development of new vaccine targets. Powerful research approaches to generate a multitude of potential new protein targets are key to counteracting microorganisms or toxins leading to infectious diseases. There is a need for development of new rapid diagnostic tests, vaccines, immunotherapies for prevention, and drugs for treatment. Therefore,

Mass Spectrometry, Proteomics, Data Mining Strategies

Table 4.

Anti-Infective Agents in Medicinal Chemistry, 2007, Vol. 6, No. 2

11

Summary of Some Useful Protein-Protein Interaction Databases

Algorithm

Description

Website

BIND

Biomolecular interaction network

http://bind.ca

BIOGRID

A general repository for interaction datasets

http://www.thebiogrid.org

DIP

Curated database of interacting proteins

http://dip.doe-mbi.ucla.edu

HPRD

Human reference protein interaction database

http://www.hprd.org

HPID

Human Protein Interaction Database

http://wilab.inha.ac.kr/hpid

INTACT

Molecular interaction database

http://www.ebi.ac.uk/intact

MINT

Molecular interaction database

http://cbm.bio.uniroma2.it/mint

MIPS

Mammalian Protein-Protein Interaction Database

http://mips.gsf.de/proj/ppi

OPHID

Online Predicted Human Interaction database – comprises predicted, experimental, and high-throughput interactions.

http://ophid.utoronto.ca

POINT

Predicted and curated protein interaction database

http://point./bioinformatics.tw

STRING

Known and predicted protein interactions and associations

http://string.embl.de

Additional database are available in various online (http://www3.oup.co.uk/nar/database/cap), and http://www.biopax.org/.

lists,

including

it is crucial to develop and use novel technologies such as proteomics in biomedical research to identify the properties of pathogens and their immune response. For example, rather than relying upon demonstrating growth of infectious microorganisms, quantitative amplification tests, genomics, and proteomics have been used and increased our ability to detect and identify proteins or toxins involved with infectious diseases such as Severe Acute Respiratory Syndrome (SARS), Tuberculosis, Influenza, and Anthrax [123-125]. Clinical Applications of Proteomics The ability of mass spectrometry to identify a large number of proteins in relatively short time is promising to researchers that deal with biological samples with complex protein content. This is reflected by the number of scientific publications that apply proteomic-based analyses to clinical and medical research that has grown exponentially over the last several years. Medical proteomics research has now been expanded to many categories [50]. Reviewing all of them is beyond the scope of this article. One of the major categories of interest is cancer research. Many efforts are being put into the search for possible blood-based biomarkers in order to improve the detection of disease. Other studies are proteomic comparison of diseased tissue samples versus healthy controls in order to distinguish between cancerous and normal samples [126]. Another focus of proteomic research interest is heart disease [127]. The need for prevention and improved treatment of such diseases initiated a necessity, as well as, interest in applying proteomic methodologies. Several studies have demonstrated altered protein profiles in diseased hearts and defined some of the involved proteins [128-130]. A third focus for proteomic research is the common, chronic and non-

JCB

(http://www.imb-jena.de/jcb/ppi/jcb_ppi_databases.html)

and

NAR

curable diseases such as diabetes mellitus. To date, the urinary proteome of diabetic patients was analyzed and compared to matched healthy controls and a disease pattern was established [131]. Alterations in the proteome of diabetic patients red blood cells membrane was found by MALDITOF-MS [132]. This review will focus on infectious diseases as a topic for proteomic research. Proteomics and Infectious Diseases Infectious diseases are a leading cause of death in the world [133]. The need for new strategies in coping with infectious diseases is necessary. This is due to the recent emergence of resistant forms or highly virulent pathogens such as SARS, West Nile virus and the avian influenza virus, as well as the threat of anthrax and smallpox as tools of bioterrorism. While these issues are alarming, the introduction of proteomics research in infectious diseases seems to be a promising approach. As pathogens target host cells, an immune response is initiated creating a proteomic signature in both the host and the pathogen through its survival mechanisms. The proteomes of microorganisms are good candidates for proteomic analysis as their genomes are usually relatively small and their adoptive process is less complex. In addition, some of them could be easily genetically manipulated, which makes them a good model for studying protein function. Applying proteomic research to infectious disease gives a higher level of understanding the relationship between host and pathogen, tracking bacterial response to antibiotic therapy and uncovering possible new target sights for clinical intervention. Proteomic maps of many pathogenic organisms are available in different stages; some of them are accessible via the Internet (http://proteome.biochem.mpg.de/ ormd. htm).

12 Anti-Infective Agents in Medicinal Chemistry, 2007, Vol. 6, No. 2

TUBERCULOSIS Mycobacterium tuberculosis is a major cause for morbidity and mortality. It is estimated that 2 million people will die from tuberculosis (TB) annually worldwide and many more will be infected [134]. In addition to the large number of people that are ill as a result of this pathogen, many others are latently infected [135,136]. This behavior defines an opportunity to detect infected individuals before they become clinically ill. Although an active vaccine against the bacterium is available [137,138] and is quite effective in preventing miliary and meningeal disease in babies, its efficacy in preventing pulmonary TB in adults is incomplete [139-141]. Proteomic research has targeted many aspects of Mycobacterium tuberculosis in an attempt to better comprehend the pathogenesis of TB. Comparative proteome analyses have been performed between virulent and non-virulent strains of TB. Mattow et al. (2003) analyzed culture supernatant proteins from Mycobacterium tuberculosis H37Rv and attenuated M. bovis BCG [142]. By using 2DE-MS (MALDI-TOF and ESI-MS/MS) and sequencing by Edman degradation they identified 27 different M. Tuberculosis specific proteins representing candidate antigens for novel vaccines and diagnostic purposes for TB. Bahk et al. (2004) compared the proteomes of three different strains of mycobacterium: i) M. tuberculosis K-strain, which is the most prevalent among clinical isolates in Korea, ii) M. tuberculosis CDC 1551 strain, which effectively induced a PPD skin test conversion among infected individuals, and iii) M. tuberculosis H37Rv, the laboratory adapted strain [124]. Proteomic analyses were performed by MALDI-TOF-MS or LC-ESI-MS. The study focused on the differentially expressed proteins of the K strain. These proteins were cloned, expressed in E. coli and affinity purified. Among them three proteins were selected and examined for their potential as sero-diagnostic antigens using ELISA. When sera of 100 tuberculosis patients and 100 sera of healthy controls where analyzed, the sensitivity was 60%, 74% and 43% and specificity was 96%, 97% and 84% in respect to the rRV3369, rRv3874 and rRv0566c proteins. Studies in reducing sample complexity by organelle fractionation have also been employed in studying TB to determine any potential biological information that mycobacterium tuberculosis may harbor. Mawuenyega et al. (2005) demonstrated a global Mycobacterium tuberculosis functional network analysis by global subcellular protein profiling [143]. They identified 1,044 proteins by 2DE-LCMS/MS, corresponding to the cell wall, membrane and cytosol compartments. The membrane proteins, associated with cell signaling and cell-to-cell interactions, may define the pathogenicity of M. tuberculosis as they are involved in lipid metabolism and transport across the membrane [144]. This fact makes the bacterial membrane a promising site as a therapeutic target. However, membrane proteins have both hydrophobic and hydrophilic regions; therefore, no single solvent can be used to solubilize all membrane proteins, and samples are difficult to prepare. In a comprehensive proteomic profiling study of the membrane constituents of M. tuberculosis H37Rv, 739 proteins have been identified using SDS-gel-LC-MS/MS and further investigation lightened some of their biological func-

Evangelou et al.

tions [145]. On the contrary, Xiong et al. (2005) performed identification of integral membrane proteins of the same TB strain by using SDS-PAGE-LC-ESI-MS/MS [144]. They reported 349 protein identifications in total, of those 100 were integral membrane proteins with at least one predicted transmembrane -helix. In order to refine the samples that undergo proteomic analysis, Sinha et al. (2005) used the Triton X-114extraction-based approach for the analysis of M. tuberculosis membrane proteins [146]. This is because they found that these detergent-soluble membrane proteins of mycobacteria are potent stimulators of human T cells. By applying MALDI-TOF-MS on 116 samples, 105 proteins were identified of which 9 were new to the M. tuberculosis proteome. The study has also suggested the possibility of certain ribosomal proteins of the pathogen serving as potent immunogens as reflected by the level of interferon- they induce. Another interesting manipulation was made in a study that analyzed proteins unique to intraphagosomally grows Mycobacterium tuberculosis. This bacterium is a facultative intracellular organism that persists and replicates within host phagocytes and cause them to arrest their maturation. In this study mycobacteria were purified from phagosomes of infected murine bone marrow-derived macrophages and analyzed by high resolution 2DE-MALDI-MS/MS. Protein patterns discovered were compared with those of broth-cultured mycobacteria. The analysis revealed 11 proteins exclusive for the intraphagosomal fraction. Those proteins could be helpful in order to understand the pathogen’s adaptations in response to intracellular conditions [147]. A better understanding of the virulence of Mycobacterium tuberculosis was demonstrated by genetically manipulating a unique strain that lacks the ESX-1 locus [148]. This locus is critical for full virulence in this bacterium encoding the ESAT-6 and CFP-10 proteins. By comparing proteomic analyses (LC-MS/MS) of the ESX-1 deleted strain with the native strain, a third protein, espA was identified and determined to be secreted solely by the ESX-1 containing strain. Additionally, the secretion of these three proteins was found to be co-dependent [148]. Although MALDI-MS PMF is a powerful tool for protein identification, it sometimes fails to identify low-molecularmass proteins, protein fragments and protein mixtures reliably. A unique approach was applied on M. tuberculosis H37Rv in order to address this problem. This approach, called minimal protein identifier (MPI) is based on comparing experimentally derived proteolytic peptide mass maps of proteins recorded by MALDI-MS to those of previously identified counterparts. By using this approach, the authors revealed truncated variants of mycobacteria elongation factor EF-Tu, previously not identified by PMF. Additionally, they suggested links between the power of the MPI approach and distinct factors such as the complexity of the proteome analyzed and the accuracy of the mass spectrometer used [149]. Isotope-coded affinity tag technology (ICAT), which enables a quantitative assessment in proteomics, has been utilized to analyze the proteome of Mycobacterium tuberculosis by 2DE-MS and by ICAT-LC/MS [150]. Two strains of M. Tuberculosis were compared. Both methods demon-

Mass Spectrometry, Proteomics, Data Mining Strategies

strated biases for and against certain types and classes of proteins and quantified proteins at different levels. 2DE-MS complemented ICAT-LC/MS for low Mr and cysteine-free proteins and protein species separation. Although both the ICAT and 2DE/MS methods demonstrated detection of similar functional protein classes, ICAT-LC/MS was more sensitive for high molecular weight membranes proteins. In searching for candidate antigens for a tuberculosis vaccine, researchers are now using reversed-phase HPLCMS in order to identify M. tuberculosis proteins shed by the bacteria and eliminated in animal urine during the early phase of the infectious process. A hypothetical M. tuberculosis protein was identified and the recombinant protein was produced in E. coli. Lymphoid cells from both PPD-positive individuals and mice infected with M. tuberculosis recognized this protein. Furthermore, immunization of mice with this protein induced protection against a challenge with virulent M. tuberculosis [151]. Not only has proteomics been applied to study the mycobacterium proteome but attempts have been made to study the pharmacokinetics of a commonly used antimicrobial regiment against tuberculosis. In order to overcome the resistance that this pathogen developed to the first line antibiotic drugs, a combination anti-infective therapy is now being administrated commonly. This has raised the need for a method that quantifies both drugs simultaneously in human plasma. Chen et al. (2005) developed and validated a method using LC/MS/MS for a simultaneous quantification of isoniazid and ethambutol in human plasma [152]. SARS In 2003, the world had witnessed the emergence of a new infectious disease called Severe Acute Respiratory Syndrome (SARS). During this epidemic 916 patients died of the disease and 8,422 were infected [153]. A new coronavirus (SARS-CoV) was found to be the etiological agent within 4 months of disease emergence [123,154,155] and its genome was completely sequenced within two weeks. Whereas the outbreak of this infectious disease is currently seemingly under control, there is still great concern about possible future outbreaks. The course of the disease is such that during the first 7-10 days of illness, the viral load in the upper respiratory tract is too low to be diagnosed by direct detection of the virus in body secretions [156]. Long latency from time of contamination until diagnosis of disease is also a concern with diagnosis by indirect antibody detection. The median time for seroconversion in SARS patients is 17 - 20 days after onset of symptoms [157]. A third modality, by which SARS can be diagnosed, is the detection of viral RNA by RT-PCR, is expensive and could result in false-positive outcome due to contamination. The difficulties encountered in discovering the virus as the responsible pathogen in upper respiratory tract infections has led to further investigation into innovative proteomic diagnostic techniques. One of the major goals of mass spectrometry is to detect specific proteins of the pathogen for diagnostic purposes. Yip et al. (2005) performed SELDI-TOF analysis of samples from SARS patients and found a unique proteomic

Anti-Infective Agents in Medicinal Chemistry, 2007, Vol. 6, No. 2

13

signature of proteins produced by the virus causing SARS and proteins related to the host-response of the infection [158]. They succeeded to identify one of the virus’ protein biomarker as serum amyloid A and showed correlation between its concentration and severity of the disease. In another study, Kang et al. (2005) [159] demonstrated a sensitivity and specificity of 97.3% and 97.4%, respectively, enough to facilitate a rapid diagnostic tool for SARS that provides results within 3 hours of testing in patients whose serum was tested within 24 hours from the onset of fever. Additionally, proteomic analysis of SARS patients’ sera could be quantified and correlated with the severity of the disease. Although these investigations are key to the development of better diagnosis methods for this disease, its been argued that control group selection that consisted of both healthy and non SARS infected individuals and the need for validation of the SELDI-TOF method as a diagnostic tool through other more common technologies [160]. Ren et al. (2004) used 2DE coupled to MALDI-TOF-MS to detect serum biomarkers in SARS [123]. Serum samples of SARS patients were analyzed upon hospital admission, after one week of medical treatment and were compared to healthy subjects and to patients with pneumonia caused by another pathogen. At least 3 clusters of protein spots were significantly and persistently over expressed in sera of patients with SARS. The proteins were identified as truncated 1-antitrypsin, Complement 4 fragment and Serum amyloid A. A correlation was found between high levels of these proteins and specific clinico-pathological parameters. The significant increase in truncated 1-antitrypsin showed sensitivity of 100% for SARS patients and specificity of 92.8% for controls. Furthermore, high levels of truncated 1antitrypsin may be the result of degradation of 1antitrypsin, a protein that has a role in protection of the lung. Hence, its degradation may be an important factor in the pathogenesis of SARS and a possible biological marker for the diagnosis of this disease. Zeng et al. used 2D-LC-MS/MS to analyze the cytosol of Vero E6 cells that were infected with the SARS virus [161]. They identified a protein encoded by the ORF3a gene. Further analyses revealed that inter-chain disulfide bonds might be formed between this protein and one of the viral structure proteins: spike protein. Mutation analysis of SARS-CoV isolates and ELISA of SARS patients’ sera indicated that the ORF3a product might function together with the viral spike protein in vivo. Taken together, the 3a protein may serve as a new clinical marker or drug target to SARS. In another study, Chen et al. (2004) investigated the plasma proteome of SARS patients in different time courses [162]. By using 2DE-MALDI-TOF-MS, 38 differential spots were selected for protein identification and most were found to be acute phase proteins. Proteomic comparative analysis to healthy subjects was also performed and peroxiredoxin II, a natural killer enhancing factor B, was found exclusively in the sera of SARS patients. Although this protein has a known role in the removal of cytotoxic hydrogen peroxides from the cytosol, its function in SARS patients is unknown. Whereas the above mentioned accomplishments were achieved through mass spectrometry, the worldwide effort of

14 Anti-Infective Agents in Medicinal Chemistry, 2007, Vol. 6, No. 2

characterization of the SARS virus is multidisciplinary with other methods used to eliminate the disease. MALARIA The lethal species of malaria parasite, Plasmodium falciparum, causes a devastating disease and remains a major challenge to health providers especially in sub-Saharan Africa. During its complex cell cycle, this parasite is going through four main stages: i) sporozoite – this is the infectious form that is injected to the human host by the mosquito; ii) merozoite – this is the form being secreted after maturation from the liver to the blood stream, where it invades the red blood cells; iii) trophozoites – the form multiplying in the erythrocytes; and iv) gametocytes – the sexual stage that is taken up by the mosquitoes and fertilize to form a zygote. Zygotes mature into ookinetes that form sporozoitescontaining oocyte. Later on, upon oocyte lyses, sporozoites are released and migrate to the mosquito’s salivary gland. Each one of these stages requires a specialized protein expression, but not all the stages where accessible to expression studies through older methods. It was only in 2002 that Florens et al. performed comparative proteomics throughout the life cycle of the parasite [163]. These proteomes were analyzed by MudPIT and 2,415 parasite proteins were confidently identified representing 46% of all gene products that were detected in the four stages of the protozoa’s life cycle. Furthermore, most surface proteins (Including the var and rif genes) were found to be more widely expressed than was initially thought. The latter two genes were thought to be involved in immune evasion only in the blood stage but in this study were found to be expressed at the sporozoite stage as well. Based on the same MudPIT method, a comparison between mRNA transcript and protein abundance levels was made in different stages of the parasite life cycle. This study discovered significant discrepancies between mRNA and protein abundance in this protozoa mainly a delay between a maximum detection of an mRNA transcript and that of its cognate protein. Additionally, possibly post transcriptionally regulated genes were identified and families of functionally related genes were observed to have similar patterns of mRNA and protein accumulation [164]. As was previously discussed, analyses of subsections of the proteome have an advantage of simplifying the sample and raising the probability of identifying low abundance proteins. In their study, Kahn et al. performed proteomic analysis by LC-MS/MS on Male- and Female-Specific Gametocyte Proteomes of Plasmodium berghei. The male proteome contained 236 male-specific proteins and the female proteome 101 female-specific proteins. However, they shared only 69 proteins, emphasizing the diverged features of the sexes [165].

Evangelou et al.

in quantitative and shot-gun proteomics, and listed current data mining tools and resources available to proteome researchers in the medical and infectious disease fields. The advancements in new mass spectrometry methods over the past few years have increased our ability to unambiguously identify greater number of proteins with higher confidence. It is anticipated that the continued advancements in mass spectrometers, proteomic techniques and strategies will greatly increase our ability to fully identify complete proteomes for organisms and organelles, and characterize their tissue- and disease-specific fractions. More importantly the confident identification of whole proteomes from infectious agents such as those associated with SARS, tuberculosis, and malaria would allow us to design better and more effective strategies to tackle these diseases in the future. REFERENCES [1] [2] [3]

[4] [5] [6] [7] [8] [9] [10]

[11] [12] [13] [14] [15] [16] [17]

[18] [19] [20] [21] [22] [23] [24] [25] [26]

CONCLUSIONS AND FUTURE OUTLOOK In the present review we have provided a short overview of the history of proteomics and mass spectrometry technologies currently used in systems biology research. We have also summarized recent developments and applications

[27] [28] [29]

Kim, T.K. J. Biochem. Mol. Biol., 2004, 37(1), 53. Patterson, S.D.; Aebersold, R.H. Nat. Genet., 2003, 33(Suppl), 311. Mootha, V.K.; Bunkenborg, J.; Olsen, J.V.; Hjerrild, M.; Wisniewski, J.R.; Stahl, E.; Bolouri, M.S.; Ray, H.N.; Sihag, S.; Kamal, M.; Patterson, N.; Lander, E.S.; Mann, M. Cell, 2003, 115(5), 629. Winzeler, E.A. Nat. Rev. Microbiol., 2006, 4(2), 145. Pal, C.; Papp, B.; Lercher, M.J. Nat. Rev. Genet., 2006, 7(5), 337. Fischer, H.P. Biotechnol. Annu. Rev., 2005, 11, 1. Jiang, Z.; Zhou, Y. Am J. Pharmacogenomics, 2005, 5(6), 387. Forst, C.V. Mol. Biol. Rep., 2002, 29(3), 265. Kaufmann, R. J. Biotechnol., 1995, 41(2-3), 155. Wright, P.; Chassaing, C.; Cussans, N.; Gibson, D.; Green, C.; Gleave, M.; Jones, R.; Macrae, P.; Saunders, K. Biomed. Chromatogr., 2006, 20(6-7), 585. Saber, D.L.; Mauro, D.; Sirivedhin, T. J. Ind. Microbiol. Biotechnol., 2005, 32(11-12), 665. Pond, K.L.; Huang, Y.; Wang, Y.; Kulpa, C.F. Environ. Sci. Technol., 2002, 36(4), 724. Ong, S.E.; Foster, L.J.; Mann, M. Methods, 2003, 29(2), 124. Lisacek, F.C.; Traini, M.D.; Sexton, D.; Harry, J.L.; Wilkins, M.R. Proteomics, 2001, 1(2), 186. Taylor, J.; Anderson, N.L.; Scandora, A.E. Jr.; Willard, K.E.; Anderson, N.G. Clin. Chem., 1982, 28(4 Pt 2), 861. Wilkins, M.R.; Williams, K.L. Experientia, 1995, 51(12), 1189. Wilkins, M.R.; Appel, R.D.; Van Eyk, J.E.; Chung, M.C.; Gorg, A.; Hecker, M.; Huber, L.A.; Langen, H.; Link, A.J.; Paik, Y.K.; Patterson, S.D.; Pennington, S.R.; Rabilloud, T.; Simpson, R.J.; Weiss, W.; Dunn, M.J. Proteomics, 2006, 6(1), 4. Arthur, J.W.; Wilkins, M.R. J. Proteome Res., 2004, 3(3), 393. Harry, J.L.; Wilkins, M.R.; Herbert, B.R.; Packer, N.H.; Gooley, A.A.; Williams, K.L. Electrophoresis, 2000, 21(6), 1071. James, P.; Quadroni, M.; Carafoli, E.; Gonnet, G. Biochem. Biophys. Res. Commun., 1993, 195(1), 58. Pappin, D.J.; Hojrup, P.; Bleasby, A.J. Curr. Biol., 1993, 3(6), 327. Fenn, J.B.; Mann, M.; Meng, C.K.; Wong, S.F.; Whitehouse, C.M. Science, 1989, 246(4926), 64. Karas, M.; Hillenkamp, F. Anal. Chem., 1988, 60(20), 2299. Tanaka, K.; Waki, H.; Ido, Y.; Akita, S.; Yoshida, Y.; Yoshida, T. Rapid Comm. Mass Spectrom., 1988, 2 (8), 151. Mann, M.; Hojrup, P.; Roepstorff, P. Biol. Mass Spectrom., 1993, 22(6), 338. Stevenson, C.L.; Anderegg, R.J.; Borchardt, R.T. J. Pharm. Biomed. Anal., 1993, 11(4-5), 367. Merand, V.; Forest, E.; Gagnon, J.; Monnet, C.; Thibault, P.; Neuburger, M.; Douce, R. Biol. Mass Spectrom., 1993, 22(8), 447. Eng, J.K.; McCormack, A.L.; Yates, J.R. J. Am. Soc. Mass Spectrom., 1994, 5(11), 976. Wolters, D.A.; Washburn, M.P.; Yates, J.R., 3rd Anal. Chem., 2001, 73(23), 5683.

Mass Spectrometry, Proteomics, Data Mining Strategies [30]

[31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52]

[53]

[54] [55] [56] [57]

[58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68]

[69]

Link, A.J.; Eng, J.; Schieltz, D.M.; Carmack, E.; Mize, G.J.; Morris, D.R.; Garvik, B.M.; Yates, J.R., 3rd . Nat. Biotechnol., 1999, 17(7), 676. Washburn, M.P.; Wolters, D.; Yates, J.R., 3rd. Nat. Biotechnol., 2001, 19(3), 242. Jungblut, P.; Baumeister, H.; Klose, J. Electrophoresis, 1993, 14(7), 638. Klose, J. Electrophoresis, 1989, 10 (2), 140. O'Farell, P.H. J. Biol. Chem., 1975, 250 (10), 4007. Fernandez-Patron, C.; Madrazo, J.; Hardy, E.; Mendez, E.; Frank, R.; Castellanos-Serra, L. Electrophoresis, 1995, 16(6), 911. Mukherji, M. Expert. Rev. Proteomics, 2005, 2(1), 117. de Graauw, M.; Hensbergen, P.; van de Water, B. Electrophoresis, 2006, 13, (13), 2676. Harvey, D.J. Expert. Rev. Proteomics., 2005, 2(1), 87. Baldwin, M.A. Methods. Enzymol., 2005, 405, 172. Wuhrer, M.; Deelder, A.M.; Hokke, C.H. J. Chromatogr., 2005, 825(2), 124. Morelle, W.; Michalski, J.C. Curr. Pharm. Des., 2005, 11(20), 2615. Abbott, D.W.; Wilkins, A.; Asara, J.M.; Cantley, L.C. Curr. Biol., 2004, 14(24), 2217. Pedrioli, P.G.; Raught, B.; Zhang, X.D.; Rogers, R.; Aitchison, J.; Matunis, M.; Aebersold, R. Nat. Methods, 2006, 3(7), 533. Denison, C.; Rudner, A.D.; Gerber, S.A.; Bakalarski, C.E.; Moazed, D.; Gygi, S.P. Mol. Cell. Proteomics, 2005, 4(3), 246. Baldwin, M.A. Methods Enzymol., 2005, 402, 3. Guzzetta, A.W.; Thakur, R.A.; Mylchreest, I.C. Rapid Commun. Mass Spectrom., 2002, 16(21), 2067. Ishihama, Y.; Katayama, H.; Asakawa, N.; Oda, Y. Rapid Commun. Mass Spectrom., 2002, 16(10), 913. Hillenkamp, F.; Karas, M.; Beavis, R.C.; Chait, B.T. Anal. Chem., 1991, 63(24), 1193A. Patterson, S.D.; Aebersold, R. Electrophoresis, 1995, 16 (10), 1791. Kislinger, T.; Jurisica, I. Cancer Genomics and Proteomics, 2006, 3(1), 11. Siuzdak, G. Proc. Natl. Acad. Sci. USA, 1994, 91(24), 11290. Kislinger, T.; Emili, A. In Knowledge Discovery in Proteomics. Jurisica, I. and Wigle, D.A. Eds.; CRC Press: Boca Raton, FL, 2006; pp. 39. Wright Jr, G.L.; Cazares, L.H.; Leung, S.M.; Nasim, S.; Adam, B.L.; Yip, T.T.; Schellhammer, P.F.; Gong, L.; Vlahou, A. Prostate Cancer Prostatic Dis., 1999, 2(5/6), 264. Merchant, M.; Weinberger, S.R. Electrophoresis, 2000, 21, 1164. Weinberger, S.R.; Morris, T.S.; Pawlak, M. Pharmacogenomics, 2000, 1 (4), 395. Weinberger, S.R.; Dalmasso, E.A.; Fung, E.T. Curr. Opin. Chem. Biol., 2002, 6(1), 86. Petricoin, E.F.; Ardekani, A.M.; Hitt, B.A.; Levine, P.J.; Fusaro, V.A.; Steinberg, S.M.; Mills, G.B.; Simone, C.; Fishman, D.A.; Kohn, E.C.; Liotta, L.A. Lancet, 2002, 359(9306), 572. Kwapiszewska, G.; Meyer, M.; Bogumil, R.; Bohle, R.M.; Seeger, W.; Weissmann, N.; Fink, L. BMC Biotechnol., 2004, 4, 30. Lin, Z.; Jenson, S.D.; Lim, M.S.; Elenitoba-Johnson, K.S. Mod. Pathol., 2004, 17(6), 670. Reid, G.; Gan, B.S.; She, Y.M.; Ens, W.; Weinberger, S.; Howard, J.C. Appl. Environ. Microbiol., 2002, 68(2), 977. Patterson, B.W.; Zhao, G.; Elias, N.; Hachey, D.L.; Klein, S. J. Lipid Res., 1999, 40(11), 2118. Bruins, A.P. J. Chromatogr., 1991, 554(1-2), 39. Deutzmann, R. Methods Mol. Med., 2004, 94, 269. Wilm, M.; Shevchenko, A.; Houthaeve, T.; Breit, S.; Schweigerer, L.; Fotsis, T.; Mann, M. Nature, 1996, 379(6564), 466. Mann, M.; Wilm, M. Trends. Biochem. Sci., 1995, 20(6), 219. Wilm, M.; Mann, M. Anal. Chem., 1996, 68(1), 1. Tomer, K.B.; Parker, C.E. J. Chromatogr., 1989, 492, 189. Hunt, D.F.; Henderson, R.A.; Shabanowitz, J.; Sakaguchi, K.; Michel, H.; Sevilir, N.; Cox, A.L.; Appella, E.; Engelhard, V.H. Science, 1992, 255(5049), 1261. Nogueira, R.; Lammerhofer, M.; Lindner, W. J. Chromatogr. A, 2005, 1089(1-2), 158.

Anti-Infective Agents in Medicinal Chemistry, 2007, Vol. 6, No. 2 [70] [71] [72] [73]

[74] [75]

[76]

[77] [78] [79]

[80] [81] [82] [83] [84]

[85]

[86] [87] [88] [89] [90]

[91] [92] [93] [94] [95] [96]

[97] [98] [99] [100] [101]

[102]

15

Kapp, U.; Langowski, J. Anal. Biochem., 1992, 206(2), 293. Yates, J.R. 3rd; Morgan, S.F.; Gatlin, C.L.; Griffin, P.R.; Eng, J.K. Anal. Chem., 1998, 70(17), 3557. Kislinger, T.; Emili, A. Expert. Rev. Proteomics, 2005, 2(1), 27. Kislinger, T.; Cox, B.; Kannan, A.; Chung, C.; Hu, P.; Ignatchenko, A.; Scott, M.S.; Gramolini, A.O.; Morris, Q.; Hallett, M.T.; Rossant, J.; Hughes, T.R.; Frey, B.; Emili, A. Cell, 2006, 125(1), 173. Kislinger, T.; Emili, A. Curr. Opin. Mol. Ther., 2003, 5(3), 285. Florens, L.; Liu, X.; Wang, Y.; Yang, S.; Schwartz, O.; Peglar, M.; Carucci, D.J.; Yates, J.R. 3rd; Wub, Y. Mol. Biochem. Parasitol., 2004, 135(1), 1. Koller, A.; Washburn, M.P.; Lange, B.M.; Andon, N.L.; Deciu, C.; Haynes, P.A.; Hays, L.; Schieltz, D.; Ulaszek, R.; Wei, J.; Wolters, D.; Yates, J.R., 3rd . Proc. Natl. Acad. Sci. USA, 2002, 99(18), 11969. Ong, S.E.; Mann, M. Nat. Chem. Biol., 2005, 1(5), 252. Turecek, F. J. Mass Spectrom., 2002, 37(1), 1. Old, W.M.; Meyer-Arendt, K.; Aveline-Wolf, L.; Pierce, K.G.; Mendoza, A.; Sevinsky, J.R.; Resing, K.A.; Ahn, N.G. Mol. Cell. Proteomics, 2005, 4(10), 1487. Ong, S.E.; Blagoev, B.; Kratchmarova, I.; Kristensen, D.B.; Steen, H.; Pandey, A.; Mann, M. Mol. Cell. Proteomics, 2002, 1(5), 376. Gygi, S.P.; Rist, B.; Gerber, S.A.; Turecek, F.; Gelb, M.H.; Aebersold, R. Nat. Biotechnol., 1999, 17(10), 994. Gygi, S.P.; Rist, B.; Griffin, T.J.; Eng, J.; Aebersold, R. J. Proteome Res., 2002, 1(1), 47. Schmidt, A.; Kellermann, J.; Lottspeich, F. Proteomics, 2005, 5(1), 4. Sarioglu, H.; Brandner, S.; Jacobsen, C.; Meindl, T.; Schmidt, A.; Kellermann, J.; Lottspeich, F.; Andrae, U. Proteomics, 2006, 6(8), 2407. Ross, P.L.; Huang, Y.N.; Marchese, J.N.; Williamson, B.; Parker, K.; Hattan, S.; Khainovski, N.; Pillai, S.; Dey, S.; Daniels, S.; Purkayastha, S.; Juhasz, P.; Martin, S.; Bartlet-Jones, M.; He, F.; Jacobson, A.; Pappin, D.J. Mol. Cell Proteomics, 2004, 3(12), 1154. Zieske, L.R. J. Exp. Bot., 2006, 57(7), 1501. Wang, G.; Wu, W.W.; Zeng, W.; Chou, C.L.; Shen, R.F. J. Proteome Res., 2006, 5, (5), 1214. Hsu, J.L.; Huang, S.Y.; Chow, N.H.; Chen, S.H. Anal. Chem., 2003, 75(24), 6843. Bondarenko, P.V.; Chelius, D.; Shaler, T.A. Anal. Chem., 2002, 74(18), 4741. Wang, W.; Zhou, H.; Lin, H.; Roy, S.; Shaler, T.A.; Hill, L.R.; Norton, S.; Kumar, P.; Anderle, M.; Becker, C.H. Anal. Chem., 2003, 75(18), 4818. Liu, H.; Sadygov, R.G.; Yates, J.R., 3rd. Anal. Chem., 2004, 76(14), 4193. Zybailov, B.; Coleman, M.K.; Florens, L.; Washburn, M.P. Anal. Chem., 2005, 77(19), 6218. Yates, J.R., 3rd . Eng, J.K.; McCormack, A.L.; Schieltz, D. Anal. Chem., 1995, 67(8), 1426. Perkins, D.N.; Pappin, D.J.; Creasy, D.M.; Cottrell, J.S. Electrophoresis, 1999, 20(18), 3551. Craig, R.; Beavis, R.C. Bioinformatics, 2004, 20(9), 1466. Geer, L.Y.; Markey, S.P.; Kowalak, J.A.; Wagner, L.; Xu, M.; Maynard, D.M.; Yang, X.; Shi, W.; Bryant, S.H. J. Proteome Res., 2004, 3(5), 958. Beavis, R.C. Methods Mol. Biol., 2006, 328, 217. Kislinger, T.; Rahman, K.; Radulovic, D.; Cox, B.; Rossant, J.; Emili, A. Mol. Cell. Proteomics, 2003, 2(2), 96. Nesvizhskii, A.I.; Keller, A.; Kolker, E.; Aebersold, R. Anal. Chem., 2003, 75(17), 4646. Keller, A.; Nesvizhskii, A.I.; Kolker, E.; Aebersold, R. Anal. Chem., 2002, 74(20), 5383. Gan, R.R.; Yi, E.C.; Chiu, Y.; Lee, H.; Kao, Y.C.; Wu, T.H.; Aebersold, R.; Goodlett, D.R.; Ng, W.V. Mol. Cell. Proteomics, 2006, 5(6), 987. Heller, M.; Ye, M.; Michel, P.E.; Morier, P.; Stalder, D.; Junger, M.A.; Aebersold, R.; Reymond, F.; Rossier, J.S. J. Proteome Res., 2005, 4(6), 2273.

16 Anti-Infective Agents in Medicinal Chemistry, 2007, Vol. 6, No. 2 [103] [104] [105] [106] [107] [108] [109]

[110]

[111] [112]

[113] [114] [115]

[116]

[117] [118] [119] [120] [121] [122] [123]

[124] [125] [126] [127] [128]

[129] [130]

Higdon, R.; Hogan, J.M.; Van Belle, G.; Kolker, E. OMICS, 2005, 9(4), 364. Peng, J.; Elias, J.E.; Thoreen, C.C.; Licklider, L.J.; Gygi, S.P. J. Proteome Res., 2003, 2(1), 43. Eisen, M.B.; Spellman, P.T.; Brown, P.O.; Botstein, D. Proc. Natl. Acad. Sci. USA, 1998, 95(25), 14863. de Hoon, M.J.; Imoto, S.; Nolan, J.; Miyano, S. Bioinformatics, 2004, 20(9), 1453. Kohonen, T. Self Organizing Maps, Springer-Verlag: Berlin, 1995. Sultan, M.; Wigle, D.A.; Cumbaa, C.A.; Maziarz, M.; Glasgow, J.; Tsao, M.S.; Jurisica, I. Bioinformatics, 2002, 18(Suppl 1), S111. Ashburner, M.; Ball, C.A.; Blake, J.A.; Botstein, D.; Butler, H.; Cherry, J.M.; Davis, A.P.; Dolinski, K.; Dwight, S.S.; Eppig, J.T.; Harris, M.A.; Hill, D.P.; Issel-Tarver, L.; Kasarskis, A.; Lewis, S.; Matese, J.C.; Richardson, J.E.; Ringwald, M.; Rubin, G.M.; Sherlock, G. Nat. Genet., 2000, 25(1), 25. Zeeberg, B.R.; Feng, W.; Wang, G.; Wang, M.D.; Fojo, A.T.; Sunshine, M.; Narasimhan, S.; Kane, D.W.; Reinhold, W.C.; Lababidi, S.; Bussey, K.J.; Riss, J.; Barrett, J.C.; Weinstein, J.N. Genome Biol., 2003, 4(4), R28. Al-Shahrour, F.; Diaz-Uriarte, R.; Dopazo, J. Bioinformatics, 2004, 20(4), 578. Al-Shahrour, F.; Minguez, P.; Tarraga, J.; Montaner, D.; Alloza, E.; Vaquerizas, J.M.; Conde, L.; Blaschke, C.; Vera, J.; Dopazo, J. Nucleic Acids Res., 2006, 34(Web Server issue), W472. Al-Shahrour, F.; Minguez, P.; Vaquerizas, J.M.; Conde, L.; Dopazo, J. Nucleic Acids Res., 2005, 33(Web Server issue), W460. Maere, S.; Heymans, K.; Kuiper, M. Bioinformatics, 2005, 21(16), 3448. Shannon, P.; Markiel, A.; Ozier, O.; Baliga, N.S.; Wang, J.T.; Ramage, D.; Amin, N.; Schwikowski, B.; Ideker, T. Genome Res., 2003, 13(11), 2498. Mulder, N.J.; Apweiler, R.; Attwood, T.K.; Bairoch, A.; Barrell, D.; Bateman, A.; Binns, D.; Biswas, M.; Bradley, P.; Bork, P.; Bucher, P.; Copley, R.R.; Courcelle, E.; Das, U.; Durbin, R.; Falquet, L.; Fleischmann, W.; Griffiths-Jones, S.; Haft, D.; Harte, N.; Hulo, N.; Kahn, D.; Kanapin, A.; Krestyaninova, M.; Lopez, R.; Letunic, I.; Lonsdale, D.; Silventoinen, V.; Orchard, S.E.; Pagni, M.; Peyruc, D.; Ponting, C.P.; Selengut, J.D.; Servant, F.; Sigrist, C.J.; Vaughan, R.; Zdobnov, E.M. Nucleic Acids Res., 2003, 31(1), 315. Kersey, P.J.; Duarte, J.; Williams, A.; Karavidopoulou, Y.; Birney, E.; Apweiler, R. Proteomics, 2004, 4(7), 1985. Yang, Y.; Engin, L.; Wurtele, E.S.; Cruz-Neira, C.; Dickerson, J.A. Bioinformatics, 2005, 21(18), 3645. Breitkreutz, B.J.; Stark, C.; Tyers, M. Genome Biol., 2002, 3(12), PREPRINT0012. Han, K.; Ju, B.H.; Jung, H. Nucleic Acids Res., 2004, 32(Web Server issue), W89. Meil, A.; Durand, P.; Wojcik, J. Appl. Bioinformatics, 2005, 4(2), 137. Drake, R.R.; Deng, Y.; Schwegler, E.E.; Gravenstein, S. Expert. Rev. Proteomics., 2005, 2(2), 203. Ren, Y.; He, Q.Y.; Fan, J.; Jones, B.; Zhou, Y.; Xie, Y.; Cheung, C.Y.; Wu, A.; Chiu, J.F.; Peiris, J.S.; Tam, P.K. Proteomics, 2004, 4(11), 3477. Bahk, Y.Y.; Kim, S.A.; Kim, J.S.; Euh, H.J.; Bai, G.H.; Cho, S.N.; Kim, Y.S. Proteomics, 2004, 4(11), 3299. Drake, W.P.; Pei, Z.; Pride, D.T.; Collins, R.D.; Cover, T.L.; Blaser, M.J. Emerg. Infect. Dis., 2002, 8(11), 1334. An, H.J.; Kim, D.S.; Park, Y.K.; Kim, S.K.; Choi, Y.P.; Kang, S.; Ding, B.; Cho, N.H. J. Proteome Res., 2006, 5(5), 1082. Fu, Q.; Van Eyk, J.E. Expert. Rev. Proteomics, 2006, 3(2), 237. Jin, X.; Xia, L.; Wang, L.S.; Shi, J.Z.; Zheng, Y.; Chen, W.L.; Zhang, L.; Liu, Z.G.; Chen, G.Q.; Fang, N.Y. Proteomics, 2006, 6(6), 1948. White, M.Y.; Cordwell, S.J.; McCarron, H.C.; Prasan, A.M.; Craft, G.; Hambly, B.D.; Jeremy, R.W. Proteomics, 2005, 5(5), 1395. Pan, Y.; Kislinger, T.; Gramolini, A.O.; Zvaritch, E.; Kranias, E.G.; MacLennan, D.H.; Emili, A. Proc. Natl. Acad. Sci. USA, 2004, 101(8), 2241.

Evangelou et al. [131]

[132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142]

[143]

[144] [145] [146]

[147]

[148]

[149] [150]

[151]

[152] [153] [154] [155]

[156] [157]

[158]

[159]

[160]

Meier, M.; Kaiser, T.; Herrmann, A.; Knueppel, S.; Hillmann, M.; Koester, P.; Danne, T.; Haller, H.; Fliser, D.; Mischak, H. J. Diabetes Complications, 2005, 19(4), 223. Jiang, M.; Jia, L.; Jiang, W.; Hu, X.; Zhou, H.; Gao, X.; Lu, Z.; Zhang, Z. Biochem Biophys. Res. Commun., 2003, 309(1), 196. Binder, S.; Levitt, A.M.; Sacks, J.J.; Hughes, J.M. Science, 1999, 284(5418), 1311. Drobniewski, F.A.; Kent, R.J.; Stoker, N.G.; Uttley, A.H. J. Hosp. Infect., 1994, 28(4), 249. Dye, C.; Scheele, S.; Dolin, P.; Pathania, V.; Raviglione, M.C. JAMA, 1999, 282(7), 677. Sudre, P.; ten Dam, G.; Kochi, A. Bull. World Health Organ., 1992, 70(2), 149. Calmette, A.; Guerin, C. Ann.. Inst. Pasteur., 1920, 34, 553. Calmette, A. Proc. R. Soc. Med., 1931, 24, 85. Fletcher, H.; McShane, H. Expert. Opin. Emerg. Drugs, 2006, 11(2), 207. Fine, P.E. Lancet, 1995, 346(8986), 1339. Colditz, G.A.; Brewer, T.F.; Berkey, C.S.; Wilson, M.E.; Burdick, E.; Fineberg, H.V.; Mosteller, F. JAMA, 1994, 271(9), 698. Mattow, J.; Schaible, U.E.; Schmidt, F.; Hagens, K.; Siejak, F.; Brestrich, G.; Haeselbarth, G.; Muller, E.C.; Jungblut, P.R.; Kaufmann, S.H. Electrophoresis, 2003, 24(19-20), 3405. Mawuenyega, K.G.; Forst, C.V.; Dobos, K.M.; Belisle, J.T.; Chen, J.; Bradbury, E.M.; Bradbury, A.R.; Chen, X. Mol. Biol. Cell, 2005, 16(1), 396. Xiong, Y.; Chalmers, M.J.; Gao, F.P.; Cross, T.A.; Marshall, A.G. J. Proteome Res., 2005, 4(3), 855. Gu, S.; Chen, J.; Dobos, K.M.; Bradbury, E.M.; Belisle, J.T.; Chen, X. Mol. Cell. Proteomics, 2003, 2(12), 1284. Sinha, S.; Kosalai, K.; Arora, S.; Namane, A.; Sharma, P.; Gaikwad, A.N.; Brodin, P.; Cole, S.T. Microbiology, 2005, 151(Pt 7), 2411. Mattow, J.; Siejak, F.; Hagens, K.; Becher, D.; Albrecht, D.; Krah, A.; Schmidt, F.; Jungblut, P.R.; Kaufmann, S.H.; Schaible, U.E. Proteomics, 2006, 6(8), 2485. Fortune, S.M.; Jaeger, A.; Sarracino, D.A.; Chase, M.R.; Sassetti, C.M.; Sherman, D.R.; Bloom, B.R.; Rubin, E.J. Proc. Natl. Acad. Sci. USA, 2005, 102(30), 10676. Mattow, J.; Schmidt, F.; Hohenwarter, W.; Siejak, F.; Schaible, U.E.; Kaufmann, S.H. Proteomics, 2004, 4(10), 2927. Schmidt, F.; Donahoe, S.; Hagens, K.; Mattow, J.; Schaible, U.E.; Kaufmann, S.H.; Aebersold, R.; Jungblut, P.R. Mol. Cell. Proteomics, 2004, 3(1), 24. Mukherjee, S.; Kashino, S.S.; Zhang, Y.; Daifalla, N.; Rodrigues, V. Jr.; Reed, S.G.; Campos-Neto, A. J. Immunol., 2005, 175(8), 5298. Chen, X.; Song, B.; Jiang, H.; Yu, K.; Zhong, D. Rapid Commun. Mass Spectrom., 2005, 19(18), 2591. Chan-Yeung, M.; Xu, R.H. Respirology, 2003, 8(Suppl), S9. Gibbs, A.J.; Gibbs, M.J.; Armstrong, J.S. Arch. Virol., 2004, 149(3), 621. Zajkowska, J.M.; Hermanowska-Szpakowicz, T.; Pancewicz, S.; Kondrusik, M.; Grygorczuk, S. Pol. Merkuriusz Lek., 2004, 16(92), 183. Nicholls, J.; Dong, X.P.; Jiang, G.; Peiris, M. Respirology, 2003, 8(Suppl), S6. Guan, Y.J.; Tang, X.P.; Zhang, F.C.; Chen, Y.Q.; Yin, C.B.; Li, Y.M.; Zhong, N.S. Zhongguo Wei Zhong Bing Ji Jiu Yi Xue, 2005, 17(6), 332. Yip, T.T.; Chan, J.W.; Cho, W.C.; Yip, T.T.; Wang, Z.; Kwan, T.L.; Law, S.C.; Tsang, D.N.; Chan, J.K.; Lee, K.C.; Cheng, W.W.; Ma, V.W.; Yip, C.; Lim, C.K.; Ngan, R.K.; Au, J.S.; Chan, A.; Lim, W.W. Clin. Chem., 2005, 51(1), 47. Kang, X.; Xu, Y.; Wu, X.; Liang, Y.; Wang, C.; Guo, J.; Wang, Y.; Chen, M.; Wu, D.; Wang, Y.; Bi, S.; Qiu, Y.; Lu, P.; Cheng, J.; Xiao, B.; Hu, L.; Gao, X.; Liu, J.; Wang, Y.; Song, Y.; Zhang, L.; Suo, F.; Chen, T.; Huang, Z.; Zhao, Y.; Lu, H.; Pan, C.; Tang, H. Clin. Chem., 2005, 51(1), 56. Mazzulli, T.; Low, D.E.; Poutanen, S.M. Clin. Chem., 2005, 51(1), 6.

Mass Spectrometry, Proteomics, Data Mining Strategies [161]

[162]

Zeng, R.; Yang, R.F.; Shi, M.D.; Jiang, M.R.; Xie, Y.H.; Ruan, H.Q.; Jiang, X.S.; Shi, L.; Zhou, H.; Zhang, L.; Wu, X.D.; Lin, Y.; Ji, Y.Y.; Xiong, L.; Jin, Y.; Dai, E.H.; Wang, X.Y.; Si, B.Y.; Wang, J.; Wang, H.X.; Wang, C.E.; Gan, Y.H.; Li, Y.C.; Cao, J.T.; Zuo, J.P.; Shan, S.F.; Xie, E.; Chen, S.H.; Jiang, Z.Q.; Zhang, X.; Wang, Y.; Pei, G.; Sun, B.; Wu, J.R. J. Mol. Biol., 2004, 341(1), 271. Chen, J.H.; Chang, Y.W.; Yao, C.W.; Chiueh, T.S.; Huang, S.C.; Chien, K.Y.; Chen, A.; Chang, F.Y.; Wong, C.H.; Chen, Y.J. Proc. Natl. Acad. Sci. USA, 2004, 101, (49), 17039.

Received: September 18, 2006

Revised: October 18, 2006

Accepted: October 20, 2006

Anti-Infective Agents in Medicinal Chemistry, 2007, Vol. 6, No. 2 [163]

[164]

[165]

17

Florens, L.; Washburn, M.P.; Raine, J.D.; Anthony, R.M.; Grainger, M.; Haynes, J.D.; Moch, J.K.; Muster, N.; Sacci, J.B.; Tabb, D.L.; Witney, A.A.; Wolters, D.; Wu, Y.; Gardner, M.J.; Holder, A.A.; Sinden, R.E.; Yates, J.R.; Carucci, D.J. Nature, 2002, 419(6906), 520. Le Roch, K.G.; Johnson, J.R.; Florens, L.; Zhou, Y.; Santrosyan, A.; Grainger, M.; Yan, S.F.; Williamson, K.C.; Holder, A.A.; Carucci, D.J.; Yates, J.R. 3rd Winzeler, E.A. Genome Res., 2004, 14(11), 2308. Khan, S.M.; Franke-Fayard, B.; Mair, G.R.; Lasonder, E.; Janse, C.J.; Mann, M.; Waters, A.P. Cell, 2005, 121(5), 675.