Recent advances in immunoinformatics: Application

3 downloads 0 Views 243KB Size Report
the launch of Immunome Research, the flagship journal of the International Immunomics Society, and .... High-throughput computational methods allow epitope.
Current Opinion in Drug Discovery & Development 2008 11(2): © The Thomson Corporation ISSN 1367-6733

Recent advances in immunoinformatics: Application of in silico tools to drug development Mark C Evans

Address Research Informatics, Preclinical Research and Development, XOMA (US) LLC, 2910 Seventh Street, Berkeley, CA 94710, USA Email: [email protected]

Immunoinformatics is an emerging specialization of bioinformatics that focuses upon the structure, function and interactions of the molecules involved in immunity. Two major cell types, T-cells and B-cells, play significant roles in allergy, inflammation, infection and protective immunity. This review examines recently developed in silico tools and databases that can be used to identify, characterize or predict antigen epitopes recognized by T- and B-cells including the latest generation of B-cell epitope prediction tools that employ peptide-binding information derived from peptide phage display experiments. The application of these tools to facilitate drug development efforts is also discussed.

Keywords B-cell, epitope prediction, immunogenicity, immunoinformatics, major histocompatibility complex, T-cell

Abbreviations IgE immunoglobulin E, MHC major histocompatibility complex, SVM support vector machine

Introduction Immunoinformatics is a specialization within bioinformatics that uses computational methods to better understand, identify and predict the various interactions between antigens and the receptor components of the immune system, primarily B-cell receptors (antibodies), T-cell receptors (TCRs) and major

histocompatibility complex (MHC) receptors. The field of immunoinformatics is relatively new, the term first arising from a 2003 conference [1]. Since then, immunoinformatics and immunomics (immunerelated genomics) have gained traction in the bioinformatics and genomics communities as evidenced by the launch of Immunome Research, the flagship journal of the International Immunomics Society, and the emergence of several international conferences centered on immunoinformatics and immunomics. Early research and development in this field focused on TCR and MHC epitopes as they have a linear conformation and significant progress has been made toward identifying and predicting these epitopes. The understanding, characterization and prediction of B-cell epitopes is a significantly more challenging task and has lagged behind that of T-cell epitopes because of their inherent complexity as non-linear conformational structures of protein-protein interactions. This review focuses on the progress made in immunoinformatics in recent years.

The human immune system is composed of the innate and the adaptive systems. The innate immune system focuses upon the rapid, non-specific clearance of an infectious foreign agent or antigen. Antigens that escape are caught by the adaptive immune system. This capture is accomplished by the recognition of discrete regions (epitopes) of the protein antigen by either B-cells, which produce antibodies, or T-cells, which are involved in cell-mediated antigen destruction and complement activation. Epitopes can be either linear peptide sequences or discontinuous; a discontinuous epitope is one where amino acids are brought together in 3D space to present a coherent surface. Epitopes recognized by antibodies expressed on Bcells are B-cell epitopes. B-cells recognize antigens in their native conformation, whereas T-cells require antigens to be processed by an antigen-presenting cell (APC). During processing, the antigen is internalized by the APC and cleaved by proteases into small fragments, typically 15-20 amino acids long, bound to one of the MHC receptors, then returned to the cell surface and 'presented' to circulating T-cells. These small peptides are known as T-cell epitopes and, since the peptides are bound into the major groove of the MHC receptor protein, the conformation of these epitopes is linear [2].

T-cell epitope prediction Recently, many new tools have been developed for the prediction and/or characterization of T-cell epitopes. These fall into two general categories: the use of sequence similarity to identify patterns and the use of 3D structures to model interactions with MHC molecules. Computational methods employed include sequence motifs, binding matrices, decision trees, artificial neural networks, hidden Markov models and support vector machines (SVMs). Details of how each of these methods has been implemented were recently reviewed [3••] and a summary is shown in Table 1.

Table 1. T-cell epitope-related tools. Tool name BLAST EpiJen EpiMatrix EpiPredict HLABIND

Hepitope ELF IMTECH

Lib Score MEME MHCBench

MHCPred Multipred NetCTL NetMHC NetMHCII NetMHCpan nHLAPred

PepDist PREDEP ProPred

RankPep SVMHC SYFPEITHI

Institution

Accessed from

National Institutes of Health, USA The Edward Jenner Institute for Vaccine Research, Oxford, UK EpiVax Inc, Rhode Island, USA

http:// www.ncbi.nlm.nih.gov/blast/Blast.cgi

University of Tübingen, Germany Bioinformtics and Molecular Analysis Section, Center for Information Technology, National Institutes of Health, USA

http://epipredict.de

Los Alamos National Laboratory, New Mexico, USA

http://www.hiv.lanl.gov/content/immunology/hepitopes/index.html

Institute of Microbial Technology, Chandigarh, India DNA Data Bank of Japan, Shizuoka, Japan University of California, San Diego, USA Institute of Microbial Technology, Chandigarh, India The Edward Jenner Institute for Vaccine Research, Oxford, UK Institute for Infocomm Research, Singapore

http://imtech.res.in/raghava/mhc

Center for Biological Sequence Analysis, Technical University of Denmark and Institute of Medical Microbiology and Immunology, University of Copenhagen

http://www.cbs.dtu.dk/services/NetCTL

Institute of Microbial Technology, Chandigarh, India

http://www.imtech.res.in/raghava/nhlapred

The Hebrew University, Jerusalem Institute of Microbial Technology, Chandigarh, India Dana Farber Cancer Institute, Massachusetts, USA University of Tübingen, Germany

http://www.jenner.ac.uk/EpiJen http://www.epivax.com

http;//www-bimas.cit.nih.gov/molbio/hla_bind

http://www.hiv.lanl.gov/content/sequence/ELF/epitope_analyzer.html

http://libscore.ddbj.nig.ac.jp/cgi-bin/libscore/request.rb?lang=E http://meme.sdsc.edu/ http://www.imtech.res.in/raghava/mhcbench

http://www.jenner.ac.uk/MHCPred http://research.i2r.a-star.edu.sg/multipred

http://www.cbs.dtu.dk/services/NetMHC http://www.cbs.dtu.dk/services/NetMHCII http://www.cbs.dtu.dk/services/NetMHCpan

http://www.pepdist.cs.huji.ac.il/ http://margalit.huji.ac.il/Teppred/mhc-bind/index.html http://www.imtech.res.in/raghava/propred

http://bio.dfci.harvard.edu/Tools/rankpep.html http://www-bs.informatik.uni-tuebingen.de/Services/SVMHC http://www.syfpeithi.de

New T-cell epitope prediction methods are still being developed, such as NetMHCpan (Table 1) [4]. There are currently more than 1500 MHC class I molecules registered, of which less than 10% have been experimentally tested and less than 5% have been well characterized. This new tool takes both antigen peptide and MHC sequence information and generates quantitative predictions of potential peptide-MHC interactions using artificial neural networks. High-throughput computational methods allow epitope searches that are genome-, pathogen- and MHC-wide. One of the major advantages of this approach is that it will predict the binding of any peptide to any known or as yet undiscovered MHC molecule. Since the MHC repertoire represents the diversity of human immune response, this strategy should help facilitate the rational design of vaccines, immunotherapy and diagnostics.

Since there are so many different prediction methodologies, an important consideration centers on the accuracy of their predictions. The predictions are based upon various mathematical models, thus, until their predictions can be tested empirically in the laboratory, caution must be taken in their use. The accuracy of these methods can be improved if the outputs of the individual methods are considered together. In one study, 16 available prediction tools were combined with a heuristic tool developed using several peptide datasets and were evaluated for predicted binding to three MHC alleles [5•]. It was found that the likelihood that a particular peptide was a true binder correlated to the number of tools that agreed and the individual accuracy of those tools.

Recently, in silico prediction using EpiMatrix (Table 1) was successfully used to identify T-cell epitopes on a recombinant therapeutic protein candidate being tested in a Phase I clinical trial [6]. The prediction suggested that potential T-cell epitopes within the therapeutic molecule had a high probability of being presented on five different MHC molecules to T-cells and so generate an immune response. After 76 volunteers received the recombinant therapeutic protein by intraveneous or subcutaneous injection, 37% developed antibodies and it was determined that all of the MHC alleles predicted by the EpiMatrix software were present in various combinations. In addition, seven new alleles were found to be present, illustrating an important application of T-cell prediction tools.

As more protein therapeutics approach the clinic, the desire to predict their immunogenicity will increase. An immune response that alters the pharmacokinetics of the therapeutic could diminish the utility of the molecule as a drug; however, if epitopes mediating immunogenicity can be accurately predicted, they could be engineered out while retaining therapeutic efficacy. In addition to 'de-immunizing' therapeutic protein candidates, the ability to accurately identify T-cell epitopes for therapeutic benefit in the fields of vaccine development, cancer immunotherapy and allergy is beginning to show promise.

Epitope-based vaccines The traditional development of vaccines employing empirical methods is time consuming, labor intensive and often fails. Rational design of vaccines has the potential to be significantly faster with a higher probability of success. The development of epitope-based vaccines is an area of very active development, with progress being reported for anthrax [7], tuberculosis [8,9•], Streptococcus [10] and tularemia [11].

There are two approaches to this type of vaccine. One is to identify the pathogen-unique epitopes that are capable of eliciting a protective response. These peptides then constitute the core of the vaccine. Tools to identify protective epitopes have been summarized in Table 1. A subunit vaccine consists of some fragment of an organism, instead of an inactivated or attenuated organism, to generate an immune response. Therefore, an alternative strategy is to identify a protein containing a unique epitope and use it in a subunit vaccine. One novel method for identifying candidate antigens utilizes an alignmentindependent method based on the inherent physiochemical properties of the amino acids present in a protein sequence [12•]. Prediction accuracies of 70-90% have been demonstrated using the publicly available VaxiJen server (Table 4) [13]. Used in an automated fashion, it could screen the entire proteome of a pathogenic organism for the identification of unique candidate antigens.

In epitope-based vaccine design, conserved epitopes are preferred because they will provide broader protection across multiple strains or species. To facilitate this objective, a new tool has been developed as part of the Immune Epitope Database (IEDB, Table 2), called epitope conservancy analysis [14]. It is used to determine the variability of epitopes within a given set of protein sequences. Amino acid residues that are crucial for the retention of protein function are believed to be associated with intrinsically lower variability even when under immune pressure, therefore, such sequences might represent good targets for vaccines, provided they do not also represent host epitopes. This tool will work with either linear or discontinuous epitopes that have at least three identified residues. This program also could be used to monitor the divergence of an epitope during disease progression, thereby providing insight into immune escape mechanisms. Various other T-cell epitope-related databases are shown in Table 2.

Table 2. T-cell epitope-related databases. Database name AntiJen

Institution

Accessed by http://www.jenner.ac.uk/antijen

Epitome

The Edward Jenner Institute for Vaccine Research, Oxford, UK Dana Farber Cancer Institute, Massachusetts, USA Columbia University Bioinformatics Centre, USA

FIMM

Institute for Infocomm Research, Singapore

http://research.i2r.a-star.edu.sg/fimm

IEDB

La Jolla Institute for Allergy and Immunology, California, USA Centre National de la Recherche Scientifique, Montpellier, France The Edward Jenner Institute for Vaccine Research, Oxford, UK Institute of Microbial Technology, Chandigarh, India

http://www.immuneepitope.org

EPIMHC

IMGT JenPep MHCBN

http://bio.dfci.harvard.edu/epimhc http://www.rostlab.org/services/epitome

http://imgt.cines.fr http://www.jenner.ac.uk/jenpep http://www.imtech.res.in/raghava/mhcbn

As an example of how this has been used, the Rv1818c protein was identified as a source of antigenic variation between different strains of Mycobacterium tuberculosis, to play a role in pathogenesis and to

influence host-cell response to infection [8]. Rv1818c was analyzed for the presence of T-cell epitopes using the publicly available tools SYFPEITHI and BIMAS (Table 1) by scanning nonomer peptides for binding to class I MHC receptors. Two peptides were selected for synthesis that were later demonstrated to bind the predicted MHC receptor and elicit cell lysis.

In two other studies, EpiVax Inc used their EpiMatrix software (Table 1) to develop a more effective vaccine against tuberculosis [9] and Francisella tularensis [11]. The design goal was to create a vaccine based on T-cell epitopes that represented multiple antigens. To accomplish this, relevant proteins were expressed and selected as candidates and screened for T-cell epitopes using EpiMatrix. The identified peptides were synthesized and empirically tested for MHC-binding properties, then the results were rankordered and the best peptides chosen for cell assays. Although all of the selected tuberculosis peptides were shown to be immunogenic and some conferred modest protection to challenge, demonstrating proof of principle, none matched the efficacy of the current standard of care vaccine. However, the result of the F tularensis study illustrates another benefit of immunoinformatics, in that several epitopes that were identified came from proteins which had prior unknown function, demonstrating that prior identification of function is not required for epitope identification.

Similarly, an alternative use for T-cell epitope identification was reported with the analysis of two potential vaccine candidates for Group A Streptococcus (Streptococcus pyogenes, GAS) [10]. In GAS, a major surface antigen, the M-protein, is thought to play a significant role in disease pathogenesis. T-cells responding to M-protein are also cross-reactive to heart proteins, most likely due to structural similarities or mimicry. Vaccines that target M-protein show the greatest promise of efficacy, but the heart crossreactivity poses a safety concern that has hindered efforts to produce a safe and effective vaccine to date. Two peptides derived from M-proteins that elicit protective antibody responses were identified; however, they also induced cross-reacting antibodies. Recombinant constructs were created that retained protective activity and did not appear to show cross-reactivity. One peptide construct was tested more extensively than the other, so to evaluate if the second peptide was likely to have a similar safety profile, the investigators used the programs BLAST, ProPred, RankPep, MEME and HLABIND (Table 1) to determine regions of the peptides that may also be present in host proteins and in which MHC alleles would be involved. After following up with cell assays, it was concluded that both constructs were equally safe and had no cross-reactivity issues. This study illustrates how T-cell epitope prediction can be used to avoid Tcell promiscuity and, therefore, unwanted immunogenicity to the therapeutic protein.

Allergy immunotherapy Allergy is another therapeutic area where immunoinformatics has the potential to make an impact. Allergies arise when individuals develop immunoglobulin E (IgE) antibodies to environmental allergens, which are often small, charged proteins. To date, over 500 'certified' protein allergens have been classified by the International Union of Immunological Societies [15]. Methods have been developed to elucidate epitopes that would identify potential allergens and predict cross-reactivity. Both the European Food Safety Authority and the World Health Organization have developed guidelines for using bioinformatics to predict allergenicity and cross-reactivity that rely primarily upon sequence similarity to known allergens [16,17]. A summary of allergy-specific tools and databases is provided in Table 3.

Table 3. Allergy-related tools and databases. Resource name AlgPred

ALLERDB Allergome Allermatch

APPEL

EVALLER

SDAP WebAllergen

Institution Institute of Microbial Technology, Chandigarh, India Institute for Infocomm Research, Singapore Allergy Data Laboratories, Latina, Italy Wageningen University and Research Center, Wageningen, The Netherlands Computational Science Department, National University of Singapore National Food Administration, Uppsala, Sweden University of Texas Medical Branch, USA Institute for Infocomm Research, Singapore

Accessed by http://www.imtech.res.in/raghava/algpred

http://sdmc.i2r.a-star.edu.sg/Templar/DB/Allergen http://www.allergome.org http://www.allermatch.org

http://jing.cz3.nus.edu.sg/cgi-bin/APPEL

http://bioinformatics.bmc.uu.se/evaller.html

http://fermi.utmb.edu/SDAP http://weballergen.bii.a-star.edu.sg

Recently, two new resources have been developed for predicting IgE epitopes. The first is a tool called EVALLER (Table 3) [18] which uses the detection based on filtered length-adjusted allergen peptides (DFLAP) method for evaluating potential protein allergenicity [19]. SVMs are used to determine if a query sequence is likely to be allergenic. Another useful resource is the Structural Database of Allergenic Proteins (SDAP, Table 3) [20,21]. SDAP is an allergen database that contains information on the sequences, 3D structures and epitopes of known antigens from literature and other public databases. SDAP has integrated search programs that allow users to compare the epitopes and molecular properties of allergens; the variety and application of the individual tools that are available as part of SDAP has been reviewed elsewhere [22••]. One obvious use of these tools is for the identification of potential crossreacting proteins based on structural and physical characteristics that might escape detection by sequence-similarity methods alone. This would allow a clinician, for example, to advise a patient to avoid foods that would not be obvious, but yet share potential cross-reacting proteins that may put them at risk.

Another system called APPEL (Allergen Protein Prediction E-Lab, Table 3) [23] uses sequence-derived structural and physicochemical properties to predict the identity of allergen proteins. Remarkably, this tool was able to correctly classify 93% of 229 allergens and 99.9% of 6717 non-allergens tested. Because APPEL is based on a statistical method rather than a sequence-motif method, it has the potential to discover novel allergen proteins.

In addition to B-cells, allergen-specific T-cells also play a role in allergy. Although it is not currently well understood, immunotherapy with immunodominant T-cell epitope peptides is used as desensitization therapy to induce T-cell non-responsiveness, as reviewed elsewhere [24••]. Current immunotherapy for specific allergens consists of injecting the subject with native allergen extracts; this starts at a low dose and builds up until a maintenance dose is achieved. This treatment regimen has occasional side effects as a result of allergens also binding to cell surface-bound IgE on mast cells and basophils. Treatment with the short peptides of a T-cell epitope would not be able to crosslink IgE and should thereby avoid IgEmediated side effects. This strategy has been successfully demonstrated for bee venom [25,26] and cat allergy [27].

B-cell epitope prediction The area of B-cell epitope prediction faces more significant challenges than that of T-cell epitope prediction. B-cell epitopes can be comprised of linear amino acids in a peptide chain or discontinuous amino acids that are brought into close proximity in 3D space. Discontinuous epitopes are termed conformational epitopes, since their ability to be recognized relies entirely upon their 3D structure, although the recognition of linear epitopes can also be conformation dependent. In addition to recognizing protein epitopes, antibodies and B-cells also recognize non-protein epitopes, called haptens. The SuperHapten database currently has information on over 7500 immunogenic small molecules [28]; however, this review focuses on protein epitopes. A collection of B-cell epitope-related tools and databases are shown in Table 4.

Table 4. B-cell epitope-related tools and databases. Resource name ABCpred AgAbDb AntiJen

Institution

Accessed from

Institute of Microbial Technology, Chandigarh, India University of Pune, India

http://www.imtech.res.in/raghava/abcpred

The Edward Jenner Institute for Vaccine Research, Oxford, UK

http://www.jenner.ac.uk/AntiJen

http://202.41.70.51:8080/agabdb2

Bcepred

Institute of Microbial Technology, Chandigarh, India

http://www.imtech.res.in/raghava/bcepred

BepiPred

Center for Biological Sequence Analysis, Technical University of Denmark and Institute of Medical Microbiology and Immunology, University of Copenhagen

http://www.cbs.dtu.dk/services/BepiPred

CED

Kyoto University, Japan

http://web.kuicr.kyoto-u.ac.jp/~ced

CEP

University of Pune, India

http://bioinfo.ernet.in/cep.htm

DiscoTope

Center for Biological Sequence Analysis, Technical University of Denmark and Institute of Medical Microbiology and Immunology, University of Copenhagen Columbia University Bioinformatics Centre, USA ePitope Informatics ltd, Northumberland, UK

http://www.cbs.dtu.dk/services/DiscoTope

Bcipep

Epitome ePitope IEDB

http://www.imtech.res.in/raghava/bcipep

http://www.rostlab.org/services/epitome http://www.epitope-informatics.com http://www.immuneepitope.com

IEDB B-cell epitope prediction

La Jolla Institute for Allergy and Immunology, California, USA

Mapitope

The Hebrew University, Jerusalem

[email protected]

MEPS

Inter-University Consortium for the Application of Super-Computing for Universities and Research, Rome, Italy Centre National de la Recherche Scientifique, Montpellier, France Kyoto University, Japan

http://www.caspur.it/meps

The Bioinformatics Unit, Tel Aviv University, Israel The Edward Jenner Institute for Vaccine Research, Oxford, UK

http://pepitope.tau.ac.il/index.html

MIMOP MIMOX Pepitope Vaxijen

http://tools.immuneepitope.org/tools/bcell/iedb_input

Available on request from [email protected] http://web.kuicr.kyoto-u.ac.jp/~hjian/mimox

http://www.jenner.ac.uk/VaxiJen

Predicting antigen surface epitopes is a difficult challenge as it is a specialized form of protein-protein interaction. As such, it shares many of the basic tools and research with the protein-protein interaction field. A recent study evaluated the global shape of proteins in a set of non-redundant co-crystal structures to determine whether an anisotropic shape characteristic for protein-protein complexes exists [29]. It was found that, on average, the binding site residues are closer to the center of mass than non-binding residues. It also was revealed that the smaller of the two proteins in a complex usually binds within a concave cavity in the larger protein. Interestingly, the exception to the rule is antibody-antigen complexes, where antibodies almost always bind to a convex surface on the antigen regardless of the relative sizes of the molecules.

In another example of the overlap between protein modeling and B-cell epitope prediction, distances between the antibody surface-binding residues in the complimentarity-determining regions and the surface of the antigen epitope were studied [30]. Since high-affinity binding between the antibody and antigen implies a good fit, the objective was to explore the physical characteristics of the antibody-antigen interaction. Thirty-seven antibody-antigen co-crystal structures were examined and a SVM tool was developed to predict the distances of the residues in the interface. While not having a practical direct use

in its present form, it adds to the understanding of how antibody affinity relates to the microstructure of the antibody-antigen interface.

The process of of B-cell epitope identification can be viewed from two perspectives. The most common approach is to identify which epitope an existing antibody recognizes on its antigen, while the other approach is to start with an antigen and predict the best regions for immunogenicity or for function modulation. This information can be useful in the design of peptide vaccines and therapeutic antibodies and the processes for identification will be reviewed in more detail below.

One method of predicting continuous B-cell epitopes was developed using recurrent neural networks [31]. The network was trained on 700 non-redundant B-cell epitopes taken from the Bcipep database (Table 4) [32] and an equal number of non-epitopes taken from the Swiss-Prot database [33]. A web-based tool, ABCpred (Table 4), is available based on this software. The prediction accuracy is only 65%, which is not remarkable when compared to the 40 to 68% accuracy observed for previous methods and should therefore be used in conjunction with other methods.

Another method for predicting B-cell epitopes was created based on an 'information structure' approach [34] using specific features of information entropy [35] derived from protein training sets [36]. The analysis of previous epitope prediction methods using 500 different amino acid propensity scales and 50 proteins showed only modest improvement over random selection [37]. This low predictability may be a consequence of B-cells recognizing sites with many different properties, so attempting to make predictions based on a single property will only identify a subset of the available epitopes. Cryptic epitopes, which are immunogenic regions that differ from common or immunodominant epitopes that are typically found on the antigen surface, may be buried, partially exposed or, ideally, species-specific immunogenic regions, that reduce the risk of cross-reactivity with human structural homologs.

As a validation exercise using the aforementioned method, two major allergens from Aspergillus fumigatus were investigated. Features were identified that correlated with the presence of known epitopes and three times as many epitopes were found within the identified features than were randomly distributed. Based on the computational results, several peptides were synthesized and injected into mice. All the test peptides gave a robust immune response, including those that represented cryptic or previously unknown epitopes. It is hoped that this tool might be useful in identifying peptide vaccine candidates for epitope vaccines where the use of an immunodominant epitope has a high risk of undesirable cross-reactivity.

A new database, AgAbDb (Table 4), can be added to those maintained for the purposes of cataloging identified epitopes [38]. AgAbDb consists of antibody binding sites to proteins and peptide antigens that have been characterized by co-crystal structures. It contains information on two levels, molecular and atomic, and uses distance- and geometry-based criteria to extract the interacting residues from the antibody and antigen. This information is incorporated in the database and can be retrieved for any existing structure. It is hoped that this database will provide a resource for improved epitope characterization and provide a benchmark for future epitope prediction methods.

A relatively new category of B-cell epitope prediction methodologies are those that incorporate the use of peptide phage display libraries. Screening libraries for their capacity to bind to the antibody of interest generates a pool of peptides, termed mimotopes. Since a mimotope can bind to the antibody, it implies that some information about the epitope must be contained in the target sequence. This information could be part of the actual epitope sequence, if the epitope were linear, or it may be information about the structural or physicochemical nature of the epitope. This area of development is the most promising in terms of the ability to accurately determine the epitope of an antibody. Over the last few years, many tools have emerged to mine the information contained in mimotope pools and derive a predicted epitope.

In late 2006, a tool called MIMOP (Table 4) was published [39]. It predicts the 3D (linear or conformational) epitope of an antibody based on the information content of the mimotope pool derived from peptide phage display. MIMOP consists of two separate programs, MimAlign and MimCons. MimAlign combines the results of four sequence alignments of antigen and mimotope sequences and constructs a frequency matrix at each position. Clusters of high-scoring amino acids are selected from the matrix and considered putative epitopes, then they are scored and ranked. MimCons identifies consensus sequences from the mimotope pool and then, using a 3D structure file for the antigen, it scans the antigen surface residues for these consensus patterns. The highest scoring matches are considered putative epitopes. MIMOP combines the outputs and evaluates them together for a final result. This tool was validated on several known X-ray crystal structures and successfully identified many, but not all, of the residues known to be involved in the epitope. This system was designed to allow the user to intervene in the analysis process and adapt it based on special knowledge they may have about their dataset or target.

Soon after MIMOP was published, another tool, MIMOX (Table 4) was also made public [40]. Using a similar concept to MIMOP, MIMOX will work with either a single mimotope or a mimotope pool. In the case

of a mimotope pool, an alignment is made of the peptides and a consensus peptide is generated. The consensus peptide is mapped over the surface residues of the antigen structure file. In strict mode it looks for an exact match, but in permissive mode it looks for amino acids belonging to the same biochemical class at each position. When validated against examples used by other tools, MIMOX did not perform as well using its default parameters in general; however, after parameter optimization, it identified residues from the true epitope. For this program, foreknowledge of the epitope proved quite useful in guiding the process.

More recently, the introduction of the Pepitope server (Table 4) has significantly advanced the reliability of mimotope-based epitope prediction. Pepitope uses the PepSurf [41••] and Mapitope [42••] algorithms, each with a different methodological approach. PepSurf takes the 3D structure of the antigen and identifies all of the surface residues. It then takes each mimotope and maps it onto the surface of the antigen by essentially searching all possible 3D trajectories for those that have a high similarity to the mimotope sequence. Rather than first aligning the mimotopes and generating a consensus as in MIMOX, it searches each peptide first, then clusters the resulting high-scoring surface paths and assigns the resulting highest-scoring clusters as putative epitopes. The parameters for PepSurf can be tuned to enhance the ability to detect antibody-antigen interactions. It also could be used for other classes of protein-protein interactions; however, there may be a decrease in accuracy because non-antibody proteinprotein interactions do not typically have the same level of intra-molecular affinities. Validation tests of the PepSurf program against four antibodies with known epitopes, including trastuzumab, were performed and the program successfully predicted all four epitopes. Tests were also performed against ten other structures with PepSurf successfully identifying the majority of the actual epitope residues. This represents a significant improvement in the state of the art for mimotope-based prediction.

In contrast to PepSurf, Mapitope takes a very different approach to the problem of mimotope-based epitope prediction [42••]. It attempts to utilize information contained within the mimotope pool as a group to find the genuine epitope by taking amino acid pairs (AAPs) as the smallest piece of information contained within a peptide. It assumes the mimotope pool members contain at least a fraction of the true epitope. Mapitope decomposes each peptide into a series of AAPs, which are then searched through the antigen surface amino acid space. The program is looking for the presence of the AAPs on the surface within defined distance constraints. The frequency with which an AAP is present in the pool and is found on the antigen surface is scored and a determination is made as to whether the occurrence of the AAP is statistically significant. The highest-scoring AAPs are extracted to represent the putative epitope. The

Mapitope program was validated by successfully predicting the correct location and identity of a majority of known epitope residues for two HIV antibodies and trastuzumab. Taken together as Pepitope, PepSurf and Mapitope represent the best methods for de novo prediction of conformational B-cell epitopes currently available.

The MEPS server (Table 4) approaches the problem of mimotope-based epitope prediction by finding a surface region of an antigen that is capable of being mimicked by one or more peptides based on peptide length and the number of side chains necessary for antibody recognition [43]. The system works by taking the antigen structure, identifying the surface residues and constructing a matrix of surface residues within a specified distance from each other. It then extracts all of the possible peptide sequences that are able to mimic portions of the antigen surface. The sequences can be saved to a file in FASTA format and later used for comparison or to search against mimotope sequences. Alternatively, the user can input a protein structure or set of structures, a peptide sequence and a minimum number of allowed mismatches and retrieve the location, if it exists, of the corresponding epitope. Whereas the MEPS server was intended to facilitate identification of antibody epitopes, it also can be used for designing peptides that are able to mimic a portion of two interacting proteins or to identify common surface areas among multiple proteins that may result in cross-reactivity.

Applied immunoinformatics An emerging area of work in many pharmaceutical and biotechnology companies falls into the category of applied immunoinformatics. In this sub-discipline, existing tools are utilized to map epitopes or design strategies for the generation of potential therapeutic antibodies, rather than the development of new predictive algorithms. In one study, for example, the goal was to identify the epitopes of auto-antibodies that arise against L-amino acid decarboxylase (AADC) in subjects with autoimmune polyendocrine syndrome type I [44•]. To accomplish this, the protein structure for AADC was downloaded from PDB and modeled in the DeepView 3D molecular viewer available from ExPASy [45]. Information from chimeric and truncated expression constructs was used to narrow the region of AADC where the conformational epitope could be located on the 3D model. Based on the in silico modeling results, site-directed mutagenesis experiments were performed to probe for the epitope. The mutagenesis constructs were interrogated with the immune sera containing auto-antibodies against AADC. The mutagenesis constructs were successful in ablating binding while retaining enzymatic activity, implying that the molecule remained structurally sound and that the epitope identification was accurate. This report illustrates how applied immunoinformatics is making its way into routine laboratory use.

Conclusion Significant progress is being made in the emerging field of immunoinformatics. More attention is being given to this area at bioinformatics and computational biology conferences and with the increased crossfertilization of ideas from mathematics, traditional bioinformatics and protein modeling, it is to be expected that even better tools are on the horizon. Although the current T-cell epitope prediction tools are useful, the recent arrival of PepSurf and Mapitope suggests that B-cell prediction also is maturing and will soon be able to routinely predict these epitopes with a reasonably high level of confidence. For those using the open-access tools for applied immunoinformatics, as the quality of the tools increases, more and more companies will be following this model to facilitate the development of therapeutic antibodies, vaccines and allergy treatments.

Acknowledgements The authors would like to thank Dr Mary Haak-Frendscho and Dr Seema Kantak for valuable discussion and critical review of this manuscript.

References ••

of outstanding interest



of special interest

1.

Foundation N (Eds): Immunoinformatics: Bioinformatic Strategies for Better Understanding of

Immune Function, No. 254. Wiley VCH, Weinheim, Germany (2003).

2.

Janeway C, Travers P, Walport M, Shlomchik MJ (Eds): Immunobiology 6th Ed. Garland Science

Publishing, New York, NY, USA (2005).

3.

Tong JC, Tan TW, Ranganathan S: Methods and protocols for prediction of immunogenic

epitopes. Brief Bioinform (2007) 8(2):96-108. •• Examines current computational methods used in the prediction and study of epitope/MHC interactions and provides guidelines for epitope prediction based upon the availability of experimental data.

4.

Nielsen M, Lundegaard C, Blicher T, Lamberth K, Harndahl M, Justesen S, Røder G, Peters B, Sette A,

Lund O, Buus S: NetMHCpan, a method for quantitative predictions of peptide binding to any HLA-A and -B locus protein of known sequence. PLoS ONE (2007) 2(8):e796.

5.

Trost B, Bickis M, Kusalik A: Strength in numbers: Achieving greater accuracy in MHC-I

binding prediction by combining the results from multiple prediction tools. Immunome Res (2007) 3:5. • Describes a benchmark study of 16 T-cell epitope-prediction algorithms illustrating an improved likelihood of success when multiple tools agree on an epitope.

6.

Koren E, De Groot AS, Jawa V, Beck KD, Boone T, Rivera D, Li L, Mytych D, Koscec M, Weeraratne D,

Swanson S et al: Clinical validation of the 'in silico' prediction of immunogenicity of a human recombinant therapeutic protein. Clin Immunol (2007) 124(1):26-32.

7.

Pelat T, Hust M, Laffly E, Condemine F, Bottex C, Vidal D, Lefranc MP, Dübel S, Thullier P: High-

affinity, human antibody-like antibody fragment (single-chain variable fragment) neutralizing the lethal factor (LF) of Bacillus anthracis by inhibiting protective antigen-LF complex formation. Antimicrob Agents Chemother (2007) 51(8):2758-2764.

8.

Chaitra MG, Shaila MS, Nayak R: Evaluation of T-cell responses to peptides with MHC class I-

binding motifs derived from PE_PGRS 33 protein of Mycobacterium tuberculosis. J Med Microbiol (2007) 56(Pt 4):466-474.

9.

McMurry JA, Kimball S, Lee JH, Rivera D, Martin W, Weiner DB, Kutzler M, Sherman DR, Kornfeld H,

De Groot AS: Epitope-driven TB vaccine development: A streamlined approach using immunoinformatics, ELISpot assays, and HLA transgenic mice. Curr Mol Med (2007) 7(4):351-368. • Describes a practical path for the development of epitope-based vaccines using immunoinformatics from epitope identification to live challenge.

10. Shaila MS, Nayak R, Prakash SS, Georgousakis M, Brandt E, McMillan DJ, Batzloff MR, Pruksakorn S, Good MF, Sriprakash KS: Comparative in silico analysis of two vaccine candidates for group A streptococcus predicts that they both may have similar safety profiles. Vaccine (2007) 25(18):3567-3573.

11. McMurry JA, Gregory SH, Moise L, Rivera D, Buus S, De Groot AS: Diversity of Francisella tularensis Schu4 antigens recognized by T lymphocytes after natural infections in humans: Identification of candidate epitopes for inclusion in a rationally designed tularemia vaccine. Vaccine (2007) 25(16):3179-3191.

12. Doytchinova IA, Flower DR: Identifying candidate subunit vaccines using an alignmentindependent method based on principal amino acid properties. Vaccine (2007) 25(5):856-866. • Describes an alignment-independent method of antigen recognition based upon amino acid chemical properties with a prediction accuracy of 83%.

13. Doytchinova IA, Flower DR: VaxiJen: A server for prediction of protective antigens, tumour antigens and subunit vaccines. BMC Bioinformatics (2007) 8:4.

14. Bui HH, Sidney J, Li W, Fusseder N, Sette A: Development of an epitope conservancy analysis tool to facilitate the design of epitope-based diagnostics and vaccines. BMC Bioinformatics (2007) 8(1):361.

15. Allergen Nomenclature: International Union of Immunological Societies, Allergen Nomenclature Sub-Committee (2007). http://www.allergen.org

16. Codex Alimentarius Commission, Rome, Italy (2007). http://www.codexalimentarius.net

17. European Food Safety Authority: Guidance document of the Scientific Panel on Genetically Modified Organisms for the risk assessment of genetically modified plants and derived food and feed. EFSA J (2006) 99:1-100.

18. Martinez Barrio A, Soeria-Atmadja D, Nistér A, Gustafsson MG, Hammerling U, Bongcam-Rudloff E: EVALLER: A web server for in silico assessment of potential protein allergenicity. Nucleic Acids Res (2007) 35(Web Server issue):W694-W700.

19. Soeria-Atmadja D, Lundell T, Gustafsson MG, Hammerling U: Computational detection of allergenic proteins attains a new level of accuracy with in silico variable-length peptide extraction and machine learning. Nucleic Acids Res (2006) 34(13):3779-3793.

20. Ivanciuc O, Schein CH, Braun W: SDAP: Database and computational tools for allergenic proteins. Nucleic Acids Res (2003) 31(1):359-362.

21. Ivanciuc O, Schein CH, Braun W: Data mining of sequences and 3D structures of allergenic proteins. Bioinformatics (2002) 18(10):1358-1364.

22. Schein CH, Ivanciuc O, Braun W: Bioinformatics approaches to classifying allergens and predicting cross-reactivity. Immunol Allergy Clin North Am (2007) 27(1):1-27. •• Reviews existing immunoinformatic tools that are applicable to allergy with detailed explanations of the methods involved together with examples of their uses and applications.

23. Cui J, Han LY, Li H, Ung CY, Tang ZQ, Zheng CJ, Cao ZW, Chen YZ: Computer prediction of allergen proteins from sequence-derived protein structural and physicochemical properties. Mol Immunol (2007) 44(4):514-520.

24. Tanabe S: Epitope peptides and immunotherapy. Curr Protein Pept Sci (2007) 8(1):109-118. •• Excellent review of the use of T-cell epitope peptides in current immunotherapy for allergy.

25. Müller U, Akdis CA, Fricker M, Akdis M, Blesken T, Bettens F, Blaser K: Successful immunotherapy with T-cell epitope peptides of bee venom phospholipase A2 induces specific T-cell anergy in patients allergic to bee venom. J Allergy Clin Immunol (1998) 101(6 Pt 1):747-754.

26. Fellrath JM, Kettner A, Dufour N, Frigerio C, Schneeberger D, Leimgruber A, Corradin G, Spertini F: Allergen-specific T-cell tolerance induction with allergen-derived long synthetic peptides: Results of a phase I trial. J Allergy Clin Immunol (2003) 111(4):854-861.

27. Oldfield WL, Larché M, Kay AB: Effect of T-cell peptides derived from Fel d 1 on allergic reactions and cytokine production in patients sensitive to cats: A randomised controlled trial. Lancet (2002) 360(9326):47-53.

28. Günther S, Hempel D, Dunkel M, Rother K, Preissner R: SuperHapten: A comprehensive database for small immunogenic compounds. Nucleic Acids Res (2007) 35(Database issue):D906-D910.

29. Nicola

G,

Vakser

IA:

A

simple

shape

characteristic

of

protein-protein

recognition.

Bioinformatics (2007) 23(7):789-792.

30. Shi Y, Zhang X, Wan J, Wang Y, Yin W, Cao Z, Guo Y: Predicting the distance between antibody's interface residue and antigen to recognize antigen types by support vector machine. Neural Comput Applic (2007) 16(4-5):481-490.

31. Saha S, Raghava GP: Prediction of continuous B-cell epitopes in an antigen using recurrent neural network. Proteins (2006) 65(1):40-48.

32. Saha S, Bhasin M, Raghava GP: Bcipep: A database of B-cell epitopes. BMC Genomics (2005) 6(1):79.

33. Bairoch A, Apweiler R: The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res (2000) 28(1):45-48.

34. Nekrasov AN: Analysis of information structure of protein sequences: A new method for analyzing the domain organization of proteins. J Biomol Struct Dyn (2004) 21(5):615-623.

35. Nekrasov AN: Entropy of protein sequences: An integral approach. J Biomol Struct Dyn (2002) 20(1):87-92.

36. Alekseeva L, Nekrasov A, Marchenko A, Shevchenko M, Benevolenskii S, Sapozhnikov A, Kurup VP, Svirshchevskaya E: Cryptic B-cell epitope identification through informational analysis of protein sequences. Vaccine (2007) 25(14):2688-2697.

37. Blythe MJ, Flower DR: Benchmarking B cell epitope prediction: Underperformance of existing methods. Protein Sci (2005) 14(1):246-248.

38. Ghate AD, Bhagwat BU, Bhosle SG, Gadepalli SM, Kulkarni-Kale UD: Characterization of antibodybinding sites on proteins: Development of a knowledgebase and its applications in improving epitope prediction. Protein Pept Lett (2007) 14(6):531-535.

39. Moreau V, Granier C, Villard S, Laune D, Molina F: Discontinuous epitope prediction based on mimotope analysis. Bioinformatics (2006) 22(9):1088-1095.

40. Huang J, Gutteridge A, Honda W, Kanehisa M: MIMOX: A web tool for phage display based epitope mapping. BMC Bioinformatics (2006) 7:451.

41. Mayrose I, Shlomi T, Rubinstein ND, Gershoni JM, Ruppin E, Sharan R, Pupko T: Epitope mapping using combinatorial phage-display libraries: A graph-based algorithm. Nucleic Acids Res (2007) 35(1):69-78. •• Pepsurf predicts conformational B-cell epitopes by aligning affinity-isolated peptide sequences to computationally derived antigen surface sequences. The method successfully predicted epitopes of four antibody/antigen co-crystal structures in a statistically significant manner.

42. Bublil EM, Freund NT, Mayrose I, Penn O, Roitburd-Berman A, Rubinstein ND, Pupko T, Gershoni JM: Stepwise prediction of conformational discontinuous B-cell epitopes using the Mapitope algorithm. Proteins (2007) 68(1):294-304. •• Mapitope predicts conformational B-cell epitopes using amino acid pairs derived from sequences of affinity-isolated peptides. The method was successfully validated with four antibody/antigen co-crystal structures.

43. Castrignanò T, De Meo PD, Carrabino D, Orsini M, Floris M, Tramontano A: The MEPS server for identifying protein conformational epitopes. BMC Bioinformatics (2007) 8(Suppl 1):S6.

44. Bratland E, Wolff AS, Haavik J, Kämpe O, Sköldberg F, Perheentupa J, Bredholt G, Knappskog PM, Husebye ES: Epitope mapping of human aromatic L-amino acid decarboxylase. Biochem Biophys Res Commun (2007) 353(3):692-698. • Provides an example of applied immunoinformatics using open-access tools to predict a conformational B-cell epitope.

45. DeepView – Swiss-PdbViewer: GlaxoSmithKline R&D & the Swiss Institute of Bioinformatics, Geneva. Switzerland (2006). http://www.expasy.org/spdbv