tGRAP, the G-protein coupled receptors mutant ... - BioMedSearch

1 downloads 0 Views 338KB Size Report
containing mutant data on five families of GPCRs,. i.e. Family A, rhodopsin-like; Family B, secretin-like;. Family C, metabotropic glutamate-like; Family D,.
© 2002 Oxford University Press

Nucleic Acids Research, 2002, Vol. 30, No. 1

361–363

tGRAP, the G-protein coupled receptors mutant database Øyvind Edvardsen*, Anne Lise Reiersen, Margot W. Beukers1 and Kurt Kristiansen Department of Pharmacology, Institute of Pharmacy, Breivika, University of Tromsø, N-9037 Tromsø, Norway and 1Department of Medicinal Chemistry, Leiden/Amsterdam Center of Drug Research, University of Leiden, The Netherlands Received August 21, 2001; Revised and Accepted October 12, 2001

ABSTRACT The searchable mutant database tGRAP (previously called tinyGRAP) at the University of Tromsø contains data on mutated G-protein coupled receptors (GPCRs). All data have been extracted from scientific papers and entered manually into the database. The current version of the tGRAP mutant database (tGRAP.uit.no, release 10, April 2001) contains around 10 500 mutants extracted from almost 1400 research papers containing mutant data on five families of GPCRs, i.e. Family A, rhodopsin-like; Family B, secretin-like; Family C, metabotropic glutamate-like; Family D, pheromone; Family E, cAMP receptors. A query form provides rapid and simple access to relevant mutant information. In addition to this query form, a tool that enables the user to access mutation data via sequence alignments has been introduced. The ability to access mutant data from such alignments increases the usefulness of the mutant database and facilitates comparison of mutagenesis data between receptors. Moreover, this tool allows the construction of tailor-made sequence alignment views from any combination of receptors belonging to the same class. The database is available at http://tGRAP.uit.no/. INTRODUCTION G-protein coupled receptors (GPCRs) are important targets for therapeutic drugs. Therefore, their structural and functional features have been extensively characterized. Although all GPCRs are believed to have seven transmembrane α helices, there is a large variation in amino acid sequences among the receptors (1). Furthermore, the parts of the receptors which are involved in binding of transmitter substances, such as neurotransmitters and hormones, differ markedly between different classes of receptors. A large amount of data on structural and functional features of GPCRs has been developed by the creation, expression and subsequent characterization of GPCR mutants. In order to facilitate access to relevant data on GPCR mutants we have developed a database of GPCR mutants (tGRAP; 2,3) that have been functionally tested in cellular experiments. The database is available at http://tGRAP.uit.no/.

Apart from information about the actual mutations, the database also contains qualitative information about how the mutated receptor was tested including information about the experimental conditions. Hand in hand with technical improvements of the database system itself, we have put much effort into the continuous extraction of mutant data from published papers. All these data are made available to the GPCR research community via a web-based search form, via amino acid sequence alignments and via external services such as snake plots of receptors (4). Mutant database release number 10 contains around 10 500 mutants extracted from around 1400 scientific papers. METHODS The database covers all known types of GPCRs but papers dealing exclusively with descriptions of clinical findings related to splice variants and polymorphisms are not included. In other words, all papers describing experiments (such as radioligand binding or functional testing) on mutant GPCRs, including splice variants and polymorphisms, have been incorporated into the database. To guarantee high quality data, the GPCRs that have been subjected to mutagenesis have to be present in the annotated database SWISS-PROT (5). This annotation provides, among other things, the crucial information on the sequence positions of the seven transmembrane regions that are used to create the secondary structure assignments. Relevant papers are identified by manual and semi-manual literature searches. Subsequently, the data are extracted manually and encoded in a file format developed by some of the authors. This file format allows automatic verification of the data as described below. Apart from the mutant information, data on receptor name, classification, source and splice variant are included along with links to the SWISS-PROT amino acid sequence entries (Fig. 1). Links into OMIM (6) are included based on the OMIM number if present in the SWISS-PROT entry or based on the SWISS-PROT entry gene names. Bibliographic information on authors, paper title and journal reference is gathered from PubMed and links to the relevant PubMed entries are included. A cross-reference to all mutants described in that paper is shown below the mutant information. Mutants are encoded as amino acid substitutions, deletions, insertions and chimers, and combinations of these. For

*To whom correspondence should be addressed. Tel: +47 77 64 66 33; Fax: +47 77 64 66 45; Email: [email protected]

362

Nucleic Acids Research, 2002, Vol. 30, No. 1

Figure 1. A mutant entry example. The AC (accession code), DE (description), NR (name of receptor), OS (organism), ID (identification) and TO (link to OMIM) records describe the receptor that has been mutated. RA (author), RT (title), RL (journal reference) and RX (PubMed code and link to PubMed) records contain information on the literature reference that describes the mutant and the experiments that have been done to characterize the mutant. The PM record indicates that a single amino acid has been substituted and the data show the amino acid substitution and which receptor segment has been altered. The following records, LB (ligand binding), SM (second messenger), OD (origin of data), ES (expression system), AS (assay system), CT (cell type) and VE (vector type) show which experiments were performed and the experimental conditions. Finally, the MT (mutant type) and DR (database release number) records show the classification of the mutant and in which database release it was entered. Underlined phrases contain links to external databases.

chimers, only the end points of the affected region(s) are currently searchable. A set of experimental conditions and functional tests has been defined to be included in the database, and this set is continuously updated and extended as new data are entered. For each mutant entry these relevant data (e.g. expression system, functional assay used, constitutive activation) are included (Fig. 1). As mentioned above, the manually entered input data files are being automatically checked by software. Especially, the presence in the parent receptor of the amino acids that were subjected to mutation and their sequence positions are verified against the SWISS-PROT entries. From the checked, and if necessary corrected, input files, the full mutant entries (Fig. 1) are being constructed. Based on the data in SWISS-PROT, the secondary structure assignments of the mutated amino acid are encoded in the mutant files, hence the need for secondary structure annotation in the SWISS-PROT entries. Mutant information is encoded in a mutant file format that is similar to the one used by SWISS-PROT. Records that have been copied from SWISS-PROT or that carry the same type of information as in SWISS-PROT have the same record labels (e.g. AC for SWISS-PROT accession code; OS for species, etc). Care has been taken to ensure that records unique to the mutant database have unique labels, with respect to SWISS-PROT. Due to an unsystematic way of handling receptor names in SWISS-PROT we have included the hierarchical receptor name scheme as is being used in GPCRDB (7). Thus, it is

possible to carry out correct and exhaustive name searches among the mutated receptors. Almost all software was made using the Python programming language (8). The database query engine is freeWAIS-sf which is based on text indexing. Sequence alignments for the major three (A, B and C) families of GPCRs were constructed with the CLUSTALX 1.81 and Genedoc 2.5.010 programs. All GPCR amino acid sequences were retrieved from the SWISS-PROT database (Release 40 and updates up to March 23, 2001 for the alignments of Family A and C; and Release 38 and updates up to June 1, 2000 for the Family B alignment). Sequence fragments were omitted from the alignments. The number of amino acid sequences included in the alignments is 941 (Family A), 59 (Family B) and 23 (Family C). Full and partial alignment views (Fig. 2) are available. In addition, both a wide and narrow (printer friendly) format are provided. RESULTS AND DISCUSSION tGRAP (tGRAP.uit.no) release 10 is available to the GPCR research community. Around 1400 new mutants were added for this release, bringing the grand total number of mutants to approximately 10 000. The tGRAP mutant data have been extracted from a total of around 1400 scientific papers. tGRAP is offered as a web-based bioinformatics tool for GPCR research. The database is a useful tool for planning

Nucleic Acids Research, 2002, Vol. 30, No. 1

363

Figure 2. Mutant data can be accessed from alignments of amino acid sequences. One alignment for each of the receptor Families A, B and C are available from the mutant database server. Furthermore, from a selection page for each family alignment the user can choose to view only parts of the alignment by selecting the desired subgroup(s) of receptors within the family. Thus, subsections of the large family alignments can be viewed. A part of the Family A alignment including transmembrane region 1 as indicated by the characters above the alignment is shown. The transmembrane regions correspond to those in the bovine rhodopsin structure (9). Underlined amino acids in the alignment link to mutant data, while the leftmost underlined receptor labels point directly to sequence entries in SWISS-PROT. Clicking an underlined amino acid produces either a list of mutants involving the amino acid or a mutant entry as shown in Figure 1.

mutant experiments, for interpretation of information from such experiments and for receptor modeling. Access to the mutant data is provided through a search form, through amino acid sequence alignments (Fig. 2) or from external services (7). As shown in Figure 1, the basic data on receptor name, classification and secondary structure assignments have been imported from SWISS-PROT (5) and GPCRDB (7). Apart from these data, the database provides detailed information on the amino acids being altered, qualitative information on experiments and experimental conditions, and literature references (Fig. 1). REFERENCES 1. Kolakowski,L.F. (1994) GCRDB: a G-protein-coupled receptor database. Receptors Channels, 2, 1–7. 2. Kristiansen,K., Dahl,S.G. and Edvardsen,Ø. (1996) A database of mutants and effects of site-directed mutagenesis experiments on G protein-coupled receptors. Proteins, 26, 81–94.

3. Beukers,M.W., Kristiansen,K., IJzerman,A.P. and Edvardsen,Ø. (1999) TinyGRAP database: a bioinformatics tool to mine G-protein-coupled receptor mutant data. Trends Pharmacol. Sci., 20, 475–477. 4. Campagne,F., Jestin,R., Reversat,J.L., Bernassau,J.M. and Maigret,B. (1999) Visualisation and integration of G protein-coupled receptor related information help the modelling: description and applications of the Viseur program. J. Comput. Aided Mol. Des., 13, 625–643. 5. Bairoch,A. and Apweiler,R. (2000) The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res., 28, 45–48. 6. Hamosh,A., Scott,A.F., Amberger,J., Valle,D. and McKusick,V.A. (2000) Online Mendelian Inheritance in Man (OMIM). Hum. Mutat., 15, 57–61. 7. Horn,F., Weare,J., Beukers,M.W., Horsch,S., Bairoch,A., Chen,W., Edvardsen,Ø., Campagne,F. and Vriend,G. (1998) GPCRDB: an information system for G protein-coupled receptors. Nucleic Acids Res., 26, 275–279. 8. Watters,A., van Rossum,G. and Ahlstrom,J.C. (1996) Internet Programming with Python. M&T Books, New York, NY. 9. Palczewski,K., Kumasaka,T., Hori,T., Behnke,C.A., Motoshima,H., Fox,B.A., Le Trong,I., Teller,D.C., Okada,T., Stenkamp,R.E. et al. (2000) Crystal structure of rhodopsin: a G protein-coupled receptor. Science, 289, 739–745.