Trans-HHS Workshop: Diet, DNA Methylation ... - Journal of Nutrition

1 downloads 409 Views 459KB Size Report
database that is exclusively dedicated to DNA methylation: MethDB (4) ... simple search mode; and 2) several query masks for a struc- tured search. A user who .... server and is maintained at the Institute for Human Genetics, an Institute of the ...
Trans-HHS Workshop: Diet, DNA Methylation Processes and Health DNA Methylation Database “MethDB”: a User Guide1,2 Christoph Grunau,3 Eric Renault and Ge´rard Roizes Laboratoire de Se´quences Re´pe´te´es et Centrome`res Humains, Institut de Ge´ne´tique Humaine, Montpellier, France ABSTRACT The DNA Methylation Database (MethDB; http://www.methdb.net) is a public database dedicated to DNA methylation. It attempts to store all data about DNA methylation in a common source. MethDB can be searched in different ways, ranging from a simple browse mode to detailed queries for sequence-specific DNA methylation profiles and patterns, total methylation content data or environmental conditions that can influence the methylation state of DNA. Currently, the database contains 2570 values for methylation content and 5278 methylation patterns and profiles for a total of 83 genes or loci. MethDB is an annotated database, and the scientific community is invited to submit their own data ([email protected]). J. Nutr. 132: 2435S–2439S, 2002. KEY WORDS:



DNA methylation



MethDB



epigenetics

DNA methylation is a modification of the DNA that encodes for epigenetic information. The existence of methylated bases in the DNA has been known for !50 y (1), and the importance of DNA methylation for development and health of an organism was recognized early on (2). During the past 20 y, the findings about DNA methylation have been accumulating, and the introduction of new techniques has led to an explosion of available data in recent years. Many new insights about the biological role of DNA methylation have been gained. In particular, the relationship between cancer development and abnormal methylation might bring about promising new approaches in diagnosis and therapy (3). Despite this, many aspects of DNA methylation remain enigmatic, and the combined expertise of researchers in many fields will be necessary to clarify its biological role. Even for experts the comparison of experimental results and the extraction of common motifs often are difficult because experimental strategies and techniques differ considerably. In addition general publication standards for data representation have never



database



cancer



5-methylcytosine

been established. To find the related information, bibliographic databases are helpful in many cases, but a user who endeavors to scan the literature to find methylation data for a particular gene, tissue or disease will face a labor-intensive and time-consuming task. Moreover, many of the original findings have never been published due to the limitations and the character of paper journals. For this reason we established a database that is exclusively dedicated to DNA methylation: MethDB (4) (http://www.methdb.net). For the moment the thematic focus of MethDB lies on the fifth base in eukaryotic cells, 5-methylcytosine (5-MC).4 A description of the structure of MethDB has been published before (4), and in this article we propose strategies for making effective use of the database. STRUCTURE OF THE METHYLATION DATA DNA methylation data are heterogeneous and can range from the global estimation of the total 5-MC content to the exact methylation pattern of a single DNA molecule. In MethDB, therefore, two types of methylation data exist: 1) sequence-specific methylation profiles and methylation patterns; and 2) total methylation content. A methylation pattern is the sequence of the five nucleotide bases (including 5-MC) in a single DNA molecule. Methylation profiles are representations of the average methylation along a sequence. Both are necessarily sequence-specific data forms. Methylation content refers to the relative amount of 5-MC in a DNA sample. This can be sequence-specific data if a DNA fragment was investigated, but in general total DNA or a subset of the DNA was used to determine its degree of methylation. This distinction should be kept in mind when MethDB is used. It

1 Presented at the “Trans-HHS Workshop: Diet, DNA Methylation Processes and Health” held on August 6 – 8, 2001, in Bethesda, MD. This meeting was sponsored by the National Center for Toxicological Research, Food and Drug Administration; Center for Cancer Research, National Cancer Institute; Division of Cancer Prevention, National Cancer Institute; National Heart, Lung and Blood Institute; National Institute of Child Health and Human Development; National Institute of Diabetes and Digestive and Kidney Diseases; National Institute of Environmental Health Sciences; Division of Nutrition Research Coordination, National Institutes of Health; Office of Dietary Supplements, National Institutes of Health; American Society for Nutritional Sciences; and the International Life Sciences Institute of North America. Workshop proceedings are published as a supplement to The Journal of Nutrition. Guest editors for the supplement were Lionel A. Poirier, National Center for Toxicological Research, Food and Drug Administration, Jefferson, AR, and Sharon A. Ross, Nutritional Science Research Group, Division of Cancer Prevention, National Cancer Institute, Bethesda, MD. 2 Part of this work was supported by the Klaus Tschira Foundation (Heidelberg, Germany). 3 To whom correspondence should be addressed. E-mail: [email protected].

4 Abbreviations used: 5-MC, 5-methylcytosine; MethDB, DNA Methylation Database; PCR, polymerase chain reaction.

0022-3166/02 $3.00 © 2002 American Society for Nutritional Sciences. 2435S

2436S

SUPPLEMENT

also is helpful to understand that in MethDB the sample (or experimental target) is structurally separated from the experiment and the methylation data (i.e., three different, but interrelated, tables exist for these logic entities). Each sample could have been analyzed in multiple experiments, with each delivering either one or more than one set of methylation data. SEARCHING IN METHDB MethDB can be accessed from any Web browser (www. methdb.net or www.methdb.de). On the front page, a navigation bar can be found on the left-hand side, and some general information is displayed in the central frame. Clicking on “Search” brings the user to a page with different search options (Fig. 1). Using a database for the first time is not as trivial as it seems. The occasional user might be either overwhelmed by the sheer number of unspecific hits or disappointed by not finding anything. Intelligent design of the query masks helps to make database access easier; however, the very specific needs of each different user cannot be anticipated. Therefore, we decided to provide two basic access routes to MethDB: 1) a simple search mode; and 2) several query masks for a structured search. A user who has only a vague idea of what he or she is looking for can use the “simple search” option on the search page or might browse the entire contents of MethDB. The latter could be an approach for a user who has never used MethDB before and wants to get an idea about the sort of data he or she can expect to find in the database. In the “simple search” field, single search words like “liver,” “diet” or “human” can be entered. A logical operator must not be used for the combination of search terms (e.g., “tumor and liver” would search literally for this combination of words and probably would not find anything). In fact not the entire database, but the fields “method name,” “sample name,” “restriction enzyme,” “species name,” “phenotype,” “tissue,” “locus” and “submitting author” are searched. In many cases this basic search facility might be sufficient, but a user who has a precise idea of what he or she is looking for should use the “structured search”

FIGURE 1 Main search page of MethDB. In the upper third of the page are the gateways to the detailed query masks. In the middle of the page is the field where single search terms can be entered. On the bottom of the page are two buttons to browse either the methylation content lists or experiments that gave sequence-specific methylation patterns and profiles. 5mC, 5-methylcytosine.

option. Here, four query masks are offered: 1) search for methylation patterns and profiles; 2) search for methylation content data; 3) search for environments that can influence the DNA methylation state; and 4) the matching of 5-MC content data with phenotypes. Users who are interested in total 5-MC content data will follow the corresponding link on the search page that brings them to a query mask (Fig. 2). Here, the desired species, sex, tissue and gene or locus can be entered. For the phenotype there exists a basic choice between tumor samples, healthy wild-type samples or both. The DNA type can be total DNA, nonrepetitive genomic regions (unique sequences), repetitive sequences and single genomic fragments. Generally, single genomic fragments are parts of genes, regulatory regions or specific hybridization loci. There are many experimental ways to determine the degree of DNA methylation, and accordingly we find a rich choice of absolute and relative units for the 5-MC content. Some of these are in principle convertible into each other (e.g., “mol% of all nucleotides” and “mol% of 5-MC " cytosine”) as long as the common data basis is known (e.g., guanine plus cytosine content). This is not the case for many data sets. Besides, comparison of data from different sources should be performed with care because the systematic errors apparently can be considerable. Generally, the policy of MethDB is to provide the raw data as they were published or submitted by the author and leave the interpretation to the user of the database. Further choices for the 5-MC content units are “DMH” for “differential methylation hybridization” and “MSP” for “methylation-sensitive polymerase chain reaction.” Both are experimental methods used to determine the methylation degree in specific DNA fragments. They deliver relative units and values between 0 and 1. The initial result of the query is a table in which each row represents one experiment. The species, sex, tissue, developmental state, DNA type, 5-MC content of the sample and the data source (proof) for this experiment are shown. The Details link leads to more data about the nature of the experiment and the investigated sample, including expression data if a specific gene was investigated. Generally, clicking on a hypertext link

DNA METHYLATION DATABASE

2437S

FIGURE 2 Query mask to search in the experiments that delivered total 5-methylcytosine (5mC) content data. PCR, polymerase chain reaction.

provides the user with background information about the highlighted term (e.g., a description of the tissue, the title of the publication and the description of the method used). A user who wants to find sequence-specific methylation patterns or profiles must follow the corresponding link on the search page, which gives access to another query mask (Fig. 3). Here again the requested species, sex, tissue, gene and phenotype can be entered. Every sequence for which methylation data exist in MethDB has a unique identification number (sequence ID). If this number is known (e.g., as a reference in a publication), the data also can be accessed directly by searching for this number. As for the 5-MC content data, the result of the query is a table in which each row represents one

experiment. The Details link brings the user to the data for this experiment, including a graphic representation of the methylation patterns or of the methylation profile, respectively (Fig. 4). The raw data are accessible via the hyperlinked pattern/profile ID numbers. A click on this ID number gives the user access to either a table for the methylation profiles or a guanine-adenine-thymine-cytosine-5-MC sequence for the methylation patterns. Clicking on the sequence ID provides the user with information about the analyzed sequence including exon-intron structure and regulatory sequences in the analyzed region. Another link leads to a description of the method that was used to generate the methylation data. If polymerase chain reaction was used, the reaction conditions

FIGURE 3 Query mask for the experiments that produced sequence-specific methylation patterns and profiles.

2438S

SUPPLEMENT

FIGURE 4 Experimental details page. Hypertext links lead to descriptions of the tissue (A), the phenotype (B) and the experimental procedure (C), the polymerase chain reaction (PCR) (1) details (D) and the investigated gene (E) and give details about the studied sequence (F). Links to PubMed (G) and the National Center for Biotechnology taxonomy browser (H) provide access to background information in external databases. The sample data can be viewed via the sample ID (I) and the guanineadenine-thymine-cytosine-5-methylcytosine (GATC5-MC) sequence via the pattern ID.

and primer sequences are available. The structure of MethDB allows several experiment entries to exist for one sample entry. It is imaginable that for a certain sequence, the degree of methylation was determined by methylation-sensitive polymerase chain reaction and additionally that the methylation profile was generated with bisulfite genomic sequencing. Sample and experiment therefore are placed in different tables in the database. On the experimental details page, the sample ID link gives access to data of the sample used in this experiment. Different sequence fragments are treated as different samples. The same is true for the complement DNA strand if a strandspecific method was used to determine the distribution of 5-MC (e.g., bisulfite genomic sequencing). If the investigated sequence section has related 5# or 3# neighbors, links are available to the corresponding sample entries. It has been shown that certain diets (5), drugs (6) and other environmental agents (7) can influence the methylation state

FIGURE 5 Query mask to search in the environmental conditions to which some samples were exposed and that potentially influenced their DNA methylation state.

of DNA. Therefore, a table was incorporated in MethDB that contains descriptions of environmental conditions to which some of the samples were exposed. This table can be searched (Fig. 5), and matching entries are shown as a list of experiments comparable with the scheme described above. It also is conceivable that users who have determined the 5-MC content in a certain sample wish to compare their results with entries in the database. For this purpose the phenotype search mask was created: users can enter their own values (or a value range) and find experiments that have delivered similar values for a similar type of sample. Submitting Data to MethDB MethDB is a public database, and the scientific community is invited not only to submit data but also to comment on its design and accessibility and to suggest improvements. Data

2439S

DNA METHYLATION DATABASE

can be sent to an annotator by E-mail ([email protected]). An E-mail submission form is available on the submission page. Methylation profiles should be submitted as tables of three comma-separated values: position in a mother-sequence, degree of methylation at this site between 0 (no methylation) and 1 (complete methylation), and standard error (if available). Methylation patterns are sequence files in a modified FASTA format with all nucleotide bases except 5-MC written in lower case and 5-MC residues written in upper case C. A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater than (!) symbol in the first column (8). For bisulfite genomic sequencing data (9), methylation patterns in the above file format can be conveniently generated with the MethTools Software Suite (10). For help with the conversion of data or for submitting data in a different format, please send a message to help@ methdb.de. FUTURE ASPECTS MethDB contains 2570 methylation content values and 5278 methylation profiles and patterns for a total of 83 genes and 44 species (as of April 5, 2002). The number of visits to MethDB has been growing constantly and for March 2002, 7245 requests were recorded. For the moment, the focus of the work on MethDB lies on the improvement of the data basis (i.e., collection of literature data, annotation and entering of the experimental findings, as well as communication with the authors of the original publications) to ensure a maximum quality of the database entries. A major part of the bibliographic work will be dedicated to the search for data that

substantiate the interrelation of DNA methylation, cancer and environment. Once the data basis of MethDB has grown sufficiently it is, however, important to develop new datamining and analysis tools as well. MethDB has its own Web server and is maintained at the Institute for Human Genetics, an Institute of the French National Scientific Research Centers. ACKNOWLEDGMENT The authors are grateful to Je´ roˆ me Buard for carefully reading the manuscript.

LITERATURE CITED 1. Hotchkiss, R. D. (1948) The quantitative separation of purines, pyrimidines, and nucleosides by paper chromatography. J. Biol. Chem. 175: 315–332. 2. Holliday, R. & Pugh, J. E. (1975) DNA modification mechanisms and gene activity during development. Science 187: 226 –232. 3. Costello, J. F. & Plass, C. (2001) Methylation matters. J. Med. Genet. 38: 285–303. 4. Grunau, C., Renault, E., Rosenthal, A. & Roizes, G. (2001) MethDB: a public database for DNA methylation data. Nucleic Acids Res. 29: 270 –274. 5. Poirier, L.A. (1986) The role of methionine carcinogenesis in vivo. Adv. Exp. Med. Biol. 206: 269 –282. 6. Creusot, F., Acs, G. & Christman, J.K. (1982) Inhibition of DNA methyltransferase and induction of Friend erythroleukemia cell differentiation by 5-azacytidine and 5-aza-2#-deoxycytidine. J. Biol. Chem. 257: 2041–2048. 7. Cox, R. (1985) Selenite: a good inhibitor of rat-liver DNA methylase. Biochem. Int. 10: 63– 69. 8. Pearson, W. R. & Lipman, D. J. (1988) Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85: 2444 –2448. 9. Frommer, M., McDonald, L. E., Millar, D. S., Collis, C. M., Watt, F., Grigg, G. W., Molloy, P. L. & Paul, C. L. (1992) A genomic sequencing protocol that yields a positive display of 5-methylcytosine residues in individual DNA strands. Proc. Natl. Acad. Sci. USA 89: 1827–1831. 10. Grunau, C., Schattevoy, R., Mache, N. & Rosenthal, A. (2000) MethTools: a toolbox to visualize and analyze DNA methylation data. Nucleic Acids Res. 28: 1053–1058.