Transcription Regulation of Eukaryotic Genes - Springer Link

2 downloads 0 Views 363KB Size Report
known to possess more than 5000 transcription factor binding sites and about 1600 regulatory ... Key words: transcription regulation, eukaryotic genes, database.
Molecular Biology, Vol. 35, No. 6, 2001, pp. 794–801. Translated from Molekulyarnaya Biologiya, Vol. 35, No. 6, 2001, pp. 934–942. Original Russian Text Copyright © 2001 by Kolchanov, Podkolodnaya, Ananko, Ignatieva, Podkolodnyi, Merkulov, Stepanenko, Pozdnyakov, Belova, Grigorovich, Naumochkin.

UDC 577.113.3:57.087.1

Transcription Regulation of Eukaryotic Genes: Description in the TRRDatabase N. A. Kolchanov, O. A. Podkolodnaya, E. A. Ananko, E. V. Ignatieva, N. L. Podkolodnyi, V. M. Merkulov, I. L. Stepanenko, M. A. Pozdnyakov, O. E. Belova, D. A. Grigorovich, and A. N. Naumochkin Institute of Cytology and Genetics, Siberian Division, Russian Academy of Sciences, Novosibirsk, 630090 Russia; E-mail: [email protected] Received May 12, 2001

Abstract—The structure of the Transcription Regulatory Regions Database (TRRD) and the principles of describing transcription regulation of eukaryotic genes in TRRD are considered. Formalized description of the structural and functional organization of the regulatory gene regions is illustrated with examples. By now, TRRD is based on 3500 original works and contains data on transcription regulation of more than 1100 genes known to possess more than 5000 transcription factor binding sites and about 1600 regulatory elements (promoters, enhancers, silencers). TRRD is available at http://www.bionet.nsc.ru/trrd/

Key words: transcription regulation, eukaryotic genes, database

INTRODUCTION Data on gene expression regulation have been rapidly accumulated over the recent decades. Expression of eukaryotic genes is a complex multistep process, which includes transcription, mRNA processing, translation, protein posttranslational modification, etc. Transcription is the key event and initiates all others. This explains the great interest in mechanisms of the transcriptional control of gene expression. Each year dozens of journals publish thousands of works with numerous findings obtained with a variety of techniques and characterizing various aspects of transcriptional regulation. It is extremely difficult to summarize and to order all these new data without the modern information technology. Molecular biological databases have come to be essential for research in molecular biology, molecular genetics, gene engineering, biotechnology, molecular evolution, molecular physiology, cell biology, bioinformatics, etc. Several databases with information on gene expression regulation are available. First and foremost, these include nucleotide sequence databases EMBL and [1] and GenBank [2] and protein databases SWISS-PROT [3], PIR [4], and PDB [5]. Specialized databases, which concern individual aspects of transcription regulation, have also come into broad use, including TRANSFAC for transcription factors [6], COMPEL for composite elements [7], and EPD for

eukaryotic promoters [8]. To describe the structural and functional organization of large regulatory regions of eukaryotic genes and to integrate data on the molecular mechanisms of transcription regulation, we have developed a Transcription Regulatory Regions Database (TRRD) [9, 10]. TRRD is a component of an information system GeneExpress-2 [11, 12], which is available from the server of the Institute of Cytology and Genetics (http://wwwmgs.bionet.nsc.ru/) and is linked with other databases and with programs allowing processing and analysis of formalized information. This papers concerns formalized description of experimental data and principles of the organization of information in TRRD, as well as the structure and the content of this database. DESCRIPTION OF TRANSCRIPTION REGULATION IN TRRD Regulatory regions of eukaryotic genes are predominantly located in noncoding regions and are complex in structure. A scheme of a 5' regulatory region is shown in Fig. 1. Each regulatory region contains binding sites for various transcription factors. These factors interact with the basal transcription complex to modulate the rate of its function. Groups of their sites form extended regulatory units, such as promoters, enhancers, and silencers [13]. It should be noted that

0026-8933/01/3506-0794$25.00 © 2001 MAIK “Nauka /Interperiodica”

TRANSCRIPTION REGULATORY REGIONS DATABASE Silencer

6

5

Enhancer 4

3

2

1

5'-region

7

– + N



+

+

Basal transcription complex Transcription start

Promoter

Fig. 1. Organization of the 5'-regulatory region in eukaryotic genes.

regulatory elements can occur in the 5' gene region both close to (within 200 nt) and far (several kilobases) away from the transcription start, as well as in gene introns and in the 3' region [14]. TRRD describes the structure of regulatory regions of eukaryotic genes transcribed by RNA polymerase II; the function of regulatory elements; and gene expression as dependent on ontogenetic stage, cellcycle phase, cell type, cell differentiation, external factors, etc. A unit entry in TRRD corresponds to a gene. The information on transcription regulation is distributed through six interconnected tables: (a)

795

(1) TRRDGENES which describes genes and the structure of their regulatory regions, (2) TRRDUNITS which describes extended regulatory regions (promoters, enhancers, silencers, etc.), (3) TRRDSITES which contains data on transcription factor binding sites, (4) TRRDFACTORS which contains data on transcription factors affecting expression of a given gene, (5) TRRDEXP which describes specific features of gene expression, and (6) TRRDBIB which contains relevant references. Such data arrangement allows direct access to a TRRD section of interest. Examples of entries from sections (1)–(5) are given below. The format of TRRD has previously been described in detail [10]. A distinctive feature is that TRRD contains only data verified in special experiments (Table 1). For instance, to identify an extended regulatory element (promoter, enhancer, silencer, etc.), a DNA region under study is fused with a reporter gene, and the resulting plasmid constructs are tested for reporter expression in various conditions. Experiments are coded in TRRD. To illustrate, Fig. 2a shows a fragment of TRRDUNITS entry P01195 which describes the promoter region of mouse COX2. The region has been studied in cell line MC3T3-E1 transfected with plasmid constructs in which a gene fragment with the promoter region (see ExperimentCodes, code 6.8) and its deletion derivatives (code 6.1.1) are fused with a reporter gene. In addition, the effect of the tumor necrosis factor α on construct expression has been analyzed (code 6.5). (b)

Fig. 2. Fragments of TRRD entries. (a) An entry describing the promoter region of mouse COX2. Only a fragment is shown of the gene sequence (Sequence) of 621 nt (SeqLength) from EMBL (DNA_BankLink). There are no data on the tissue specificity of the promoter (PromotTisSp). Two TFBS are known (Site:, Position:). (b) An entry describing the retinoic acid receptor-binding site from the promoter of mouse RARβ. The site (Sequence) is in region –54, –34 relative to the transcription start (SequencePosition) and binds with two proteins, RAR and heterodimer RXRα/RXRβ (FactorName), to enhance gene transcription (FactorInfluence). Indicated are positions of the first nucleotide in the EMBL sequences (DNA_BankLink) and a reference to a list of analogous sites in the SAMPLES database [15] (DatabaseReference). Experiments (ExperimentCodes) are described in the text. MOLECULAR BIOLOGY

Vol. 35

No. 6

2001

796

KOLCHANOV et al.

GeneExpress-2 TRRD

Site Recognition BLAST

TRRDGENES

EMBL GenBank

TRRDUNITS

TRANSFAC ACTIVITY SAMPLES

TRRDSITES TRRDEXP

TRRDFAC TRRDBIB

COMPEL

GeneNet EPD

GDB

SWISS-PROT

MEDLINE

Fig. 3. Links of the TRRD SRS tables (bars) with external databases (cylinders) and information processing systems (bricks).

Transcription factor binding sites (TFBS) are located in experiments on DNA protection from nucleases and chemical agents (footprinting) and in

variants of the electrophoretic mobility shift assay (EMSA). The functional role of a site is commonly verified by studying the effect of deletions and point mutations on transcription (with a site of interest fused with a heterologous promoter), by in vivo footprinting, etc. As an example, Fig. 2b shows a fragment of TRRDSITES entry S5200 which describes the binding site for the retinoic acid receptor (RAR) from mouse RARβ. The site has initially been localized via the deletion analysis of a RARβ fragment contained in plasmid constructs (ExperimentCodes, code 6.1.1), which were expressed in cell line CV1. Data have been obtained on expression of plasmids with point mutations in the site (code 6.2) and on cell cotransfection with a vector carrying RARα (code 6.6). EMSA with DNA competitors (code 3.2.2) has been employed in studying the binding of an oligonucleotide corresponding to the site with proteins of CV1 nuclear extracts and the effect of point mutations on their binding (code 3.3). DNA methylation protection (code 4.1) has been assayed with a nuclear extract of S91 cells, and proteins binding to the site have been

Table 1. Examples of experiments providing information for TRRD Experiment

TRRD code

Identification of transcription start sites Primer extension assay RNase (T1, A) and nuclease-S1 mapping Detection and initial analysis of extended regulatory regions Promoter insertion upstream of a reporter gene Fusion of a DNA fragment with a homologous or heterologous promoter and a reporter gene Deletion analysis Analysis of the effects of various agents in the transient transfection assay Detection of TFBS DNase I footprinting with crude nuclear extracts OP-Cu footprinting with crude nuclear extracts DNase I footprinting with purified or recombinant proteins In vivo footprinting Methylation protection assay EMSA with crude nuclear extracts EMSA with competitive oligonucleotides EMSA with mutant oligonucleotides Identification of the TF binding to the site DNase I footprinting with purified or recombinant proteins DNase I footprinting with crude nuclear extracts and specific antibodies EMSA with purified or recombinant proteins EMSA with crude nuclear extracts and specific antibodies Functional analysis of sites Insertion of an isolated site in a homologous or heterologous promoter Comprehensive mutant analysis In situ mutagenesis Cotransfection assay with plasmids expressing transcription factors In vivo footprinting MOLECULAR BIOLOGY

5 5.5 6.8 6.3.1 6.1.1 6.5 1.1.1 1.6 1.1.5 1.5 4.1 3.1 3.2 3.3 1.1.5 1.1.6 3.5 3.6 6.3.2 6.2 12 6.6 1.5 Vol. 35

No. 6

2001

TRANSCRIPTION REGULATORY REGIONS DATABASE

797

Fig. 4. Graphical map of the human aldolase A gene, which was generated with the TRRD-Viewer program.

identified with antibodies against transcription factors RAR and RXR (code 3.6). In addition, EMSA has shown that the RAR-binding site interacts with isolated RAR (code 3.5). Codes for some experiments that provide data for TRRD are given in Table 1. INFORMATION CONTENT OF TRRD As of April, 2001, TRRD contains information on transcription regulation of more than 1100 genes. For these genes, there are data on more than 5000 TFBS and more than 1500 regulatory regions, including promoters, enhancers, and silencers. The data have been collected from more than 3500 publications.

Genes of 94 species are described in TRRD, including man (34%), mouse (22%), rat (14%), chicken (4%), frog (1%), and other organisms (25%). Primary emphasis is given to the development of TRRD sections that describe transcription regulation of functionally important groups of genes (Table 2). TRRD SOFTWARE The Sequence Retrieval System (SRS) is used to access TRRD through Internet, which allows effective search of TRRD and of the linked databases and integrates TRRD with data analysis systems of GeneExpress-2 [11, 12]. SRS provides a possibility of navigation, interactive search for necessary information, and complex inquiries to TRRD and the linked databases.

Table 2. Functional groups of genes described in TRRD Functional group

TRRD section

Total genes

Heat-shock genes

HS-TRRD

91

Interferon system genes

IIG-TRRD

111

Genes specifically regulated in erythroid cells

ESRG-TRRD

63

Lipid metabolism genes

LM-TRRD

78

Endocrine system genes

ES-TRRD

115

Glucocorticoid-regulated genes

GR-TRRD

52

Plant genes

PLANT-TRRD

136

Cell-cycle genes

CYCLE-TRRD

55

MOLECULAR BIOLOGY

Vol. 35

No. 6

2001

798

KOLCHANOV et al.

The interlinks of the TRRD SRS tables with external databases and with data analysis systems are shown in Fig. 3. +



Graphical representations of data can be obtained with the TRRD-Viewer program, which generates a map of gene regulatory elements. To illustrate, Fig. 4 shows a map of regulatory elements of the human aldolase A gene, which was constructed with this program on the basis of TRRD data. It is seen that the gene has three independent promoters, one enhancer, and a regulatory region in intron 4 (long bars above the scale). These regions contain ten TFBS (short bars below the scale). EXAMPLES OF DESCRIPTIONS OF REGULATORY REGIONS AND FEATURES OF TRANSCRIPTION REGULATION IN TRRD Transcription Factor Binding Sites

Fig. 5. A fragment of the human embryonic β-globin gene and its description in TRRDSITES (entries S1675, S1676).

A TFBS is the simplest regulatory element of 4−20 nt. A transcription factor specifically binds to its

Fig. 6. Interaction of the glucocorticoid receptor with the GR-binding site in the rat aspartate aminotransferase gene and its description in the TRRDSITES (on the left) and TRRDFACTORS (on the right). MOLECULAR BIOLOGY

Vol. 35

No. 6

2001

TRANSCRIPTION REGULATORY REGIONS DATABASE

799

Fig. 7. A fragment of TRRDGENES entry A00264 describing the human apolipoprotein A gene. The description of the composite element is framed.

site in a regulatory gene region and thereby affects the function of the RNA polymerase complex. Some sites are recognized by several transcription factors, which usually belong to one family. These factors specifically regulate transcription of target genes in various situations. For instance, interferon regulatory factors (IRF) bind to similar nucleotide sequences and thereby control interferon-inducible genes [16–18]. The effect on target genes is activating in the case of IRF-1, IRF-3, IRF-4, and IRF-7 (TRRDSITES entries S1360, S1393, S914, etc.) and mostly inhibiting in the case of IRF-2 and ICSBP (TRRDSITES entries S1361, S2773, S2767, etc.). However, IRF-2 activates transcription of human VCAM-1 MOLECULAR BIOLOGY

Vol. 35

No. 6

2001

(TRRDSITES entry S4817) [19], and ICSBP contained in multimeric complex with PU.1 and IRF-1 enhances transcription of human gp91phox (TRRDSITES entry S4613) [20]. In some cases, one nucleotide sequence is recognized by factors of different families. An illustrative example is the human embryonic β-globin gene (Fig. 5). Two factors, erythroid-specific GATA-1 and omnipresent YY1, bind to region –281, –257 of this gene. The latter blocks transcription, be it basal or induced by GATA-1 [21, 22], which allows fine tissuespecific regulation of the gene in ontogeny. A greater extent on transcription activation is often achieved via duplication or multiplication of a binding

800

KOLCHANOV et al. E2F1 (G0, onset G1)

E2F1 (G1/S, S)

50

10

10

2 +p107 +pRb

+CycD1 +Cdk2

Fig. 8. Expression pattern of the human E2F1 gene described as dependent on the cell-cycle phase.

site for a proper transcription factor. Site copies overlap in some cases. For instance, an unusual binding site for the glucocorticoid receptor (GR) has been found in the rat aspartate aminotransferase gene. Palindromic GR-binding sites (classical glucocorticoid regulation elements binding with dimeric GR) are on both DNA strands, which allows simultaneous binding of two GR dimers (Fig. 6) and dramatically increases gene transcription in the rat liver on induction with glucocorticoids [23]. Composite Elements Various TFBS (simple regulatory elements) can form more complex structures known as composite elements. In this case, transcription regulation depends on the interaction of several transcription factors bound to the element [24]. It is peculiar that their effects are not additive. For instance, HNF-3β and HNF-4 independently enhance transcription of the human apolipoprotein A gene. When both factors bind to the promoter region, transcription increases many times and ensures maximal expression of the gene in terminally differentiated hepatocytes [25]. A description of this gene and of its composite element in TRRD is shown in Fig. 7. Extended Regulatory Regions TFBS and composite elements are components of large regulatory regions, such as promoters, enhancers, and silencers [13]. A gene can be alternatively transcribed from two promoters functioning in different tissues at different developmental stages and

responding to different inducers (Fig. 4), which allows fine expression regulation. Many eukaryotic genes have several enhancers differing in functional properties. Thus the human apolipoprotein A gene contains an enhancer, which functions in the liver and in the intestine, in the 5' region and another, intestine-specific one in the 3' region (Fig. 7). The highest level of transcription regulation concerned in TRRD involves locus control regions (LCR), which determine coordinated tissue- and stage-specific expression of genes in a locus. For instance, the human β-globin gene locus has LCR about 6 kb upstream of the transcription start of the ε-globin gene [26]. This LCR consists of five functional elements, including three enhancers, and regulates coordinated transcription of the five genes for ε-, Aγ-, β-, δ-, and Gγ-globins [27, 28]. The LCR function has been considered in detail elsewhere [29]. Description of General Expression Features Expression of some genes markedly varies with tissue, cell type, cell-cycle phase, and external effects. Data on specific expression features of genes are available from the TRRDEXP SRS table. To illustrate, Fig. 8 shows a description of the expression pattern of the human gene for transcription factor E2F1. Expression of E2F1 is low in G0 and in early G1 owing to the suppressory effect of the retinoblastoma gene product and is high in G1/S and in S owing to induction with D-cyclins [30]. MOLECULAR BIOLOGY

Vol. 35

No. 6

2001

TRANSCRIPTION REGULATORY REGIONS DATABASE

CONCLUSION

801

9. Kolchanov, N.A., Podkolodnaya, O.A., Ananko, E.A., et al., Nucleic Acids Res., 2000, vol. 28, pp. 298–301.

There are now many databases on expression regulation of eukaryotic genes. A unique feature of TRRD is that its data have been obtained in experimental studies of extended regulatory regions, transcription factor binding sites, and expression patterns of various eukaryotic genes. These data may be of advantage for research in molecular biology, genetics, and pharmacology, including studies on transcription regulation of individual genes, computer analysis of unknown regulatory sequences, prediction of the effect of mutations on gene function, design of experiments on transgenesis, etc. Further development of TRRD is aimed at format improving to describe the nucleosomal organization of chromatin; DNA methylation; and changes in gene expression caused by point substitutions, deletions, and insertions in the regulatory regions.

10. Kolchanov, N.A., Ananko, E.A., Podkolodnaya, O.A., et al., Nucleic Acids Res., 1999, vol. 27, pp. 303–306. 11. Kolchanov, N.A., Ponomarenko, M.P., Kel, A.E., et al., Proc. Int. Conf. Intell. Syst. Mol. Biol., 1998, pp. 95– 104. 12. Kolchanov, N.A., Ponomarenko, M.P., Frolov, A.S., et al., Bioinformatics, 1999, vol. 15, pp. 669–686. 13. Kel’, A.E., Kolchanov, N.A., Kel’, O.V., et al., Mol. Biol., 1997, vol. 31, pp. 626–636. 14. Carey, M. and Smale, S.T., Transcriptional Regulation in Eukaryotes, Cold Spring Harbor, N.Y.: Cold Spring Harbor Lab. Press, 1999. 15. Vorobiev, D.G., Ponomarenko, J.V., and Podkolodnaya, O.A., Proc. 1st Int. Conf. Bioinformatics Genome Regulation Structure, BGRS’98, Novosibirsk, 1998, pp. 58–61. 16. Darnell, J.E., Jr., Kerr, I.M., and Stark, G.R., Science, 1994, vol. 264, pp. 1415–1421.

ACKNOWLEDGMENTS We are grateful to I.V. Lokhova and L.V. Katokhina for bibliographic support of TRRD and to E.V. Maksakov for working out the TRRD-Viewer program. This work was supported by the Russian program Human Genome and the Russian Foundation for Basic Research (project nos. 01-07-90203, 00-04-49229, 00-04-49225, 00-07-90337, 99-07-90203).

17. Harada, H., Takahashi, E.-I., Itoh, S., et al., Mol. Cell. Biol., 1998, vol. 14, pp. 1500–1509. 18. Pitha, P.M., Au, W.C., Lowther, W., et al., Biochimie, 1998, vol. 80, pp. 651–658. 19. Jesse, T.L., LaChance, R., Iademarco, M.F., and Dean, D.C., J. Cell Biol., 1998, vol. 140, pp. 1265–1276. 20. Eklund, E.A., Jalava, A., and Kakar, R., J. Biol. Chem., 1998, vol. 273, pp. 13957–13965. 21. Peters, B., Merezhinskaya, N., Diffley, J.F.X., and Noguchi, C.T., J. Biol. Chem., 1993, vol. 268, pp. 3430–3437.

REFERENCES 1. Stoesser, G., Baker, W., van den Broek, A., et al., Nucleic Acids Res., 2001, vol. 29, pp. 17–21. 2. Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., et al., Nucleic Acids Res., 2000, vol. 28, pp. 15–18. 3. Bairoch, A. and Apweiler, R., Nucleic Acids Res., 2000, vol. 28, pp. 45–48. 4. Barker, W.C., Garavelli, J.S., Hou, Z., et al., Nucleic Acids Res., 2001, vol. 29, pp. 29–32. 5. Bhat, T.N., Bourne, P., Feng, Z., et al., Nucleic Acids Res., 2001, vol. 29, pp. 214–218. 6. Wingender, E., Chen, X., Fricke, E., et al., Nucleic Acids Res., 2001, vol. 29, pp. 281–283.

22. Raich, N., Clegg, C.H., Grofti, J., et al., EMBO J., 1995, vol. 14, pp. 801–809. 23. Garlatti, M., Daheshia, M., Slater, E., et al., Mol. Cell. Biol., 1994, vol. 14, pp. 8007–8017. 24. Kel, O.V., Romaschenko, A.G., Kel, A.E., et al., Nucleic Acids Res., 1995, vol. 23, pp. 4097–4103. 25. Harnish, D.C., Malik, S., and Karathanasis, S.K., J. Biol. Chem., 1994, vol. 269, pp. 28220–28226. 26. Grosveld, F., van Assendelft, G.B., Greaves, D.R., and Kollias, G., Cell, 1987, vol. 51, pp. 975–985. 27. Jackson, J.D., Petrykowska, H., Philipsen, S., et al., J. Biol. Chem., 1996, vol. 271, pp. 11871–11888. 28. Lowrey, C.H., Bodine, D.M., and Neinhuis, A.W., Proc. Natl. Acad. Sci. USA, 1992, vol. 89, pp. 1143–1147.

7. Kel-Margoulis, O.V., Romashchenko, A.G., Kolchanov, N.A., et al., Nucleic Acids Res., 2000, vol. 28, pp. 311–315.

29. Podkolodnaya, O.A., Levitsky, V.G., and Podkolodnyi, N.L., Mol. Biol., 2001, vol. 35, pp. 943–951.

8. Perier, R.C., Praz, V., Junier, T., et al., Nucleic Acids Res., 2000, vol. 28, pp. 302–303.

30. Johnson, D.G., Ohtani, K., and Nevis, J.R., Genes Dev., 1994, vol. 8, pp. 1514–1525.

MOLECULAR BIOLOGY

Vol. 35

No. 6

2001