Collagen Gene - The Journal of Biological Chemistry

1 downloads 0 Views 6MB Size Report
May 4, 1981 - the a2 (type I) collagen gene has been used as template .... scribed in an in vitro cell-free system as it is transcribed in. -. 5' vv 0 0 e e VI v 3'. A.
THEJOURNALOF BIOLOGICAL CHEMISTRY Vol. ‘256, No. 21, Issue of November 10, pp 11251-11258.1981 Prrnted ~n U S.A.

Accurate in Vitro Transcriptional Initiation of the Chick a2 (Type I) Collagen Gene* (Received for publication, May 4,1981)

Glenn T. MerlinoS, GabrielVogeli, Tadashi Yamamoto, Benoit de Crombrugghe, and Ira Pastan From the Laboratoryof Molecular Eiology, Division of Cancer Biology and Diagnosis, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20205

Chick genomic DNA containing the extreme 5’ end of the a2 (type I) collagen gene has been used as template inan in uitm HeLa cell transcription system. RNA polymerase 11-dependent transcriptioninitiates from a specific site on this DNA. The precise location of this site was determined by three types of experiments: sizing of in vitro-synthesized RNA runoff transcripts, of the in vitro-madeRNA trancomparing the sequence scripts with the structure of the DNA template, and identifying the first and second nucleotides of the in vitro-synthesized transcripts. Transcription was found to initiate 33 base pairs downstream from a canonical Goldberg-Hogness sequence (TATAAATA). This in uitro start site is the sameas the initiation site of in uiuosynthesized collagen RNA.

scripts, 2) comparison of the sequence of the in vitro-made RNA transcripts with the structure of t h e DNA template, 3) identification of the first and second bases of t h e in vitrosynthesized RNA transcripts by determining which nucleotides are associated with t h e c a p . T h e in vitro start site is identical with the site from which transcription starts in vivo ( 19). MATERIALS AND METHODS

Preparation of Whole Cell Extracts-Extracts for in vitro transcription were prepared from HeLa cells (from BRL) using a modification of the procedure of Manley et al. (20). The cells were homogenized in a minimal volume of 10 mM 4-(2-hydroxyethyl)-l-piperazineethanesulfonic acid (pH 7.9), 1 mM EDTA, 5 mM dithiothreitol buffer, usually 2.7X the volume of the packed cells, and then 2.7 packed cell volumes of 20 mM 4-(2-hydroxyethyl)-l-piperazineethanesulfonic acid (pH 7.9), 10 mM MgCI2,2 mM dithiothreitol, 50% glycerol, 25% sucrose were added. Saturated (NHdrSO, was then added to lo‘% Collagen is a pervasive, ubiquitous protein found in most of saturation. The final protein concentration of all extracts was between 22 and 26 mg/ml. animaltissues, where it serves as an extracellularmatrix Subcloning of DNA Fragments-DNA fragments from pgCOL323constituent (1, 2). Collagen gene expression is altered i n hu- 3 (containing a large portion of the 5’ end of the a2 collagen gene) mans and animals in a variety of pathologicalstates (3). were subcloned into the Hind111 site of pBR322using ’”P-labeled Cultured cells also exhibit changes in collagen synthesis in HindIII-specific decanucleotide linkers (Collaborative Research) by response to exposure to various factors(4-8). In cultured chick a procedure modified from Yamamoto et al. (21). HinfI-cleaved DNA embryo fibroblasts, Rous sarcoma virus transformation fragments were made blunt-ended in a 30.~1reaction containing 250 greatly depresses the synthesis of al(1) and a2(I) collagen (4, mM NaC1, 40 mM Na acetate (pH 4.5), 1 mM ZnC12,10 pg of tRNA, 0.6 pg of denatured salmon sperm DNA, and 10 units of S1 nuclease. 9, 10) a n d their corresponding mRNAs (11-15). Using intronIncubation was at 40 “C for 30 min. After phenol extraction and specific probes,it has previously been shown that the levelsof chloroform extraction, the DNA was recovered by ethanol precipitaboth precursor and m a t u r e a2 collagen RNA change in the tion. After a 10-min incubation with the Escherichia coli DNA same manner upon RSV‘ infection, suggesting that the activ- polymerase I Klenow fragment (Boehringer Mannheim), the bluntended DNA was ligated to 330 pmol (50-fold molar excess) of ILPity of t h e a2 collagen gene in RSV-transformed cells is melabeled Hind111 linkers using 100 units/ml of T,3ligase (BRL). diated by transcriptional control (16). HindIII-digested pBR322 wasligated to the linker-containing fragWe have previously reported the isolation and identification ments as described previously (21). except that TJ ligase was used at of a series of genomic clones spanningthe entire a2(I) collagen 100 units/ml, and the concentration of pBR322 was 6 pg/ml. The gene (17, 18).One of these contains the extreme5’ end of this resulting molecules were used to transform E. coli HB101. Cells were gene, including the in vivo transcription start site (19). One grown in L-broth to 0.15 Aax,unit, harvested, concentrated in 50 mM way to further our understanding of how the a2 collagen gene CaCI2,and added to a ligated DNA solution in a 2:l (v/v) ratio (cells: is regulated is to examine its activity in a cell-free in vitro DNA). The DNA had previously been suspended in 10 mM Tris-HC1 transcription system.In this report, we show that this collagen (pH ‘7.5). 10 mM MgCl,,10 mM CaC12. The cells were subjected to temperature shock (221, incubated for 40 min at 37 “C, and spread DNA will function as an efficient template in a HeLa cell in over L-agar containing 50 pg/ml of ampicillin. All recombinant DNA vitro transcription system (20). In this system, transcription experiments were carried out in accordance with the NIH Guidelines initiates at a discrete site. T o determine the precise location for Research Involving Recombinant DNA Molecules. Preparation of Template DNA-Plasmid DNA was isolated using of the transcriptional initiation site we have chosen three experimental approaches: 1) determination of t h e sizes of t h e standard techniques including chloramphenicol amplification and equilibrium density centrifugation in CsCl/ethidium bromide. a-amanitin sensitive in vitro-synthesized RNA runoff tranSupercoiled plasmid DNA was digested with various restriction * The costs of publication of this article were defrayed in part by enzymes under conditions suggested by the manufacturer. The the payment of page charges. This article must therefore be hereby cleaved DNA was phenol-extracted, twice chloroform-extracted, twice marked “aduertisement” in accordance with 18 U.S.C. Section 1734 ethanol-precipitated, anddissolved in 10 mM Tris-HC1 (pH 8.0), 1 mM solely to indicate this fact. EDTA. This mixture of DNA fragments was either used for in vitro f Supported by a fellowship from the Cystic Fibrosis Foundation transcription directly or first subjected to electrophoresis onlow in 1980 and subsequently by a fellowship from the Arthritis Founda- melting agarose to isolate a specific fragment. tion. The plasmid pSmaF containing the adenovirus major late promoter The abbreviations used are: RSV, Rous sarcoma virus; PEI-cel- was a gift of P. A. Weil (University of Iowa). lulose, polyethyleneimine-cellulose;bp, base pair; kb, kilobase pair. In Vitro Transcription Reaction-RNA synthesis reactions were



11251

11252

Chick a2 Collagen Transcription Gene Vitroin

set up as previously described (23), except that analytical reaction volumes were usually 25 p1, and [a-”PIGTP (ICN, 20 Ci/mmol) was used a t 0.5-1.0 @/PI. Reactions were terminated and RNAwas prepared for polyacrylamide gel electrophoresis using conditions described by Weil et al. (24). For agarose gel electrophoresis, a total of three ethanol precipitations were performed to ensure adequate removal of unincorporated [a-’”PIGTP. For preparative RNA synthesis, reaction volumes of 50-100 p1 were utilized. Each [a-’”PINTP (Amersham, 2000-3000 Ci/mmol) was at . prepared in this manner was used for fingerprinting 15-20 p ~ RNA (see below). Fractionation of RNA Transcripts-Large RNA transcripts (greater than 900 bases) were fractionated as detailed elsewhere (25). Briefly, RNA was denatured using a glyoxal/dimethyl sulfoxide sample buffer and then electrophoresed on 1% agarose. The gel wasdried and used for autoradiography (Kodak XAR5 film). Smaller RNA transcripts were fractionated on 4% polyacrylamide, 7 M urea as described by Yamamoto et al. (23) after dissolution in 80%formamide sample buffer. Recovery of RNA for sequence analysis was achieved after 4% polyacrylamide gel electrophoresis and localization of specific transcripts by autoradiography, by electrophoretic elution in 20 mM Tris/ acetate (pH 8.3). These RNAs were filtered through 0.45-pm nitrocellulose and precipitated with ethanol. RNase Digestion-[”PIRNA transcripts were exhaustively digested a t 37 “C with one of the following: 60 units of RNase TI (Sankyo) in a 3-1.11reaction of 10 mM Tris-HC1 (pH 7.5). 1 mM EDTA, for 30 min; 0.5unit of RNase T2 (Sankyo) in a 10-p1reaction of 20 mM Na acetate (pH 4.5), 1 mM EDTA, for 1-2 h; 5 pg of RNase A (Worthington) in a 10-pl reaction of 20 RIM Tris-HC1 (pH 7.5), 2 mM EDTA, for 1h; or 5 pg of nuclease PI (gift from U. Schmeissner, NIH) in a 10-pl reaction of 50 mM Na acetate (pH 6.0) for 2 h. All reactions routinely contained 10-20 pg of carrier tRNA. RNA Fingerprinting-In vitro-synthesized RNA was sequenced by published procedures (26, 27). RNA was digested with RNase TI and the resulting oligonucleotides were separated in two dimensions. RNA products were first subjected to electrophoresis on DEAE-paper (pH 3.5) a t 5000 V, and after transfer to PEI-cellulose, homochromatographed in the second dimension. Spots were localized by autoradiography, cut out, and eluted with 30% triethylamine bicarbonate. RNase A secondary digestion products were analyzed by electrophoresis on DEAE-paper (pH 3.5) at 1500 V for 2 h. RNase Tz secondary digestion products were analyzed by chromatography in isobutyric acid/NH,OH on Avicel cellulose thin layer plates. Digestion products were visualized by standard fluorographic techniques. Analysis of Cap Structure-Using Bam HI-cleaved pCa2PHO-3 (containing the extreme 5’ end of the a2 collagen gene) as DNA template, four separate preparative in vitro transcription reactions were run, each containing only one [a-’”P]NTP (2000-3000 Ci/mmol). After isolation of the appropriate transcript, equal counts per min of each were digested separately with either nuclease P1 or RNase T2 (as detailed above). The resulting digestion products were then separated on PEI-cellulose (10 X 10 cm), as described elsewhere (23.28). Briefly, the RNA products were chromatographed on a f m t dimension with 1.4 M Li formate, 7 M urea (pH 3.5) for 5 cm, and then with 2.3 M Li formate, 7 M urea (pH 3.5) to 10 cm. Chromatography in the second dimension was with 0.8 M LiCI, 7 M urea, 20 mM Tris-HCI (pH 8.0).

A a

Endogenous-

Band

b

c

B d

e

f

9

h

- 0.47kh

-

1.4 kb 1.13 kb-

FIG. 2. Autoradiogram of in uitm-synthesized runoff RNA. A, RNAs were synthesized using HeLa whole cellextracts and [a-’”PI GTP, denatured in glyoxal-dimethyl sulfoxide, and fractionated on 1% agarose. The gels were dried and subjected to autoradiography. DNA templates usedfor in vitro transcription were: lanea, pgCOL323-3 X Sst I; lane b, same as a + 1pg/ml of a-amanitin; lane c, pgCOL323-3 X Sst and Sma I; lane d, pgCOL323-3 X Sma I; lane e, isolated 1.8-kb Sau3a fragment; lane fsame as e + 1 pg/ml of aamanitin. B , RNAs were synthesized as inA, denatured in 80% formamide, and fractionated on 4% polyacrylamide, 7 M urea. Lane g, reaction products using 1.8-kb Sau3a fragment X BstNI as template; lane h, same as g + 1 pg/ml of a-amanitin.

vivo. Extractsto beusedfor in vitro transcription were prepared from HeLa cells by a modification of the procedure of Manley et al. (20) (see “Materials and Methods”). The transcriptional activity of all extracts was tested using DNA templates containing known eukaryotic promoters. Plasmid DNA from pSmaF, containing the major late promoter of adenovirus 2 (24), and pSR1, containing the promoterof RSV (23, 29), were transcribed both actively and accurately in our HeLa extracts. In addition, this transcriptional activity was completely sensitive to low levels of a-amanitin, proving that eukaryotic polymerase I1 was responsible for the observed transcription (30). Fiftyper centinhibition was found to be at about 0.02 pg/ml of a-amanitin (data not shown).

Size of in Vitro-synthesizedRunoff Transcripts DNAs to be utilized as templates in the HeLa in vitro transcription system were truncated a t various sites downstream from the putative start site using specific restriction enzymes. The location of the in vitro initiation site was then deduced by comparing the sizes of the resulting runoff transcripts. This general strategy was applied to several different DNA templates. RESULTS Transcription of 5’ Collagen Genomic DNA-The in vivo Our goal was to determine if chick collagen DNA is tran- transcription start of the a2 collagen gene was located by a scribed in an in vitro cell-free system as it is transcribed in variety of methods within a 3.5-kb genomic fragment, which wascloned into pBR322 (18, 19). The resulting plasmid, pgCOL323-3, was used as template in the HeLa system. A 5‘ vv 0 0 e e V I v 3’ partial restriction mapof the chick DNA insert of pgCOL323A O A O A A 0 O A O A I I I I I 3 is shown in Fig. 1. The in vivo start site is located 8 bp to 0 lo00 m 3Mx) the left of the middle Sma I cleavage site (Fig. 1, origin of lbase pairs1 arrow) (19). For initial in vitrotranscription studies, Sst I was v - Sau 3 a utilized to cleave the templateDNA (pgCOL323-3) a t a single 6 - Hinf I site (Fig. 1, chick insert). Introduction of a discrete cut prex-SSI I 0 - B s f NI vented theproduction of numerous different sized, end to end e-Smal RNA polymerase transcripts. TemplateDNA (final 50 pg/ml) FIG. 1. Restriction Map of pgCOL323-3 a 2 collagen DNA was incubated with [a-”’PIGTP and HeLa extract, and the insert. The thin lines represent pBR322 sequences, while the thick line is chickgenomicDNA. The arrow indicates the direction of resulting transcription productswere denatured and fractiontranscriptional activity. The origin of the arrow marks the in vivo ated on a 1% agarose gel. Upon visualization of radioactive start site. material by fluorography, a 1.4-kb a-amanitin-sensitive tran-

-

Chick a2 Collagen Gene Transcription Vitroin script wasobserved(Fig. 2 A , lane a). Theoretically, this transcript could have initiated from either the3’ or the 5’ side of the Sst I site, terminating at the Sst I site. To determine the direction of transcription, this template (Sst I-cleaved pgCOL323-3) was further digested with Sma I, which cuts only to the left of the Sst I site (Fig. 1).When added to the HeLa system, this DNA failed to produce the 1.4-kb band (Fig. 2 A , lane c),suggesting that thedirection of transcription was from left to right on the map shown in Fig. 1 (arrow). To confm that this DNA region did indeed contain promoter-like activity, the 1.8-kb Sau3a fragment (from 900 to 2700 in Fig. 1) was isolated on a low melting agarose gel and transcribed in the in uitro HeLa system. Fig. 2 A , lane e displays a prominent a-amanitin sensitive transcript of 1.13 kb, confming the location of the promoter. Furthermore, this Sau3a fragment was restricted with BstNI and the resulting DNA fragments were usedas templates in vitro. The products were denatured and fractionated on 4% polyacrylamide. Fig. 2B, lane g shows that a strong band 0.47 kb in length appears. These three results together confm thatthe location of the promoter activity is within two 400-bp HinfI restriction sites (Fig. 1, from 1350 to 1750) in the 3.5-kb chick insert. Transcription of the Subcloned Hint7 Fragment-To determine precisely the in vitro start site, and to confm the orientation of polymerase 11-dependent transcription, it was necessary to generate smaller RNA transcripts which could be measured more accurately. To accomplish this, the 400-bp HinfI fragment (1350-1750,Fig. 1) wassubcloned into the HindIII site of pBR322 using HindIII decanucleotide linkers. The resulting hybrid plasmids were used to transform E . coli HB101. Clones containing DNA inserts were screened as Ampvet’. The desired clones wereidentified by examination of isolated plasmids: size of plasmid insert and the partial restriction map of that insert. The restriction map of one plasmid containing the HinfI fragment (pCa2PRO-3) is shown in Fig. 3. Using the strategy outlined above, three different restriction enzymes were used to generate three different sized templates. When pCa2PRO-

11253 a

b

c

- + - + -

d

e

+

- 0.75 kb

- 0.46 kb - 0.35 kb

FIG. 4. Autoradiogram of runoff RNAs madefrom the pCa2PRO-3 DNA template. RNAs were synthesized and fractionated as inFig. 2B. DNA templates used for in vitro transcription: lane a, pCa2PRO-3 X HincII; lane b, pCa2PRO-3 X Barn HI; lane c, pCa2PRO-3 X Mst I; lane d, pCa2PRO-3 X Eco RI; lane e, pBR322 x Bum HI and Eco RI. A (+) sign designates that 1 &mlof aamanitin has been added to the reaction. A (-) sign indicates the absence of a-amanitin in the reaction. The letters a, b, and c also correspond to the transcript lengths shown in Fig. 3.

a

b

c

- 0.75 kb - 0.56 kb

- 0.44 kb

FIG. 5. Relative promoter strength. RNAs were synthesized in vitro as described in Fig. 2B. Template DNAs were as follows: lane a, pSmaF X Sma I, adenovirus 2 major late promoter; lane b, pSRI X HincII, RSV common region promoter; lane c, pCa2PRO-3 X HincII,chick collagen promoter.

Pvu I1

FIG.3. Restriction map of pCa2PRO-3. The cleavage map of the plasmid pCa2PRO-3 is shown in circular form.The thin lines are pBR322 sequences, while the thick lines represent the 400-bp HinfI chick-specific DNA fragment. The numbers within the circle are in kilobase pairs. The DNA region active in transcription is expanded. The white box represents the in viuo promoter region and the arrows depict the lengths of three runoff transcripts obtained in vitro. The letters a, 6, and c correspond to the bands shown in Fig. 4.

3 was restricted at two sites with HincII, and the resulting DNA was added to the in vitro transcription system, a single strong a-amanitin-sensitive transcript of about 750 bases was observed (Fig. 4a). Because the in uiuo start site was known, the transcript size was expectedto be 735 bases in length.Fig. 46 shows that when pCa2PRO-3was cleaved at one site with Bam HI, a single, strong, a-amanitin-sensitive transcript of 460 bases resulted (expected = 462). Finally, Fig. 4c reveals that when pCa2PRO-3 was restricted at four sites by Mst I,

11254

Chick a2 Collagen Gene inTranscription Vitro

this template DNA generated a 350-base, a-amanitin-sensitive transcript (expected = 345). The sizes of these three transcripts are consistent with the notion that they all have the same start site. The start site isshown to be just upstream of the Sma I site (Fig. 3, white box). This is in agreement with the known in vivo initiation site. These transcripts all appear to be oriented in the same direction. If transcription had a polarity opposite thatshown in Fig. 3, onemight expect that use of Eco RI-cleaved pCa2PRO-3 as template would generate a 300-base transcript (see Fig. 3). Fig. 4d shows that when this DNA was used as template, no 300-base transcript appears above background, indicating the unidirectional nature of transcription. The transcriptional efficiency from the collagen promoter

is comparable to that of other known eukaryotic promoters. The chick collagen DNA exhibits strong promoter activityin HeLa extracts. Fig. 5 shows that its activity is a t least as strong as the RSV promoter (pSRl), but somewhat weaker than the adenovirus promoter (pSmaF plasmid). Although strong invitro promotion of transcriptional activity canbe detected in that region of the collagen DNA known to contain the in vivo start site, more precise experiments were conducted to determine if the two initiation sites are indeed the same. Comparison of the Sequenceof Template DNA a n d RNA Runoff Transcripts

By comparing the DNA templatesequence with RNase TI fingerprints of in uitro-synthesized RNA transcripts, onecan c discoverwhich major oligonucleotides are produced and Conaenwo TrsnvriptDn SW”WU0 Iwlmlm smsI which are not. In thisway, upstream and downstreambound+’ r y l aries canbe placed on the invitro start site, allowing a direct TATMTACGGCG~AGCGGGGC+GAI-~AAT~GCATCCCGGGCAGCAGGT~T~TCT 3 17 comparison with the known in vivo initiation site. The DNA corresponding to the 5’ end of the RNA coding +XI +E4 for chick a2 collagen and approximately 400 bp upstream of AAG~~~GGAG~~AETCCTCGCGAETGTATGCCT~CGTCCTGCA~GTAATAGCC~CCACG 18 rod 14b I the transcription start site have been sequenced (19). Fig. 6 shows the chicken-specific sequence of the template DNA +? pCa2PRO-3 from -33 to +109, relative to the in vivo start TCCGGGGGCTCTG~AACACAAGGCCAAGCT site, A, a t +1 (corresponding to theorigin of the black arrows l9 Lmkef in Figs. 1 and 3). From -33 to -26 of this DNA, there is an 8bp consensus sequence which is similar to sequences found at FIG. 6. Nucleotide sequence ofthe 5’ end of chick a2 collagen genomic DNA within pCa2PRO-3. DNA was sequenced by the the samelocation in other known eukaryotic promoters (31). method of Maxam and Gilbert(34). TheSma I site at+8 corresponds In order to obtain sufficient amounts of a discrete RNA to the Sma I site at the origin of the arrows shown in Figs. 3 and 4. transcript for fingerprint analysis, Bum HI-cleaved pCa2PROThe in vivo start site is at +l. The solid line represents pBR322 3 was used as template in a preparative in vitro transcription sequences. The arrow indicates the direction of transcription. The underlined deoxynucleotides represent those sequences for which a reaction (see “Materials and Methods”). For thisfingerprint, and prominent oligonucleotide was identified in the RNase TIfingerprint. the RNA wassynthesized using [cx-‘~P]ATP,[~Y-:’~P]CTP, The italicized numbers and letters beneath these oligonucleotides [a-”PIUTP (on other occasions, fingerprints were generated correspond to the spots displayed in Fig. 7. using shortertemplatesand allfourlabeled NTPs). The

A

B

B

CG

QQ 18

CP 0 5

017

3



c

FIG. 7. Fingerprint analysis of in vitro-synthesized RNA. A, RNA was labeled using [a-’”PIATP, [a-”2P]CTP,and [a-:”P]UTP in a preparative in vitro reaction with Ban HI-cleaved pCa2PRO-3 DNA as template. The 460-base transcript was isolated and digested with RNase TI (see “Materials and Methods”). The resulting oligonucleotides were electrophoresed in the first dimension and subjected

c

electrophoresis (pH 3.5) to homochromatography in the second. The autoradiograph of the fingerprint is shown above. B, the spots shown in A are here assigned numbers and letters, which are referred to in the text and in Table I. Dotted circle indicates the absence of the oligonucleotide AUUAAUUUAG.

'

ChickTranscription Vitro oL2 Collagen inGene

11255

expected transcript length using Bum HI-cleaved pCa2PRO3 is 460 bases (Figs. 3 and 4). This RNA contains chick collagen-specific sequences at its 5' end and pBR322-specific sequences at its 3' end. The RNA product of the preparative reaction was fractionated on 4% polyacrylamide, the 460-base transcript excised from the gel, and the RNA was eluted electrophoretically. This RNA was digested with RNase T I and the resulting oligonucleotides were electrophoresed on cellulose acetate. After transfer to PEI-cellulose, the oligonucleotides were separated in a second dimension by homochromatography. The labeled fingerprint of this in uitro RNA transcript is shown in Fig. 7A. The compositions of the prominent oligonucleotides displayed in Fig. 7A were determined by visual inspection of their mobility, secondary digestion with RNase A, and secondary digestion with RNase Tz plus RNase A.We have found that thecompositions of the prominent oligonucleotides are consistent with the known DNA template sequence. In Fig. 6, the underlined collagen-specific DNA sequences represent those that correspond to a prominent oligonucleotide. Representative spots of the fingerprint have been assigned numbers andlettersin Fig. 7B and the analysis of their sequences is shown in Table I. Examination of the DNA immediately surrounding the in vivo initiation site (Fig. 6) reveals that if a proper start is obtained in vitro, the oligonucleotide CAUCCCG would be synthesized in its entirety; however, the oligonucleotide immediately upstream (AUUAAUUUAG) would not. Table I shows that an oligonucleotide with a composition consistent with the sequence CAUCCCG has been identified and assigned the number 3 (see Fig. 7B). However, an oligonucleotide with a mobility consistent with the sequence AUUAAUUUAG has not been found. Based on the mobility

of spots 10-13, 15, and 17 (see base compositions in Table I), an oligonucleotide with the sequence AUUAAUUUAG would be located approximately within the dotted circle shown in Fig. 7B. None of the spots surrounding the dotted circle contains all of the RNase A digestion products expected for the oligonucleotide AUUAAUUUAG. Spots 10 and 11 do not contain an AAU; spots 12, 13, and 15 do not contain an AG; and spot 17 does not contain an AAU, AU, or an AG. Because the oligonucleotide CAUCCCG is contained entirely within the RNA transcript, and oligonucleotide the AUUAAUUUAG is not, the in vitro start sitemust lie between the second base of AUUAAUUUAG and thefirst base of CAUCCCG. This is consistent with the known in vivo initiation site.

Exact Initiation Site of in Vitro Transcripts To determine whether the in vitro start site is identical with that in vivo, it is necessary to unambiguously identify the fvst and second nucleotides of in uitro RNA transcripts. This canbe achieved by exhaustively digesting ["PjRNA with nuclease P1 or RNaseT Pand thensubjecting the resulting products to chromatography. In this way, the fist initiation nucleotide (within a capstructure) will exhibit chromatographic mobility very different from the other four mononucleotides (GMP, AMP, CMP, UMP). Bum HI-cleaved pCa2PRO-3 was used as template in four separate reactions, each labeled with only one of the four [a"PINTPs. The 460-base transcript from each was isolated as described above, each digested exhaustively with nuclease PI, and theresulting products (equal amounts of radioactivity of each) were chromatographed on PEI-cellulose. Nuclease P1 cleaves all 5' phosphodiester bonds (generating 3' hydroxyl ends) except those in the cap structure ( G p p p N d . Because the in vitro HeLa systemhad previously

TABLEI RNase TI oligonucleotides 'pot number"

1 2 3 4

296 160

5 6 7 8 9 IOa 10b 11 12a 12b 13a 13b 14a 14b 15 16a 16b 17 18

G-numberb

86" 368 9* 69,347 269 110 119,200 341 101 46* 55 221 39 63 329 242 119 77f 26 1

Expected' T1-oligonucleotide

CCAACCACG ACCACACCCG CAUCCCG CUAACG,ACTACG CACCCG CUCAUCG UCAUCCUCG,UCCAUUCCG CCACUAUCG AAAUCUAACAAUG UUACUCCUCG UUUAUCACAG UCACUAUG CUUUAAUG UUAAAUUG CUACUUG CUAUAUG AUAUCG UAAUAG CAAUUUCUAUG

1.

1 UUAUG I UUUCUG

1

33* UUUG Spot number refers to the numbered spots displayed in Fig. 7B. When two oligonucleotides were unresolvable (spots 7,10,12-14, and 16),they were of necessity digested together with RNase A. The combined digestion products are shown above. The G-number refers to the positon of the G residue in the DNA sequence. Chick DNA in pCa2PRO-3 is demarcated by an asterisk ("), and the numbers refer to the sequence in Fig. 6. pBR322 sequences are numbered according to Sutcliff (35). 'The identity of each oligonucleotide was deduced fromits mobility in the T1-fingerprint, and its nucleotide composition, as determined

RNase A products expected

RNase A products formed

C,AC,AAC,G C,AC,G C,AU C,U,AC,AAC,G C,AC,G C,U,AU,G C,U,AU,G C,U,AC,AU,G C,U,AAC,AAU,AAAU,G

C,AC,AAC,G C,AC,G C,AUd C,U,AC,AAC,G C,AC,G C,U,AU,G C,U,AU,G C,U,AC,AU,G C,U,AAC,AAU,AAAU,G

C,U,AC,AU,AG,G

C,U,AC,AU,AG,G

C,U,AC,AU,G

C,U,AC,AU,G

C,U,AAU,AAAU,G

C,U,AAU,AAAU,G

C,U,AC,AU,G

C,U,AC,AU,G

U,AU,AG,AAU,G

C,U,AU,AG,AAU,G~

C,U,AU,AAU,G

C,U,AU,AAU,G

C,U,AU,G

C,U,AU,G

C,U,G U

C,U,G Ud

by RNase T2 secondary digestions. *This fingerprint was made with a-32P-labeled ATP, CTP, and UTP only. RNase T1 will generate oligonucleotides with 3' phosphates. For this reason, when a G residue follows a G residue, as is the case for spots number 3 and 18, the oligonucleotide-specific G residue is not radioactive, and therefore not visible. For spot 14a, the C residue preceding the G should not be labeled, yet it appears among the RNase A products. We believe this is due to contamination from other spots.

Chick a2 Collagen Gene

11256

in Vitro Transcription

been shown to cap the 5' ends of RNAs (20) and because the in vivo start site was known (Fig.6), we expected nuclease P1 digestion to generate pG, PA, PC, pU, the cap core structure G,,&oH, and/or some methylated form of this cap ( * = labeled phosphate in a position). Fig. 8 shows that only RNA transcripts labeled with [a-32P]ATPor [a-32P]GTPgenerate labeled spots (arrows) which co-migrate with nonmethylated and methylated (m7GpppAo~orm7GpppAmo~) cold cap standards. To definitively prove that this HeLa system generates caps, the [a-"PIGTP-labeled 460-base transcript was exhaustively digested with nuclease P1 and alkaline phosphatase. When the products were electrophoresed on DEAEpaper and autoradiographed, a spot was observed co-migrating with a GpppA standard (data not shown). The cap struc-

3

A

0 0

t

a

0

C

I

A

W

1

*\

0

.

\

I

1

I

e

0

0

0

id,

2

FIG. 9. Identification of the nearest neighbor base on in vitm-synthesized RNA. The 460-base transcripts were synthesized

C

and isolated as described in Fig. 8. Using RNase T2,10,OOO dpm of each were exhaustively digested and the resulting products were chromatographed on PEI-cellulose as described in Fig. 8. Autoradiographs of digested RNA labeled with [ c ~ - ~ ~ P ] A(A), T P [a-."P]UTP ( B ) ,[a-"PICTP (0,[cY-~~PIGTP (D).Arrows point to promment cap structures containing a 3' phosphate. To help confirm that these structures were caps, spots were eluted, redigested with nuclease P1, and reanalyzed on PEI-cellulose. 0 = origin.

ture is therefore GpppAo~(or some methylated form of this cap). These results are again in agreement with the start site (A) found for in vivocollagen RNA. 0' There are several possible candidates for an A start site within the oligonucleotide AUUAAUUUAG,in which the in E vitro start site hasbeen localized (see above). For this reason it was necessaryto determine the nearest neighbor of the cap structure. The template Barn HI-cleaved pCa2PRO-3 was m7GpppAm again utilized in four separate reactions with four different 0 [CX-~~PINTPS and the 460-base RNAproducts were isolatedas pi 0~ ' G P P P A above. The four transcripts were separately digested with RNase T2and the products (equal amounts of radioactivity of 0 each) were chromatographed on PEI-cellulose. RNase T) GPPPA cleaves all 3' nonmethylated phosphodiester bonds (generating 3' phosphate ends) except those in the cap structure. From 0' the known in vivo start site (Fig. 6), we expected RNase TP digestion to generate Gp, Ap, Cp, Up, and a capped structure Gpp&,,. If the cap structure of some of the in vitro-synthesized transcripts contains a 2'-0-methyl A, then digestionwith FIG. 8. Identification of the first base of in vitro-synthesized RNase T2 would generate m'GpppAmpGp as well.Fig. 9 RNA. The Barn HI-cleaved pCa2PRO-3 template was used in four reveals that transcripts labeled with [LY-~'P]ATP or [a-RZP] separate reactions, each containing one of four [a-:''P]NTPs. The 460GTP generate prominently labeled spots (arrows), which now base transcripts were isolated and 10,OOO dpm of each were exhausthey contain an extra phosphate. tively digested with nuclease P1. The resulting products were chro- migrate more slowly because matographed on PEI-cellulose: the first dimension was Li formate, 7 Most importantly, the GTP-specific spot is more highly laM urea (pH 3.5) and the second LiCl, 7 M urea (pH 8.0) (see "Materials beled than its ATP counterpart, suggesting that the nearest and Methods"). Autoradiographs of digested RNA labeled with [a- neighbor nucleotide is, in fact, G. The deduced sequence of =P]ATP ( A ) ,[a-32P]UTP ( B ) ,[a-32P]CTP (0,[a-"PIGTP (D).Ar- the cap structure and its nearest neighbor is therefore rows point to various forms of cap core structures. Pi = inorganic orthophosphate. The presence of inorganic orthophosphate is most GpppApGp. If the structure m'GpppAmpGp exists, it should probably due to a contaminating phosphatase in the nuclease P1. 0 be labeled with [a3'P]CTP (at the position indicated by *). = origin. The location of orthophosphate and cold cap standards is We did detect this [a-32P]CTP-labeledstructure, but it was shown in E . not as prominent as caps labeled with[~u-~'P]ATP. As a result,

t

I

Chick (y2 Collagen Gene in Vitro Transcription it is not readily visible in Fig. 9C. We believe these results, when taken together, place the in vitro initiation site unambiguously at +1 in Fig. 6, in agreement with the in vivo RNA start site (19). DISCUSSION

We have used cloned chicken genomic DNA containing the 5’ end of the a2 (type I) collagen gene as template in a cellfree, in vitro transcription system derived from HeLa cells (20). Strong RNA polymerase 11-specific promoter activity was detected with this template. The exact in uitro initiation site was determined by three experimental approaches: sizing ofzn uitro-synthesized runoff transcripts, comparing the sequence of template DNA with that of in vitro-made RNA, and identifying the fiist and second nucleotides of the in vitro transcripts by determining which nucleotides are associated with the cap. In a separate study,Vogeli et al. (19) identified the in vivo initiation site for a2 collagen RNA transcription. A comparison of in uitro-synthesized RNA and in vivo made RNA reveals that both are initiated from the same site located 33 bp downstream from the Goldberg/Hogness-like sequence TATAAATA. This consensus sequence exhibits strong homology with sequences found at the same location in other eukaryotic promoters (31). Examination of the DNA sequence at the 5’ end of the a2 collagen gene has revealed several interestingfeatures. There are three distinct ATG codons, at +54, +117, and +134 (19). The fist two are followed almost immediately by termination codons. We believe the third ATG (+134) marks the start of translation because it is followed by an open reading frame (19). The amino acids encoded downstream of this protein start signal exhibit strong homology with those of in uitrosynthesized preproal(1) collagen (32). We have already discussed the presence of a GoldbergHogness box upstream of the transcription start site (between -33 and -26). Further upstream (between -84 and -78) is located a “CAT”box with a sequence (GCCCATT) similar to that found inother eukaryotic RNA polymerase 11-dependent promoters (33). This sequence may prove to have regulatory significance. Several inverted repeat sequences have been detected surrounding the a2 collagen promoter region (19). Three of these dyads of symmetry are located upstream of the transcriptional start site (Fig. 10). The sequences involved in the formation of these threedyads overlap each other and are thus mutually exclusive. Depending on which structure is formed, the “CAT” box is localized in different regions of the hairpin (Fig. 10). The “CAT” box can reside within the loop (in dyad A, most proximal to the transcriptional initiation site), buried within I

-60

-160

TAT*

, c

+I



, (

B A

+I

-120-

C

Tra”*C,lpllO”

T

C

1”lflatlO”

A C

> I 3

0 - 1 20

FIG. 10. Schematic representation of three possible hairpin structures around the promoter region of the a2(I) collagen gene. The boxed “TATA” contains the Goldberg-Hogness sequence TATAAATA. The boxed “CAT” represents thesequence GCCCATT. The start of transcription in vivo is marked by +1.

11257

the stem (in the middle dyad B), or at the base of the stem (in dyad C, most distal from the transcriptional start site). The dyads of symmetry could represent potential interaction sites for various regulatory proteins. One of our major aims is to understand how a2 collagen chick gene expression is regulated in normal and transformed embryo fibroblasts. DNA containing the a2 collagen promoter and associated upstream structures is presently being used as template in cell-free extracts to identify RSV transformationspecific factors and explain how RSVcontrols gene expression. Acknowledgments-We wish to acknowledge J. Sivaswami and S. Venkatesan for their useful advice concerning in vitro transcription and analysis of the RNA sequences. We thank P. A. Weil for the plasmid pSmaF andH. Ohkubo for the plasmid pgCOL323-3. We are grateful to R. Steinberg for taking the photographs, and W. Davis and J. Silverman for typing the manuscript. We are indebted to M. Gottesman and M. Sobel for critically reading this manuscript. REFERENCES 1. Ramachandran, G. N., and Reddi, A. H., eds (1976) Biochemistry of Collagen, Plenum, New York 2. Eyre, D. R. (1980) Science 207, 1315-1322 3. Pinnell, S.R. (1978) in The Metabolic Basis oflnherited Disease (Stanbury, J. B., Wyngaarden, J. B., and Frederickson, D. S., eds) 4th Ed, pp. 1366-1394, McGraw-Hill, New York 4. Green, H., Todaro, G. J., and Goldberg, B. (1966) Nature 209, 916-917 5. Mayne, R., Vail, M. S., and Miller, E. J. (1975) Proc. Natl. Acad. Sci. U. S. A . 72,4511-4515 6. Holtzer, H., Biehl, J., Yeoh, G., Meganathan, R., and Kaji, A. (1975) Proc. Natl. Acad.Sci. U. S. A . 72, 4051-4055 7. Deshmukh, K., and Sawyer, B. D. (1977) Proc. Natl. Acad. Sci. U. S. A . 74,3864-3868 8. Kaul, R., Hewitt, A. T., Varner, H., Somerman, M., Martin, G., and Sobel, M. E. (1981) Fed. Proc. 40, 1626 9. Levinson, W., Bhatnagar, R. S., and Liu,T.-Z. (1975) J . Natl. Cancer Inst. 55,807-810 10. Hata, R., and Peterkofsky, B. (1977) Proc. Natl. Acad. Scz. U. S. A . 74, 2933-2937 11. Adams, S.L., Sobel, M. E., Howard, B. H., Olden, K., Yamada, K. M., de Crombrugghe, B., andPastan, I. (1977) Proc. Natl. Acad. Sci. U. S. A . 74, 3399-3403 12. Howard, B. H., Adams, S. L., Sobel, M. E., Pastan, I., and de Crombrugghe, B. (1978) J . Biol. Chem. 253, 5869-5874 13. Rowe, D. W., Moen, R. C . , Davidson, J. M., Byers, P. H., Bornstein, P., and Palmiter, R. D. (1978) Biochemistry 17, 15811590 14. Adams, S.L., Alwine, J. C., de Crombrugghe, B., and Pastan, I. (1979) J . Biol. Chem. 254, 4935-4938 15. Sobel, M. E., Yamamoto, T., de Crombrugghe, B., and Pastan, I. (1981) Biochemistry 20, 2678-2684 16. Avvedimento, E., Yamada, Y., Lovelace, E., Vogeli, G., de Crombrugghe, B., and Pastan, I. (1981) Nucl. Acids Res. 9, 11231131 17. Vogeli, G., Avvedimento, E. V., Sullivan, M., Maizel, J. V., Lozano, G., Adams, S. L., Pastan, I., and de Crombrugghe, B. (1980) Nucl. Acids Res. 8, 1823-1837 18. Ohkubo, H., Vogeli, G., Mudryj, M., Avvedimento, V. E., Sullivan, M., Pastan, I., andde Crombrugghe, B. (1980) Proc.Natl. Acad. Sci. U. S. A . 77,7059-7063 19. Vogeli, G., Ohkubo, H., Sobel, M. E., Yamada, Y., Pastan, I., and de Crombrugghe, B. (1981) Proc. Natl. Acad. Sci. U. S. A., in press 20. Manley, J. L., Fire, A., Cano, A., Sharp, P. A., and Gefter, M. L. (1980) Proc. Natl. Acad.Sci. U. S. A . 77, 3855-3859 21. Yamamoto, T., Sobel, M. E., Adams, S. L., Avvedimento, V. E., DiLauro, R., Pastan, I., deCrombrugghe, B., Showalter, A., Pesciotta, D., Fietzek, P., and Olsen, B. (1980) J. Biol. Chem. 255, 2612-2615 22. Lederberg, E. M., andCohen, S. N. (1974) J. Bacteriol. 119, 1072-1074 23. Yamamoto, T., de Crombrugghe, B., and Pastan, I. (1980) Cell 22, 787-797 24. Weil, P. A., Luse, D. S., Segall, J., and Roeder, R. G . (1979) Cell

11258

Chick d?Collagen Transcription Gene Vitroin

18,469-484 25. Merlino, G. T., Water, R. D., Moore, G. P., and Kleinsmith, L. J. (1981) Deu. Biol. 85, 505-508 26. Brownlee, G . G., and Sanger, F. (1969) Eur. J. Biochem. 11,395399 27. Brownlee, G. G. (1972) in Laboratory Techniques in Biochemistry and Molecular Biology (Work, T. S., and Work, E., eds) pp. 1-260, North-Holland Publishing Co., London 28. Mirzabekov, A. D., and Griffin, B. E. (1972) J. Mol. Biol. 72,633643 29. Yamamoto, T., Jay, G., and Pastan, I. (1980) Proc. Natl. Acad. Sci. U. S. A . 77, 176-180

30. Roeder, R. G. (1976) in RNA Polymerase (Losick, R., and Chamberlin, M., eds) pp. 285-329, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 31. Corden, J., Wasylyk, B., Buchwalder, A., Sassone-Corsi, P., Kedinger, C., and Chambon, P. (1980) Science 209, 1406-1414 32. Palmiter, R. D., Davidson, J. M., Gagnon, J., Rowe, D. W., and Bornstein, P. (1979) J. Biol. Chem. 254, 1433-1436 33. Benoist, C., O’Hare, K., Breathnach, R., and Chambon, P. (1980) Nucleic Aids Res. 8, 127-142 34. Maxam, A. M., and Gilbert, W. (1977) Proc. Natl. Acad. Sci. U. S. A . 74,560-564 35. Sutcliff, J. G. (1978) Nucleic Acids Res. 5,2721-2728