Collagen Gene - The Journal of Biological Chemistry

1 downloads 97 Views 3MB Size Report
Mitchell H. Finer$, Sirpa Ahd, Louis C. Gerstenfeldll, Helga Boedtker 11, and Paul Doty. From the Department of Biochemistry and Molecular Biology, Harvard ...
Vol. 262. No.27, Iasue of September 25, pp. 13323-13332,1987

THEJOURNAL OF BIOLOGICAL CHEMISTRY 0 1987 by The American Society for Biochemistry and Molecular Biology, Inc.

Printed in U.S.A.

Unusual DNA Sequences Located within the PromoterRegion and the First Intronof the Chicken Pro-arl(1)Collagen Gene* (Received for publication, March 10, 1987)

Mitchell H. Finer$, Sirpa A h d , Louis C. Gerstenfeldll, Helga Boedtker 11, and PaulDoty From the Department of Biochemistry and Molecular Biology, Harvard University, Cambrdge, Massachusetts 02138,$Research Loboratories Alko, Ltd., P. 0.Box 350, SF-00101, Helsinki, Finland, and the TDepartment of Orthopedic Research, The Childrens Hospital, Harvard Medical School, Boston, Massachusetts 02115

Genomic clones corresponding tothe amino-terminal propeptide and 5”flanking sequences of the chicken pro-al(1) collagen gene were isolated as a first step in theidentification ofDNA sequences importantfor transcriptionalregulation of thepro-al(1) collagen gene. Due to the failure to identify positive clones in either primary or amplified genomic libraries, a 5.1kilobase pair StuI genomic fragmentidentified by Southern blotting was enriched by sucrose gradient fractionation of genomic DNA and cloned into Xgtll. Comparison of the DNA sequence of the 5.1-kilobase pair StuI fragment to the DNA sequence of a cDNA clone encoding the amino-terminal propeptide, signal peptide, and the 5’-untranslated region identified the first four exons and most of the fifth. Exon size and intron position have beem largely conserved between human and chicken al(1) genes. DNA sequence analysis of the region 5‘ to the transcription initiation site identified thecanonical TATA and CAAT boxes. However, the 40-nucleotide pyrimidine stretch centered between -150 and -180 nucleotides, found in all previously isolated type I procollagen genes from chicken, mouse, and human, was absent in chicken the pro-al(1) collagen gene. This sequence corresponds to thein vivo DNase I hypersensitive site in the chicken pro-a2(1) and mouse pro-al(1) collagen genes, as well as the in vitro S1 nuclease hypersensitive site in both chicken and mouse pro-a2(1) collagen genes. Two unusual DNA sequences were identified within the chicken pro-al(1) collagen gene. Fifteen tandem repeatsof the sequence GGGGAGA were identified within the first intron, 300 nucleotides 3’ to the first exon. This sequence was identified due to its hypersensitivity to S1 nuclease in vitro in supercoiled plasmids. The second sequence located 5’ to -180 contained at least 25 copies of a polymorphic, 23-base pairtandemly repeated sequence not identified in other type I procollagen genes. Both of these tandem repeat sequences were identified at *This research was supported by United States PublicHealth Service Grants HD 01229 and HD 20705 and by Arthritis Investigator Award and Orthopedic Research Education Foundationgrants (to L. C . G.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. The nucleotide sequence(s) reported in thispaperhas been submitted to the GenBankTM/EMBLData Bank with accession number(s) 502821.

$Present address: The Whitehead Institute for BiomedicalResearch, 9 Cambridge Center, Cambridge MA 02142. 11 To whom correspondence shouldbe addressed Dept. of Biochemistry and Molecular Biology, Harvard University, 7 Divinity Ave., Cambridge, MA 02138.

other locations in the chicken genome by Southern blot hybridization.

The collagens are a class of related, extracellular structural proteins that play a primary role in establishing and maintaining tissue architecture in vertebrates. At least 10collagen types composed of more than 20 genetically distinct polypeptide chains have beenidentified (Martin et al., 1985). Heterotrimeric type I collagen is the most abundant collagen and is composed of two a1 chains and onea 2 chain. It is expressed at high levels in tendons,bone, skin, and ligaments (Bornstein and Sage, 1980). Steady state levels of type I collagen have been shown to be regulated indiametrically opposed fashionindifferent tissues in response to the same stimulus.Following transforet al., 1979) or mationwithRoussarcomavirus(Adams growth in the presence of phorbol myristate acetate,’ chick embryo fibroblasts cease synthesis of both a1 and a2 type I procollagen chains, controlled primarily at the level of transcription of these genes (Sandmeyer et al., 1981). In contrast, chondrocytes transformed with Rous sarcoma virus (Adams et al., 1982; Gionti et al., 1983) or chondrocytes grown in the et al., 1985) presence of phorbolmyristateacetate(Finer initiate synthesis of type I procollagen or of the d ( I ) trimer. Dramatic increases in the steady state levels of type I procollagen mRNAs suggest transcriptional control, although this has not been directly documented. In phorbol ester-treated chondrocytes, procollagen synthesis is also regulated at the level of RNA processing and the translational efficiency of procollagen mRNAs (Finer et al., 1985). Type I procollagen expression isalso under complex developmental control. Both chondrocytes and myoblasts are derived from limb budmesoderm (Dienstman et al., 1974). Type I procollagen synthesis is turned off during chondrogenesis or detect(Sasse et al., 1983) and is either not detectable only able at low levels in differentiated chondrocytes’ (Focht and Adams, 1984; Alema et al., 1985; Finer et al., 1985). In contrast, type Iprocollagen increasesdramaticallyduring myoblast differentiation in vitro, and the steady state levels of type I procollagen mRNAs undergo a biphasic 15-fold increase(Gerstenfeld et al., 1984). In order to understand the complex molecular mechanisms responsible for controlling type I procollagen levels, as well as to better understand the evolutionary relationship between the pro-al(1) and pro-a2(1) collagen genes, we have undertaken themolecular cloning of the chicken type I procollagen L. C. Gerstenfeld, unpublisheddata. L. C. Gerstenfeld, M. H. Finer, and H. Boedtker, manuscript in

preparation.

13323

13324

Unusual Sequences in the al(I) Collagen Gene

genes. We have previously reported the detailed structure of was digested with EcoRI and HindIII, according to themanufacturer's the chicken a2(I) procollagen gene. This gene contains 52 specifications. 0.5 pg of DNA was electrophoresed on 1.0% agarose exons, encoded over38 kb3 of DNA (Wozney et al.,1981; Tate gels and transferred to nitrocellulose, hybridized, and washed as described above, except the hybridization was carried out at 37 "C et al., 1983; Boedtker et al., 1985). The highly interrupted and thewashing a t 42 "C. structure and large size of the chicken pro-a2(1) gene have Isolation of Genomic ClonesEncoding the al(Z) Procollagen Genebeen conserved between chicken and human (Myers et al., 100 pg of chicken genomic DNA weredigested with StuI, as described above. DNA was fractionated on 10-40% neutral sucrose gradients 1983; Dickson et al., 1985). Recently the human pro-al(1) collagen gene has been isolated (Chuet al., 1984). In contrast in 100 mM NaCl, 10 mM Tris-HC1, and 1mM EDTA, pH 7.9. Sucrose to the pro-a2(1) gene, the pro-al(1) gene is encoded by only gradients were centrifuged in a Beckman SW40 rotor for 24 h at 40,000 rpm, 4 "C. Three hundred-pl fractions were collected. A control 20 kb of DNA. If the size of the pro-al(1)genes is conserved gradient containing alabeled HindIII digest of XDNA was centrifuged, as well, isolation of the 5' end of the chiaken gene would still fractionated in parallel, and electrophoresed on a 1%agarose gel to require the isolation of a series of overlapping genomic clones, estimate the region of the gradient containing the size fractions between 1.0 and 9.0 kilobases. Fractions of genomic DNA from this using oligo (dT)-primed cDNA as a probe to identify the initial genomic clone. Therefore, a single genomic clone en- region of the gradient were electrophoresed on a 1% agarose gel, hybridized to pMFlA8, and washed as described above to locate the coding the amino-terminal prepropeptide and 5'-flanking re- fraction enriched for collagen genefragments (data notshown). These gion was isolated using a cDNA clone complementary to the fractions were pooled, ethanol-precipitated, and resuspended at 1 pg/ 5' end of the pro-al(1)collagen mRNA (Finer et al., 1987) as pl. One pg of Xgtll was treated with alkaline phosphatase and then a probe. ligated to 0.3 pg of size-fractionated genomic DNA for 12 h at 16 "C In this communication,we report the structureof the first in 66 mM Tris, pH 7.5,lO mM MgCl,, 15 mM DTT, 1mM spermidine, five exons and the 5"flanking sequence of the chicken pro- 1 mM ATP, 200 pg/ml gelatin using 1 unit (Weiss) ofDNA ligase (Bethesda Research Laboratories). The DNAwas packaged using al(1) collagen gene. In addition to identifying the canonical packaging extracts (Promega Biotec) and Escherichia coliNM539 TATA and CAAT promoter sequences, DNA sequence anal- transfected according to Young and Davis (1983). 270,000 clear ysis identified 25 copies of a 23-bptandemly repeated element plaques were transferred to nitrocellulose filters in duplicate (Grunbeginning 181 bp 5' to the transcription start site. Similar stein and Hogness, 1975) and screened by hybridization to labeled pMFlA8, a cDNA clone encoding the 5' end of the chicken pro-al(1) DNA sequences have not been identified in the 5"flanking collagen mRNA (Finer et al., 1987), as described previously. A single gene region of eitherthe mouse orthehumanpro-al(1) strongly hybridizing plaque was purified to homogeneity and characcollagen genes. We have also identified in an vitroS1 nuclease terized by restriction digestion and Southern blotting, as described hypersensitivesitewithinthefirstintron, composed of a previously. This clone was designated XSA/S51. "satellite-like'' repeat sequence. Based on comparison to the and 0.5-kb Fine Structure Analysis of XSA/SSl-The4.6-kb 5'-flanking sequences in mouse and human type I procollagen EcoRI/SalI fragments corresponding to the insert were subcloned genes, we speculate on the function of these sequences as into the polylinker of SP65 and designated pRS4.6 and pRS500, controlling elements for the chicken pro-al(1)collagen gene. respectively. These plasmids were used for restriction mapping and

DNA sequence determination, carried out by chemical cleavage (Maxam and Gilbert, 1980). DNA to be sequenced was labeled at the 5' termini by incubation of 5 pmol of plasmid DNA digested with the Southern Blotting-Chicken genomic DNA, isolated according to appropriate restriction enzyme, first with 10 milliunits of calf intesBlin and Stafford (1976), was digested twice with the appropriate tine alkaline phosphatase (Boehringer Mannheim) in 10 mM Trisrestriction endonuclease (New England Biolabs) according to the HCl, 1 mM ZnCl,, 0.1 mM MgCl, for 30 min at 37 "C, followed by manufacturer's specification. 5 pg of digested DNA were electropho- incubation with T4 polynucleotide kinase (Boehringer Mannheim) resed on 1%agarose gels followed by Southern transfer (Southern, and 42.5 pCi of [y3'P]ATP (Amersham) in 50 mM Tris, pH 7.9, 7 1975)to nitrocellulose (Satorius). Antisense RNA probes correspond- mM MgC12,1mM DTT for 30 min at 37 "C. DNA was also labeled by ing to thefirst two exons of the pro-al(1)collagen gene wereprepared end-filling the 3' termini by incubation of 5 pmol of DNA with a 2by subcloning the BamHIISalI fragment of pMFlA8, a cDNA clone fold molar excess of the appropriate ~~-~'P-labeled deoxynucleoside encoding the 5' 13% of the pro-al(1) collagen mRNA (Finer et al., triphosphate (Amersham Corp.), 5 units of the large fragment of 1987), into the polylinker of SP65 (Melton et al., 1984). This plasmid DNA polymerase I (Bethesda Research Laboratories) in 60 mM Triswas linearized with BamHI, and 1 pg was transcribed in vitro with HC1, 50 mM NaCl, 1 mM DTT for 100 min at 16"C,followedby SP6 RNA polymerase (Promega) in the presence of 100 pCi of [a- addition of cold deoxynucleosidetriphosphates to a concentration of 32P]UTP, according to Melton et al. (1984). Southernblots were 2.5 mM each for 15 min at 16 "C. All labeled DNA was purified by hybridized to 2 X lo6cpm of probe for 12 h at 53 "C in 50% formamide, chromatography on G-50 (Pharmacia LKB Biotechnology, Inc.) fol5 X SSC (1X SSC, 0.15 M NaC1, 0.015 M trisodium chloride, pH 7.0), lowed by ethanol precipitation. The DNA sequence was determined l x Denhardt's (1 X Denhardt's reagent, 0.02% (w/v) each of polyvi- on both strands for each fragment. nylpyrrolidone, Ficoll, and bovine serum albumin), 50 mM Na2HP04/ Determination of the in Vitro SI Nuclease Hypersensitive SiteNaH2P04,pH 7.0, and 250 pg/ml denatured salmon sperm DNA. Twenty pgof supercoiled pRS4.6 were digested with S1 nuclease, Southern blots were washed at 53 "C four times for 30 min in 2 X neutralized, and redigested with restriction enzymes, as described by SSC, 0.1% SDS, followed by two times for 30 min in 0.1 X SSC, 0.1% Finer et al. (1984). The products were run on 1 or 1.7% agarose gels SDS. Southern blotsof purified DNA from XSA/S51 were carried out and stainedwith ethidium bromide. For DNA sequence determination identically. For each digest, only 0.5 pg of DNA restricted with the of the SI-sensitive site, the 3.?-kb XmnI fragment of pRS4.6 was appropriate enzyme was electrophoresed. treated with alkaline phosphatase and kinased with [r-32PlATP, as For the analysis of repetitive sequences, probes were labeled by described above, redigested with AccI, and the 0.51-kb fragment nick translation (Rigby et al., 1977) using 0.1 pg ofDNA per reaction. isolated from a 6% acrylamide gel, as described above. This fragment Restriction fragments were isolated by electroelution into dialysis was sequenced by chemical cleavage (Maxam and Gilbert, 1980). bags (Maniatis et al., 1982). Genomic DNA wasisolated and digested and electrophoresed as described above. Each hybridization used 2.5 RESULTS X IO6cpm of probes nick-translated to a specific activity of 10' cpm/ Identification of a Genomic Clone Encoding the Pro-al(I) pg. Southern blots were hybridized for 12 h at 42 'C in the buffer Collagen Amino-terminal Propeptide-In order to isolate gedescribed above and washed at 42 "C four times for 30 min in 2 X SSC, 0.1% SDS, followed by two times for 30 min in 0.1 X SSC, 0.1% nomic clones corresponding to the 5' end of the chickenal(1) SDS. For "reverse hybridizations," 0.1 pg of chicken genomic DNA procollagen gene,we screened a primary MboI-partialgenomic was nick-translated as described above. XSA/S51 DNA was digested library with pMFlA8, a cDNA clone encodingthe 5"untranswith EcoRI, AccI, and SalI. The plasmid p71-68 (Wozney et al., 1981) MATERIALS ANDMETHODS

The abbreviations used are: kb, kilobase pairs; bp, base pairs; SDS, sodium dodecyl sulfate; DTT, 1,4-dithiothreitol.

lated region, the signal peptide, the amino-terminal propeptide, and the amino-terminal telopeptide of pro-al(1) collagen (Finer et al., 1987). Wewere unable toidentify positive clones

Unusual Sequences in the a1(I) Collagen Gene following two successive library screenings. Therefore, we utilized an alternative cloning strategy. Genomic Southern blotting would be used to identify a 4-7-kb genomic fragment that hybridized topMFlA8. Afractionenriched for this fragment wouldbe cloned into Xgtll. For ease of cloning, restriction enzymes which generated blunt ends, to which EcoRI linkers could be directly ligated, were selected. Of the initial 12 enzymes used to digest chicken genomic DNA, the enzymes AhaIII, StuI, and XmnIproduced fragments of 4.2, 5.1 kb, and 5.7 kb, respectively, which hybridized to pMFlA8. Genomic DNA digested with each of these enzymes was fractionated on a 10-40% sucrose gradient, and the fraction enriched for the AhaIII, StuI, or XmnI fragment were cloned into Xgtll, asdescribed under “Materials and Methods.” Only the StuI digests produced positive results. A single plaque, hybridizing strongly to pMFlA8, was isolated and purified to homogeneity. The identity of this clone, designated XSA/S51, was confirmed by Southern blotting and hybridization to pMFlA8. Digestion with EcoRI, and double digests with EcoRI and AccI, and EcoRI and BglII, yielded fragments of 5.1 kb, 2.0 and 2.6 kb, and 4.6 and 0.6 kb, respectively. These fragments were identical to those found in genomic DNA (data notshown). When the intact phage was nick-translated and hybridized to calvaria mRNA, the hybridization profile obtained was identical to that obtained with pMFlA8 (data not shown). Based upon this data, XSA/S51 was tentatively identified as a pro-cul(1)collagen genomic clone. Fine StructureAnalysis of the 5’ End of the d(I) Procollagen Gene-A detailed restriction map of hSA/S51, together with the DNA sequencing strategy, is shown in Fig. 1. Two kilobases of DNA were sequenced; 1kb corresponding to exon 1 and the 5“flanking region and 1 kb corresponding to the region surrounding exons 2-5. The DNA sequence of both strands was determined to reduce the possibility of DNA sequence errors. The first four exons and most of the fifth were identified by comparison of the genomic DNA sequence to theDNA sequence of the pMFlA8 (Fineret al., 1987). The DNA sequence andthe predicted translation products of exons 1-5 are shown in Fig. 2. Prior toa further characterization of the gene, the initiation site was determined. Comparison of the 5’ end of pMFlA8 to the genomic sequence led to a tentative identification of the transcription initiation site. The transcription initiation site was precisely determined by primerextension (datanot shown). The initiation site corresponds to a GC doublet with bands of equal intensity. Many eukaryotic genes have been shown to initiate with a purine (Breathnach and Chambon,

D

m

(s) A

I ’

N

D

D

H

13325

1980); therefore, the G of the doublet has been identified as the first transcribed nucleotide (Fig. 2 A ) . Based upon this analysis, pMFlA8 initiates at nucleotide +6 on the genomic map, not at nucleotide +l. These cDNA clones were made double-stranded by self priming, followed by S1 nuclease treatment to open up the hairpin at the 5’ end of the cDNA (Finer etal., 1987). Therefore, loss of the first five nucleotides is not unexpected. Exon 1 encodes the 114-bp 5”untranslated region, the signal peptide, and the first 5 amino acids of the propeptide (Fig. 2 A ) . Three AUG codons have been located at nucleotides +59, +98, and +115. The first two are followed by in-phase termination codons located at +71 and +110. The positions of these short peptides as well as their amino acid sequence have been conserved between all the type I procollagen genes identified to date (Yamada et al., 1983; Harbers et al., 1984; Chu et al.,1985; Dickson et al., 1985). The 22-amino acid signal peptide follows the 5’-untranslated region. The signal peptidase cleavage site has been previously located between Gly-22 and Glu-23 (Pesciotta et al., 1980). The amino-terminal propeptide is composed of a globular cysteine-rich domain and a triple-helical domain (Pesciotta et al., 1980; Chu et al., 1984) located almost entirely in exons 2-5. The DNA sequence of this region, including the introns, is shown in Fig. 2B. Exon 2 is 195 bp and encodes 90% of the globular cysteine-rich domain. This domain is absent in the pro-&(I) propeptide, and as a result, exon 2 in the pro-aZ(1) gene is only 11 bp (Tate et al., 1983). It has 64% homology with the sequence extending from +49 to +59 in exon 2, shown in Fig. 2B. For exons 2-4, exon size and intronposition have been conserved between the chicken and human pro-@(I)genes (Chu et al., 1984). Exon 5 in the chicken gene is 9 bp shorter thanin the human gene because of the deletion of a Gly-Pro-Pro triplet (Finer et al., 1987). These exons are preceded by the canonical AG dinucleotide and followed by the canonical GT dinucleotide (Breathnach and Chambon, 1981). This has been ex) tended to (C/T)AG-exon-GTAAGT inthe chicken & ? ( Iprocollagen gene (Wozney et al., 1981). In the chicken d ( 1 ) procollagen gene, the consensus acceptor splice site can be enlarged to (C/T)AG as well. In contrast, the consensus donor splice sequence can only be extended by one nucleotide to GT @/GI. DNA sequence analysis of the 5‘-flanking gene region identified a TATA box and a CAAT box at -25 and -100 base pairs, respectively (Fig. 2 A ) . These canonical sequences have been shown to be important for correct initiation (Grosschedl and Birnstiel, 1980; Benoist and Chambon, 1981)and efficient

M ME H E

Exon

1

2

34 5

0.5 kb

FIG. 1. Restriction map of XSA/S51. The exons are identified as raised boxes. Arrows indicate the direction of DNA sequence determination. The DNA sequence of both strands was determined for each fragment by 3’ endlabeling with the Klenow fragment of DNA polymerase I and 5’ end-labeling with T4 polynucleotide kinase. RI ( S ) ,genomic StuI site replaced with EcoRI linkers during cloning; A , AhaIII; N , NcoI; D , DdeI; H,HinfI; M , MboI; B,BanHI; C , AccI; X , XmnI; G , BgfII; L, SalI. SI corresponds to the in vitro S1 nuclease hypersensitive site found only in supercoiled plasmids.

13326

FIG. 2. DNA sequence determination of XSA/SSl. The DNA sequence of the regions marked by arrows was determined onboth strands by chemical cleavage as described under “Materials and Methods.” Exons are identified by upper case letters and 5’flanking sequence and introns by lower case letters. A, DNA sequence of the 5’flanking region, exon 1, and 43 bp of the 1.8-kb first intron. The TATA box located at -25 and the inverted CAT box located at -100 are underlined. The three AUG codons within exon 1 are ouerlined, and the in-phase termination codons are underlined. The stem of the conserved hairpin structure surrounding the lattertwo AUG codons extends from 96 to 105 and from 110 to 118.The signal peptidase cleavage site is found between Gly-22 and Glu-23. B, DNA sequence of exons 2-5, including intron sequences. Exon 3 10

.

20

30

C~CCCCAGTCTACC~ffiAGTG~AGTAGAG

AloScrProV~lTyrProCluScrAloG~yVI1Gln

.

.

40 20

10

30

100

.

SO

. .

9070

60

80

0

~ ~ ~ ~ C C ~ C C 8 C t ~ C C C C C ~ ~ O C C C t t t 8 ~ C t 8 C C C C ~ ~ O ~ ~ C C C C C c c c 8 ~ 8 t c c c c c t t t ~ g g c c ~ t ~ ~ ~ ~ g

Exon 4 30

20

10

.

GGTCCCCTAMGGffiACAMIGGCCCCA(i~~A(iACAGGC

ClyProLyrOlyAspThlGlyProArg(ilyAspArg 110 90

80

70

60

50

10

40 20

.

30

.

.

8t8c~t~ccacccc~8c~ccocccc8ccct8gcc~ccccc~caccg8t8cccc~ct~ctg~t~~ccccctctcc~~~cccctt~t~g

. .

10

.

20

30

.

40

Exon 5 7 0 50

.

60

.

80

90

.

GGACTCCCCGC~CCCCCCCGCAGAGATGGCATCCCIT~ACACC~GiGCCCCTCCCGCGACCCCCAG(i~CCCTCCAGGACCCCTCCAG(i, ,,

.

,,

GlyLcuProClyProPrffi1y~gAspClyIlePr~~lyGlnPr~~~yLe~~ffi~yProProClyProPr~lyProPrffily...... 140

120

130

GeneCollagen Unusual Sequences in the al(I) transcription (Mellon et al., 1981; McKnight et al., 1981; Grosveld et al., 1982; Charnay et al., 1985) of eukaryotic RNA polymerase I1 genes. The CAAT box in this gene is actually inverted and complementary to thesequence ATTGG on the sense strand, as was found in the Herpes simplex thymidine kinase gene (Jones et al., 1985). Inspection of the promoter region of the chicken pro-a2(1) gene (Vogeliet al., 1981; Tate et al., 1983), as well as of the mouse (Harbers et al., 1984; Schmidt et al., 1984), and human (Chu et al., 1985; Dickson et al., 1985) pro-al(1) and pro-a2(1) collagen genes suggests that all of these genes have a similar inverted CAAT box, not -100 previously identified, located at -80 in the a2(I) and at in the al(1) genes. The CAAT boxes previously identified either had only one A or more than two As. While the functional role of the inverted CAAT box has not yet been shown for any collagen gene, only the inverted CAAT box is essential for efficient transcription in the thymidine kinase gene, even though the latter has a perfect CAAT sequence on the sense strand (Jones et al., 1985). The chicken pro-al(1) collagengene does not have the 40-bp pyrimidine stretch centered at -150 in both the human pro-al(1) (Chu et al., 1985) and mouse pro-al(1) genes (Harbers et al., 1984). Instead, a tandem repeat of the sequence GsAPywas found between -120 and -137. This repeat is contained within an 18-bp purine-rich sequence which has at least 80% homology with two sequences located between -168 and -186, and between -200 and -214 in the mouse pro-al(1) collagen promoter. There areno other sequences of comparable length and homology in this region of the chicken and mouse al(1) genes. The most striking difference between previously identified type I collagen genes and thechicken pro-al(1) collagen gene was found in the region extending 5’ from -181. Beginning at this position and extending 570 bp in the 5’ direction, 25 copies of a highly conserved, tandemly repeated element were identified, shown in Fig.3. The individual repeat elements -754

C T T C C C C T T C C C C C C C C C

T T T T T T T T T T T T T T T T T T

T T T T T T T T T T T T T T T T T T

T T T T T T T T T T T T T T T T T T

G G G G G G G G G G G A G G G G G G

G G G G G G G G G G G A G G G G G G

T G G G G C A G G G G G G G T T T

G A A G G G A G G G G G G G A A A

T T T T T T T T T T T T T T T T T T

T C C C C C C C C C C C C C C C C C

T C C C T C C C C C C C C C C C C C

C T T C C C T C C C T C C C C C C C

C C C C C C C T C C T C C C C C C C

A A A A A A A A A A A A A A A A A

T T T A T T T A A T T A G T A A A A

C C C C C C T A C T C C T T T C C T

C C C T C A C T T C A T C C C T T C

C C C C C C C C C C C C T C T C C T

A A A A A A A A A A A A A A A A A A

A A A A A A G A A T A A A T A A A A

A A G C C G G C C G A C C G C C C C

T T A C C T T T T T T T C T C C C C

A A G T G G G A A G A G A A G A A A A

C T T T G G T A T C c C C C G T C C - A G A C G

C T C C

T T T G G G G T C T T T G G G G T C T T C G G - - T C T T C G & + G C C C

C C C C

C C C C

C A T C T C A A C A A C A T T C C A G G C G C A T C C C A A A C T A T T C C A A G C G

C T T T G G G A T C C C C A T C Q C A A C C T C

C T T C A G G G T C C C C A T C C C A C C C C -181 Consensus *ens C T T T G G G G T C C C C A T C C C A A N P Q

zo n

21 12 23

2.

IS IS

PI

24 2s

11 23 21 16 (6 16 21 ?I II

FIG. 3. 23-base pair polymorphic tandem repeat sequence. The DNA sequence of the 23-base pair tandem repeats has been aligned to generate maximal homology. The first nucleotide of the first repeat and the last nucleotide of the last repeat are indicated. The consensus sequence for these repeated elements has also been derived, below which the number of times each nucleotide occurs within the 25 copies of the repeat. N corresponds to any of the four nucleotides, P for pyrimidines, and Q for purines.

13327

vary in size between 21 and 25 bp and are 78% homologous to the consensus sequence (Fig. 3). The exact 5‘ end of these repeat units was not determined. Additional repeat elements were not identified followingDNA sequence analysis surrounding the NcoI site, 350 bp 5‘ to the 5’-most repeat. A t most, an additional 15 repeat elements could be present. The 23-bp repeat sequence is not found between -180 and -270 in the human al(1) (Chu et al., 1985) or between -180 and -220 in the mouse al(1) (Harbers et al., 1984) procollagen gene. In Vitro SI Nuclease Hypersensitivity-As a first step to identifying regions that might be important for regulation of the chicken pro-al(1) collagen gene, we determined whether the pro-al(1) collagen gene contained a site sensitive to S1 nuclease digestion in vitro in supercoiled plasmids. Such sites have been shown to correspond to the tissue-specific DNase I hypersensitive sites in a variety of genes (Mace et al., 1983; Schon et al., 1983; Finer et ai., 1984; McKeon, 1984). Plasmid DNA was digested with S1 nuclease to convert 50% of the DNA to linear and 50% to relaxed circles (Fig. 4 A , lane f). In these experiments, the 4.6-kb EcoRIISalI fragment of S A / S51 (containing2.6 kb of 5“flanking sequences, the first and second exons, and the first intron) was subcloned into the vector SP65 anddesignated pRS4.6. Followingdigestion with S1 nuclease, this plasmid was digested with BgDI (Fig. 4.4, lane c), NcoI (Fig. 4 A , lane d ) , and EcoRI (Fig. 4A, lane e). The DNA fragments were electrophoresed on a 1%agarose gel. The sub-bands obtained from each of the three digests were found to map to a unique site within the plasmid, 1.7 kb 5’ from Sal1 site, shown in Fig. 1. This corresponds to a site within the first intron, about 0.3 kb 3‘ to the first exon. As a control, plasmid DNA waslinearized first with BglII and then digested with S1 nuclease (Fig. 4 A , lane b). No sub-bands were produced in addition to the linear plasmid, demonstrating that the in vitro S1 hypersensitive site is dependent upon a supercoiled DNA substrate. To map the S1 site at higher resolution, S1-digested DNA was digested with either BamHI or AccI and the products analyzed on a 1.7% agarose gel. Digestion of plasmid DNA with BamHI alone yields four fragments: 5.0, 2.0, 0.43, and 0.10 kb (Fig. 4B,lane g). Digestion with S1 nuclease followed by BamHI digestion produces sub-bands of 1.2 and of 0.8 and 0.7 kb, with a concomitant reduction in intensity of the 2.0kb band (Fig. 4B, lunes b-e). No variation within the S1 site was seen over the range of 0.5-4.0 units Sl/pg plasmid DNA, suggesting that thissite is a stable, recognizable feature within the supercoiled plasmid. A map showing the precise location of the S1 site based upon this data is shown (Fig. 4B). The location was confirmed by digestion of plasmid DNA with AccI following S1 digestion. The intensity of the 2.1-kb AccI band decreased as a result of the generation of new sub-bands of 1.6 and of 0.45 and 0.35 kb. Each sub-band is a doublet, approximately 100 base pairs apart. Thissame doublet is also identifiable in the NcoI, BgDI, and EcoRI digests, shown in Fig. 4A. Therefore, two in vitro S1 hypersensitive sites have been identified about 250 and 350 nucleotides 3’ to exon 1. To determine the DNA sequence of the region containing the in vitro S1 hypersensitive site, the distance between the S1 hypersensitive sites and an XmnI site was determined to be 185 and 95 bp, +5 bp (data not shown). An XmnI digest of pRS4.6 was labeled with [y3*P]ATP andrecut with AccI. The 0.51-kb fragment was isolated and sequenced by chemical cleavage (Maxam and Gilbert, 1980). The DNA sequence is shown in Fig. 4C. It contains seven tandem repeats of the sequence GGGGAGA,followedbyGCGGAGA and seven additional tandem repeats of GGGGAGA. Based upon the

Unusual Sequences in the al(I) Collagen Gene

13328

A

a b c d e f g

I 4-

3.0

1.8) 1.6-)

Lane E EcoRl

kb

1.80 kb

Lane C Bgl II

Lane D Nco I

Accl BamHl a b c d e f g h i j k l m

BamHl

0.45 0.35

4 1.6 1.2

Exon 1 Accl S1 S1 Xmnl

-+ 0.7 0.8

0 . 8 -& 0 . 7 -W

4 0.45 4 0.35

b

*

4-

Exon 2

0.5 kb

13329

Unusual Sequences in the a1 (I) Collagen Gene distance between the XmnI site and the S1 hypersensitive sites, the S1 hypersensitive sites correspond to the 5’ and 3’ junctions of the repeated heptamer and theunique DNA. The 5’-Flanking Repeat Sequence and SI Hypersensitive Site Repeat Are Located at Other Positions within the Chicken Genome-To determinewhether the %base pairtandem repeat located 5’ to the transcription initiation site and the purine-rich in vitro S1 nuclease hypersensitive site were located at other positions within the chicken genome, a reverse hybridization experiment was carried out. Total chicken genomic DNA was labeled with c~-~’P-labeled nucleoside triphosphates by nick translation and hybridized to an EcoRI, SalI, AccI triple digest of cloned XSA/S51 DNA. Repetitive sequences are present within the labeled genomic DNA at sufficient concentration tohybridize to cloned DNA, whereas single copy DNA sequences are not. This approach has been used previously to identify repetitivesequences in the chicken ovalbumin and U1 snRNA genes (Stumph et al., 1981) and the chicken heat shock gene (Dybvig et al., 1983). Labeled genomic DNA hybridized to anEcoRI/AccI/SalI digest of the plasmid pRS4.6 yielded positive hybridization to the2.1- and 2.6-kb fragments (Fig. 5 , lane a). Densitometric scanning of the Southern blots showed that the 2.1-kb AccI/AccI fragment, containing the in vitro S1 nuclease hypersensitive site, hybridized six times more strongly than the 2.6-kb EcoRI/ AccI fragment containing the 23-bp 5”flanking sequence repeat. When corrected for the 3.3-fold difference in the size of the two elements, the repetition frequency for the S1 hypersensitive sequence is approximately 20-fold greater than that of the 23-bp tandem repeat element. As a positive control for the identification of repetitive sequences using this technique, labeled genomic DNA was also hybridized toan EcoRI/ HindIII digest of the plasmid 71-68. This plasmid, which contains exons 6-15 (formerly identified as 37-45)of the chicken a2(1) procollagen gene (Wozney et al., 1981;Boedtker et al., 1985), also contains a “CR1-like” repetitive DNA sequence4 within the 3.6-kb EcoRI/HindIII fragment. This fragment hybridizes to labeled genomic DNA 50 times stronger than the S1 hypersensitive sequence (Fig. 5, lane c). In addition to serving as a positive control, the hybridization of labeled genomic DNA to this sequence can be used as an internal standard todetermine the approximate copy number of the S1-sensitive repeat and the 23-bp tandem repeat. Solution hybridization studies have shown that CR-1 elements are present at 6800 copies per haploid genome in chicken (Stumph et al., 1981). Correction for the 22-fold greater size of the CR-1 elementsuggests that therepetition frequency of the S1-sensitive sequence is approximately 3000 copies per B. de Crombrugghe,personal communication.

haploid genome, and the 23-bp tandem repeat sequence is present at approximately 150 copies per haploid genome. Genomic Southern blotting was used to confirm the results of the reverse hybridization experiment. If the S1 hypersensitive sequence and the 23-bp tandem repeat sequence are repeated at otherlocations within the genome, new restriction fragments, in addition to restriction fragments corresponding to the pro-al(1)collagen gene, will hybridize to cloned probes containing thesesequences. A 2.1-kb AccI/AccI fragment containing the in vitro S1 nuclease hypersensitive site and a 1.4kb NcoIIAccI fragment containing the 23-bp polymorphic tandem repeat sequence were purified by elution from agarose gels and labeled by nick translation. Theselabeled fragments were hybridized to identical Southern blots containing BamHI, DruI (an isoschizomer of AhaIII, which is no longer available), KpnI, and NcoI digests of chicken genomic DNA fractionated on 1%agarose gels. The pro-al(1) collagen genomic fragments that should hybridize to each probe have been indicated with arrows for each digest. Fig. 6 (right)shows that the 2.1-kb AccI/AccI fragment hybridized to multiple additional fragments of variable intensity, confirming that the S1 hypersensitive sequence is present at other locations within the chicken genome. These fragments could not be ascribed to partial digestion products because they are both larger and smaller than authentic pro-al(1) collagen bands. In contrast, the1.4-kb NcoI/AccIfragment hybridizes to only a few additional bands ot,her than the authentic pro-al(1) procollagen bands (Fig. 6, left). Therefore,both DNA sequences are present at other loci within the chicken genome, in additionto the pro-al(1) collagen gene.The variable intensity of the additional bands reflects the copy number of the repeat sequence at each location. The far greater number of bands that hybridize to the S1 hypersensitive sequence is consistent with the reverse hybridization experimentdescribed above, which indicated that the S1 sensitive repeat is present at about a 20-fold greater copy number than the 23bp tandem repeat. DISCUSSION

In thiscommunication, we describe the isolation and characterization of a 5.1-kb StuI fragment which contains 2.5 kb of DNA sequence encoding the first five exons of the chicken pro-al(1) collagen gene, as well as an additional 2.6 kb of 5‘flanking gene sequence. This region was isolated in order to identify DNA sequences important for transcriptional and translational control. The 5.1-kb StuI fragment was identified by genomic Southern blotting using a pro-al(1) amino-terminal propeptide cDNA clone as a probe, and an enriched fraction containing this fragment was cloned into Xgtll. This

shown in b n e f, followed by digestion of 1-pg samples with BglII (lane c), NcoI (lane d ) , and EcoRI (lane e ) . The digestion products were electrophoresed on a 1% agarose gel and stained with ethidium bromide. Lanes a, HinfI digest of pBR322; lane g, HindIII digest of X for size markers. Lane b, pRS4.6 DNA linearized with BgZII followed by digestion with SI nuclease. Maps identify the location of the in uitro S1 hypersensitive site with respect to the restriction sites. B, high resolution mapping of the in vitro S1 hypersensitive site by digestion first with SI nuclease at concentrations of 0.5, 1.0, 2.0, and 4.0 units/pg of DNA followed by digestion with BarnHI (lanes b-e) and AccI (lanes h-k). Lane g and 1, pRS4.6 digested with BamHI or AccI only, respectively. Lane f , BglII linearized pRS4.6 DNA digested with S1 nuclease, followed by BarnHI. Restriction digests were electrophoresed on a 1.7% agarose get. Lane m, Hinff digest of pBR322; lane a, HindIII digest of X DNA for size markers. Arrows on the left identity the 1.2-kb and 0.8- and 0.7-kb sub-bands derived from S1 digestion of the 2.0-kb BamHI fragment. Arrows on the right identify the 1.6-kb and 0.45- and 0.35-kb fragments derived from SI digestion of the 2.1-kb AccI fragment. The map on the right identifies the location of the in vitro S1 hypersensitive site with respect to exon 1. C, The 3.7-kb XmnI fragment of pRS4.6 was end-labeled with [y3’P]ATP, digested with AccI. The 0.51-kbfragment was isolated, sequenced by chemical cleavage, and electrophoresed on a 6% acrylarnide,8.3 M urea sequencing gel. The distance from the XrnnI to the SI hypersensitive site was determined to be 185 and 95 bp +5 bp (data not shown) and is indicated on the map in Fig. 4B. This size determination was used to position the S1 cleavage sites on the DNA sequence. The arrows indicate the sites cut by S1 nuclease.

13330

Unusual Sequences Chicken Genomic DNA

in the al(I) Collagen Gene

method for cloning DNA fragments that grow or amplify poorly in these vectors. This approach may also prove useful a b C in examiningalleles from diseased patients, when the isolation of only a small region of a gene is required. Asimilar approach using a plasmid vector has beenused to isolate a clone encoding a human a2(I) procollagen gene fragment from a patientwith osteogenesisirnperfecta (Pihlajaniemi et al., 1984). The 10-100-fold increase in cloning efficiency per pg of insert that can be obtained with X (compare Hanahan, 1983, and Young and Davis, 1983) makes this method preferable when starting material islimited. Two unusual DNA sequences were identified within the chicken pro-al(1)collagen gene. The firstsequence was identified as a result of DNA sequenceanalysis. Starting 181 nucleotides 5' to the transcription initiation site, 25 copies of a polymorphic 23-base pair, tandemly repeated element were identified.Approximately150copies of this sequence are present in the chicken genome, estimated by reverse Southern blotting. Twenty five to fortyof these are present on the1.4kb NcoI/AccI fragment, the remainder present 4onto 6 other DNA fragmentsa t loci other than the pro-al(1) collagen gene. A similar 15-bp, tandemly repeated element has been identified in the human insulingene. The insulin element,like the collagen element, is made up of tandem repeats of a polymorphic sequence repeat. However, unlike the collagen element, the insulin element is only present at theinsulin locus in the humangenome (Bell et al., 1982). Transient expression studies haveshown that thissequence is not part of the tissuespecific insulin enhancer (Walker et al., 1983; Edlund et al., 1985). This suggests that the 23-bp tandem repeat sequence within the chicken pro-al(1)collagen gene may also lack the expression properties of a transcriptional enhancer. Transient experiments arebeing carried out to test this hypothesis. The second unusual sequence was identified based uponits 0.5 in vitro hypersensitivity to S1 nuclease in supercoiled plasmids. Two distinct hypersensitive sites were located about FIG. 5. Determination of therepetition frequency of the23- 250 and 350 nucleotides 3' to the first exon. The sequence bp tandem repeat and the in vitro S1 nuclease hypersensitive (GGGGAGA),(GCGGAGA)(GGGGAGA), was found within sequence within the chicken genome. Reverse hybridization of the hypersensitive region. The sites of S1 nuclease digestion labeled genomic DNA to cloned procollagen DNA immobilized on correspond to the borders of unique DNA and the repeated nitrocellulose filters was carried out in order to estimate the copy in vitro S1 nuclease number of the 23-bp tandem repeat sequence and the in uitro S1 purine-rich heptamer. In contrast, the and a2(I) procollagen nuclease hypersensitive tandem repeat sequence. The intensity of the hypersensitive site in the chicken mouse two sequenceswas hybridization of genomic DNA tothelatter genes has been located 180 and 145 base pairs, respectively, compared to the intensityof hybridization to a cloned chicken CR-1 5' to thetranscriptioninitiationsite(Finer et al., 1984; element, with a repetition frequency of 6800 copies in the chicken McKeon et al., 1984). The S1 hypersensitivesiteinthe genome. In thisfashion, the repetitionfrequency of the 23-bp tandem repeat and the in uitro S1 nuclease hypersensitive sequence could be chicken gene corresponds to two sites 10bp apart, located in of XSA/S51; lane b, a 40-bp pyrimidine stretch (Fineret al., 1984). Although both determined. Lane a, EcoRI/AccI/SalIdigest collagen genes contain labeled Hind111 digest of XDNA; lane c, an EcoRI/HindIII digest of the human pro-al(1) and the pro-a2(I) the plasmid 71-68 containing exons 6-15 (formerly identified as 37analogous pyrimidine stretches, it is not known whether they 45) of the chicken pro-a2(I) collagen gene. Digests were electrophoare sensitive toS1 nuclease in vitro (Chu et al., 1985; Dickson resed on a 1%agarose gel, transferred to nitrocellulose, and hybridized et al., 1985). This pyrimidine stretch is absent from the 5'to nick-translated genomic DNA. flanking sequence of the chicken pro-al(1)collagen gene. The in vitro S1 nucleasehypersensitive sitealso correstrategy was pursued due to thefailure of initial attempts to sponds to the tissue-specific in vivo DNase I hypersensitive isolate the 5' end of the chicken pro-al(1) collagen gene by site in the chicken pro-a2(1) and mouse pro-al(1) collagen screening either primary or amplified genomic libraries. Restriction mapping studies showed that this clone is rich in genes (McKeon et al., 1984; Breindl et al., 1984). Insertion of MboI sites. This could partially account for its under-repre- a single copy of the Moloney murine leukemia virus into the sentation within a genomic library constructed by MboI par- first intron of the mouse pro-al(1) collagen gene results in the loss of the DNase I hypersensitive site in the mutated tial digestion of genomicDNA. However, the difficultyin propagating this phage and its derived plasmid subclones, in allele (Breindl et al., 1984) and in a 20- to 100-fold reduction particular those containing DNA sequences from within the in the transcription of the gene (Hartung et al., 1986). In first intron,"suggested that the clone contains sequences contrast, the human pro-al(1) gene has no in vivo DNase hypersensitive site in this region. While several hypersensitive difficult to replicate in plasmids or bacteriophagevectors. start Therefore, the alternative approach described aboveis a useful sites were identified, the only site near the transcription site is located within the first intron, 1.0 kb 3' to the tran'M. H. Finer, unpublished data. scription initiation site (Barsh et al., 1984). The in vitro S1

Unusual Sequences

in the al(I) Collagen Gene

13331 a b c d e

a b c d e 23.3

23.3

9.4

9.4

6.7

6.7

4.4 Aha 111 6

Barn HI -b

2.3 2.0 Barn HI -b

,u ii

0.5

4.4

+ 111 +

Aha 111 Aha

+Kpn I -

NCOI

I

2.3 2.0 Barn HI+

Barn HI + 0.5

FIG. 6. 23-bp tandem repeat sequence and the in vitro S1 nuclease hypersensitive sequence are located at other positions within the avian genome. Cloned DNA fragments containing the 23-bp tandem repeat or the in vitro S1 nuclease hypersensitive sequence were nick-translated and hybridized to genomic DNA digested with BarnHI ( l a n e b), DraI (AhaIII, lane c), KpnI ( l u n e d), and NcoI ( l o n e e), electrophoresed on 1% agarose gels, and transferred to nitrocellulose. Nick-translated 2.1-kb AccI/AccI(right) and 1.4-kb NcoI/AccI (left) were hybridized to duplicate blots. Arrows indicate the predicted positively hybridizing fragments for each digest based upon extensive genomic mapping. Lanes a contains a labeled Hind111 digest of X DNA.

nuclease hypersensitivity of this sequence has notbeen deterbeen correlated with mined. DNase I hypersensitive sites have enhancer elements in other cellular genes as well as DNA viruses (Edlund et al., 1985; Bergman et d., 1984; Cremesi, 1981). This suggests that the in vitroS1 hypersensitive site in vivo in the chicken pro-al(1)gene might correspond to the DNase I hypersensitive siteand be a potentialenhancer element in this gene. The existence of transcriptional controlling elements, both 5' to the transcription initiation site and within the first intron, has been observed both in the human @-majorglobin gene (Charnay et al., 1984; Wright et ~ l . 1984) , and in the rearranged immunoglobulin heavy and light chains (Bergman et al., 1984; Banerji et d., 1983). Therefore, it would not be unusual to find an enhancer element within the first intronof the chicken pro-al(1)collagen gene. There is one caveat to this argument. Reverse hybridization experiments have demonstrated that the chicken pro-al(1) S1 hypersensitivesequence is present a t 3200 copies per haploid genome. Southernblottingexperimentsconfirmed that these copies are distributed at many loci within the genome. A similar sequence, AGAGG, has been identified 5' to a chicken heat shock gene and isrepeated 32 times (Dybvig et al., 1983). This sequence shares the properties of in vitro S1 nuclease hypersensitivityas well as location atother positions within the chickengenome with a comparable repetition frequency of 2000 copies per haploid genome. These

sequences resemble highly repeated satellite sequences found in heterochromatic satelliteDNA (Brutlag, 1980). Repetition of S1 hypersensitive sequences at other locations within the genome could rule out the possibility that these sequences bind tissue-specific factors. However, a protein that is unique to B cells has been shown to bind to a sequence with the immunoglobulin K chainenhancer which is identical to a sequence in the SV40 enhancer (Sen and Baltimore, 1986). Hence specificity can be defined by the protein andneed not be a function of the DNA sequence. Acknowledgment-We want to thank Elizabeth Twombly for her excellent technical assistance. REFERENCES Adams, S. L., Alwine, J. C., de Crombrugghe, B., and Pastan,I. (1979) J. Biol. Chern. 254,4935-4938 Adams, S., Boettiger, D., Focht, R. J., Holtzer, H., and Pacifici, M. (1982) Cell 30, 373-384 Alema, S., Tato, F., and Boettiger, D. (1985) Mol. Cell. Biol. 5, 538544 Banerji, J., Olson, L., and Schaffner, W. (1983) Cell 33,729-740 Barsh, G. S., Roush, C. L., and Gelinas, R. E. (1984) J. Biol. Chern. 259, 14906-14913 Bell, G . I., Selby, M. J., and Rutter, W. J. (1982) Nature 295.31-35 Benoist, C., and Chambon, P. (1981) Nature 290,304-310 Bergman, Y.,Rice, D., Grosschedl, R., and Baltimore, D. (1984) Proc. Natl. Acad. Sci. U. S. A. 81,7041-7045 Blin, N., and Stafford, D. W. (1976) Nucleic Acids Res. 3,2303-2314

Unusual Sequences

13332

in the a1(I) Collagen Gene

Boedtker, H., Finer, M., and Aho, S. (1985) Ann. N. Y. Acad. Sci. 460.85-116 Bornstein, P., and Sage, H. (1980) Annu. Reu. Biochem. 4 9 , 9571003 Breathnach, R., and Chambon, P. (1981) Annu. Reu. Biochem. 50, 349-383 Breindl, M., Harber, K., and Jaenisch, R. (1984) Cell 38, 9-16 Brutlag, D. (1980) Annu. Reu. Genet. 1 4 , 121-141 Charnay, P., Treisman, R., Mellon, P., Chao, M., Axel,R., and Maniatis, T. (1984) Cell 3 8 , 251-263 Charnay, P., Mellon, P., and Maniatis, T.(1985) Mol. Cell. Biol. 5 , 1498-1511 Chu, M.-L., de Wet, W., Bernard, M., Ding,J.-F., Morabito, M., Myers, J., Williams, C., and Ramirez, F. (1984) Nature 3 1 0 , 337340 Chu, M.-L., de Wet, W., Bernard, M., and Ramirez, F. (1985) J . Bwl. Chem. 260,2315-2320 Cremesi, C. (1981) Nucleic Acids Res. 9 , 5949-5964 Dickson, L. A., de Wet, W., Di Liberto, M., Weil, D., and Ramirez, F. (1985) Nucleic Acids Res. 13, 3427-3438 Dienstman, S., Biehl, J., Holtzer, H., and Holtzer, S. (1974) Deu. Biol. 39,83-95 Dybvig, K., Clark, C. D., Aliperti, G., and Schlesinger, M. J. (1983) Nucleic Acids Res. 11, 8495-8508 Edlund, T., Walker, M.D., Barr, P. J., andRutter, W. J. (1985) Science 230,912-916 Finer, M. H., Fodor, E. J. R., Boedtker, H., and Doty, P. (1984) Proc. Natl. Acad. Sci. U. S. A . 8 1 , 1659-1663 Finer, M. H., Gerstenfeld, L. C., Young, D., Doty, P., and Boedtker, H. (1985) Mol. Cell. Biol. 5 , 1415-1424 Finer, M. H., Boedtker, H., and Doty, P. (1987) Gene 5 6 , 71-78 Focht, R. J., and Adams, S. L. (1984) Mol. Cell. Biol. 4, 1843-1852 Gerstenfeld, L. C., Crawford, D. R., Boedtker, H., and Doty, P. (1984) Mol. Cell. Biol. 4, 1483-1492 Gionti, E., Capasso, O., and Cancedda, R. (1983) J. Biol. Chem. 2 5 8 , 7190-7194 Grosschedl. R.. and Birnstiel, M. L. (1980) Proc. Natl. Acud. Sci. U. S. A. 77,. 1432-1436 Grosveld. G. C.. Rosenthal. A.. and Flavell. R. A. (1982) . . Nucleic Acids Res. 10,4951-4971 Grunstein, M., and Hogness, D. (1975) Proc. Natl. Acud.Sci. U. S. A. 72,3961-3965 Hanahan, D. (1983) J. Mol. Biol. 1 6 6 , 557-580 Harbers, K., Kuehn, M., Delius, H., and Jaenish,R. (1984) Proc. Natl. Acud. Sci. U. S. A . 8 1 , 1504-1508 Hartung, S., Jaenisch, R., and Breindl, M. (1986) Nature 320, 365367 Jones, K.A., Yamamoto, K. R., and Tjian, R. (1985) Cell 42, 559572 Mace, H. A. F., Pelham, H. R. B., and Travers, A.A. (1983) Nature 304,555-557 ,

I

Maniatis, T., Fritsch, E., and Sambrook, J. (1982) Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York Martin, G., Timpl, R., Muller, P., and Kuhn, K. (1985) Trends Biochem. Sci. 10,285-287 Maxam, A., and Gilbert, W. (1980) Methods Enzymol. 65, 499-560 McKeon, C., Schmidt, A., and de Crombrugghe, B. (1984) J. Biol. Chem. 259,6636-6640 McKnight, S., Gavis, E. R., and Kingsbury, R. (1981) Cell 2 5 , 385398 Mellon, P., Parker, V., Gluzman, Y., and Maniatis, T. (1981) Cell 2 7 , 279-288 Melton, D. A., Krieg, P. A., Rebagliati, M. R., Maniatis, T., Zinn, K., and Green, M. R. (1984) Nucleic Acids Res. 1 2 , 7035-7056 Myers, J., Dickson, L., de Wet, W., Bernard, M., Chu, M., Di Liberto, M., Pepe, G., Sangiorrgi, F., and Ramirez, F. (1983) J. Biol. Chem. 258, 10128-10135 Pesciotta, D., Silkowitz, M., Fietzek, P., Graves, P., Berg, R., and Olsen, B. (1980) Biochemistry 19,2447-2453 Pihlajaniemi, T., Dickson, L.A., Pope, F.M., Korhanen, V. R., Nicholls, A., Prockop, D. J., and Myers, J. C. (1984) J. Biol. Chem. 259,12941-12944 Rigby, P., Dieckmann, M., Rhodes, C., and Berg, P. (1977) J. Mol. B i d . 1 1 3 , 237-248 Sandmeyer, S., Gallis, B., and Bornstein, P. (1981) J. Bwl. Chem. 256,5022-5028 Sasse, J., von der Mark, K., Pacifici, M., and Holtzer, H. (1983) Prog. Clin. Biol. Res. llOB, 159-166 Schmidt, A., Yamada, Y., and de Crombrugghe, B. (1984) J. Biol. Chem. 259,7411-7415 Schon, E., Evans, T., Welsh, J., and Efstradiadis, A. (1983) Cell 3 5 , 837-848 Sen, R., and Baltimore, D. (1986) Cell 4 6 , 705-71 Stumph, W. E., Kristo, P., Tsai, M.-J., and OMalley, B. W. (1981) Nucleic Acids Res. 9 , 5383-5397 Southern, E. (1975) J. Mol. Bwl. 98,503-517 Tate, V., Finer, M., Boedtker, H., and Doty, P. (1983) Cold Spring Harbor Symp. Quant. Biol. XLVII, 1039-1049 Vogeli, G., Ohkubo, H., Sobel, M. E., Yamada, Y., Pastan, I., and de Crombrugghe, B. (1981) Proc. Natl. Acad. Sci. U. S. A . 78, 53345338 Walker, M. D., EdIund, T., Boulet, A. M., and Rutter, W. J. (1983) Nature 306,557-561 Wozney, J., Hanahan, D., Tate, V., Boedtker, H., and Doty, P. (1981) Nature 2 9 4 , 129-135 Wright, S., Rosenthal, A., Flavell, R., and Grosveld, F. (1984) Cell 38,265-273 Yamada, Y., Mudryj, M., and de Crombrugghe, B. (1983) J. Biol. Chem. 258,14914-14919 Young, R., and Davis, R. (1983) Proc. Natl. Acud. Sci. U. S. A. 80, 1194-1198