Human d(V1) Collagen Gene - The Journal of Biological Chemistry

0 downloads 0 Views 5MB Size Report
Mar 25, 2018 - Sl nuclease and primer extension analyses show that ... the anchored polymerase chain reaction. ..... Oligonucleotides used in the primer ex-.
Vol. 267, No. 9, Issue of March 25,pp. 6188-6196,1992 Printed in U.S.A.

THEJOURNAL OF BIOLOGICAL CHEMISTRY 0 1992 by The American Society for Biochemistry and Molecular Biology, Inc.

Human d ( V 1 ) Collagen Gene HETEROGENEITY AT THE 5”UNTRANSLATEDREGION

GENERATED BYAN

ALTERNATE EXON*

(Received for publication, September 4,1991)

Biagio Saittal, Rupert Timplt, and Mon-Li ChullII From the Department of Biochemistry and Molecular Biology and the TDepartment of Dermatology, Jefferson Instituteof Molecular Medicine, Thomas Jefferson University, Philadelphia, Pennsylvania19107 and the $Man-Planck-Institutfur Biochemie, 0-8033 Martinsried, Federal Republic of Germany

Cosmid clonescontaining the 5’ region of the human a2(VI) collagen gene have been isolated and characterized. DNA sequencing indicates that thesignal peptide and the amino-globular domain are encoded by four exonsof 142,596,21, and 66 base pairs (bp). However, S l nuclease and primer extension analyses show that the transcription start siteis not present in the142-bp exon. Two different 5’ cDNA clones are generated by the anchored polymerase chain reaction. Using the 5‘ cDNA clones as probes, two untranslated exons ( 1 , l A ) are found 12 kilobase pairs upstream of the firstcoding exon. These two exons are alternatively used in human fibroblasts, and most transcripts contain exon 1 sequence. Exon 1 shows, by primer extension and S1 nuclease protection assay, two major and several minor transcription start sites. The promoter region contains a canonical TATA box, seven GGGCGG sequences, two possible CAAT boxes, and two sequences resembling AP2 binding sites. Exon 1 A contains three alternative splice donor sites andis located 650 bp downstream of exon 1. The most 3’ splice donor site of exon 1 A is found within an Alu repeat sequence. Exon 1 A is preceded byfive GGGCGG sequences and one resembling the AP2 binding site although neither TATA or CAAT boxes are found. Two additional GGGCGG sequences are located at the beginning of exon 1A. This study establishes that the human a2(VI) collagen gene is 36 kilobase pairs long and contains 30 exons. The 5‘untranslated and promoter regions are significantly different from the corresponding segments of the chicken gene. The human gene produces by alternative processing multiple mRNAs differing in the 5’-untranslated region as well as the 3’-coding and noncoding sequences.

Type VI collagen forms a major class of microfibrils found in most tissues of vertebrates (for areview see Ref. 1).In situ, * This work was supported in part by National Institutes of Health Grants AR 38912,AR 38923, AR 19616, and AR39740 and by Deutsche Forchungsgemeinshaft Grant SFB 266. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. The nucleotide sequence(s) reported in this paperhas been submitted to the GenBankTM/EMBLDataBankwith accession numberls) M81834, M8183.5, and M81836. $ On leave of absence from the Istituto di Biologia dello Sviluppo, Consiglio Nazionale delle Ricerche, Palermo, Italy. I( To whom correspondence should be sent: Dept. of Biochemistry and Molecular Biology,Thomas Jefferson University, 233 S. 10th St., Philadelphia, PA 19107.

these microfibrils localize close to cells, nerves, blood vessels, and large collagen fibrils andare considered to have an anchoring function(2,3). Consistent withsuch a function are the biochemical findings that type VI collagen binds cells (4, 5) and that its fusion protein binds type I collagen (6). The binding activity also implies that in addition to its structural role, type VI collagen may be involved in cell migration and differentiation and embryonic development (7). Type VI collagen is unusual among collagens in that the noncollagenous domains comprise 80% of its total mass. It consists of three chains, al(VI), a2(VI), and a3(VI), with molecular masses of 140, 140, and 300-340 kDa, respectively (1, 8, 9). Full-length cDNAs encoding all three chains from the human (10-12) and the chicken (6, 13-16) have been cloned and sequenced. Analysis of their primary structure indicates that each chain contains a central collagenous domain of 335-336 amino acid residues, which is flanked by a large globular domain at both amino and carboxyl termini. The globular domains consist of repetitive motifs of approximately 200 residues which share significant identitywith similar domains in von Willebrand factor, certain integrins and complement components, and the cartilage matrix protein. The al(V1) and a2(VI)chains have one such repeat in the amino-globular domain (Nl) and two in the carboxylglobular domain (C1 and C2). The a3(VI) chain has a total of 11 repetitive motifs, with 9 found in the amino-globular domain. Collectively, the domain structures suggest that the three type VI collagen genes may have evolvedby exon shuffling and duplication of two primordial genes, one coding for the collagenous domain and the other the 200-residue repetitive motif. The human gene encoding the a2(VI) collagen is present as a single copy on human chromosome 21q22.3 in proximity to the al(V1) collagen gene (17). Recently we have isolated and characterized the genomic region coding for the carboxylglobular domain and the triple-helical domain of the human a2(VI) chain (18, 19), and other investigators have reported the complete structure of the chicken a2(VI) collagen gene (20-22). In this study we have completed the isolation and characterization of the entire human a2(VI) collagen gene. We demonstrated here that even though the exon structure of the human and chicken a2(VI) genes is highly conserved, the 5‘ end of the human gene is significantly different from that of the chicken gene. Unlike the chicken gene, the human gene possesses a canonical TATA box in thepromoter region and transcripts predominantly initiate 43 and 47 bp’ downstream of the TATA box. Perhaps a more striking featurehas been elucidated by the cloning of the 5’ end of the human The abbreviations used are: bp, base pair(s); kb, kilobase pairb); PCR, polymerase chain reaction.

6188

The 5' End of the Human a2(VI.. Collagen Gene

6189

ng of actinomycin D. The reaction mixtures were ethanol precipitated and separated on a 6% polyacrylamide sequencing gel. Anchored PCRCloning of the 5' cDNA-Anchored polymerase chain reaction (PCR) was performed as described by Loh et al. (26). Briefly, single-stranded cDNA was prepared from 1 pg of fibroblast poly(A)+ RNA using 1 ng of primer f (see Fig. 5) and 200 units of MATERIALS AND METHODS murine Moloney virus reverse transcriptase under conditions sugIsolation of GenomicDNAClones-A human leukocyte genomic gested by the manufacturer(Bethesda Research Laboratories). A library constructed inpHV4 cosmid vector (18) and a human cosmid poly(G) tail was added to the cDNA with terminal deoxynucleotide library in the vector pWE15 were obtained from Stratagene (La Jolla, transferase in 2 mM C o c l ~1, mM dGTP for 1 h at 37 "C. After phenolCA) and screened by standard procedures (23). The probe used was chloroform extraction and ethanol precipitation, the cDNA was used a 1.8-kb EcoRI-BamHI fragment isolated from the 5' end of the for PCR amplification using 4.0 units of Taq polymerase (Promega, a2(VI) cDNA clone, designated F225, which contained the 21 bp of Madison, WI) and 70 ng of each primer in 100 p1 of the standard the 5'-untranslated region, the coding regions for the amino-globular buffer (Perkin-Elmer Cetus Instruments). The primers included one domain, and most of the triple-helical domain (10). Positive clones specific for the 5' end of the previously isolated cDNA (e in Fig. 5) were purified by three or four sequential rounds of screening, and the and a mixture of the 33-mer EPSC primer (CGGAATTCTGCAGTCAGCT(C)14) and 19-mer EPS primer (CGGAATTCTGCAGTcosmid DNA was prepared by the alkaline lysis method (23). SequenceAnalysis-Genomic DNA fragments were subcloned into CAGCT) at a ratio of1:9. The PCR was carried out in a thermal Bluescript vectors (Stratagene). Plasmid DNA was prepared by a cycler (Coy Laboratory, Ann Arbor, MI) for 25 cycles of 2 min each small scale boiling procedure (23) and sequenced by the dideoxy chain at 94,52, and72 "C. Five p1 of the final reaction mixture was ligated termination method (24) using 35S-dATPand Sequenase (U. S. Bio- with SmaI-digested pUC19 plasmid, transformed into Escherichia coli chemical Corp.). Nucleotide sequencing was first performed with (DH5a, Bethesda Research Laboratories), and screened with primer primers derived from the vector and the cDNA sequences. Further e (see Fig.5). DNAs from positive clones were isolated and sequenced sequences were obtained with primers synthesized on the basis of the as described above. newly determined DNA sequences. RESULTS RNA Isolation-Established human diploid fibroblast cell lines 3349 and 1520 were obtained from the Coriell Institute for Medical Isolation and Mapping of the Genomic Region Codingfor the Research (Camden, NJ). Osteosarcoma cells (Saos-2) were obtained from the American Type Culture Collection. Cellsweregrown in Amino-globular Domain-Two cosmid clones, 7a and B10, Dulbecco's modified Eagle'smedium containing 10% fetal calf serum. were isolated by screening of approximately 4 X lo5 clones Total RNA and poly(A)+RNA were prepared as described previously from each of the two cosmid libraries. Restriction enzyme (18). Total RNA of osteosarcoma cells was a generous gift of David mapping indicated that B10 (35 kb in pHV4 vector) and 7a Stokes, Thomas Jefferson University. (35 kb in pWE15 vector) overlapped by 20 kb (Fig. 1).The SI Nuclease and PrimerExtension Analyses-Nuclease S1 analysis 4.2-kb Hind111 fragment from the B10 cosmid clone that was performed as described previously (18, 25). Primer extension analysis generally followed the protocol described in Sambrook et al. hybridized to the 5' end of the cDNAwas subcloned into Bluescript. The nucleotide sequences corresponding to the (23). Briefly, oligonucleotide primers were 5' end labeled with [y-"P] noncoding ATP (3,000 Ci/mol) using T4 polynucleotide kinase. The primers entire coding region, parts of introns, and the were annealed to 1 pg of poly(A)+RNA a t 50 "C for 1 h in 20 pl of 10 flaking regions were determined. mM Tris, pH 8.0,l mM EDTA, and 75 mMKC1 containing 1 unit of As shown in Figs. 1 and 2, the amino-globular domain is RNasin. After ethanol precipitation, the primers were extended for 1 encoded by four exons (exons 2-5), the first of which (exon h at 37 "C with 200 units of murine Moloney virus reverse transcriptase in 20 pl of a mixture containing 50 mM Tris, pH 8.3; 75mM KCI; 2) codes for the signal peptide and amino acid residues 1-18. 3 mMMgC1,; 10 mM dithiothreitol; a 0.5 mM concentration of dATP, This exon also includes 21 bp of noncoding DNA found at the dCTP, dGTP, dTTP and in the presence of 1unit of RNasin and 50 beginning of the F225 cDNA clone. In these experiments,the

a2(VI) cDNA, which has revealed multiple mRNA species that differ in their5"untranslated regions. This heterogeneity has been generated by the use of an alternative 5' exon that contain three splice donor sites.

Cos E10 (35 Kb)

Cos7a (35 Kb)

5'

4 Kb H

H

E

H E

E

IZyKb

""""""'

-

4.2

Kh, "-"""""

I

I

I I

0.5 K b

66

P

(11

(1 A)

(2)

P

(3)

P

x (4)

x

P

I I H

(5)

FIG. 1. Genomic organization of the 5' end of the human &(VI) collagen gene. The top panel shows two overlapping cosmid clones, B10 and 7a. The middle panel shows the restriction map of clone B10, locations of the sequenced Alu repetitive elements, and positions of the 2.4-kb X-H and 4.2-kb H-H fragments. The Alu sequences are either inthe same or opposite orientation as the aP(V1) collagen gene, depicted as Alu (An) andAlu (Tn), respectively. Restriction sitesshown are: E, EcoRI; H, HindIII; B, BglI; P , PstI; X, XhoI. In the bottom panel a more detailed map of the subcloned region with a schematic representation of the exon/intron organization is presented. Exons coding for the 5'-untranslated region and for the amino-globular domain are 1, lA, and 2-5, respectively. Exon 5 is afusion exon also encoding the beginning of the triple helical domain. Exons are numbered from the 5' end (in parentheses below the exon), and the sizes of exons (in bp) are indicated above each exon.

The 5' End of the Human a2(VI) Collagen Gene

6190

73 217 361 505 649 793 931 1081

50 10 20 30 60 70 40 I I I I I I i aagctttggtcacaggttatgccacatttaaaaatgaattgggaaaaagtttatctttttatgttctaaaac aagttaaatagcatgaagtgcttgtccttgaagctgtgaaactcacccacaaactgcctgagctggcacctt tgggaacggacttcattcctccctgggccccaggctgagcaggtctggcctgggtcacctcatccactccag acgccgcaaaatcacagcaacccttcaggccctctcctggcagcgctcccctagtccggtgcttggacactc ctcccatctgcacgaggagagggcggcactcaatcctgcatcaccctggaactgcacgctctgaaccaggcc gtgccttctgccagcgtccggctgctgccagccccactctgcggagccagcagcacagtgaggcccgtctga tggcctccagctccaccctgcggagtcagcgggcacagtgaggccggtcctgatggcctcctggagatcctg gagcttaggccaagccacagggcatcagtgaggatggacgttcaagggctggatctgtttgaccccagggtc ctggcactccaacagataggaaagccaggctaatgacggctgtgtccctacacttgacagagtcctccctcc ctccttatcaaagtcctgtttaaagggaatggagccaggctggagagagtgcctagctctgcaggggagccg gtctggggaagctgggatctctctcccgccctcccctctgcactgcttccagggcagccccaggcatggggg gcgccagacagtggtgctccattcccttccatgccgacgcgcccagctacccactccacccagccctggaga catggaagggcctcaaccatccaaatcccacccaaaactgagcccagaggcacccactaaacatctgtgaca cccagggtggggcaagaggcgcaagccccccagtccagatgctggtgatggtgtgtgctgggcgcagacccc gcttccttgaagactgaggcagtgcccccaatcccgctgacctggtgtgcgtgcgcctgccatgggggaggg tgccaggggagaggcactgggggtgtctgagcgacccccacccctgttgcagGACTTCAGGGCCACAGGTGC TGCCAAGAT CTCCAGGGCACCTGCTCCGTGCTCCTGCTCTGGGGAATCCTGGGGGCCATCCAGGCCCAGCA

~ L Q G T c s v L L L w G I L G A I Q A Q Q e 1225 GCAGGAGGTCATCTCGCCGGBCACTACTACCGAGAGA~CAAC~cTGCCCAGgtgccagggtcgggccggggct Q E V I S P D T T E R N N N C P ~

-

x

O

n

2

~~

ctgggcatttgggggcagttggaccagtacccaggtgccaggggtcgggggccgggggctctgggcatttgg

1369 ggggcagttgggaccagtacccaggtgccaSoogttggggggccgggggctctggcattcgggggcagcggag gtcaaacccacaaacaggcacggggccaggaaacggggctccaacagtccctcctaggctggctcgtacagg

FIG. 2. Nucleotide sequence of the 4.2-kb HindIII fragment (Fig. 1, middle panel) coding for the aminoglobular domain. Underlined sequencesrepresentthe HindIII restriction sites.Arrows denote the directionof the primers e and f a t the beginning and end of exon 2, respectively (see alsoFig. 5). An Alu repetitive sequence is present in the last intron between positions 3866 and 4158 (dashed tines).

1513 tcctgtgccccagAGAAGACCGACTGCCCCATCCACGTGTACTTCGTGCTGGACACCTCGGAGAGCGTCACC E K T D C P I H V Y F V L D T S E S V T ATGCAGTCCCCCACGGACATCCTGCTCTTCCACATGAAGCAGTTCGTGCCGCAGTTCATCAGCCAGCTGCAG M Q S P T D I L L F H M K Q F V P Q F I S Q L Q 1657 AACGAGTTCTACCTGGACCAGGTGGCGCTGAGCTGGGCCTACGGCGGCCTGCACTTCTCTGACCAGGTGGAG N E F Y L D Q V A L S W A Y G G L H F S D Q V E GTGTTCAGCCCACCGGGCAGCGACCGGGCCTCCTTCATCAAGAACCTGCAGGGCATCAGCTCCTTCCGCCGC V F S P P G S D R A S F I K N L Q G I S S F R R 1801 GGCACCTTCACCGACTGCGCGCTGGCCAACATGACGGAGCAGATCCGGCAGGCAGGACCGCAGCAAGGGCACCGTC G T F T D C A L A N M T E Q I R Q D R S K G T V CACTTCGCCGTGGTCATCACCGACGGCCACGTCACCGGCAGCCCCTGCGGCATCAAGCTGCAGGCCGAGCGG H F A V V I T D G H V T G S P C G I K L Q A E R

1945

exon

GCCCGCGAGGAGGGCATCCGGCTCTTCGCCGTGGCCCCCAACCAGAACCTGAAGGAGCAGGGCCTGCGGGAC

A

R

E

E

G

I

R

L

F

A

V

A

P

N

Q

N

L

K

E

Q

G

L

R

D

ATCGCCAGCACGCCGCACGAGCTCTACCGCAACGACTACGCCGGCAACCATGCTGCCCGACTCCACCGAGATC~C I A S T P H E L Y R N D Y A T M L P D S T E I N

2089 CAGGACACCATCAACCGCATCATCAAGGTCATGgtgagccgtgggcgggagcaccgtccacgcgccaggggt Q D T I N R I I K V M ggccacggtgggctgtccacccactccgggcctcactttacccctctgtgagtgcggaggccgaaggaggaa

2233 gctccgggcagggcctgggccactcaggtgtccctccatccccacccagactcgaggcacacggctaaccag catgtctgtcttttctgcagAAACACGAAGCCTACGGAGAGgtgagtggcgcttcccttcctgccagtgctg

exon 4

K H E A Y G E 2317 gccggcagctgacccagcagagatgaccgcgccaggctgccgactcctggcgcctccaggctggaacagatg agaggagagggagtcacctgtcacctgttggaccgtaggccttggagtctggagcaagggctcccagccaaa

2521 gctaggctgtttagatcccgtgagggtcagcgttagggtcacccacagagcacgtgcttacaaggagaggtc

........

gagggtctggcctccgggcaggtgggatccatccaccctggact 0 . 8 Kb.............. 3465 ggcttggtttttaaacttgcctagacacctgaccgagagccaaatctcttggctgtccctgatggggcagag cctcacagcaccccattctcacagctccctcacgcccgcccaggttctcagggcatttcagcatctccttgg 3609 cccctgctgagagtcgtgggctacacgttctgagaccctgccctgccacctgaggaatgtcccacccatgca accttctgtctctgcttcctcgtttcagTGCTACAAGGTGAGCTGCCTGG~TCCCTGGGCCCTCTGGGCC

C

Y

K

V

S

C

L

E

I

P

TABLE I Splice junction sites in the 5'-untranslatedregion and in the amino-globular domain of the human oi2fvI)collazen gene Exon no.

SDlice acceutor sites

Exon size bp

Start sites Start sites Start sites Start sites ccacccctgttgcagGACT . . . ggtcctgtgccccagAGAA ... CtgtcttttctgcagAAAC . . . mttcctc&.ttcaflGCT . . . Internal splice donor sites in exon 1A (Fig. 5). * Splice donor site within an Alu repeat sequence (Fig. 5). Indicates thepresence of split codon. 1 1A 1A 1A 2 3 4 5

41-63 115, 117 305,307 395,397 142 596 21 66

Sulice donor sites

. . . CCAGgtgagcgc . . . GCAGgtgtgcgg" . . . TCTGgtgattat" . . . TGAGgtcaaggab . . . CCAG'gtgccagg . . . CATGgtgagccg . . . AGAGgtgagtgg , . . GAAG gtaagatg

G

P

S

G

P

~

~

The 5' End of the Human cuB(VI) Collagen Gene

6191

A -

M 1 2 3 4

5 OllFl,S

AS7 I(PS C I I I P S C G C C C ( : ( . I ; I . C . i . C r ; [ ; ~ ~ ~ ~ ' s c ~ ~ ~ * ~ ~ G ( ; * G c C T C C T c G G G * C C * G

>

__c

-

G A C ' : . i . l . A G ^ G ! . C ; . C A ' ; ~ ; ? ~ ; C ~ G C c M G ; ~ T 51Cl

354 +'

olll?os

ASS lipsCIEPLTGGiu~s,-r-tiC !'GG(;C!~'!'~;G~:'C?CCAC~GCTGGCMCCGA~CGGATCGGCCCTC?GT

e

-

GG~CCCGC~CC;~CT?CACGGCC*C,,GGTGCTGCCMCI~GCT Me1

AS5a

220

~G,,,,~G~;~?C[;~CCC'.~~GTGG;~GCCGC*GG~GT~CGGGCGhGCGGCGCCC*~C~~ GCi'G_GCCirGCRGhAC!::~C,;~:I;~~C(~~~C~;~'~~,~.,~~;~!~~;~CCC~GGAG;".~GGG,~CC?TCCC~T?GGGGi'CGG,V~'~~ Ch:;,i.~!;G,CSGrcC. >...,.. I . r...".,.l(...^.i.^r . . . . nC:;AT~~CI.CA(;C?CTGGCTTGGh(;GCCC~~_~';

..",. ..

,.(

-

.

4

;C':CCAG

ChCT?CGAC~CTGGAC?l'(:h~;~;:~CC;hl."C,C':':;CC':

-

END OF THE CDNA

.SCA:CXCT

Me1

AS5b " O?C'.CCA'TGCTtiGCiri . '~Mr(;Sii':'";GCCC?CT~~GG~*~CCGC/,GGTGTGCGGGCG,~GCGGCGCCC;~TCCG~

CCTGTGCCA~CAG,l~~C~'~:'l.;':~;:'C~.'~~~SCC'."i:~;~~nCCCCCCTCGAGAnGCGnC~~ CAGt~V,~GAGCGG::C?GC~,~~C~.~:C~~;;,.[;C~~C~TG':'TC'~~GGG,,~A;~CAC;~GC?CTGGCT~G~;,GGCCCCC:_~ ChCT?CGAC~CTGI;TGA'?'?~,~??CA~.~;;,;,AG^CC;,~ACCGGGCACGGTGC?C,,CGCC?G~MTCC MC,1C?TCG 4

G G ; ~ C G C C G h C C C G G G C R ( ; i ~ ' ~ ~ ; ~ ~ ~ ? : : ; ~ : ~ ~ ; , \ C ? ~ C , ~ G G G C C A C A G G T G C TIC:CCAGG.;CACCXC7 GCC,~GA':

-

>IC,

B ATG

84 bp protected fragment

AS7

h*

1A

_ ~ _

2 AS5

~~

~

1A

~~

2

m

ASSa

1A 2 .... .............. m.. " AS5b ~~

~~

FIG. 4. Nucleotide sequence and schematic representation of the heterogeneous 5' cDNA clones for the human &(VI) collagen. A , two cDNA clones obtained by anchored PCR, AS5 and "84 79 + 75 +

FIG. 3. S1 nuclease analysis of the 5' end of exon 2. The probe was a 0.5-kb PstI-BglI genomic fragment as depicted in the schematic diagram. The probe was 5' 32P-labeledand annealed with poly(A)+ RNA from human skin fibroblasts (3349). Lane 1, probe alone; lane 2, probe plus SI nuclease; lanes 3 and 4, 1 and 3 pg of poly(A)+ RNA, respectively, with probe and S1 nuclease. The probe (509 bp) and the protected fragment (84 bp) are indicated. Lane M, DNA size markers; the sizes in bp are indicated on the left. The samples were run on a 6% polyacrylamide gel, and autoradiography was performed overnight a t -70 "C.

5' end of exon 2 could not be determined by comparison of the cDNA and genomic sequences because this cDNA is not a full-length clone. Exon 3 (596 bp) and exon 4 (21 bp) code for amino acid residues 19-217 and 218-225, respectively. Exon 5 (66 bp) contains 30 bp, which codes for the last 10 amino acid residues of the amino-globular domain and 36 bp for thebeginning of the triple-helix. As shown in TableI, the sequences of intron/exon junctions conform well to the consensus sequences of eukaryotic splice junctions (27,28). Identification of the 5' End of Exon 2-To define the 5' end of exon 2, the S1 nuclease protection assay was performed using a 0.5-kb PstI-BglI genomic fragment that contained exon 2 (Fig. 3). The fragment was 5' end labeled with 32Pat the BglI site, which is located in exon 2 corresponding to 78 bp from the 5' end of the cDNA F225. This probe protected a band of 84 bp. Inspection of the genomic sequence revealed a splice acceptor site that coincided with the S1 nuclease cleavage site (Fig. 2). Primer extension experiments using an

AS7, and thetwo longer species of AS5 cDNAisolated by PCR, AS5a and AS5b, are shown. Underlined sequences are previously known common cDNA; double underlined are new sequences common for AS7 and AS5 species. Arrows show the nucleotide sequences and the orientation of the primers. Oligonucleotides EPSC/EPS aredescribed under "Materials and Methods." text. B, schematic representation of exons 1, lA, and2 for the four different kinds of cDNAs, which differ in their 5"untranslated regions.

oligonucleotide in this exon yielded severalbands, whose lengths mapped the transcription start site approximately 4060 bp upstream of the S1 protected site (data not shown). The S1 and primer extension analyses thereforesuggested the presence of exon(s) upstreamof exon 2. Cloning of 5' cDNA Extensions-To locate the first exon the anchored PCR method was used to extend the5' end of the cDNA. The amplified cDNA fragments were cloned, and nine independent clones (insert size, 80-100 bp) were characterized by DNA sequencing. All clones contained the 3' PCR primer e (Fig. 4), and the sequence upstream of the primer matchedthe genomic sequence until the putative splice acceptor site identified by the S1 nuclease assay. Thereafter the sequences diverged from the genomic DNA. As shown in Figs. 4 and 5, mostclones containedthesame sequence upstream of the splice acceptor site although the length of these clones differed by 5-10 bp. The longest clone, AS7, possessed 45 bp of sequence upstream of the splice junction (Fig. 4A).However, a single clone (AS5) possessed a different upstream sequence. Analysis of the 5' cDNA clones established that exon 2 is 142 bp long, which includes 27 bp of 5'noncoding sequence. Identification of Exon 1 and Exon 1A-The AS7 cDNAand AS5 cDNA-specific oligonucleotides were found to hybridize to a common 2.4-kb XhoI-Hind111 fragment approxi-

6192

The 5' End of the Human cr2(VI) Collagen Gene 10

I

20

I

40

30

I

I

50

T"

60

I

I

80

I

-744 agggtggggagtggggaccccagacagagccctaccagggacccctgtcactctgtccccggctgggctcaggtggggacctcacg -659 gtggtcccagggcccagcaccgaagcccacctgtggtttccagcgggaaaggggtggcaggggtggctggccgcatgcccaggctc

Pat1 -574 tgccCcaacctccgcgcccaggctctgctgtccctgccctcccggctccccaccctcaggccccaggagcagcagtttctgcagga -489 gctcctgacccggggcctctcgcgggaggcctgagcaagcgggacacaggacacggggtaggggaggggtg-tgatgggg -404 ggaaccctgcaccccccaggcagctgctaccaaggggcgagtcccagggcccccgtcggccctgcgtgcggggcgcggtccccaa

- -

-319 cacccagggccccggaggcgacacagccccagccaggtcgtccgggaaatggggcgggggcgacgggcggcggggcccgggacgcg """"""

PStI

-234 aagtccgagcagcagcqggcaggggctqgcgggggagctcggcccgggctgcaggggggtccccaccctctccacctcctcctgcc

FIG.5. Nucleotide sequence of the promoterand6'-untranslatedregions of the human a2(VI) collagen gene. Two major transcriptional start sites of exon 1 are indicated by hollow arrows, and weak initiation sites are indicated by solid arrows. The most5' start siteis labeled +l. Exon sequences are depicted by uppercase letters.The TATA box at -25 is shown. Underlined sequences represent potential Spl binding sites. PstI and XhoI are restriction sites. Oligonucleotides used in the primer extension, PCR, and first-strand cDNA synthesis (a-f), similarities with CAAT sequences (double-dashed lines), and AP2 binding sites (dashed lines) are indicated. AnAlu repetitive sequence is represented between the brackets. The internal splice donor site6 of exon 1A are indicated by soliddots. Note thatthe Alu element overlaps with exon 1A. Methionine encoded by the ATG codon within exon 2 is indicated above the codon.

-

XhoI -149 tcccgccctcgagggtccccgcttccc~~~~~~~ccccctcccgtgcccccggccccctcctc=~~~~~ccgcggggccgcagcgcttc -25 +ll t t 1 fr t -64 C t g g c g g c g g g g c g g g t c a g g c c g g c g g ~ g ~ g g ~ c g c c g g c c g c g g t t c c C T C C C T G C T G C T T C T C G G C G ~ exon 1 22

*+

CCGCGCCTCGGGCCGTCGGGAGCGGAGCCTCCTCGGGACCAGgtqagcgcctcccggaccccgcacctggaagccgctcggcccgc

4 a b 107 gggggtgaccccgagtcctgggaag9c99cggcggcggcggctccgtccctcgggcccccgggaagggggactccag~~~~~-~~~acgg

192 cggggggctcggcgggttcggggctcctcctcgcggggctqgggccgcqcct~cccctgtqgctccgcgtctctqggtccgaccct 217 cgggcgcgcgacttggggccacctccccgcqg~ctcctctggcgcggagcggcctggtcggggtggggggggtccctgtctgcgcc 362 cgagctcggtgctgggacccccgctcccgagacgaccccggcaccgcacgccccgccaggccccgcgtctgcgagcggttcgggtc

-

447 cggctccggccccgcggggaagacgccccggctqgctgggacctccgggggcgcagqgcctctccccgggccggacggaaggggcg

-

-

617 cgcccctcccggcctggagcccaccaggqccccgccaggcccaggagaagcPPgtc~gacggaggcggctccccagggcggcgggacc

t t

exon

1A

7 0 2 CgggCtgacagcgacccgCAGCCCTGCCGGG~ACAC~CTGGGAcTCCGcCGGGGCGCTGGTGG~ccGCTGGGcCTG "

e

787 GGTCTCCACTGCTGGCAACCGMCGGATCGGATCGGCCCTCTGTGGAGCCGCAGGTGTGCGGGCGAGCGGCGCCCATCCGGGCTGTGCCAG C

8 7 2 CAGAACCCCGGTGCCCGCGCCTAGGACGCCCCTGGAGAAGGGACCTTCCCTTTGGGGTCGGAACCCAG~GGAGGGGCCTGCGAT

0

957 CCGCGGAGCTCCTTGTTCTTGGGATAACACAGCTCTGGCTTGGAGGCCCCCTTGCACTTCGACTCTGGTGATTTATTC~G~GG 1042 C~AGACCGGGCACGGTGCTCACGCCTGTAATCCCAACACTTGGGGAGGCCGAGGCGGGCAGATCACCTGAGgtcaaggagtcgaga 1127 ccagcctagcacagggtgaaagccgtctctctactaaaatacaaaaaaaattagccgggcgtggtggcagcacctgtaatcccagc 1212 taatcgggaggctgaggcaggagaaatcacttgaacctgggagqcgqaggttgcagtgagctgagatcgcgccactgactccagcc

1297 tgggrgagggagcgagactgtctcaaaaaaaaaaaaaaaaaaaaaaaaaaggaaagga~ggcccggtgagatgctttctcttaaac 1382 acggccctgcacgttgagttgctgcctcctgtggcctatttcacgtttatgcaaagtcgggcgcctgatgcggggctcacccgcca

1467

caagcagggggtcctg . . . . . . 12.Kb . . . . . .tgccagqggagaggcactggggqtgtctgagcgacccccacccctgttgcagG Met exon 2

13552 ACTTCAGGGCCACAGGTGCTGCCRAGATGCTCCAGGGCACCTGCTCCGTGCTCCTGCTCTGGGG~TCCTGGGGGCCATCCAGGCC " . e 13637 CAGCAGCAGGAGGTCATCTCGCCGGACh~TACCGAGAGACTGCCCAGgtgccagggtcgggccggggctctgggcatt

mately 12 kb upstream of exon 2 (Fig. 1).The DNA sequence of 1.5 kb of this genomic region was determined (Fig. 5). The results confirm that this region contains both AS7 and AS5 sequences and that thetwo sequences are 650 bp apart (Fig. 5). In addition, consensus splice donor sites are found at the junctions at which the sequences of the cDNA and genomic DNA diverged. Collectively, these data indicate that exon 2 can be spliced into two different upstream regions, exon 1 and exon lA, generating the AS7 and AS5 cDNAs, respectively. Identification of Three Splice Donor Sites in Exon 1A"The existence of the AS5 and AS1 mRNAs was verified further by PCR amplification of total RNA from fibroblast 3349 (Fig. 6A). Primers from exon 1 and 2 yielded an 89-bp product as predicted from amplification of the AS7 mRNA.Primers from exon 1 A and 2 produced a major band a t 94 bp as expected for the AS5 mRNA as well as a 284-bp product. Cloning of the latter two PCR products, however, generated three different kinds of clones with insert sizes of 89, 284, and 374 bp, respectively. The longest product was not visible on the agarose gel shown in Fig. 6A but appeared in the reamplification of the PCR products (data not shown). The sequence of the shortest cDNA was identical to that of the AS5 cDNA

(Fig. 4A).The 284-bp product, designated AS5a, contained a 195-bp insertion whose sequence was colinear with the genomic sequence downstream of exon 1A whereas the longest clone, designated AS5b, had an additional 90 bp of genomic sequence identified further downstream (Figs. 4A and 5). Interestingly, an Alu repetitive sequence is found in this region of the gene, and thelongest clone (ASW) included part of the Alu sequence. Thus, the datafrom PCR cloning indicate that there are two other 5' splice donor sites in exon 1A downstream from the one identified by anchored PCR cloning, and these three donor sites can be alternatively used in normal skin fibroblasts. Primer Extension and Sl Nuclease Analyses of Exon 1Nuclease S1 analysis of fibroblast mRNA using a 198-bp probe from exon 1 yielded two major protected fragments of 41 and 45 bp (Fig. 7). Longer exposure of the film revealed six larger fragments of lower intensity(data not shown). Protected fragmentsof a similar size wereobtained with total RNA from Saos-2 osteosarcoma cells except that theintensity of an approximately 63-bp doublet was much higher than that observed in fibroblasts. Primer extension with an oligonucleotide from exon 1yielded two major extension products of 41

6193

Y 23

5'

1

1A

3'

I72 bD

-63

FIG.6 . PCR analysis of the 5"untranslated region of the human a2(VI)collagen mRNA. PCR was performed with specific primers after first-strandcDNA synthesis using mRNAisolated from human skin fibroblast 1520 and oligonucleotide f (Fig. 5). The prod43 5 ucts were separated on a 4% agarose gel ( A ) . Lane I shows PCR products of 94 and 284 bp specific forexons 1A and 2 by using primers 4 4 1 c and e as depicted in the schematic diagram in B. Lane 2 shows a PCR product of 89 bp specific for exons 1 and 2 by using primers a and e ( B ) . Lane M representsDNA size markers. B, schematic FIG.7. Analysis of exon 1 by S1 nuclease protection and diagram showing the primers for the PCR and the 198- and 172-bp primer extension. The probe for S1 analysis was a 198-bp singlesingle-stranded DNA probesused in theS1 nuclease analyses inFigs. stranded DNA generated by DNA polymerase I (Klenow fragment) 7 and 8. using oligonucleotide b and XhoI restriction site (Fig. 6R). Lane M represents a 1-kb DNA size ladder.Lane I contains probe alone. Lane 2 contains probe plus S1 nuclease. Lanes 3 and 4 contain probe, S1 and 43 bp and several minor products ranging insize from 45 nuclease, and 1 and 3 pg, respectively, of fibroblast 3349 poly(A)+ t o 63 bp. The sizes of the primer extension products agreed RNA. Lane 5 contains probe, S1 nuclease, and 10 pg of total RNA of with those of the S1 nuclease analyses although therelative Saos-2 cells. Lane 6 represents primer extension analysis using1 pg intensity of the bands in these two experiments differed. of fibroblast 3349 mRNAand oligonucleotideb (Fig. 6B). DNA were runon a 6% polyacrylamide sequencing gel, and Comparison of the S1 protected band sizes with the sequenc- fragments autoradiography was performed overnight a t -70 "C.

ing ladder of the S1 probe allowed the assignment of the

6194

The 5’ End of the Human a2(VI) Collagen Gene

a short cysteine-rich segment connected to the triple-helical domain. The exons coding for this region correspond well with the protein subdomains. Specifically, the signal peptide is encoded by exon 2; the entire N1 domain is encoded by a single exon of 596 bp (exon 3); and the short connecting 1724 segment is encoded by exon 4 and the first half of exon 5. Similar to the organization of several other collagen genes, there is a junction exon (exon 5) that encodes the transition of the noncollagenous domain and the collagenous domain. This arrangement is in sharp contrastto thatof the 3’ end of the gene where an intron separates the regions coding for the triple-helical domain and thecarboxyl-globular domain. Comparison with the chicken a2(VI) gene indicates that theexons coding for the amino-globular domain are strictly conserved between the human and the chicken (18-22). The exon structures of the 200-residue repeats of von Willebrand factor, cartilage matrix protein, integrin receptor P150,95, and complement factor B have been reported (32-35). These studies demonstrate that each repeat can be encoded by 1-5 exons. We have shown previously that two repeats (C1 and C2) in the carboxyl-globular domain of the a2(VI) chain are encoded by either one or two exons. More recently we showed that each of the eight consecutive repeatsin the amino-globular domain of the a3(VI) chain is each encoded by a single exon (25). Interestingly, the boundaries of these repeats arealways delineated by introns although the number of exons coding for each repeat varies. These observations suggest that the primordial gene for these repeats has no introns and that introns are acquired after the primordial gene has been duplicated and shuffled to separate genomic locations. FIG. 8. S1 nuclease analysis of exon 1A. The probe was a 172Analysis of the extreme 5‘ region of the gene presented bp single-stranded DNA synthesized by DNA polymerase I (Klenow fragment) using oligonucleotide d and PstI restriction site (Fig. 6 B ) . here reveals that there is little sequence homology between I h z e M represents DNA size markers. Lune 1 contains probe alone. the promoters of the human and chicken a2(VI) collagen Lane 2 contains probe plus S1 nuclease. Lanes 3 and 4 contain 2 and genes (22). The human gene contains a canonical TATA box 10 pg, respectively, of fibroblast 3349 poly(A)+ RNA with probe and and two possible CAAT boxeswhereas the chicken gene lacks S1 nuclease. Protected fragments were run on a 6% polyacrylamide sequencing gel, and autoradiographywasperformed overnight at both elements. A TATA box is thought to specify the precise position of transcription initiation (36), andthus genes with-70 “C. out a TATA box often start transcription at multiple sites. unlikely that the promoter upstream of exon .1 transcribes Our primer extension andS1 nuclease analyses identified two both AS5 and AS7 mRNAs. Further inspection of the DNA major start sites of the human gene at 43 and 47 bp downsequence upstream of the exon 1A revealed five GGGCGG stream of the TATA element. However, we also found that sequences and two additional GGGCGG sequences found at six additional weak start sitesin a segment of 30 bp surround the beginning of exon 1A. Nevertheless, canonical TATA and the major start sites. Because the TATA box is flanked by CAAT sequences were not found, nor was a splice acceptor two strong potential SP1 binding sites (Fig. 5), it is conceivsite present at the 5’ end of exon 1A. These data strongly able that steric hindrance may prevent simultaneous binding suggest that another promoter is present upstream of exon of transcription factors to the closely spaced TATA and SP1 binding elements. Therefore, initiation from additional weak 1A. sites may be a consequence of transcriptionina TATADISCUSSION independent mode. In this regard it is of interest to note that The studies described in this report, in conjunction with both TATA-dependent and -independentmodes of transcripour previous structural analyses of the 3’ portion of the gene tion are utilized by the mouse metallothionein gene (37). On (18, 19), provide a complete characterization of the human the otherhand, the chicken a2(VI) collagen gene, whichlacks a2(VI) collagen gene. Together, we have isolated and char- a TATA box, initiates transcription at multiple sites spread acterized three overlapping cosmid clones, B10, 7a, and D l over a larger region of 60 bp (22). As in the chicken gene, (Fig. 9), spanning80 kb of genomic DNA.Structural mapping multiple CpG sequences are found in the promoter region of indicates that the entire a2(VI) gene consists of 30 exons the human gene. The CpG sequences are present at the 5’ spanning 36 kb of DNA. Two of the exons, 1A and 28A, are end of all housekeeping genes as well as many tissue-specific alternatively utilized to produce multiple mRNAs that differ genes (38). Furthermore, a large intron (12.5 kb) is found in in the 5’- and the 3”untranslated regions, as well as in a the 5”untranslated region of the human gene. The translation segment coding for the carboxyl-globular domain (18). Re- start sites of most other collagen genes are usually present in cently, physical mapping using pulsed field gel electrophoresis the first exon. Most strikingly, the human a2(VI) gene produces at least suggested that the a2(VI)collagen gene is located within 700 kb from the telomere of the long arm of chromosome 21 (31). four mature mRNAs that differ in the sequence of the 5‘The amino terminus of the a2(VI)chain consists of a signal untranslated region. The sequence divergence begins at 27 bp peptide, a 200-amino acid residue repetitive motif (Nl), and upstream of the ATG codon. The most abundant mRNA

The 5’ End of the Human a2(VI) Collagen Gene

3 Kb

Cos B10 (35Kb)

.

Cos D l (33 Kb)

/

/

, /

/ / /

-1

5’ 1

6195

1A

TH

5’UT

2

3 4

5 6 8 10 12

14

I

.... C

16 1828A 2627 2024 23

3

28

27 2 8 2 8 A

FIG. 9. Schematic representation of the human a2(VI)collagen gene. The B10,7a, and D l overlapping genomic cosmid clones are indicated (top). The exons that comprise the 5’-untranslated region (5‘ U T ) ,aminoand carboxyl-globular domain ( C ) are depicted between the globular domain ( N ) , triple-helical domain (TH), arrows (middle).Alternative splicing at the 5’ end and at the 3’ end of the gene is shown (bottom).

species AS7 is transcribed from exon 1 whereas three low abundance mRNAs, AS5, AS5a, and ASSb, utilize the alternate exon 1A that contains three splice donor sites. It is of interest to note that theAS5b mRNA contains a donor splice site within an Alu repetitive sequence. To our knowledge this is the first demonstration of such an element being included in anexon. S1 nuclease analysis indicates that exon 1A starts at 720 (Figs. 5 and 8). The three minor RNA species could initiate transcription either from exon 1A or from exon 1 as does the AS7 RNA although by alternative splicing into exon 1A. If the latter is the case, mRNAs that share both exon 1 and 1A sequences should exist. The fact that anchored PCR cloning and PCR amplification using primers for exon 1 and 1A fail to detect such mRNA species seems to favor the idea that exon 1A contains its own promoter. In addition, there are seven SP1 binding sequences clustered in a region surrounding exon lA, suggesting that this region could contain a second promoter. Further functional assays are necessary to delineate the precise mechanism by which these heterogeneous 5‘ transcripts are generated. Alternative processing of RNA is a common mechanism used by eukaryotic cells to generate multiple transcripts from a single gene (for a review, see Ref.39). In most of the cases, the expression of these multiple transcripts is regulated in a tissue-specific or temporal manner. There are a t least two examples of a collagen gene using alternative promoters in modulating its tissue-specific expression. The al(1X)collagen gene initiatestranscriptionin the cornea at asite 20 kb downstream from the one used in the cartilage (40). As a result the cornea mRNA encodes an al(1X) chain that lacks a significant portion of the amino-terminal globular domain. Similarly, a chondrocyte-specific promoter is found in the second intron of the chick al(1)collagen gene(41). Transcription from this al(1) promoter generates an mRNA with a different 5’ end. This mRNA encodes short polypeptides unrelated to the al(1) collagen. Consequently, the synthesis of the al(1)chain is turned off in chondrocytes. The &(VI) collagen gene described in thisreportrepresentsanother example of a collagen gene that utilizes alternative processing to produce mRNAs with different 5’ ends. In this case, alternative processing does not alter theprotein coding region.

There are several examples of genes that produce multiple mRNA transcripts that differ only in the 5”untranslated sequences. The process involves different mechanisms. For example, the genes for a-amylase (42), insulin-like growth factors I and I1 (43-45), actin 5C (46), and aldolase A (47) transcribe from multiple promoters whereas the gene for hydroxymethylglutaryl-CoA reductase uses a combination of multiple transcription initiation sites and multiple 5’ splice donor sites of an intron (48). It should be noted that all of these genes contain an intron before the translation start codon, therefore alternative processing of the 5’ exons does not affect the coding sequence. Previous studies have suggested that structural elements such as hairpin loops and open reading frames in the 5”untranslated region affect the translation efficiency of the mRNA (49, 50). Indeed, it has been shown that tissue-specific utilization of the transcriptional start sites for the complement protein C2 produces multiple mRNAs with differential translation efficiencies (51). The gene for insulin-like growth factor I1 transcribes two mRNAs with different 5”untranslated regions. One species is found exclusively on the membrane-bound polysomes whereas the other is present in cytoplasmic particles and not directly engaged in protein synthesis (52). The 5”untranslated regions of all four a2(VI) collagen mRNA species lack an AUG codon upstream of the authentic translation start site, and therefore the protein products of these four mRNAs remain the same. However, because all four 5”untranslated regions are GC rich, computer analyses predict that stable hairpinstructurescan be formed. This suggests that the translation efficiency or the stability of the four mRNAs may be different. We have shown previously that the3‘ end of the aZ(V1) geneundergoes alternative splicing. This process gives rise to three transcripts that differ in the carboxyl-terminal coding sequence and the 3”untranslated region. It is now interesting to consider the newly identified heterogeneity in the 5’ end in the context of what is known of the diversity in the 3’ end. For example, it is formally possible that thechoice of the 5’ and the3‘ exons are intimately linked. Even though significant changes in the expression of these multiple &(VI) collagen mRNAs have not yet been detected in different tissues and cell lines we cannot exclude the possibility that

6196

The 5' End of the Human a2(VI) Collagen Gene

the minor mRNA species are preferentially utilized in some restricted tissues orduring development.Further work is required to understand the biological significanceof the alternative processing. The isolation and characterization of the entire gene represent an important step toward elucidating the mechanismsinvolvedin the regulation of the a2(VI) collagen gene. Acknowledgments-We thank Loretta Renkart for her excellent technical assistance and Dr. George Dodge and Sulagna Chakraborty for their critical reading of this manuscript.

REFERENCES 1. Timpl, R., and Engel, J. (1987) in StructureandFunction of Collagen Types (Mayne, R., and Burgeson, R. E., eds) pp. 105143, Academic Press, Orlando, FL 2. Bruns, R., Press, W., Engvall, E., Timpl, R., and Gross, J. (1986) J. Cell Biol. 103, 394-404 3. Keene, D. R., Engvall, E., and Glanville, R. W. (1988) J. Cell Biol. 107, 1995-2006 4. Wayner, E.A., and Carter, W. G. (1987) J. Cell Biol. 105, 18731884 5. Aumailley, M., Mann, K., von der Mark, H., and Timpl, R. (1989) Exp. Cell Res. 181,463-474 6. Bonaldo, P., Russo, V., Bucciotti, F., Doliana, R., and Colombatti, A. (1990) Biochemistry 29,1245-1254 7. Otte, A. P., Roy, D., Siemerink, M., Koster, C. H., Hochstenbach, F., Timmermans, A., and Durston, A. (1990) J. Cell Biol. 1 1 1 , 271-278 8. Trueb, B., and Winterhalter, K.H. (1986) EMBO J. 5, 28152819 9. Colombatti, A., Bonaldo, P., Ainger,K., Bressan, G.M., and Volpin, D. (1987) J. Biol. Chem. 262, 14454-14460 10. Chu, M.-L., Conway, D., Pan, T., Baldwin, C., Mann, K., Deutzmann, R., and Timpl, R. (1988) J. Biol. Chem. 263, 1860118606 11. Chu, M.-L., Pan, T.-C., Conway, D., Kuo, H. J., Glanville, R. W., Timpl, R., Mann, K., and Deutzmann, R. (1989) EMBO J. 8, 1939-1946 12. Chu, M.-L., Zhang, R.-Z., Pan, T.-C., Stokes, D.,Conway,D., Kuo, H.-J., Glanville, R., Mayer, U., Mann, K., Deutzmann, R., and Timpl, R. (1990) EMBO J. 9,385-393 13. Trueb, B., Schaeren-Wiemers, N., Schreier, T., andWinterhalter, K. H. (1989) J. Biol. Chem. 264,136-140 14. Koller, E., Winterhalter, K. H., and Trueb, B. (1989) EMBO J. 8,1073-1077 15. Bonaldo, P., Russo, V., Bucciotti, F., Bressan, G. M., and Colombatti, A. (1989) J. Biol. Chem. 264,5575-5580 16. Bonaldo, P., and Colombatti, A. (1989) J. Biol. Chem. 264, 20235-20239 17. Weil, D., Mattei, M.-G., Passage, E., Nguyen, V.C., PribulaConway, D., Mann, K., Deutzmann, R., Timpl, R., and Chu, M.-L. (1988) Am. J. Hum. Genet. 42,435-445 18. Saitta, B., Stokes, D. G., Vissing, H., Timpl, R., and Chu, M.-L. (1990) J. Biol. Chem. 2 6 5 , 6473-6480 19. Saitta, B., Wang, Y.-M., Renkart, L., Zhang, R.-Z., Pan, T.-E., Timpl, R. and Chu, M.-L. (1991) Genomics 11,145-153 20. Hayman, A. R., Koppel, J., Winterhalter, K. H., and Trueb, B.

(1990) J . Biol. C k m . 265,9864-9868 21. Hayman, A. R., Koppel, J., and Trueb, B. (1991) Eur. J. Biochern. 197,177-184 22. Koller, E., Hayman, A. R., and Trueb, B. (1991) Nucleic Acids Res. 19,485-491 23. Samhrook, J., Fritsch, E. F., and Maniatis, T. (1989) Molecular Cloning: A Laboratory Manual, pp. 1.38-1.39, 7.79-7.83, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 24. Sanger, F., Nicklen, S., and Coulson, A. (1977) Proc. Natl. Acad. Sci. U. S. A. 74,5463-5467 25. Stokes, D.G., Saitta, B., Timpl, R., and Chu, M.-L. (1991) J. Biol. Chem. 266,8626-8633 26. Loh, E. Y., Elliott, J. F., Cwirla, S., Lanier, L. L., and Davis, M M. (1989) Science 243,217-220 21. Mount, S. M. (1982) Nucleic Acids Res. 10,59-72 28. Shapiro, M. B., and Senapathy, P. (1987) Nucleic Acids Res. 1 5 , 7155-7174 29. Dynan, W. S., and Tjian, R. (1983) Cell 3 5 , 79-87 30. Mitchell, P. J., and Tjian, R. (1989) Science 245,371-378 31. Burmeister, M., Kim, S., Price, R., de Lange, T., Tantravahi, V., Myers, R. M., and Cox, D. R. (1991) Genomics 9, 19-31 32. Mancuso, D. J., Tuley, E. A., Westfield, L. A., Worrall, N. K., Shelton-holes, B. B., Sorace, J. M., Alevy, Y. G., and Sadler, J. E. (1989) J. Biol. Chem. 2 6 4 , 19514-19527 33. Kiss, I., Deik, F., Holloway, R. G., Jr., Delius, H., Mebust, K. A., Frimberger, E., Argraves, W. S., Tsonis, P. A., Winterbottom, N., and Goetinck, P. F. (1989) J. Biol. Chem. 264,8126-8134 34. Corbi, A.L., Garcia-Aguilar, J., and Springer, T. A. (1990) J. Biol. Chem. 2 6 5 , 2782-2788 35. Campbell, R. D., and Porter, R. R. (1983) Proc. Natl. Acad. Sci. U. S. A . 80,4464-4468 36. Ghosh, P. K., Lebowitz, P., Frisque, R. J., and Gluzman, Y.(1981) Proc. Natl. Acad. Sci. U. S. A . 7 8 , 100-104 37. Garrity, P. A., and Wold, B. J. (1990) Mol. Cell. Biol. 1 0 , 56465654 38. Gardiner-Garden, M., and Frommer, M. (1987) J. Mol. Biol. 1 9 6 , 261-282 39. Smith. C. W. J.. Patton. J. G.. and Nadal-Ginard (1989) Annu. Reu.'Genet. 23,527-517 40. Nishimura. I.. Muraeaki. Y.. and Olsen. B. R. (1989) . . J. Biol. Chem. 264,' 20033~20041' 41. Bennett, V.D., and Adams, S. L. (1990) J. Biol. Chem. 2 6 5 , 2223-2230 42. Young, R. A,, Hagenbuchle, O., and Schibler, U. (1981) Cell 2 3 , 451-458 43. Roberts, C. T., Lasky, S. R., Lowe, W. L., and LeRoith, D. (1987) Biochem. Biophys. Res. Commun. 146,1154-1159 44. Lowe, W. L., Roberts, C. T., Lasky, S., R., and LeRoith, D. (1987) Proc. Natl. Acad. Sci. U. S. A . 8 4 , 8946-8950 45. Sussenbach, J. S. (1989) Prog. Growth Factor Res. 1,33-48 46. Bond, B. J., and Davison, N. (1986) Mol. Cell. Biol. 6,2080-2088 47. Izzo, P., Costano, P., Lupo, A., Rippa, E., Paolella, G., and Salvatore, F. (1988) Eur. J. Biochem. 174,569-578 48. Reynold, G. A., Goldstein, J. L., and Brown, M. S. (1985) J. Biol. Chem. 260,10369-10377 49. Kozak, M. (1986) Cell 4 4 , 283-292 50. Kozak, M. (1989) Mol. Cell. Biol. 9 , 5134-5142 51. Horiuchi, T., Macon, K. J., Kidd, V. J., and Volanakis, J. E. (1990) J. Biol. Chem. 265,6521-6524 52. Nielsen, F. C., Gammeltoft, S., and Christiansen, J. (1990) J. Biol. Chem. 265,13431-13434 '