The Type X Collagen Gene - The Journal of Biological Chemistry

3 downloads 0 Views 6MB Size Report
Dec 5, 2017 - ing of the type X collagen gene clone YN92 (4) has demon- strated that ...... Acknowledgments-We thank Theresa A. Summers and Gary Bal-.
THEJOURNALOF B I O L O G I C A LCHEMISTRY

Vol. 263,No. 34,Issue of December 5 , pp. 1837618385,1988 Printed in U.S.A.

0 1988 by The American Society for Biochemistry and Molecular Biology, Inc.

The TypeX Collagen Gene INTRONSEQUENCES SPLIT THE 5”UNTRANSLATED REGION AND SEPARATE THE CODING REGIONS FOR THE NON-COLLAGENOUS AMINO-TERMINAL AND TRIPLE-HELICAL DOMAINS* (Received for publication, January 21, 1988, and in revised form, August 4, 1988)

Phyllis LuValleS, Yoshifumi NinomiyaS, Norman D. RosenblumQ, and BjernR. OlsenSll From the $Department of Anatomy and Cellular Biology, Harvard Medical School, and the §Departmentof Anatomy and Medical School, Cellular Biology and the Departmentof Medicine, The Children’s Hospital, Harvard Boston, Massachusetts 021 15

Type X collagen, expressed by hypertrophic chon- collagen expression is restricted to cartilaginous tissues drocytes, consistsof homotrimeric molecules with sub- undergoing endochondral ossification (2, 3). Third, sequencunits that are only about one-halfthe size of the poly- ing of the type X collagen gene clone YN92 (4) has demonpeptides of fibrillar collagens. In this report we destrated that thetriple-helical domain of 460 amino acid resiscribe for the first time complete the primary structure dues and the 170-residue-long non-collagenous carboxyl doof type X collagen, based on cloning and sequencing of main are encoded by one large exon. This is in stark contrast cDNA and genomic DNA. A comparison between the to the genes encoding fibrillar collagens, which contain mulnucleotide sequences of the cDNA and genomic DNA tiple short exons (5, 6). clones has also allowed determination of the complete The amino acid sequence derived from the nucleotide seexon structureof the typeX collagen gene. quence of the type Xcollagen geneshows that thepolypeptide Our results demonstrate that the primary translation product of the chicken type X collagen mRNA is 682 contains a large non-collagenous carboxyl-terminal domain amino acid residues long with a calculated molecular (4). Although a large non-collagenous domain, the carboxyl form. This propeptide, is also present at the carboxyl ends of pro-amass of 67,317 Da for the nonhydroxylated chains infibrillar collagens (7), thedomain of type Xcollagen calculated molecular mass is in excellent agreement with the observed electrophoretic mobility of cell-free is not homologous to that of fibrillar collagens. During biotranslation products with bothpoly(A)+ RNA isolated synthesis of fibrillar collagens the carboxyl propeptide is from chondrocytesas well as RNA transcribed in vitro removed by proteolysis; during synthesis of type X collagen from a full length cDNA construct. It is also in agree- there is no evidence for complete removal of the carboxyl ment with the observed size of type X collagen poly- domain from the triple-helical domain. There is, however, peptides isolated from the media of cultured hyperpublished evidence to support the notion that a limited trophic chondrocytes. Thus, our data exclude the pos- amount of proteolytic processing occurs with type X collagen sibility of a high molecular weight precursor form of (8, 9). Some of this evidence is conflicting (10-12), in part type X collagen. because there is still some confusion about the size of the X gene Our results also confirm that the chicken type intact biosynthetic precursor product (8, 13). has a most unusual exon structure when compared to Type X collagen as isolated from long term cultures of other vertebrate collagen genes. The gene has only hypertrophic chondrocytes contains subunits of 59 kDa (colthree exons. One exon (97 base pairs (bp)),codes for lagen molecular mass standards) (1).Molecules of similar size most of the 5”untranslated region of the mRNA, a have been isolated from sternal chondrocytes grown in collasecond exon (159 bp) codes for the signal peptide and gen gels (14). Analyses of the chicken 59-kDa form by eleca short non-triple-helical domain, while the third exon trophoresis in polyacrylamide gels under reducing and non(2136 bp) contains the coding region for the entire do- reducing conditions show that this form of type X collagen triple-helix and a large non-triple-helical carboxyl does not contain interchaindisulfide bonds. This is somewhat main. surprising since the amino acid sequence derived from the nucleotide sequence of the gene shows the presence of 3 cysteinyl residues in the carboxyl-terminal, non-collagenous Type X collagen is distinctly different from fibrillar colla- domain. We have therefore considered the possibility that the gen types I, 11, 111, V, and XI. First, type X molecules are 59-kDa form of typeX collagen represents a proteolytic homotrimers of subunits that areonly about one-half the size cleavage product from which the most carboxyl-terminal segof the polypeptides of fibrillar collagens (1).Second, type X ment (about 40 residues) has been removed (4). This raises the question of whether the intactprecursor of type Xcollagen * This work wassupported by National Institutes of Health Grants as deduced from the nucleotide sequence of the gene is, in AR36819 and AR07922, an ArthritisFoundation Postdoctoral Fellowship (to P. L.), and an Investigator Award (to Y. N.), and by Fellow- fact, larger than 59 kDa. The published data on the structure ships from the Medical Research Council of Canada and theWolbach of the gene areunfortunately insufficient to answer this Research Fund (to N. D. R.). The costs of publication of this article question, since the transcription start site in the gene has not were defrayed in part by the payment of page charges. This article yet been mapped and the precise translation start has not must therefore be hereby marked “advertisement” in accordance with been defined. Therefore, although the gene contains more 18 U.S.C. Section 1734 solelyto indicate this fact. than 40 amino acid residues in an upstream continuation of The nucleotide sequencefs)reported in thispaperhas been submitted to the GenBankTMfEMBLDataBankwith accession numberfs) the open reading frame that defines the triple-helical domain, it is not certain that these residues are partof the translation 504194. product of type X collagen (4). ll To whom correspondence should be addressed.

18378

The TypeX Collagen Gene

18379

synthesis by employing RNase H (Pharmacia LKB Biotechnology Inc.), DNA polymerase I (Boehringer Mannheim), and Escherichia coli DNA ligase (Pharmacia) as described by Gubler and Hoffman (18).T4 DNA polymerase (Amersham Corp.) was used to remove 3’overhanging ends from the double stranded cDNA prior to methylation and ligation to EcoRI linkers (New England Biolabs). The cDNA thus prepared was ligated into EcoRI-digested, dephosphorylated X g t l O arms (Promega Biotec). Recombinants were packaged using Packagene bacteriophage packaging extract (Promega Biotec) and plated onE. coli strain C600 (19). The resulting cDNA library was screened with a 380-bp nicktranslated DNA fragment from the 5’ end of pYN92E1 (4) which included the nucleotide primer sequence. Duplicate filters were hybridized to the probe in the presence of N-lauroylsarcosine (20). 12,000 plaques were screened in duplicate and 113 positive plaques were identified. About 30 clones were analyzed for insert size, and the longest (280 bp) insert, MRF-1, was subcloned into both pBR322 for storage and M13mp18 for sequencing by the dideoxy termination method (21). S1 Nuclease Protection Analysis and PrimerExtension MappingNuclease S1 protection was used to identify possible splice junctions in pYN92E1, as well as the start of transcription within PL10. For mapping of pYN92E1, poly(A)+ RNA was hybridized to restriction fragments of pYN92E1 which had been end-labeled with polynucleotide kinase (United States Biochemical Corp.) (see Fig. 3). For mapping of PL10, poly(A)+RNA washybridized to a 640-bp HindIII/ Sac1 fragment (see Fig. 1)which had been end-labeled with polynucleotide kinase. Hybridization was carried out at 49 “C in a buffer containing 40 mM PIPES, pH 6.4, 1 mM EDTA, 0.4 M NaCl, and 80% formamide as described (22). S1 nuclease was added to a final concentration of 12 units/pl. The protected fragments were analyzed on 5 and 8%polyacrylamide sequencing gels (22). Primer extension analysis was done as described (23) using two MATERIALS AND METHODS primers from within the 5’ region of pYN92E1. The distance between the primers was such that theirfull length primer extension products Cell Culture of Hypertrophic Chondrocytes and RNA Extractionalong type X mRNA would differ in size by 141 nucleotides. To map Tibiotarsus epiphyseal cartilage was dissected from 12-day-old chick precisely the transcription start site within PL10, a 17-mer oligonuembryos. The tissue was digested with collagenase (Sigma) and tryp- cleotide from within exon 1 corresponding to bases 65-81 in Fig. 6 sin (GIBCO), and thereleased chondrocytes were cultured essentially and representing the 3”mOSt 17 bases of the 640-bp HindIII/SacI as described (1,4). fragment, was used as a primer. Unattached cells were harvested every other day and replated at 5 Screening of Genomic Libraries and Isolation of Genomic d ( X ) X lo5 cells/25-cm flask. Tertiary cultures were analyzed by indirect Collagen Clones-We have previously (4) reported the isolation and immunofluorescence (with monoclonal antibodies provided by Dr. T. characterization of a chicken type X genomic clone, YN92 (Fig. 1). Linsenmayer, Tufts University Medical School) and found to express An 1100-bpEcoRI-PuuII restriction fragment from the 5’ end of this type I1 and type X collagen but not type I collagen. clone was used as a probe to screen a genomic library made with For metabolic labeling, [36S]methionine(5 pCi/ml) was added to chicken genomic DNA cloned in bacteriophage X Charon 30 (24). the culture medium, and the cells were incubated for 24 h in the Filters were screened with the hybridization probe in the presence of presence of 50 pg/ml ascorbate. At the end of the incubation, N- N-lauroylsarcosine (20). Phage purification and recombinant DNA ethylmaleimide (10 mM final concentration),p-aminobenzamidine (1 isolation was done using standard techniques (22). DNA fragments mM final concentration), phenylmethylsulfonyl fluoride (1 mM final of the recombinant phage YN2141 weresubcloned into pBR322, and concentration), and 176 mg/ml ammonium sulfate were added to the one of the subclones was subjected to nucleotide sequence analysis medium. After stirring overnight at 4 “C,the medium was centrifuged using the method of Maxam and Gilbert (25). a t 20,000 X g for 20 min at 4 “C torecover precipitated proteins. The To isolate a genomic fragment covering the transcription start site precipitate was resuspended by stirring overnight at 4 “C in a buffer and 5”upstream promoter elements, chicken genomicDNAwas containing 50 mM Tris-HC1, pH 7.6,200 mM NaCl, 1 mM EDTA, digested with EcoRI and size-fractionated by preparative agarose gel and 0.5% Nonidet P-40. After spinning at 12,000 X g for 2 min at electrophoresis. Fragments of 6-9 kilobases were isolated and cloned 4 “C, the supernate was used for gel electrophoresis, digestion with in XgtlO. About 90,000 plaques of this library were screened with a collagenase, or immunoprecipitation. For isolation of RNA, confluent hypertrophic chondrocyte cultures nick-translated 80-bp HinfIIEcoRI restriction fragment from the 5’ end of the cDNA MRF-1 (see above), and four positive plaques were were washed 3 times with phosphate-buffered saline, collected in identified. All ofthese containedrecombinant phage with 6.5-kilobase RNA extraction buffer (100 mM Tris, pH 7.5, 6 M guanidine hydroinserts, and one of them, PL10, was isolated and characterized. chloride, 10 mM dithiothreitol, and 1%N-lauroylsarkosine), and Construction and Cell-free Translation of cDNA Covering the Entire stored at -80 ‘C until all cells were harvested. RNA was then isolated X Collagen mRNA-To construct a complete essentially by the method of Chirgwin et al. (16).Poly(A)+RNA was Coding Portion of Type cDNA for chicken type X collagen we recombined several restriction isolated by oligo(dT)-cellulose chromatography. fragments isolated from the cDNAs MRF-1 and pYN3116 (4) and cDNA Synthesis, Cloning, and Characterization-cDNA synthesis was performed by primer extension using a 19-base synthetic deox- the genomic clone YN92 (4). From the 5’ to the 3’ end of the final ynucleotide primer (Applied Biosystems Model 380A DNA synthe- construct these fragments are: a 219-bp EcoRI/ApaI fragment from MRF-1; an 842-bpApaI/ApaI fragment from YN92; an 1122-bpApaI/ sizer) with a sequence corresponding to bases 369-387 within the TaqI fragment from YN92; a 241-bp TaqI/PuuII fragment from region coding for the triple-helical domain of the type X collagen genomic subclone pYN92E1 (4). RNA-directed DNA synthesis (first pYN3116. The fragments were ligated and cloned between the EcoRI and HincII sites of the polylinker region of SP65, generating the strand synthesis) was done essentially as described by Kohno et al. recombinant plasmic SPLX. The insert of the plasmid contains the (17), using 1000 units of reverse transcriptase from Moloney murine 5”untranslated region derived from MRF-1 and the3”untranslated leukemia virus (Bethesda Research Laboratories). The mRNA/cDNA hybrid was then used directly for second strand region as well as the poly(A) tail derived from the cDNA pYN3116. RNA was transcribed from linearized SPLX, and theactivity of the resulting mRNA was assayed by cell-free translation using a comThe abbreviations used are: bp, base pairs; PIPES, 1,4-pipera- mercial rabbit reticulocyte lysate (Amersham Corp.). Translation zinediethanesulfonic acid. products, labeled with [%]methionine, were analyzed by polyacryl-

To map the 5‘ region of the gene in detail and determine the starts of translation and transcription, we have cloned and sequenced a primer-extended cDNA that covers the 5’ region of type X collagen mRNA.In addition, we have mapped the 5’ region of the gene by S1 nuclease protection experiments, and we have isolated and characterized two genomic clones that encode the 5‘ region of the gene. We have also compared the nucleotide-derived sequence with the aminoterminal amino acid sequence of type X collagen isolated from hypertrophic cartilage (15). Our data demonstrate that the type X translation product contains a signal peptide of 18 amino acid residues, separated from the triple-helical sequence by 34 amino acid residues of non-triple-helical sequence. The coding region for this non-triple-helical sequence in the gene is split by an intronof about 2000 bp.’ This intron separates the large exon encoding the triple-helical and carboxyl domains from the region that codes for the signal peptide and the major portion (29% amino acid residues) of the amino-terminal, non-triple-helical sequence. In addition, the 5”untranslated region of the type X mRNA is split by a 670-bp-long intron. Finally, we have constructeda cDNA containing the entire translatedregion of the type X mRNA in the vector SP6, and we have transcribed this cDNA using in vitro techniques. When the resulting RNA is translated in a cell-free system, a 60-kDa (collagen molecular mass standards) translation product can be demonstrated by gel electrophoresis.



The TypeX Collagen Gene

18380

amide slab gel electrophoresis in the presence of sodium dodecyl sulfate. In some experiments, the in uitro-generated mRNAwas capped at the5' end prior to translation (26). Immunoprecipitation and Digestion with Bacterial CollugenuseFor immunoprecipitation of [36S]methionine-labeledtype X collagen from chondrocyte culture media, 100 p1 of Immunobeads (Bio-Rad) containing bound rabbit anti-mouse IgGwere incubated at room temperature for 2 h with 800 pl of the monoclonal anti-type X antibody AC9 (provided by Dr. T. Linsenmayer). After washing with phosphate-buffered saline, 2 pl of [3SS]methionine-labeledchondrocyte culture proteins were added to thebeads, and they were incubated at room temperature for 2 h. The beads were washed once with 0.5 M NaCl in phosphate-buffered saline, resuspended in 30 pl of electrophoresis sample buffer, heated in aboiling water bath for 2 min, and electrophoresed. Samples were digested with bacterial collagenase (Advance Biofactures, Form 111) in 0.1 M Tris-HC1, pH 7.6, 10 mMCaC12, by incubation a t 37 "C for 45 min with an enzyme concentration of 54 units/ 20 pl.

a typical hydrophobic signal peptide sequence (Fig. 2). Between the signal peptide and the start of the triple helix at the glycyl codon at nucleotides 200-202 is a non-triple-helical sequence. Thus, the 5'-untranslated region is 43 nucleotides long, excluding the sequence of the EcoRI site of the synthetic linker used to clone the double-stranded cDNA into the bacteriophage vector. Amino-terminal Amino Acid Sequence of Type X CollagenIndependent confirmation of a small portion of the cDNA sequence is provided by a recently published amino-terminal acid sequence of type X collagen purified from the media of chicken chondrocyte cultures (15). This sequence, as well as the sequence of a non-triple-helical peptide purified from a bacterial collagenase digest of type X collagen (15), is in excellent agreement with the sequence predicted from the nucleotide sequence of the cDNA (Fig. 2), with the exception of the residue encoded by the codon at nucleotides 116-118 in thecDNA. This residue is Gln in the protein sequence, but Glu in the DNA-derived sequence. The cDNA sequence is in complete agreement with the genomic sequence (Fig. 5). The comparison of the amino acid sequence with the DNA-derived sequence shows that signal peptidase must cleave between a glycyl and seryl residue during type X collagen biosynthesis (Fig. 2). This leaves 18 amino acid residues in the signal peptide and 34 amino residues in the amino-terminal, nontriple-helical domain. Characterization of the 5' Region of Type X Collagen mRNA by SI Nuclease and PrimerExtension Analyses-The nucleotide sequence of the genomic clone YN92 (4) contains an open reading frame with 50 amino acid residues upstream of the triple-helical sequence of type X collagen. A comparison of the MRF-1 cDNA sequence (Fig. 2) with the sequence pre-

RESULTS

Characterization of cDNA Encoding the 5' Region of Type X CollagenmRNA-By primer extension cloning we have isolated a cDNA, MRF-1, that codes for the 5' region of type X collagen mRNA (Fig. 1).The nucleotide sequence of the 285-bp-long insert of MRF-1 (Fig. 2) clearly demonstrates that the cDNA is specific for type X collagen. The 3' end of the sequence corresponds to the oligonucleotide primer from within the triple-helical coding region of the mRNA, while the 5' end of the sequence contains the 5"untranslated region of the mRNA. Although the open reading frame defined by the triple-helical sequence continues all the way to the5' end of the cDNA insert, it is likely that the translation of the mRNA starts at themethionyl codon defined by nucleotides 44-46. The methionyl residue at this position is followed by

lkb

,

YN92 (16kb)

I

I

E

(Yl lNk2b1) 4 1

IM

2

670

5'

EXON 3 ,

3'

.

159 ..

. . '.. .: .. .. .... ... ... . '... .... v

..

. ..:

. ...

..

. ,: .. ... 1

.

'

mRNA A

-

~YN3116

MRF- 1

460

//

E

E

"

9 7..

+ f

I

I

2000

I

I

2136

E

WW

Type X Collagen

300bp

52 1OOaa FIG. 1. Diagram showing the exon structure of the chicken type X collagen gene (center) and the locations of the exon-intronsplice junctions within the mRNA. The exons are numbered from the 5' to the 3' ends of the gene. The sizes (in nucleotides) of the exons are indicated below each exon, while the sizes of the introns are given above the introns. Above the diagram are indicated the relative locations and sizes (in kilobases ( k b ) )of the genomic DNAclones YN92, YN2141,and PL10. Below the gene diagram and themRNA are indicated the two cDNAs MRF-1 and pYN3116, as well as the amino acid (aa) sequence domains of the type X primary translation product. The triple-helical domain (460 amino acid residues) is represented by a straightline, while the non-triple-helical domains (52 and 170 amino acid residues) are indicated by wauy lines. E, EcoRI site; H,Hind111 site; M,MboI site; S, Sac1 site; open square at 3' end of mRNA, poly(A) tail.

Type 20

10

c

J1

Gene X Collagen

The

30

A

40

.

TTG CAT TGG AGC TCC ACA ACA TTT GAG GAC GGA TTT GGA AAC ATG CAT Met His

60

50

80

70

18381 , 100 bp, YN92

E

'r

+

t

Probe 1

TCG

~ T ACAA ATA TTA CTG CTG CTG TTT TGT CTA AAC ATT G ~ CCAT GGC A G ~ Leu Gln I l e Ser Leu Leu Leu Leu PheCys Leu Asn I l e Val His GlySer

Probe 2

130 110

120

140

A A

7

P f

100

90

P

*

b

150

TTT

CCA

GAT GGA TAC TCT GAG CGA TAC CAG AAA CAG TCC AGC ATC AAG GGG Asp GlyTyr Phe SerGlu Arg TyrGln Lys GlnSerSer I l e Lys GlyPro 77777

190

180 160

777777777

170

J1

CCA CAC T T CTA ~ CCA TTC AAT GTA AAG AGT CAA GGT GTG CAG ATG AGG GGT Pro His Phe Leu Pro PheAsn Val Lys SerGlnGlyValGln M e t Arg

21J

220

GAA CAA GGG ccc CCT G G ccc ~ CCA GGC GluGlnGlvProProGlvProProGlvPro

230

CCT

240

250

ATT GGA CCA AGA GGA CA/\ CCA I l e GlvProAraGlvGlnPro

603 31 C 234 194

'

260 GGT CCT GCA GGA AAA C GlyProAlaGlyLys

FIG. 2. Nucleotide sequence of the type X collagen cDNA MRF-I a n d the amino acid sequence of the conceptual translation product. Thesequence does not include the synthetic EcoRI linkers at each end. Short horizontal arrows indicate amino acid residues that have been determined by direct amino acid sequencing of type X collagen (15). There is complete agreement between the cDNA-derived sequence and the peptide sequence, except for one residue (not underlined). This residue is reported as Gln in the protein sequence (15), while it is Glu in the DNA-derived sequence. The amino acid sequence corresponding to the beginning of the triplehelical domain is underlined. The portion of the nucleotide sequence that corresponds to the sequence of the oligonucleotide primer is indicated with a double underline. Note that while the primer was a 19-mer with a sequence that extended beyond the 3' end of MRF-1 by 1 nucleotide, this nucleotide was lost from MRF-1 during the cloning procedure. The positions of the two exon-intron splice junctions are indicated by uertical, open arrows a t nucleotides 26-27 and 185-186.

603

118

-

-

"

310 2,3 4 794

"140 118

FIG. 3. A, diagram showing the locations and sizes of the two restriction fragments from the genomic clone YN92 which were endlabeled (filled circle) and used as probes for S1 nuclease protection experiments with mRNA from hypertrophic chondrocytes. Probe I was a Hinfl ( f ) fragment, while probe 2 was an EcoRI (E)/PstI (P) fragment. Arrowheads indicate the extent of protection from degradation by S1 nuclease when the probes were hybridized to mRNA; only sequences on the right-hand side of the arrowheads were protected. R, gel electrophoretic analysis on a 5% gel of probe 1 (lane 2 ) , and of probe 1 digested with S1 nuclease in the absence (lane 3 ) and presence (lane 4 ) of mRNA. Lane 1 contains size markers (Hind111 fragments of XDNA and HaeIII fragments of 6x174 FR DNA). Note protected 140-bp fragment in lane 4. C, gel electrophoretic analysis on a 5% gelof probe 2 (lane 2 ) , and of probe 2 digested with S1 nuclease in the absence (lane 3 ) and presence (lane 4 ) of mRNA. Lane 1 contains size markers as in B. Note protected 75-bp fragment in lane 4.

dicted from YN92 shows that out of these 50 residues only 5 residues, Gly-Val-Gln-Met-Arg, next to the triple helix, are found in the cDNA. This suggests the presence of an intron- the products were electrophoresed on a polyacrylamide gel. exon splice junction in YN92 located five codons upstream of As shown in Fig. 4, the sizes of the products, about 400 and the triple-helical coding region. In fact, by comparing the 540 nucleotides, indicate that their 5' ends are located about nucleotide sequences of YN92 and MRF-1, an acceptor splice 260 nucleotides upstream of the intron-exon splice junction junction is seen to split thecodon of the glycyl residue in the described above. Since the 5' endof MRF-1 is 185 nucleotides above pentapeptide sequence. upstream of this junction, the cDNA extends towithin about Additional evidence for the presence of an intron that splits70 nucleotides of the 5' end of the type X cdlagen mRNA. the amino-terminal, non-triple-helical domain of type X colSince the comparison of the nucleotide sequences of the lagen was obtained from S1 nuclease protection analyses. As cDNA MRF-1 and thegenomic clone YN92 showed that the illustrated in Fig. 3, we used two end-labeled probes for these 5' portion of the cDNA was not encoded by YN92, we isolated analyses. Each probe was hybridized to poly(A)+RNA isolated two additional genomic clones, YN2141 and PL10. Nucleotide from cultured chondrocytes, and S1 nuclease digests of the sequencing andSouthern analyses showed that although samples were analyzed on polyacrylamide sequencing gels. YN2141 extended beyond the 5' end of YN92 by about 740 Probe 1 gave rise to a protected fragment of 140 nucleotides, nucleotides, this extension contained only intron sequences. while probe 2 gave rise to a protected fragment of 75 nucleo- An 80-bp restriction fragmentfrom the 5' endof MRF-1 was tides. Since the two probes were end-labeled at two restriction therefore used as probe to isolate the additional genomic clone sites that are 69 bp apart in the gene, the 5' ends of the PL10. protected portions of the two probes are in thesame location. Extensive nucleotide sequence analysis of PLlO demonTo estimate the coding distance between the 5' region of strates that although it does not overlap with YN2141, it YN92 and the 5' end of the type X collagen mRNA, a primer contains all the coding information for MRF-1 which is not extension analysiswas performed. Two restriction endonucle- already contained in YN92. This coding information is conase fragments, end-labeled at the 5' end of the noncoding tained within two relatively small exonsthat are separated by strand, were denatured and annealed to poly(A)+ RNA. The an intron of 670 nucleotides (Fig. 1). The size of the most 3' labeled primers were extended withreverse transcriptase, and of these two exons, 159 nucleotides, is precisely defined by a

The Type Gene X Collagen

18382

A

1 2 3-43

B

6

123 I

b

-

YN92

PLlO E

E

P f

P

e e

603

a

5.

EXON 3

2

‘3-

: .

..

--

. mRNA

a

310 234 194

118

PRIMER I

-PFm2

72 . I r

FIG. 4. A , diagram showing the locations of exons 1-3 within the genomic clones PLlO and YN92, and thelocations of the two restriction fragments from YN92used asprimers for primer extension analyses of type X mRNA. As discussed in the text, primer 1 was a PstI ( P ) / H i n f l ( f ) fragment, while primer 2 was a Hinfl (f)/PstI ( P ) fragment. When extended with reverse transcriptase and type X mRNA as template the two primers (heavy lines) were extended to about the same 5’ location along the RNA (arrows).E, EcoRI sites. R,gel electrophoretic analysis of primer-extended products. Arrowheads on the left-hand side of the figure indicate the positions of bands with primer 1 (lane I ) and primer 2 (lane 2). Lane 3 contains HaeIII fragments of 4x174 DNA as size markers. The sizes of the markers (in nucleotides) are indicated on the right-hand side of the figure. The prominent bands below 72 nucleotides and above 118 nucleotides in lanes 1 and 2, respectively, represent excess primers.

comparison with the cDNA sequence. T o determine the size of the most 5’ exon of the gene, we mapped the transcription start site by S1 nuclease and primer extension analyses. The experiments were designed such that the SI-protected fragment and the primer extension product would be the same size (see “Materials and Methods”). As shown in Fig. 5, the sizes of primer extension products (lane 6)predict that transcription starts at an A or adjacent T residue. The size of the major SI-protected fragment (lane 1) predicts the start tobe at the A residue. There are several secondary SI-protected fragments but these are likely due to the “nibbling” activity of the S1 nuclease (27). Thus, the predominant transcription start site predicted by both analyses is an A residue located 71 nucleotides upstream of the 5’ end of the cDNA sequence and 27 nucleotides downstream of the TATA box (Fig. 6). Exon 1 is therefore 97 nucleotides long. As shown in Fig. 1, the intron separating the 159-bp-long exon 2 from the large 2136-bp-long exon 3 is about 2000 bp. The size of this intronwas estimated from Southern analysis of chicken genomic DNA using as probe a 315-bp-long PstI/ EcoRI restrictionfragment from the intron-containing 5’ portion of YN2141. This analysis showed hybridization of the probe to a single 1300-bp-long EcoRI fragment. Therefore, the EcoRI sites defining the 3‘ end of PLlO and the5’ end of YN92 (Fig. 1)are separated in thegenomic DNA by 1300 bp. Since exon 2 is located about 450 bp upstream of the 3’ end of PL10, and exon 3 starts 303 bp downstream of the 5’ end of YN92, the intron between exons 2 and 3 must be about 2000 bp in length. This conclusion is supported by Southern analysis of genomic DNA showing that threedifferent restriction fragments derived from PL10, YN2141, and YN92, respectively, and located between BglII sites inPLlO and YN92, all hybridize to the same 5500-bp-long fragment (data not shown). The size of this BglII fragment predicts that the intron between exons 2 and 3 is 2000 bp. Construction, in Vitro Transcription, and Translation of a Full Length Type X Collagen cDNA-To address the question of whether the full length translation product of the type X collagen gene is larger than 60 kDa and thuscompatible with the notion of a high molecular massprecursor, we have

3’ 5‘

CG TA GC CG GC AT AT

CG

TA TA “TA AT

GC

TA AT

GC TA CG GC AT AT

GC 5‘ 3‘

FIG.5. Definition of the transcription start site of the type X collagen gene by S 1 nuclease protection and primer extension analyses. Lane I , SI-protected portion of the HindIIIISacI fragment of PLIO. The size of the major band is 81 nucleotides. Lanes 2-5, Maxam-Gilbert (25) sequencing reactions of the end-labeled anti-sense strand of the HindIIIISacI fragment; G , A + G , T + C , and C, respectively. Lane 6, primer extension products. The topband corresponds to themajor (second) band in lane 1, althoughit migrates slightly faster in this 8% gel because the synthetic primer used for extension does not have a 5”phosphate group. To the right of the figure is a portion of the sequence as read from the gel (anti-sense strand andits complement). The horizontal line indicates the deduced transcription start site.

constructed a cDNA clone, SPLX, containing the entire protein coding sequence of type X collagen mRNA, and translated a sense RNA transcript of this clone in vitro. To construct the cDNA we utilized restriction fragments of MRF-1 and pYN3116 for the 5’ and 3’ ends of the construct. The central triple-helical part was obtained from YN92 (Fig. 7). By constructing SPLX in the vector SP65, RNA could be generated by in vitro transcription of linearized SPLX. The RNA was used for cell-free translation ina reticulocyte lysate, and the translation products were analyzed by polyacrylamide gel electrophoresis. As shown in Fig. 8, the SPLX-derived RNA directed the synthesis of a bacterial collagenase-sensitive polypeptide that co-migrates with a collagenous polypeptide synthesized with poly(A)’ RNA isolated from long term cultured hypertrophic chondrocytes. The two bands migrate at theposition of the bovine serum albumin globular molecular weight marker. In contrast,type X collagen isolated from the medium of cultured hypertrophic chondrocytes migrates slightly above bovine serum albumin. Since it is well established that hydroxylation of prolyl and lysyl residues in collagen chains causes a reduction of their mobilities on polyacrylamide-sodium dodecyl sulfate gels, it is likely that the difference in mobility of the cell-free translation products and the medium-derived protein is due to the lack of this posttranslational modification in the cell-free products. The migration of the globular bovine serum albumin marker would

The TypeX :ollagen Gene - 20 c d m $ t

1

-10

30

40

GCA

50

60

TCA

70

TGC ATT GGA GCT CCA CAA CAT TTG g t ; t g t t a t t g t i t c

att tca ........

CTC ACC AGA

11090

80

760

GGA

GCT CAC

ACT CTA CTG c c T

GAA GAA

CTC

GCT TCT

GCC

CCA GTG

GGA T T T

750

860

820 840

800

TTA CAA ATA

TCG TTA

CTG

830

CTG CTG TTT TAT CTA AAC

ATT

GTC

CAT GGC AGT

850 GAT GGA TAE. TTT TCT GAG

910

CGA

TAC CAG AAACAG

900 CCACAC

TTTCTA

TCC AGE. ATC AAG GGG ~ C A

930 CCA T T i AATGTA

t t t ttt .....2030

bp

.

2~ 3~ 4-.. .

5

6

7

8

9 1 0

"

"

69K 46K

100

630 bp ......t t t c t c ccc t t t cca aca gAG GAC GGA T T T GGA AAC nTc CAT

810

1

20

10

caa ggg t g a cgc t t g aaA TCA TCA

18383

AAG AGTCAA

Ggt aac

940

gtc tac t a t t t t

.......EXON 3.

FIG.6. Partial nucleotide sequence of the 6' region of the type X collagen gene. Exon sequences are shown in capital letters, while intron sequences are given in lower caseletters. The numbering starts at the A residue, which represents the transcription start site. A TATA-box at nucleotides -27 to -20 is indicated in the boxed area. The translational start codon at nucleotides 185-187 is underlined.

FIG. 8. Gel electrophoretic analysis of type X collagen and cell-free translation products.Lane I , molecular weight markers. Lane 2, [3sS]methionine-labeledproteins from chondrocyte culture media digested with bacterial collagenase. Lane 3 [35S]methioninelabeled proteins from chondrocyte culture media. Note type X collagen band above the 69-kDa ( K ) marker. Lane 4, [3sS]methioninelabeled proteins from chondrocyte culture media after immunoprecipitation with the monoclonal antibody against type X collagen. Lane 5, [35S]methionine-labeledcell-free translation products synthesized with poly(A)+RNA isolated from hypertrophic chondrocytes. Lane 6, [3sS]methionine-labeled cell-free translation products synthesized with poly(A)' RNA and digested with bacterial collagenase. Note the presence of a collagenase-sensitive band co-migrating with the 69kDa marker (compare lanes 5 and 6 ) .Lane 7, ["Slmethionine-labeled cell-free translation products synthesized with RNA made in vitro with SPLX. Note band co-migrating with the 69-kDa marker. The bands below this band probably represent incomplete translation products, since they are all (except for the band which is present also in the HZ0 control) degraded by bacterial collagenase (lane 8). Lane 8, cell-free translation products synthesized as for lane 7and digested with bacterial collagenase. Lane 9, cell-free translation with no RNA (H20 control). Lane 10, molecular weight markers. DISCUSSION

The Size and Primary Structure of Chicken Type X Collagen-The data presented here represent the first complete amino acid sequenceof type X collagen. The sequence can be divided into four discrete domains. The signal peptide is 18 amino acid residues and has the characteristic features of such peptides (28). The amino-terminal, non-triple-helical A YN92 domain is 34 amino acid residues long. This is considerably 3' smaller than thenon-triple-helicalsequencewithin the amino propeptide domain of fibrillar al(1) (29) and al(II1) (30) procollagen chains, but longer than the non-triple-helical sequence at the amino ends of pro-a2(1) (5) and pro-al(I1) chains (17). It is also different in size from the non-triplehelical sequencesat the amino ends of al(1X) (31) and a2(IX) 6 0 collagen chains.*A search for homologies betweenthe aminoterminal, non-triple-helical type X sequence and other published protein sequences, including those of other collagen types, was negative. As published previously(4), the triple-helical and carboxylterminal, non-triple-helical domains of chicken type X collagen are 460 and 170 amino residues, respectively. We have FIG. 7. A, diagram showingthe restriction endonuclease fragments not been ableto identify any meaningful sequencesimilarities from MRF-1, YN92, and pYN3116, which are used to generate the cDNA construct SPLX. The fragments are shown as boxed areas. between these domains and those of other collagen types. Our data demonstrate that theprimary translation product Also indicated is the presence of the translational start codon (ATG) of chicken type X collagen is 682 amino acid residues long within the MRF-1 fragment, the presence of the translational stop codon (TGA)within YN92, and thepresence of the poly(A) tail ( ( A ) , ) with a calculated molecular mass of 67,317 Da for the nonwithin the pYN3116 fragment. B, diagram showing the structure of hydroxylatedform. This calculated molecularmassisin the recombinant plasmid SPLX and the restriction fragments that agreement with the observed mobilityof the translation prodwere used to generate its insert. E, EcoRI; A, ApaI; T,Taql; P/H, uct synthesized with RNA transcribed in vitro from the full combination of a PuuII sitewith a HincII site by blunt-end ligation. length cDNA construct. As shown in Fig. 8 the mobility of the peptide synthesized with the in vitro-transcribed RNA was identical to the mobility of the peptide synthesized with correspond to a molecular mass of 60 kDa when compared mRNA isolated from cartilage, and both peptides migrated with collagen molecular mass standards. Consequently the type X collagen genecannot code for a precursor polypeptide *P.LuValle, Y. Ninomiya, N. D. Rosenblum, and B. R. Olsen, that is larger than 60 kDa. unpublished data.

18384

The Type X Collagen Gene

faster than type X collagen chains isolated from chondrocyte culture media. As discussed above, this difference in mobility is due to thelack of prolyl and lysyl hydroxylation of the cellfree translation products. While our data exclude the possibility of a high molecular weight precursor form of type X collagen (9, 131, they do not address the question of processing of type X collagen to a smaller molecule in hypertrophic cartilage (8).Therefore, we do not know whether the 664-amino acid residue-long polypeptide predicted from our data represents the subunit of the physiologically active and mature form of this collagen. The Structure of the Type X Collagen Gene-The molecular cloning and sequencing of type X collagen genomic fragments reported here provide the firstcomplete definition of the exon structure of this collagen gene. As shown in Fig. 1, the gene contains only 3 exons. This structure isdramatically different from the multi-exon structure of fibrillar collagen genes (6) and of the genes encoding al(1X) and a2(IX)collagen chains (33). Particularly intriguing is thepresence of one very large exon (2136 bp) in the type X gene that contains the coding region for the entire triple helix. This is in contrast to the very short triple-helical exons within fibrillar and type IX collagen genes. However, it is somewhat reminiscent of the structure of two genes isolated from Caenorhubditk elegans and believed to code for cuticle collagens (34). One of these genes codes for a polypeptide with a 179-amino acid residuelong triple-helical domain encoded by a single exon of 677 bp, while the second gene contains an exon which is at least 626 bp long encoding a triple helix of 178 amino acid residues. In both nematode genes the triple-helical sequences are interrupted by imperfections in the Gly-X-Y triplet structure, a feature which is also seen in the type X collagen triple helix. However, despite these similarities there are no obvious sequence similarities between the C. elegans collagens and type X. For example, the cuticle collagens contain several cysteinyl residues which are missing in type Xcollagen. Also, whilethe cuticle collagens have only short non-triple-helical sequences of their carboxyl termini, type X collagen contains a very large noncollagenous domain at thisend. Evolution and Regulation of the Type X Collagen GeneThe absence of introns from the region that codes for the triple-helical domain clearly sets the type X gene apart from other collagen genes invertebrates. The fibrillar collagen genes (types I, 11, 111,V, and XI) have a highly conserved exon structure in which the sizes of the triple-helical exons are related to 54 bp. In the chicken &(I) collagen gene, the first fibrillar collagengenewhose exons were completely sequenced, 23 out of 42 triple-helical exons are 54 bp, 8 are 108 bp, and 1is 162 bp (5). Of the remaining 10 exons, 5 are 45 bp, and 5 are 99 bp. It has therefore been proposed that the fibrillar collagen precursor gene was assembled by multiple duplications of a primordial gene that contained a 54-bp coding unit (35). Clearly, the evolution of the type X collagen gene must have taken a different path. The evolutionary assembly of the type X gene must also have been different from that of the types IV and IX collagen genes. Although these genes are not homologous with the fibrillar collagen genes, they contain multiple relatively small triple-helical axons (33,36-38) and probably arose by multiple duplications of small coding units. How did a type X gene then evolve? Clearly, the question cannot be answered based on an analysis of the gene in a single species. However,we speculate that thegene may have been found through an intermediate cDNA copyof a partially spliced RNA transcript. The presence of the two introns at the 5‘ end of the gene may simply be a consequence of the

presence of cis-acting regulatory elements within the introns. It is interesting to note that for both the human a1(I) and the mouse at2(I) collagengenes, cis-acting regulatory sequences have indeed been identified within the first ( 5 ’ ) intron (32, 39). Acknowledgments-We thank Theresa A. Summers and Gary Balian for making their amino acid sequence data available to us prior to publication.

REFERENCES 1. Schmid, T. M., and Linsenmayer, T. F. (1983) J. Biol. Chem. 268,9504-9509 2. Gibson, G. J., Beaumont, B. W., and Flint, M. H. (1984) J. Cell Biol. 99,208-216 3. Schmid, T., and Linsenmayer, T. (1985) J. Cell Biol. 100, 598605 4. Ninomiya, Y., Gordon, M., van der Rest, M., Schmid, T., Linsenmayer, T., and Olsen, B. R. (1986) J. Biol. Chem. 2 6 1 , 50415050 5. Boedtker, H., Finer, M., and Aho, S. (1985) Ann. N. Y. Acad. Sci. 460,85-116 6. Ramirez, F., Bernard, M., Chu, M.-L., Dickson, L., Sangiorgi, F., Weil, D., de Wet, W., Junien, C., and Sobel, M. (1985) Ann. N . Y. Acad. Sci. 460,117-129 7. Miller, E. J. (1985) Ann. N.Y.Acad. Sci. 4 6 0 , 1-13 8. Kwan, A. P. L., Sear, C. H. J., and Grant, M. E. (1986) FEBS Lett. 206,267-272 9. Capasso, O., Quarto, N., Descalzi-Cancedda, F., and Cancedda, R. (1984) EMBO J. 3,823-827 10. Schmid, T. M., and Conrad, H.E. (1982) J. Biol. Chem. 2 5 7 , 12451-12457 11. Grant, W. T., Sussman, M.D., and Balian, G. (1985) J. Biol. Chem. 260,3798-3803 12. Grant, W . T., Wang, G.-J., and Balian, G. (1987) J. Biol. Chem. 262,9844-9849 13. Jimenez, S. A., Rao, V. H., Reginato, A. M., and Yankowski, R. (1986) Biochem. Biophys. Res. Commun. 138,835-841 14. Gibson, G. J., Kielty, C. M., Garner, C., Schor, S. L., and Grant, M. E. (1983) Biochem. J. 211,417-426 15. Summers, T. A., Irwin, M. H., Mayne, R., and Balian, G. (1988) J. Biol. Chem. 263,581-587 16. Chirgwin, J. M., Przybyla, A. E., MacDonald, R. J., and Rutter, W . J. (1979) Biochemistry 18,5294-5299 17. Kohno, K., Martin, G. R., and Yamada, Y. (1984) J. Bwt. Chem. 259,13668-13673 18. Gubler, V., and Hoffman, B. J. (1983) Gene (Amst.) 26,263-269 19. Huynh, T. V., Young, R. A., and Davis, R. W . (1985) in DNA Cloning: A Practical Approach (Glover, D. M., ed) Vol. 1, pp. 49-78, IRL Press, Oxford 20. Overbeek, P. A., Merlino, G. T., Peters, N. K., Cohn, V. H., Moore, G . P., and Kleinsmith, L. J. (1981) Biochim. Biophys. Acta 656,195-205 21. Sanger, R., Nicklen, S., and Coulson, A.R. (1977) Proc. Natl. Acad. Sci. U. S. A . 74,5463-5467 22. Maniatis, T., Fritsch, E. F., and Sambrook, J. (1982) Molecular Cloning:A Laboratory Manual, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 23. Kohno, K., Sullivan, M., and Yamada, Y. (1985) J. Bwl. Chem. 260,4441-4447 24. Vasios, G. (1986) Isolation and characterintion of the a1 (IX) collagengene promoter and 5”coding region. Ph.D. dissertation, Rutgers, The StateUniversity, New Brunswick, NJ 25. Maxam, A.M., and Gilbert, W. (1980) Methods Enzymol. 6 6 , 497-559 26. Krieg, P., and Melton, D. A. (1984) Nucleic Aclds Res. 12,70577070 27. Ausubel, F. M., Brent, R., Kingston, R. E., Moore, D. D., Smith, J. A., Seidman, J. G., and Struhl, K. (1987) Current Protocols in Molecular Biology, Greene Publishing Associates and Wiley Interscience, New York 28. Watson, M. E. E, (1984) Nucleic Acids Res. 12, 5145-5164 29. Horlein, D., Fietzek, P. P., Wachter, E.,LapiBre, C. M., and Kiihn, K. (1979) Eur. J. Biochem. 99,31-38 30. Brandt, A., Glanville, R. W . , Horlein, D., Bruckner, P., Timpl,

The TypeX Collagen Gene R., Fietzek, P. P., and Kuhn, K. (1984) Biochern. J. 219,62531. 32. 33. 34. 35.

634 Vasios, G., Nishimura, I., Konomi, H., van der Rest, M., Ninomiya, Y., and Olsen, B. R. (1988) J. Biol. Chern. 2 6 3 , 23242329 Rossi, P., and de Crombrugghe, B. (1987) Proc. Natl. Acad. Sci. U. S. A . 84,5590-5594 Lozano, G., Ninomiya, Y., Thompson, H., and Olsen, B. R. (1985) Proc. Natl. Acad. Sci. U. S. A . 8 2 , 4050-4054 Kramer, J. M.,Cox, G . N., and Hirsch, D. (1982) Cell 3 0 , 599606 Yamada, Y., Awedimento, V. E., Mudryj, M., Ohkubo, H., Vogeli,

18385

G., Irani, M., Pastan, I., and de Crombrugghe, B. (1980) Cell 22,887-892 36. Soininen, R., Tikka, L., Chow, L., Pihlajaniemi, T., Kurkinen, M., Prockop, D. T., Boyd, C. D., and Tryggvason, K. (1986) Proc. Natl. Acad. Sci. U. S. A . 83, 1568-1572 37. Sakurai, Y., Sullivan, M., and Yamada, Y. (1986) J. Biol. Chern. 261,6654-6657 38. Kurkinen, M., Bernard, M. D., Barlow, D. P., and Chow, L. T. (1985) Nature 317,177-179 39. Rossouw, C. M. S., Vergeer, W. P., du Plooy, S. J., Bernard, M. P., Ramirez, F., and de Wet, W. J. (1987) J . Biol. Chern. 262, 15151-15157