IX collagen polypeptides

5 downloads 0 Views 1MB Size Report
ed by the 5' and 3' notations. ... ed for the polypeptide chain of homotrimeric type IX mole- cules. .... 19. van der Rest, M., Mayne, R., Ninomiya, Y., Seidah, N. G.,.
Proc. Nati. Acad. Sci. USA Vol. 82, pp. 4050-4054, June 1985 Biochemistry

A distinct class of vertebrate collagen genes encodes chicken type IX collagen polypeptides (recombinant DNA/nudeodde sequence analysis)

GUILLERMINA LOZANO, YOSHIFUMI NINOMIYA, HILLARY THOMPSON, AND BJORN REINO OLSEN Department of Biochemistry, University of Medicine and Dentistry of New Jersey-Rutgers Medical School, Piscataway, NJ 08854

Communicated by Elizabeth D. Hay, February 28, 1985

Type IX collagen is a disulfide-bonded proABSTRACT tein first isolated from hyaline cartilage. The structure of this collagen is unusual in that the molecules contain three triplehelical domains interspersed with noncollagenous regions. The molecules are heterotrimers composed of three genetically distinct polypeptide chains. In our laboratory, cDNAs specific for two of these polypeptide chains have recently been isolated. Here we report on the isolation of genomic clones by use of these cDNAs as probes for screening a chicken genomic library. Nucleotide sequence analysis of these clones shows that the exon structure of type IX collagen genes is fundamentally different from the exon structure of the genes for the fibrillar collagen types I-I. Whereas the sizes of exons in fibrillar collagen genes are related to a basic 54-base-pair coding unit, the exons of type IX collagen genes show a large variation in size and do not appear to be related to a 54-base-pair unit. We propose, therefore, that type IX collagen genes belong to a class of vertebrate collagen genes distinct from that of fibrillar collagens.

became frozen as it was duplicated to give rise to the genes encoding different types of collagen (13). The chicken proa2(I) gene cannot, however, serve as a model for all collagen genes. For example, collagen-related genes isolated from two invertebrates, Caenorhabditis elegans and Drosophila melanogaster, are quite different from the proa2(I) gene (15, 16). Analysis of the nucleotide sequence of two collagen genes in C. elegans indicates that both genes are small (1 kb) and contain only 1 or 2 introns (15). There is no evidence for a 54-bp coding unit or units of any multiples of 54 bp. Although the analysis of the Drosophila collagen gene is not yet complete, it is also clear that it lacks 54-bp coding units (16). Two cDNA clones encoding two [designated al(IX) and a2(IX)] of three genetically distinct polypeptides of a chicken cartilage collagenous protein have been isolated recently in our laboratory (17, 18). This collagen, designated type IX (19), is highly unusual in that it contains three triple-helical domains separated by noncollagenous domains. The presence of noncollagenous regions interrupting a long triple helix makes type IX collagen distinctly different from the fibrillar collagens type I, II, and III. A second major difference is the presence of large noncollagenous domains at the carboxyl and amino ends of fibrillar collagen proa chains and the absence of homologous peptides in type IX a chains. To examine whether the differences between type IX and types IIII collagens reflect differences in gene structure, we have isolated genomic clones specific for al(IX) and a2(IX) chains by using the two cDNAs as hybridization probes for screening of a genomic library. Here we report on the characterization of two such genomic clones, one containing sequences encoding the al(IX) chain, and the second containing sequences encoding the a2(IX) chain. Nucleotide sequence analysis of the two clones has allowed identification of two exons in the al(IX)-chain gene and nine exons in the a2(IX)-chain gene. The sizes of these exons vary from 33 bp to 1100 bp, and they are all different from the 54-bp basic exon unit of the types I-III collagen genes. On the basis of the exon structure of the al(IX) and a2(IX) collagen genes, we propose, therefore, that these two genes belong to a distinct class of vertebrate collagen genes that are quite different from the genes that encode the fibrillar collagens types IIII.

Collagens, the major proteins of extracellular matrices, comprise a family of proteins consisting of at least 10 different types with perhaps 20 genetically distinct polypeptide subunits. The different types of collagens are all composed of molecules containing three polypeptides, called a chains, arranged in a triple-helical conformation. The collagenous amino acid sequence of the a chains is a repeating structure with glycine in every third position and proline or 4-hydroxyproline frequently preceding the glycyl residues. Different collagen types are found in various proportions in tissues as a result of the modulated expression of different collagen genes during tissue development. As a basis for studies on this differential expression of collagen genes, several laboratories have analyzed the structure of collagen genes. Portions of genes encoding collagens type I, II, and III from a variety of species have been isolated (1-13) and they show a remarkably high degree of similarity in their organization. The chicken proa2 type I gene is by far the best characterized and it has served as a prototype for a collagen gene (for review, see ref. 11). The coding sequence of this gene is interrupted by 51 introns, resulting in a gene that spans almost 39 kilobases (kb) of DNA. Most of the exons (42 out of 52) code for the collagenous domain of the chick proa2(I) polypeptide; these exons range in size from 45 base pairs (bp) to 162 bp and are related to a basic 54-bp unit (most are 54 or 108 bp long). This finding has led to the hypothesis that the triple-helical domain of the fibrillar collagen genes evolved by tandem duplication of an ancestral gene containing a 54-bp coding unit (14). The distribution of exon sizes is highly conserved among collagen type I, II, and III genes, suggesting that following the initial evolutionary assembly of the multiexon gene, the exon structure of the gene

MATERIALS AND METHODS Hybridization Probes. We used various restriction fragments prepared from the inserts of the recombinant cDNAs pYN1738 (17) and pYN1731 (18) as probes. As reported elsewhere (17), the pYN1738 cDNA encodes most of the al(IX) collagen chain, whereas the pYN1731 cDNA contains coding sequences for the carboxyl half of the a2(IX) collagen chain (unpublished data). The probes were labeled by nicktranslation, using standard procedures.

The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. ยง1734 solely to indicate this fact.

Abbreviations: bp, base pair(s); kb, kilobase(s). 4050

Biochemistry:

Lozano et al.

Screening of the Chicken Genomic DNA Library. The labeled probes were used to screen a library of chicken genomic DNA fragments generated by partial EcoRI digestion and cloned in the bacteriophage X Charon 4A vector (unpublished work). Filters were screened with hybridization probes in the presence of N-lauroylsarcosine (20) or formamide (21). Phage purification and recombinant DNA isolations were performed as described (21). Construction of Plasmid Subclones and DNA Sequencing. The DNA of the recombinant phage YN623 containing part of the al(IX) collagen gene was digested with EcoRI and HindIII. A 6-kb EcoRI-HindIII fragment was inserted between the EcoRI and HindIII sites of pBR322, and the resulting recombinant plasmid was characterized. DNA from the recombinant phage GL858 containing a2(IX) collagen gene sequences was digested with BamHI. The BamHI fragments were inserted into the BamHI site of pBR322, and recombinant plasmids containing the different BamHI fragments of the original gene clone were isolated and characterized. Nucleotide sequence analysis of recombinant plasmids was performed by the chemical-cleavage method of Maxam and Gilbert (22) as well as the dideoxy chain-termination technique (23). For Maxam-Gilbert sequencing, restriction fragments were labeled at their 5' ends by use of calf intestinal alkaline phosphatase and T4 polynucleotide kinase or at their 3' ends by use of the Klenow fragment of DNA polymerase I. For dideoxy sequencing, restriction fragments were inserted in both orientations into M13mpl8 and M13mpl9 by standard techniques (24) and then sequenced, using a 17-nucleotide primer complementary to M13 sequences. DNA Transfer Blot Analysis. Restriction endonuclease-digested DNA was electrophoresed in agarose gels and transferred to nitrocellulose filters, and the filters were hybridized as described above for library screening. RESULTS AND DISCUSSION Using the insert of the al(IX) cDNA, pYN1738, as a probe for screening the genomic library, we identified and isolated the genomic clone YN623 (Fig. 1). With the a2(IX) cDNA pYN1731 as a probe, the genomic clone GL858 was isolated (Fig. 1). The two clones have inserts of 14 kb and 16 kb, respectively, corresponding to EcoRI fragments of chicken genomic DNA as seen by blot analysis with the two cDNAs as probes (Fig. 2). In the blot analysis, a 16-kb EcoRI fragment is the only band that hybridizes to the a2(IX) cDNA. The clone GL858 therefore contains all the coding sequences represented by pYN1731. In contrast, with the insert of the al(IX) cDNA as probe, several EcoRI fragments hybridize in the blot analysis of chicken DNA (Fig. 2). With a restriction endonuclease-generated fragment from the middle portion of the insert of pYN1738 (Fig. 2, probe 2), multiple bands of varying intensities were seen. Based on preliminary characterization of additional genomic clones specific for al(IX) collagen (data not shown), we believe that these bands reflect the complex structure of the al(IX) gene, with the strong bands representing multiple hybridizing fragments of identical electrophoretic mobilities. With a restriction

fragment from the 3' end of the insert of pYN1738 as hybridization probe, a 14-kb EcoRI fragment is seen by blot analysis (Fig. 2). This suggests that the genomic clone YN623, containing a 14-kb insert, represents the 3' end of the al(IX)

gene.

Further analysis of YN623 confirmed that it contained sequences encoding part of the al(IX) collagen chain. Using the pYN1738 insert as a hybridization probe for Southern blot analysis of restriction fragments from YN623, we found that al(IX) coding sequences were confined to a 6-kb EcoRI-HindIll fragment in YN623 (Fig. 3). This fragment was therefore subcloned and characterized further. As

Proc.

NatL.

Acad Sci. USA 82 (1985)

4051

0(1 (IX),pYN 1738 5'

COL2

CL 0

0.5

E

H3

3'

COLT 1.0 kb

E

YN623 10

20 kb

0

GL858 E

CI

E

I

5'

COL2

O

3'

2(IX),pYN 1731 FIG. 1. Diagram of chicken genomic inserts of the recombinant phages YN623 and GL858 and the corresponding cDNAs pYN1738 and pYN1731. The cDNAs encode polypeptides with collagenous sequence domains (COL 1-3) interspersed with noncollagenous sequence domains. Note that the cDNAs are drawn to a different scale than the genomic inserts. Regions of the genomic fragments that contain nucleotide sequences identical to those of the cDNAs are indicated by lines between the genomic inserts and the cDNAs. Thus, sequences from the COL 1 region of PYN1738 are found in the 5' region of YN623. Restriction endonuclease sites: E, EcoRI; H3, HindIII;

Cl, Cla I.

shown in Fig. 4, nucleotide sequence analysis of this subclone allowed identification of the two 3'-most exons in the al(IX) gene. Exon 1 (as counted from the 3' end of the gene) is about 1100 bp long. It contains the 3' nontranslated sequence of the mRNA, 63 nucleotides coding for the noncollagenous domain NC 1 (see Fig. 2 Upper) in al(IX) chains, and 122 nucleotides encoding 404 amino acid residues of the COL 1 domain in al(IX) collagen chains. Exon 2 contains 78

nucleotides encoding 26 amino acid residues of the COL 1 domain (Fig. 4). Sequence analysis of GL858 confirmed that it codes for the a2(IX) collagen chain. We have sequenced the portion of GL858 that corresponds to the cDNA pYN1731. This portion corresponds to a 2-kb BamHI fragment and a 0.9-kb BamHI fragment located in the middle of the insert of GL858 as well as a few bases extending into the 3' region of a 1.5-kb BamHI fragment located on the 5' side of the 2-kb fragment (Figs. 3 and 4). Within this 3-kb stretch of DNA, the gene contains nine exons with the coding information for about 50% of the a2(IX) chain (Fig. 4). Exon 1, as counted from the 3' end of the gene, is more than 400 nucleotides long. It contains the 3' nontranslated sequence of the a2(IX) mRNA, 45 nucleotides coding for the 15 amino acid residues of the domain NC 1, and 122 nucleotides encoding 404 amino acid residues of the COL 1 domain. Exon 2 contains 78 nucleotides encoding 26 amino acid residues of the COL 1 domain of the a2(IX) collagen chain. Exons 3 and 4 contain sequences corresponding to both the collagenous domains COL 1 and COL 2 as well as the noncollagenous domain NC 2 of the a2(IX) chain. Exon 3 is 189 nucleotides long, encoding 481 amino acid residues of COL 1 and 142 amino acid residues of NC 2. Exon 4 is 55 nucleotides long and codes for the last 3 amino acids of COL 2 and 151 amino acid

4052

Biochemistry: Lozano et aL

Proc. NatL Acad ScL USA 82 1(IX)

COL3

0t2(IX) COL2

COL2

P

I

I

PP

10-00,

23.1

w

IllS

6.61.

3;^2 9' *4b

4.4,

2.31 2.0

4T

1.4 1.10.

0.91

0.6k

A

E3

B

.4 1:.-

,e,

9.4 6.6 4.4

1.S

d'A'T;

< 2.3

M,r

6.6

-

2.0

,r

4.41

B

C

D

FIG. 3. Restriction-endonuclease and transfer-blot analysis of DNA from the recombinant phages YN623 and GL858. Purified DNA was digested with EcoRI and HindIII (lanes A and B) or EcoRI and BamHI (lanes C and D), electrophoresed in a 0.8% agarose gel, and then either stained with ethidium bromide (lanes A and C) or blotted onto nitrocellulose and probed with the nick-translated cDNAs pYN1738 (lane B) and pYN1731 (lane D). The positions of molecular size markers (in kb) are indicated at left for lanes A and B and at right for lanes C and D. Arrow indicates the electrophoresis origin.

p

PRODE4

FIG. 2. Blot-transfer analysis of chicken genomic DNA digested with EcoRI; type IX collagen cDNAs were used as probes. (Upper) The location of the probe fragments within the cDNA inserts and their relation to ;the polypeptide sequence domains encoded by the cDNAs. Restriction endonuclease sites: H, HindIII; P, Pvu II. As explained in the- text, the al(IX) chain, as encoded by pYN1738, contains three collagenous domains (COL 1-3), whereas the a2(IX) chain, as it is en< coded by pYN1731, contains two such domains (COL 1-2). Noncollagenous (NC) domains are represented by zigzag lines. (Lower) For blot analysis, 10 pg of chicken DNA, digested with EcoRI, was loaded into each of several wells of a 0.8% agarose gel. After electrophoresis, the DNA was transferred to nitrocellulose by blotting and the nitrocellulose filter was probed with nick-translated fragments of the cDNAs pYN1738 (lanes A-C: probes 1-3, respectively) and pYN1731 (lane D: probe 4). The positions of molecular size markers (in kb) are indicated at left. The electrophoresis origin is indicated C D by an arrow.

residues of NC 2. Exons 5-9 all code for triple-helical sequences; they range in size from 33 nucleotides to 147 nucleotides (Fig. 4). A comparison of the nucleotide sequences from the translated portions of exons 1 from YN623 and GL858 clearly demonstrates that they encode homologous proteins (Fig. 4). In both genes, 122 nucleotides at the 5' end of exon 1 code for 403 amino acid residues with a collagenous Gly-Xaa-Yaa triplet structure corresponding to the carboxyl portion of the COL 1 domain of al(IX) and a2(IX) collagen chains. Seven-

A

Ij

H

L-

I

9.4

23.1 P 9.4 -

I

pYN1731 P

P

PROE2

PROSE 1

CO

NHC2OO

pYN 1738 H

(1985)

teen nucleotides downstream from the 5' end of exon 1, both genes contain a stretch of 15 nucleotides encoding a penta-

peptide imperfection with a deleted glycine codon in the GlyXaa-Yaa triplet structure (Fig. 4). A comparison of the translated portion of exon 1 of the a2(IX) gene with the homologous region of the al(IX) gene shows that the homology is 52.7% at the nucleotide level (167 nucleotides compared) and 49.7% at the amino acid level (5523 amino acid residues compared).

The homology between the al(IX) and a2(IX) collagen genes is also clearly demonstrated by a comparison of exons 2 in the two genes. Both exons are 78 nucleotides long, both exons encode the same stretch of Gly-Xaa-Yaa triplets as counted from the amino end of the COL 1 domain of al(IX) and a2(IX) chains, and in the middle of each exon, 6 nucleotides encode a Gly-Arg dipeptide imperfection in the GlyXaa-Yaa triplet structure (Fig. 4). The homology between this exon in the two genes is 56.4% at the nucleotide level and 61.5% at the amino acid level. The two gene clones YN623 and GL858 contain sequences that are specific for two (al and a2) of the three genetically distinct polypeptide subunits of type IX collagen. Extensive structural information about the third polypeptide subunit is still lacking, but the existence of triple-helical molecules with three different chains (25) requires that the a3(IX) chain contain the same collagenous and noncollagenous sequence domains as the al(IX) and a2(IX) chains. Also, the COL 1 domain of the a3(IX) chain probably contains two imperfections in the same locations as in the al(IX) and a2(IX) chains, since this domain in type IX collagen is resistant to pepsin and gives rise to the triple-helical fragment LMW during pepsin extraction of cartilage (19). It is likely, therefore, that the exon structure at the 3' end of the a3(IX) collagen gene is homologous to that of the al(IX) and a2(IX) genes described here. Although we have analyzed only about half the coding se-

20ObP

d1(IX)

A 6"

Nj

-

w

2 Eco I HindI Kpn I| Pst III Pvv Xba

EXON

9

ag GG

3#

EXON 8 36 bp

EX(ON

II

mum

*

EXON 6

.

9

8

~_

m

765

43 2

Brm HI'

I

Sac I

0G0

ATC CCG GGA CCC CAA

ag GGA GAC CCC GGT GTC CAC GGC CTC GCA GGG GTG AAG GGT GAG AAG 8t Gly Asp Pro Gly Val Hes Gly Leu Ale Gly Val Lys Gly Glu Lys ag GOT GAA TCT GGA GAG CCA GCA CCA AAG GGA CAG gt Gly Glu Ser Gly Glu Pro Gly Pro Lye Gly Gin

eg CAA GGC ATC CAG GCC GAG CTC GGT TTC CCC GCG CCC TCG GGG GAC Gin Gly Ile Gin Gly Glu Leu Gly Phe Pro Gly Pro Ser Gly Asp GCC GGC TCA CCT GGT GTG AGG GGC TAC CCA GOT CCT CCT GGC CCA Ale Gly Ser Pro Gly Val Ar4 Gly Tyr Pro Gly Pro Pro Gly Pro

3.

CGG G00 CTO CTO G00 GAG CGT GOT GTG CCC GGG ATG CCC GG0 CAG Arg Gly Leu Leu Gly Glu Arg Gly Val Pro Gly Met Pro Gly Gin

1

CGCC GGC GTT GCA gt Arg Gly Val Al&

I

8$1 N Set EN Hint Pet I

CCA

ag GGC TCC CCA GGA AAG ACC GOT CCC AAA GGC AGC ACT St Gly Ser Pro Gly Lye Thr Gly Pro Lys Gly Ser Thr

.2?

_

.

33 bp

EXON 5 147 bp

**

Ot20X)

_

7

45 bp

0-404-

Pvu U

CAG GCC CCC

I'

I I

o

AAA GOT GAA

GGC TTG CCA GGA GTC AAA GGA GAC AAG St Gly Leu Pro Gly Val Lys Gly Amp Lys

I.

.. I

CCC

ly Pro Lys Gly Glu Gin Gly Pro Pro Gly Ile Pro Gly Pro Gin

71 bp

1

E

4053

Proc. Natl. Acad. Sci. USA 82 (1985)

Biochemistry: Lozano et al.

I ILI I II

EXON 4

ag

55 bp

I I

I

GOA CGG GAT GCC GGG Gly Arg Aep Al& Gly

R

I

GAC CAG CAC ATC ATT GAC CTC GTG CTG AAG Asp Gin His Ile Ile Asp V1l Val Leu Lys

ATG ATG CAA G 8t G

Net Net Gin

h-

-*-4I-

EXON 3

eg

AG CAG CTG GCG GAG GTC GCT GTC AGC GCC AAG AGA GCC GCC CTG lu Gin Leu Ale Glu Val Ale Val Ser Ale. Lye Arg Al& Ale Leu

189 bp

GGC GGG GTC GGT GCC ATG GGC CCC CCC GCA CCT CCA GG0 CCT CCA Gly Gly Val Gly Ale. Met Gly Pro Pro Gly Pro Pro Gly Pro Pro GGG CCG CCT GCG GAG CAA GGC CTC CAC GGA CCC ATG GGA CCT CGG Gly Pro Pro Gly Glu Gin Gly Leu His Gly Pro Net Gly Pro Arg

GGC GTT CCC GCC CTC CrG GOT GCC GCC GCG CAG ATC G0C AAC ATC Gly Vol Pro Gly Leu Leu Gly Ale Ale. Gly Gin Ile Gly Aen Ile

B EXON 2 78 bp

qg

GT GAA GCA GGA GAA AGG GGA ly Glu Alae Gly Glu Arg Gly

GAA AGA Glu Arg

GGA Trr CCA GGC

Gly

Phe Pro

Gly

AGA

GGA

GGA CCC AAA G gt Gly Pro Lys G

Arg Gly

.......

GTG AAA GOT OTC CCC GOA CCG AGA GOT CTC CGA Val Lys Gly Leu Pro Gly Pro Arg Gly Leu Pro

EXON

ea

gt

EXON 2 78 bp

G

ae

GT GAA CCA GCC AAA CCC AGC TAT GGC AGO GAA GGC CGT GAT GGT ly Glu Pro Gly Lys Pro Ser Tyr Gly Arg Glu Gly Arg Asp Gly

GG AAG CGT G0G GAG AAG GGT GAG CGO GGA GAC ACC GGC CGT GGG ly Lys Arg Gly G14 Lye Gly Glu Arg Gly Asp Thr Gly Arg Gly

CAC CCG GCG ATG CCC GGT CCC CCA GGG ATC CCA G gt His Pro Gly Net Pro Gly Pro Pro Gly Ile Pro

...........=........n

GTA CGA GGT CCC CCT G00 GTG GCC GGT CAG CCT GSG ATT CCT GGT Val Arg Gly Pro Pro Gly Val Alae Gly Gin Pro Gly Ile Pro Gly CCT CCT GGC CCT CCC GGC CCT CCT GGC TAC TOC GAG CCG TCG TCT Pro Pro Gly Pro Pro Gly Pro Pro Gly Tyr Cys Glu Pro Ser Ser TGC CGA ATG CAG GCT GGA CAG AGA GCT GCT GGT AAG AAC ATG AAA Cye Arg Net Gin Ala Gly Gin Arg Ala Ala Gly Lys Asn Net Lys

GGG CCA TGA Gly Pro ***

ATG

GGC

AAC

GCA

AAA

CTC ACA

AAG

CAT

TTT ACA TCC TGC ACA GAC ACC TGA AAA AAA AAC CCA

CAT

CAA

AAA ATA

AGG

TAG

EXON I

aeg

T CTC CCC GGC ATC CCC GGC CAT GCA CTG GCT GGC --ly Leu Pro Gly Ile Pro Gly His Ala Leu Ala Gly --- ---

30 BASES

-- --- ---

--- ---

CCA GGG GAT GCT G0G CCC CCG GG0 Pro Gly Asp Ala Gly Arg Pro Gly

TCC CCC GGC CCT GCA G0G CTG CCG GGT TTC TGT GAG CCA GCT GCC Ser Pro Gly Pro Ala Cly Leu Pro Gly Phe Cys Glu Pro Ala Ale

TGC CTG GCG GCT CTG CCT ACG CCT CGC CAC GGC TGA CGG AGC CGG Cys Leu Gly Ala Leu Pro Thr Pro Arg Hie Gly ***

AAT GGA GGA GAA ACA CAA CCT TGC ACC CAA GAT TTC AGT AAG GCA

TTT

TGA

ACA

CAG

TOG

T1G

ATT

OTT

TOT

AAA

GGC ACA CTT CGC

---

---

---

---

---

775 BASES -

---

---

---

--- --- --- ---

TAC

TC

GCG

AA

0G

TAC

CTG CAA ACC CAC GGG AAG GCA

COT

TGA

CAC CAC CAG

CAA

ACA

CCA CCG CCC ACA GGG GGG ACC

CAC ACG GCA CGG CGG CGT GGG GAA GGA CAG AGG GAC CTC CAC CCC

FIG. 4. (A) Restriction endonuclease cleavage

maps

of portions

of YN623 [al(IX)] and GL858 [a2(IX)] analyzed by nucleotide sequencing. The direction of the coding strands of the genes is indicat-

CCC

CCC

OTG

CCC

ATC

CGA

100-150

CAG AAA TAA AGC GGC TCT GAT CCBASES

---

--- --- --- --- --- --- ---

TA ACC TGA OAT GGA TCO TGA ccc AC GAC AAC ACC AC AGA CA ed by the 5' and 3' notations. Exons are indicated by the black boxes, numbered from right to left as discussed in the text. The 3' end of e&c e8 exon 1 of the al(IX) gene has not been completely sequenced, and it is therefore indicated by a broken line. The strategy of nucleotide sequencing is indicated below each set of maps, with dots representing the positions of 5'-end labeling and the arrows showing the directions and extents of the sequence analysis. Thick vertical bars indicate the positions of 3'-end labeling by the Klenow fragment of DNA polymerase I and the attached arrows show the directions and extent of sequence analysis. The thin vertical lines and corresponding arrows indicate regions that were sequenced by the dideoxy chain-termination technique. (B) Nucleotide and corresponding deduced amino acid sequence of exon 1 and exon 2 (as counted from the 3' end of gene) of the al(IX) collagen gene. As indicated in the text, both exons contain imperfections in the Gly-Xaa-Yaa triplet coding structure. These imperfections are underlined by broken double lines. Asterisks indicate the termination codon in exon 1. Lower-case letters in the nucleotide sequence represent intron sequences. (C) Nucleotide and corresponding deduced amino acid sequence of exons 1-9 of the a2(IX) collagen gene. Imperfections in the Gly-Xaa-Yaa triplet structure are underlined by broken double lines. The canonical polyadenylylation signal -A-A-T-A-A-A- is underlined. Asterisks indicate the stop codon.

4054

Biochemistry: Lozano et al.

quences of the a2(IX) gene and the two 3'-most exons of the al(IX) gene, the data presented here clearly demonstrate that the exon structure of type IX collagen genes is different from that of the fibrillar collagens types I-III. Among the 11 exons we have sequenced, 7 exons code exclusively for triple-helical sequences and 4 exons encode triple-helical sequences as well as noncollagenous sequences. With the exception of exon 7, none of these exons have the characteristic size of triple-helical exons (45, 54, 99, or 108 bp) of fibrillar collagen genes. The differences between fibrillar collagen genes and those of type IX collagen provide a basis for a genetic classification of collagens. Class A genes contain the multiexon genes of fibrillar collagens such as types I-III, whereas type IX genes belong in a separate class, B. How did the type IX collagen genes evolve and what is their relationship to the class A collagen genes? We propose that the genes for the different a-chains of type IX collagen arose by duplication of an ancestral multiexon gene that coded for the polypeptide chain of homotrimeric type IX molecules. Duplication of transcriptional control sequences along with the coding regions or a sharing of a common control sequence because of clustering of the genes allowed continued expression of the three gene copies. Subsequent drift among the three genes may have favored the formation of heterotrimeric molecules because of some selective advantage they would confer as compared with homotrimeric molecules. Such advantage could include, but is of course not limited to, increased rate of triple-helix formation, improved thermal stability, or increased secretion rate. How the multiexon ancestral type IX collagen gene was assembled during evolution is not clear. Also unknown is the possible relationship with fibrillar collagen genes, whose exons appear related to a basic 54-bp coding unit. Answers to these questions will have to await studies of type IX collagen genes from other animal species, including invertebrates. It will also be important to establish whether the exon structure of type IX collagen genes is shared by genes encoding other types of short-chain collagen, such as type X collagen. This work was supported in part by research Grants AM21471 and AM34059 from the National Institutes of Health. 1. Ohkubo, H., Vogeli, G., Mudryj, M., Avvedimento, V. E., Sullivan, M., Pastan, I. & deCrombrugghe, B. (1980) Proc. Natl. Acad. Sci. USA 77, 7059-7063. 2. Boyd, C. D., Tolstoshev, P., Schafer, M. P., Trapnell, B. C., Coon, H. C., Kretschmer, P. J., Nienhuis, A. W. & Crystal, R. G. (1980) J. Biol. Chem. 255, 3212-3220. 3. Wozney, J., Hanahan, D., Morimoto, R., Boedtker, H. *

Proc. NatL Acait Sci. USA 82

(1985)

Doty, P. (1981) Proc. Natl. Acad. Sci. USA 78, 712-716. 4. Monson, J., Friedman, J. & McCarthy, B. J. (1982) Mol. Cell Biol. 2, 1362-1371. 5. Weiss, E. H., Cheah, K. S. E., Grosveld, F. G., Dahl, H. H. M., Solomon, E. & Flavell, R. A. (1982) Nucleic Acids Res. 10, 1981-1994. 6. Myers, J. C., Dickson, L. A., deWet, W., Bernard, M. P., Chu, M.-L., DeLiberto, M., Pepe, G., Sangiorgi, F. 0. & Ramirez, F. (1983) J. Biol. Chem. 258, 10128-10135. 7. Sandell, L. J., Yamada, Y., Dorfman, A. & Upholt, W. B. (1983) J. Biol. Chem. 258, 11617-11621. 8. Dickson, L. A., Ninomiya, Y., Bernard, M. P., Pesciotta, D. M., Parsons, J., Green, G., Eikenberry, E. F., deCrombrugghe, B,, Vogeli, G., Pastan, I., Fietzek, P. P. & Olsen, B. R. (1981) J. Biol. Chem. 256, 8407-8415. 9. Yamada, Y., Mudryj, M., Sullivan, M. & deCrombrugghe, B. (1983) J. Biol. Chem. 248, 2758-2761. 10. Sandell, L. J., Prentice, H. L., Kravis, D. & Upholt, W. B. (1984) J. Biol. Chem. 259, 7826-7834. 11. Tate, V., Finer, M., Boedtker, H. & Doty, P. (1982) Cold Spring Harbor Symp. Quant. Biol. 47, 1039-1049. 12. Chu, M.-L., deWet, W., Bernard, M., Ding, J. F., Morabiot, M., Myers, J., Williams, C. & Ramirez, F. (1984) Nature (London) 310, 337-340. 13. Yamada, Y., Liau, G., Mudryj, M., Obici, S. & deCrombrugghe, B. (1984) Nature (London) 310, 333-337. 14. Yamada, Y., Avvedimento, V. E., Mudryj, M., Ohkubo, H., Vogeli, G., Irani, M., Pastan, I. & deCrombrugghe, B. (1980) Cell 22, 887-892. 15. Kramer, J. M., Cox, G. N. & Hirsch, D. (1982) Cell 30, 599606. 16. Monson, J. M., Natzle, J., Friedman, J. & McCarthy, B. J. (1982) Proc. Natl. Acad. Sci. USA 79, 1761-1765. 17. Ninomiya, Y. & Olsen, B. R. (1984) Proc. Natl. Acad. Sci. USA 81, 3014-3018. 18. Ninomiya, Y., van der Rest, M., Mayne, R., Lozano, G. & Olsen, B. R. (1985) Biochemistry, in press. 19. van der Rest, M., Mayne, R., Ninomiya, Y., Seidah, N. G., Chretien, M. & Olsen, B. R. (1985) J. Biol. Chem. 260, 220225. 20. Overbeck, P. A., Mertino, G. T., Peters, N. K., Cohn, V. H., Moore, G. P. & Kleinsmith, L. J. (1981) Biochim. Biophys. Acta 656, 195-205. 21. Maniatis, T., Fritsch, E. F. & Sambrook, J. (1982) Molecular Cloning: A Laboratory Manual (Cold Spring Harbor Laboratory, Cold Spring Harbor, NY). 22. Maxam, A. & Gilbert, W. (1980) Methods Enzymol. 65, 499560. 23. Sanger, R., Nicklen, S. & Coulson, A. R. (1977) Proc. Natl. Acad. Sci. USA 74, 5463-5467. 24. Norrander, J., Kempe, T. & Messing, J. (1983) Gene 26, 101105. 25. Mayne, R., van der Rest, M., Weaver, D. C. & Butler, W. T. (1985) J. Cell. Biochem. 27, 133-141.