Cloning of Human Type VI1 Collagen - The Journal of Biological ...

3 downloads 0 Views 3MB Size Report
clones which correspond to full-length human type VI1 collagen &A of "9.2 kilobases. The full-length cDNA sequence contains an 8,833 nucleotide open reading.
T m JOURNAL OF B I O ~ ~ ~ ;CHFMSTRY ICAL 0 1994 by The American Society for Biochemistry and Molecular Biolom, Inc.

VoI. 269,No.32,Issue of August 12,pp. 2025620262, 1994 Printed in U.S.A.

Cloning of Human Type VI1 Collagen COMPLETE PRIMARY SEQUENCE OF THE al(VI1) CHAIN AND IDENTIFICATION OF INTRAGENIC POLYMORPHISMS* (Received for publication, April 15, 1994, and in revised form, May 19, 1994)

Angela M. ChristianoSOn, Daniel S. Greenspangll, Seungbok Leell, and Jouni UittoS**$$ From the Departments of $Llermatology, and **Biochemistry and Molecular Biology, Jefferson Medical College, and Section of Molecular Dermatology, Jefferson Znstitute of Molecular Medicine, and Jefferson Cancer Institute, Thomas Jefferson University, Philadelphia, Pennsylvania 19107 and the IDepartment of Pathology and Laboratory Medicine, University of Wisconsin, Madison, Wisconsin 53706

VI1 collagen, which is expressed primarily in stratified squaType VI1 collagen is the major, if not the exclusive, component of the anchoring fibrils, attachment strucmous epithelia, such as the skin, themucous membranes, and turesatthedermal-epidermalbasementmembrane the cornea of the eye (3, 4). zone. In this study, we have isolated overlapping cDNA Immunofluorescence and electron microscopic studies with VI1 monoclonal antibodies have localized type VI1 collagen in the clones which correspond to full-length human type collagen &A of "9.2 kilobases. The full-length cDNA skin to the anchoring fibrils, ultrastructurally recognizable atsequencecontains an 8,833 nucleotideopenreading tachmentstructures below thedermal-epidermalbasement frame encoding 2,944 amino acids. The deduced amino membrane zone (4, 5). The anchoring fibrils extend from the acid sequence revealed the presence of a central collaglamina densa of the basement membrane to the papillary derenous domain flanked bya large %-terminal non-col- mis where they associate with basement membrane-like struclagenous (NC-1) domain which consists of submodules tures, known as anchoring plaques. However, some anchoring with homology to known adhesive proteins, including fibrils adopt a U-shaped structure, so that both ends are innine fibronectin type 111-like segments (FNIII), and a serted into the lamina densa. These arrangementsof the ansmaller COO€€-terminal non-collagenous (NC-2) domain. association of the basement In addition, we report six intragenic polymorphisms in choring fibrilsappear to secure the to the underlying dermis (4, 5). membrane the type VI1 collagen gene (COL7A1) which can be deThe importance of the anchoring fibrils in stabilizing the tected by restriction enzyme digestion of polymerase chain reaction-amplified segments. The complete cDNA cutaneous basement membrane zone is attested t o by the obsequence and polymorphismsin COL7A1 will facilitate servations that alterations in these fibrils result in heritable diseases, collectively known as thedystrophic forms of epidermutational analysis and prenatal diagnosis for patients molysis bullosa (EB)' (6-8). Specifically, the anchoring fibrils with the dystrophic forms of epidermolysis bullosa, in which mutations inCOL7A1 have been demonstrated. are abnormal in morphology, reduced in number, or entirely absent, and the patients with the dystrophic forms of EB have extremely fragile skin with easy blistering aasresult of minor trauma. Toward a better understanding of the underlying geCollagen is a family of closely related, yet genetically distinct, proteins, and currently, at least 19 different collagens netic defects in the dystrophic forms of EB, we recently initiated cloning of human type VI1 collagen (9). The first typeVI1 have been characterized to the extent that they have been assigned a Roman numeral (types I-XK) (1).A characteristic collagen cDNA clone was isolated by immunoscreening of a feature of these molecules is that they containa triple-helical human keratinocyte hgtll expression library with the serum collagenous domain consistingof a repeating Gly-X-Ysequence. IgG fraction from a patient with an acquired form of EB, a n In some collagens, however, the triple-helical domain is inter- autoimmune disease characterized by circulating antibodies rupted by imperfections in the Gly-X-Y sequence or by inser- recognizing type VI1 collagen epitopes (9). This screening retions of non-collagenous segments (1,2). Furthermore, thecol- sulted in the isolation of a 1.9-kb cDNA (K-1311, which was lagenousdomain is flanked by globular NH,- and COOH- unequivocally identified as a type VI1 collagen cDNA on the terminal non-collagenous sequences. While some of the basis of homology with a previously published 18-amino-acid peptide sequence in human placental typeVI1 collagen (9-11). collagens, such as type I collagen, have a ubiquitous tissue distribution, manyof them have relatively restricted, often tis- Subsequent cloning has resulted inelucidation of sequences in sue-specific topography (1).Among the latter collagens is type the large amino-terminal non-collagenous (NC-1) domain, as well as the carboxyl-terminal (NC-2) non-collagenous segment (11-13). In this study, we have completed the cloning of the * This work was supported in part by United States Public Health entire al(VI1)chain of human type VI1 collagen, including the Service, National Institutes of Health Grants R01-GM46846,PO15' end. The fullAR38923, and T32-ARO7561.The costs of publication of this article central collagenous segment and the ultimate part by the payment of page charges. This article must length cDNA encompasses a total of -9.2 kb of nucleotide sewere defrayed in therefore beherebymarked "advertisement"in accordance with 18 quences. We have also identified six intragenicpolymorphisms U.S.C.Section 1734 solely to indicate this fact. which may be useful in additional genetic linkage analyses and The nucleotide seguence(s) reportedin this paper has been submitted prenatal diagnosis in families with dystrophic forms of EB to the GenBankTMIEMBL Data Bank with accession number(s) L02870. (14-16). 8 The first two authors contributed equally to this work. 1 Recipient of The Society for Investigative Dermatology Research Career Development Award from the Dermatology Foundation. The abbreviations used are: EB, epidermolysis bullosa; kb, kilo$$ To whom correspondence should be addressed: Dept. of Dermatology, Jefferson Medical College, 233 S. 10th St.,Rm. 450, Philadelphia, base(s);PCR, polymerase chain reaction; RT, reversetranscriptase;bp, base pairb). PA 19107. Tel.: 215-955-5785; Fax: 215-955-5788.

20256

H u m a n o p eVII Collagen cDNA

20257

a EXPERIMENTAL PROCEDURES ABThe initially isolatedcDNA clone corresponding to human typeVI1 L collagen (K-131) consisted of -1.3 kb of amino-terminal non-collageD E nous sequences and-0.6 kb of carboxyl-terminal sequencesencoding a F collagenous segment characterized by Gly-X-Y repeats (9). This clone Gwas subsequently used to screen human fibroblast and human amniotic HI epithelial (WISH)cell cDNA libraries under high stringency,as previously described (11). The screening resulted in identification of two B L X C E X 0 X B GXBB X additional clones which extended the sequences -2 kb toward the 5’ 0 1 2 3 4 5 6 7 8 9 end of the mRNA (K7 and W21) (11)(Fig. l a ) . H= l k b The 5’ end was further extendedby PCR amplification of sequences utilizing degenerate oligomer primers synthesized on the basis of partial amino acid sequences identified within the NC-1 domain (11).In h Trlple-Helical NC-1 Domain NC this study, the ultimate 5’ end of the cDNA was cloned by RT-PCR amplification of cDNA sequences predicted from genomic clones (17). The primers used for amplification of the 5’ end were: left 5’-CTCGGACCTGCCAAGGCCAC-3’; right 5“GGAGCCATCCAGTAAGAACACMP FNlll 111 YWFCIP KM A 3’. The left hand primer was madefrom sequences in exon 1, and the right hand primer was synthesized on the basisof sequences in exon 2. FIG. 1.Cloning of type VI1 collagen cDNAs, and domain orgaThe primers generateda product of 212 bp afterRT-PCR amplification nization of the human cyl(VI1) collagen polypeptide deduced using human oral epidermoid carcinoma (KB) cell cDNA as template. from the nucleotide sequences.a,overlapping cDNAs (-) correThe cycling conditions were: 94 “C for 7 min, followed by 40 cycles of sponding to typeVI1 collagen mRNA sequences were isolatedfrom dif95 “C, 45 s; 60 “C, 45 s; 72 “C, 45 s; and a 7-min extension a t 72 “C. The ferent cDNA libraries, as detailed under “Experimental Procedures.” PCR product wassubcloned into aPCR-compatible vector (TA, Invitro- The 5’ end of the cDNA and a central segment(A, B, and G)were cloned gen) and sequenced according to standard techniques (18). The 5’ end by RT-PCR (a).The cloned cDNAs correspond tothe full-length typeVI1 sequences demonstrated the presence of a methionine residue,followed collagen mRNA, consisting of 9,174 bp. The isolatedcDNAs were as folthe 5’ end; B , LR-200 RT-PCR of a signal peptide (see “Results”). To lows:A, RT-PCR product corresponding to by a sequence with characteristics product; C , W21; D , K7; E, K131; F , K4; G , RT-PCR product; H, K10; I , verify that the methionine codon ATG in position +1 represents the CW2-1. A partial restriction map is shown on the top of the scale: B , translation initiation codon, primer extension studies were performed 5‘ end of the cDNA(17). Nucleotide sequences -113 to -1 BamHI; L, BclI;X,XhoI, G, BglII. b, deduced amino acid sequence of the to identify the represent part of the B’-untranslated region, and the ATGcodon in open reading frame from overlapping cDNAs predicts a polypeptide of position +1,which is followed by the open reading frame, represents the 2,944 amino acids. Thededuced al(VI1)collagen polypeptide consistsof a central collagenous region which containsa non-collagenous 39-aminotranslation initiation site. acid interruption ([email protected] triple-helicalcollagenous domain is flanked by To extend sequences in the 3’ direction, a BamHIIEcoRI restriction an amino-terminal non-collagenous NC-1 domain, and by a carboxylfragment correspondingto the 3’ end of the cDNAK-131(9) was used to terminal non-collagenous NC-2 domain. Notethe submodular structure screen a hgtll keratinocyte cDNA library (Clontech). This screening of the NC-1 segment with homology to known adhesive proteins.CMP resulted in isolationof two additional clones (K4 and K10) which dem- cartilage matrix protein; FNIIZ 1-9, nine consecutive fibronectin typeI11 onstrated overlapping sequences with K-131, as well as with each other domains; VWF-A,theA domain of von Willebrand factor;C I P , a cysteine (Fig. la). Clones were ligated into theEcoRI site of pBluescript SUI’ and proline richregion. The NC-2 domain has a module withhomology (Stratagene) and characterized by restriction enzyme mapping and to the Kunitz protease inhibitor(KM). dideoxynucleotide sequencing (18).The clone K4 was found to containa deletion of 162 bp upon comparison with corresponding genomic clones and a smaller carboxyl-terminal (NC-2) non-collagenous do(17), probably the result of a cloning artifact. Therefore,this region was RT-PCR amplified from KE3 cell RNA, followed by cloning into thePCR mains (9, ll, 13)(Fig. lb). Following the methionine at the site compatible vector (TA Invitrogen), and sequenced toconfirm the nucle- of initiation of translation, there is a 16-amino-acid putative otide segments identified in exons of the genomic clone. The composite signal peptide, as predicted by the computer analysis (19). The of these two clones extended the sequences within thecollagenous do- cleavage site for signal peptidase is predicted t o be at the Ala,,main -2.8 kb downstream from the 3’ end of K-131 (Fig. la). Glu,, bond (Fig. 2). At the same time, a human type VI1 collagen cDNA was indepenSequence analysis of the NC-1 domain revealedthe presence dently isolated by screening a human placenta cDNA library with a of multiple submodules with homology to adhesive proteins. hamster type VI1 collagen cDNA clone (CHL87-3) (13). This hamster These may play a role in the attachmentof the NC-1 domains clone was isolated incidentally during attempts to screen for type V collagen cDNAs. Screening of a human placenta library resulted in thet o the basementmembrane zone structures in the lamina densa isolation of one cDNA clone (CW2-1) which had a 5’ collagenous se- and in the anchoring plaques, or provide adhesive properties quence that overlapped with the previously isolated type VI1 collagen during supramolecular assembly of the protein into anchoring cDNA(K10) by -130 bp (Fig. l a ) . The human typeVI1 collagen cDNA fibrils (Fig. l b ) . The maturepolypeptide contains a 1,253-amino CW2-1 also contained a 3’ segment encoding non-collagenous domain (NC-2) as well as a putative polyadenylation signal (ATTAAA), followed acid NC-1 domain which is characterized by the presence of a by a poly(A) tail (A), (13). Thus, we have cloned overlapping COL7A1 segment with -40% homology with cartilage matrixprotein at cDNAs with a contiguous nucleotide sequenceof 9,174 bp, which con- the amino acid level. Next, nine consecutive domains with hotained a n open reading frameencoding 2,944 amino acids (Fig.2). mology to the type I11 domains of fibronectin were identified All clones were sequenced by the dideoxy nucleotide chain termina- (Fig. lb). Eachof these domains has been shown to be encoded tion method in both directions according to standard techniques (18), by two separate exons with characteristic repeatsequences (11). and thesequences have been compared with the corresponding genomic sequences (17). The translation of nucleotide sequences into aminoac- At the carboxyl-terminal sideof the fibronectin repeat domains, ids, alignment withgenomic clones, and restriction enzyme sitepredic- there is a segment with homology to theA domain of von Wiltions were performed using the pcGene program. Homology searches lebrand factor. This segment also contains an RGD sequence with GenBank sequences were performed using the Fasta program (19). (20) at amino acid position 1170-1172 (Fig. 2). This domain is Y

RESULTS

The Primary Structure of the alWII) Collagen PolypeptideThe cloned cDNAs were analyzed by deducing the primary structure of the al(VI1) polypeptide. As suggested by our previous studies, the type VI1 collagen prooc-chain consists of a central collagenous segmentcharacterized byGly-X-Y sequence, which is flanked by a larger amino-terminal (NC-1)

followed onthe carboxyl-terminal sideby a n amino acid segment rich in cysteine and proline (Fig. 2). The calculated sizeof NC-1 is 133,802 Da, which is slightly smaller than thesize of -145 kDa predicted from electrophoretic analyses (4). This size difference may be attributable to post-translational modifications, including putative N-glycosylation sites at amino acid sequences 337-339,786-788, and 1,109-1,111 (Fig. 21, which contain theconsensus sequence NXS/T, for N-glycosylation (21).

20258

Human Type VII Collagen cDNA 2340 780 2385 795 2430 810 2475 825

FN-7

2520 840 2565 855 2610 870 2655 885 2700 900 2745 915

FN- 8

2790 930 2835 945 2880 960 2925 975 2970 990 3015 1035 3060 1020

FN- 9

3105 1035 3150 1050 3195 1065 3240 1080 3285 1095 3330 1110

VWF-A

3375 1125 3420 1140 3465 1155 7q1n

3555 1185 3600 1200

CYSf PRO

3645 1215 3690 1230 3735 1245 3780 1260 GAG E

ATG GGC CTG AGA GGA CAA GTT M G L R G Q V

CTC CCG GGC AGG ACC L P G R T AGT S

CCTGGC GAC CCT P G D P

CCT GGC CTA P G L

3 CCGP GGAG

CCTGGC CGC GCC GGG AAT P G R A G N

GGC TCT CCA GGG TTG CCT GGC CCT K G S P G L P G

GAG CGA GGA CCT CGA GGC CCA AAG Guj GAG E R G P R G P K G E

CCT GGA CCA GGT P G P G

GAA

E

3960 1320

W

4005 1335 4050 1350

CCTGGG CTI CCTGGG P G L P G

4095 1365 4140 1380

CGT GGC CCC CCA GGG CTT CCTGGA R G P P G L P G

4185 1395

GAA

E

CGG AAA GGG GAC CCT GGA CCA TCG GGC R K G D P G P S G CCA CTG GGG GAC CCA GGA CCC P L G D P G P

3915 1305

CCT GGA P G P

GGA G

GACAAA GGC GAT CGTGGG GAG CGG GGT CCC 4230 1410 K G D R G E R G P GGT GGC ATT G G I

CTI CCC GGA AGC L P G S

GCT CCT GGG A P G E

GAG P

CCT GGG 4275 CTG 1425 G L

CCTGGA CCC CAA GGC CCC GTI GGC CCC P G P Q G P V G P

4320 1440

GGC C W CCA GGA CAA CCT GGG TCT CCG GGT

GAG CAGGGC CCA CGG E Q G P R

4365 1455 4410 1410

GGA CCT G P

GAC CGGGGC TTI CCA D R G F P

4455 1485

CCT GGA P G G

L

AAG AAA

K P

K G

GGA FAA AAA GGT GAC TCT GAG GAT GGA GCT G E K G D S G A P Q

P

G

S

P

G

CCT GGA GCT ATT GGC CCC AAA GGT P G I I G P K G

G

P

L

G

E

GGA GCC AAG GGT G A K G

A

G

E

K

G

CGG GGG CTG CCA GGG R G L P G

CCA

CCA P

4500 1500

GTT GCTGGA CGT CCT V A G R P

4545 1515 4590 1530

GGG CCC CTG GGT GAG GCT GGA GAG RAG GGC GAA CGT GGA CCC

GGC CCA GCG GGA TCC G P A G S

E

R

G

P

CCTGAA GGG CCA CCA GGA CCC ACT GGC CGC CAA P E G P P G P T G R Q

GGA GAG AAG GGG GAG G E K G E

Gly-X-

3870 1290

GAT GGG D G

CCG GGG P G

f

3825 1275

CCC CCT GGA CCT CGTGGA P P G P R G

GCT CCC GGA CAA GTC ATC GGA GGT A P G Q V I G G

CCG GGT P G

CCT GGG ACC P G T

AAG

ACA GCC ATG RAG GGT T A U G D

GGC G

GGT GCT CCC GGC CCC CAG GGG CCC CCT GGA G A P G P Q G P P G

GCC ACTGCC AAG GGC GAG AGG GGC TTC CCT GGA GCA A T A K G E R G F P G A

CGT CCA GGC AGC R P G S GCC A

GGGCCT G P

NC-1-

CCT GGT CGC P G R P

CCT G

GGG GAC CCT GTG GCA GTG D P A V V

4635 1545

FIG.2. Nucleotide sequences of the full-length cDNA derived from clones, as shown in Fig. la, and the deduced amino acid sequence of the (rl(VI1) polypeptide. The overlapping cDNAs encode a 9174-bp mRNA, which contains 113-bp 5'- and 342-bp 3'-untranslated regions. The shaded areas correspond to the amino-terminal non-collagenous NC-1 and the carboxyl-terminal non-collagenous NC-2 domains. Within the NC-1 domain,the borders between the peptide segments with homology with cartilage matrix protein(CMP),fibronectin type 111-like domains (FN1-91, and von Willebrand factor A-domain (VWF-A),as well as the region rich in cysteine and proline(CYSIPRO)are indicatedby brackets above the nucleotide sequences. The putativeN-glycosylation consensus sequences,NXT/S, are double underlined. Cysteine residues are circled, and RGD tripeptide sequences are outlinedby ouals. Within the centralcollagenous domain, consistingof a characteristic Gly-X-Yrepeat sequence, the non-collagenous interruptions and imperfections are underlined. Note the presence of a 39-amino-acid non-collagenous segment in amino acid positions 1940-1978.

Human Type VII Collagen cDNA GGA CCT GCT G T T GCT GGA CCC AAA GGA GAA AAG G

P

A

U

G

P

K

G

E

K

G

D

GGA GAT GTG GGG

V

G

1

4680 6 0

5

CCC GCT GGG CCC AGA GGA GCT ACC GGA GTC CAA GGG GAA CGG GGC P A G P R G A T G V Q G E R G 1 5

4125 7 5

CCA CCC GGC TX GTT C l T CCT GGA GAC P P G L V L P G D P

4170 9 0

CCT GGA GAC CGG GGT P G D R

CCT GGC CCC AAG GGA GAC G P K G D 1

CCC ATT GGC CTT ACT GGC AGA GCA GGA P I G L T G R A G P

5 CCC 6

4815 0 5

P

CCT GGG CGG G R l 6

0860 2 0

CCT GGC CCC CCA GGA CCT G l T GGC CCC CGA GGA CGA GAT P G P P G P V G P R G R D

GGT GAA

4905 3 5

G

CCA GGT GAC TCA GGG CCTCCT GGA GAG AAG GGA GAC P G D S G P P G E K G D

G

1

E

1

6

TIG

4950 5 0

CCT GGA AA4 GCA GGC GAG CGT GGC CTT CGG GGG GCA CCT GGA G T I P G K A G E R G L R G A P G V l 6

4995 6 5

CGG GGG CCT GTG GGT GAA AAG GGA GAC CAG T G P V G E K G D Q

5040 8 0

G l T GGA GAG AA4 GGT GAC GAG GGT

V

G

E

K

G

D

E

CCT CCG GGT GAC CCG GGT P P G D P G

G

G

L

6

GGA GAT CCT GGA GAG D P G E 1 6

GGC AGC CCT GGA TCA TCT GGA CCC AAG GGT GAC

AAT GAT GGA CGA D G R N

G

CGT GGG GAG CCG R G E P

GGT CCC CCA GGA CCC CCG GGA CGG CTG GTA GAC G P P G P P G R L U

S

P

G

S

S

G

ACA GGA CCT GGA GCC AGA GAG AAG GGA GAG T G P G A R E K G E P CAA GAG GGT O E G

l

CCT CGA GGG CCC AAG GGT GAT P R G P K G D

GCC CCT GGG GAA AGG A P G E R

P

P

K

I

E

G

F

R

D

1

l

5085 9 5

6

5130 1 0

7

CCT GGG GAC CGC GGA G D R G l

7

5175 2 5

CCT GGC CTC CCT GGA G L P G 1 1

5220 4 0

GGC ATT GAA GGG TlT CGG GGA

G

G

G

P

20259

CCC CCA GGC P G l 7

5265 5 5

CCA CAG GGG GAC CCA GGT GTC CGA GGC CCA GCA GGA GAA F G GGT 5 3 1 0 P Q G D P G V R G P A G E G I 7 1 0

AGC GGA CTG GAT GGT CCC CCT GGG CTG GAT GGC CGG G P P G L D G R S G L D

GGG 7

5355 8 5

AAA CCA GGA GCC GCT GGG CCC TCT GGG CCG AAT GGT GCT GCA GGC K P G A A G P S G P N G A A G 1 8

5400 0 0

AA4 GCT GGG GAC CCA GGG AGA GAC GGG CTT CCA GGC CTC CGT GGA K A G D P G R D G L P G L R G l 8

5445 1 5

GAC CGG D R

GAA CAA E Q

GGC C l T CCT GGC CCC TCT GGT C;C G L P G P S G

G

l

CCT GGA TTA CEG GGA 5 4 9 0 P G L G I 8 3 0

AAG CCA MjC GAG GAT GGG AA4 CCT GGC CTG AAT GGA AAA %C

K

P

G

E

D

G

K

P

GAA CCT GGG GAC CCT GGA GAA GAC E P G D P G E D

G

G

L

N

G

GGA

5535 G I 8 4 5

K

GGG AGG AAG GGA GAG AA4 GGA 8

5580 6 0

CGT GAT GGC CCC AAG GGT D G P K G l 8

5625 1 5

R

GAT TCA GGC GCC TCT GGG AGA GAA GGT D S G A S G R E G

R

R

G

E

K

G

l

GGC GAG CGT GGA GCT CCT GGT A X C l T GGA CCC CAG GGG CCT CCA E R G A P G I L G P Q G P P G l 8

5670 9 0

CTC CCA GGG CCA GTG GGC CCTCCT GGC CAG GGT TlT CCT GGT GTC L P G P V G P P G Q G u G V 1 9

5715 0 5

GGC ACG GGC CCC AAG GGT GAC CGT GGG GAG ACT GGA K C

5760 2 0

COT GGC CTG CGA GGA GAG G L R G E 1 9

5805 3 5

CCT GGA AGT GTG CCG AAT GTG GAT CGG TIG CTG GA4 ACT GCT GGC P G S V P N V D R L L E T A G l 9

5850 5 0

ATC AAG GCA I K A

5895 6 5

CCA GGA P G

G

T

A L A GGG GAG CAG

K

G

E

Q

G

P

K

G

D

R

GCC CTC CCT GGA GAG

G

L

P

G

E

TCT GCC CTG CGG GAG S A L R E I

R

G

E

AAG GGG GAC TCA GGC GAA CAG GGC K G D S G E Q G

G

S

l

A l C GTG GAG ACC TGG GAT GAG V E T W D E

AGC TCT GGT AGC "IC CTG CCT GTG CCC GAA CGG S S G S F L P V P E R R

ATC GGC TIT CCT GGA GAA I GP F G E

T

P

I

CCC CCA GGC AAG GAG P G K E G

GGC CCC

P

CGC GGG CTG AAG GGC GAC C R GK L G D H

1

$

CCC CCC GGG CCTTCC GGC C W GCC GGG GAG CCT GGA AAG P P G P S G L A G E P G K P

GAA CAG E Q

G

GGC AGA GAT G

R

E

R

G

GGC CCTCCT

D

G

P

P

G

E

K

8

GGA CTC TCT GGA GAA CAG G L S G E Q

AAG GGG GAG CCG GGC AGC AAT

K

G

E

P

G

S

G

CCT GGT 2 0

GGC AAC CCG

G

N

P

6255 0 5

G

G

L

V

G

P

Q

G

C p " GGT 6 3 0 0 G 2 1 0 0 1

6345 1 5 6390 3 0

P

P

G

L

K

P

GGT CTA CCA GGA GAG G L P G E R CTG CAG GGT CCA AGA L Q G P R

G

A

2

CCT GGA CCG G P 2 1

6435 4 5

CGT GGT ATG G M 2 1

6480 6 0

GGC CCC

6525 7 5

G

P

2

1

CCT GGA CCA CCT GGT GCC G P P G A 2 1

6570 9 0 6615 0 5

GGA CCT CCA GGA CGG GGC C l C ACT

6660

P

P

G

P

G

R

G

L

T

2

2

GGA CCC CCC GGC CCT X A P

GGG TCT CCA GGT

S

G

CCTTCT GGC CTG S G L 2 2

GGA CCT ACT GGA GCT GTG GGA CTTCCT G P T G A V G L P G

GGG GAG ACA

5

CCT GGG 2 0

CCG GGT CTT GCT GGC CCT GCA GGA CCC CAA GGA P G L A G P A G P Q G P

GGC C l T GTG GGT CCA CAG

5

6210 7 0

GGA CTC CCT GGA ACC L P G T P

GTG GGT GGT CAT GGA GAC V G G H G D P

AAG GGG GAG CCT GGA GAG ACA K G E P G E T

6165

0

GGA CCC CCT GGA CTC AAG GGT GCT

GCT GGG CCT GAA GGG AAG CCG GGT A G P E G K P G CCT GGC CCA P G P

6120 4 0

AAA GGA GAA CGT GGA G E R G 2 0

AGG GGT GTG CCA GGC A X AAA GGA GAC CGG GGA GAG R G V P G I K G D R G E AGG GGT CAG GAC R G Q D

8

GGT GAC C M GGT CCC AAA GGA GAC G D Q G P K G D 2 1

G

N

?

6015 G 2 0 2 5

CCC CCC GGA CCC CCT GGC CCC AAG GTG TCT GTG GAT GAG P P G P P G P K V S V D E CCT P

5985 9 5

9

GGC

AT7 CCC GGG CTC CCA GGC AGG GCT GGG GGT GTG GGA GAG GCA GGA I P G L P G R A G G V G E A G 2

GGA GAA CGG GGA GAG

9

CGT CGA GGC CCC 5 9 4 0 R G P l 9 8 0

CCT GGC CCT CAG GGG CCA CCT GGT CTG GCC CTT GGG GAG A? P G P Q G P P G L A L G E

AGG CCA GGA GAG AGG R P G E R

9

L

P

G

P

S

2

TTG CCT GGA CAA P G Q V

GGG AAG CCG GGA GCC CCA GGT CGA GAT GGT

2

2

2

0

6705 3 5 6150 5 0

GTG 2

GCC AGT S 2 2

6795 6 5

GGG AGC CCT GGT GTG CCA GGG TCA CCA 2

6840 8 0

GGT CTG CCT GGC CCT GTC GGA CCT AAA GGA GAA CCT GGC CC? ACG G L P G P V G P K G E P G F T 2 2

6885 9 5

G

GGA G

E

T

AA4 GAT

K

D

G

K

G

D

GGG GCC CCT GGA CAG G A P G Q A GAG AAG GGA E K

G

P

G

GGA GAC AGA R

R

G

S

P

P

G

G

R

V

D

P

GCT GTG G W GGG CTCCCT ? _ L : G L P G

GAC GCC CCT GGA GGC CTT GCT GGA A P G G L A G D L

G G

A S

GGA GCA A K

P

2

AAG GGA

G

2

3

CTG GTG GGT GAG V G E 2 3

1

6930 0 6915 2 5

FIG.2.-continued

Examination of the amino acid sequences within the collagenous segment indicated that it consisted of 1,530 amino acids of characteristic Gly-X-Y repeat sequences (Fig. 2). We were able toidentify deduced amino acid segments which correspond to four cyanogen bromide peptide sequences derived from the triple-helical domain of type VI1 collagen, thus confirming the

identity of the cDNAs as representative of type VI1 collagen (Table I). The Gly-X-Y repeat sequence was interrupted by small, 1or 2 amino acid imperfections on 11 occasions (Fig. 2). In addition, there were insertions consisting of3-10 amino acids in the primary structure on seven occasions. Finally, a large, 39-amino-acid interruption consisting of a non-collage-

20260

Human Type VI1 TABLE I

Comparison of four segments of amino acid sequences foundin cyanogen bromide peptides of human type VIZ collagen with sequences deduced from cDNAs locatedin the collagenous regionof COL7A1 Comparison of four sets of sequences corresponding to amino acid positions, as indicated in Fig. 2, is shown. The upper line in each case represents the amino-terminal sequenceof a cyanogenbromide peptide derived from type VI1 collagen isolated from human amniotic membranes: thus, each sequenceis presumed to start with a methionine (M). The lowerline represents sequences deduced from the cDNA. Thequestion marksrefer to amino acids whichcouldnotbedetermined. 2 indicates hydroxyproline which is post-translationally synthesized by the second positionof hydroxylationof prolyl residues. The difference in the 4th peptide pair (Quersus E) may reflect amino acid sequencing difficulty.

Amino acids 2161-2189 (M) AGPEGKZGL~PRGPZGPVGGHGDZGPZ M AGPEGKPGL~PRGPPGPVGGHGDPGPP Amino acids 2540-2558 ( M ) GERGPRGLDGDKGP?GD?G M GERGPRGLDGDKGPRGDNG Amino acids 2621-2640 ( M ) GPRGL?GERGV?GA?GLDGE M GPRGLKGERGVKGACGLDGE Amino acids 2663-2682 (M) GQZGVZGQSGAZGKEGLIGP M GEPGVPGQSGAPGKEGLIGP

nous sequence was identified in the central portion of the collagenous domain (amino acid positions 1, 9404,978; Fig. 2). The presence of the latter interruption has been previously predicted at the protein level by the susceptibility of the collagenous domain of type VI1 collagen to pepsin proteolysis under non-denaturing conditions (10). Thus, the collagenous domain contains a total of 19 imperfections or interruptions within the Gly-X-Ysequence. Onecysteine residue was identified within the triple-helical domain (amino acid 2634 in Fig. 2). This cysteine has been proposed to form intermolecular disulfide bonds by pairing with the first or second cysteine residue (amino acids 2802 and 2804 in Fig. 2)within the NC-2 domain in another type VI1 collagenmolecule (4, 10). This association was predicted by the length of the carboxyl-terminal overlap region, -60 nm, to correspond to -200 amino acids, as determined by rotary shadowing electron microscopy of antiparallel type VI1 collagen dimers (22). The actual distance between these cysteines within the primary structure is 168 and 170 residues, respectively (Fig. 2). The estimated size of the Gly-X-Y region is 143,697 Da, which is in good agreement with the estimated size of the triple-helical region, -145 m a , comprised of two pepsin-resistant fragments (10). Someof the difference could beexplained by post-translational hydroxylation of proline and lysine residues and subsequent O-glycosylation of hydroxylysine residues (1).The collagenous domain contains no sites for N-glycosylation,but threeRGD tripeptide sequences could beidentified (positions 1334-1336,2008-2010, and 2553-2555) (Fig. 2). The carboxyl-terminal NC-2 domain was shown to consist of 161amino acids. As reported previously (13),this domain contains a segment with homology to the Kunitz type protease inhibitor. It is unclear, however, whether this segment is a functionally active inhibitor of proteases. The NC-2 domain also contains 8 cysteine residues which are conserved between the hamster and human sequences (Fig. 2, and Ref. 13). As discussed above,these cysteines have been suggested to participate in the formation of disulfide bonds whichstabilize the anti-parallel association of two type VI1 collagen moleculesduring the extracellular assembly of anchoring fibrils (4). The estimated molecular size of the NC-2 domain, without post-translational

modifications, is 17,755 Da. This size is considerably smaller than the previous estimates of -30 kDa from SDS-polyacrylamide gel electrophoretic analysis of the purified NC-2 domain (4). The reasons for this size discrepancy are currently not understood, but it should be noted that theNC-2 domain contains potential sites for phosphorylation by casein kinase I and 11. There are no potential sites for N-glycosylation (Fig.2). Identification of Intragenic Polymorphisms in COL7Al-As indicated above, we have initiated the search for mutations in dystrophic forms of EB (5). The strategy consists of PCR amplification of mRNA and genomic DNA sequences of type VI1 collagen, with subsequent scanning of the amplimers by a variety of electrophoretictechniques, including single strand conformation p o l ~ o ~ h i sand m heteroduplex analysis. In addition to mutationsimplicated in thepathogenesis of dystrophic forms of EB (8,23-261, these techniques have identified several polymorphisms within the coding regionof COL7A1.Specifically,six single-base polymorphisms, all recognizable by restriction enzymes, have been delineated (Table 11). One of these polymorphisms, detectable by PvuII restriction enzyme digestion, was first identified by Southern analysis ofgenomic DNA(14). It was subsequently shown (27)that thispolymorphism results from a single base change in the third position of a proline codon (CCC-CCT) (nucleotide 2820 in Fig. 2). Subsequently,comparison between cDNA and genomic DNA sequences revealed the presence of an AEuI polymorphism. Four additional polymorphisms recognizable with the enzymes Styf, MspI, MnlI, or Ec001091, respectively,have been identified (Table 11). We have developed rapid PCR-based detection methods for all of these six polymorphisms to aid in genetic linkage studies between the type VI1 collagen locus at 3p21 (9, 14) and the dystrophic forms of EB, as well as in prenatal diagnosis of families at risk for recurrence of the dystrophic forms of EB (28).The optimized primers, expected sizes of the restriction fragments, allelic frequencies, and the polymorphism information content of these polymorphisms are indicated in Table 11. DISCUSSION

In general, the collagens have been dividedinto fibrillar and non-~brillarpolypeptidess (1,2). The fibrillar collagens (types 1-111, V, and XI) contain a triple-he~calcollagenous domain which consists of uninterrupted Gly-X-Y repeat sequence. Within the non-fibrillar collagens, several subclasses have been identified, including the FACIT collagens (fibril-associatedcollagens with interrupted triple-helices), consisting of types IX, X I , and XIV (2). In addition to FACITS, the other non-fibrillar collagens, such as type IV collagen, contain interruptions or irnperfections in thetriple-helical collagenous domainof the molecule. The precise classificationof type VI1 collagen has not been determined, but it is clear that itbears resemblance to the nonfibrillar collagens with extended non-collagenous regions (29). Type VI1 collagen is the major, if not the exclusive, component of anchoring fibrils, structures which demonstrate apparent ~exibility in theirshape, as determined by transmission electron microscopy of the d e ~ ~ ~ p i d e r junction mal of the skin, or by rotary shadowing electron microscopy of isolated type VI1 collagen molecules (4). It should be noted that comparison of partial sequences within the collagenous domain between mouse, hamster, and human type VI1 collagens demonstrates a relatively low degree of overall homology, as compared with other collagen genes (13,30, 31). Furthermore, the Unit of Evolutionary Period of partial type VI1 collagen sequences between the mouse and human has been estimated to be 2.9 million years in the NC-1 domains and 5.2 MY in the triple-helical region (30), reflecting a relatively rapid rate of evolution within the amino acid sequences (31).Interestingly, however, the locations, but notthe content, of the imperfections

20261

Human l)pe VII Collagen cDNA TABLEI1 Single base polymorphisms in exons of COL7Al Position

Enzyme site"

Primersb

Tm

5'-GGCCAGAAGAGATCCTGAGT-3' S'-CTGACCTGTCACTCCTGCTC-3'

55"

723 (421/302)

0.23/0.77

0.2915

55"

697 (592/105)

0.40/0.60 (>lo0 alleles)

0.3648

5'-GGGACTGGGTGGTAGAATAT-3' 5'-GAGACAGCTTTGAGGAGTGC-3'

55"

550 (363/187)

0.18/0.82 (40 alleles)

0.2471

5'-TAGTGTGCGCCAACCTCCTG-3' 5'-CTGCCTGTCGACCCTTGACC-3'

55"

485d (203/111 and 92)

0.90/0.10 (70 alleles)

0.1638

5'-TTGTCTATGGTGGCTGTGGA-3' S'-CAGCACCATGTCATCACAGG-3'

55"

770 (634/136)

0.06/0.94 (36 alleles)

0.1064

Exon 14 Nucleotide 1784

AluI Major AGCCPro Minor AGCT Leu

5'-ATACTGCCATGTCCCACATC-3' 5'-GGTCTGAAAGAGCAATGGAG-3'

60"

Exon 21 Nucleotide 2817

PUUII

5'-CCTCCCTGATTCCTGAGCTT-3' 5'-GGAGGAGTCACTCAGAGTCG-3'

Exon 30 Nucleotide 3830

MspI

Exon 84 Nucleotide 6653

Eco0109I

Exon 118 Nucleotide 8997

MnZl

a

Major CCGG Pro Minor CTGG Leu Major AGGGGCT Gly Minor AGGGCCT Gly Major CCCC Minor CCTC

PIC'

0.2408

sty1

Major C S C T G Pro Minor CAGCTG Pro

Allelic frequency

0.140.86 (36 alleles)

Exon 3 Nucleotide 425

Major CCMGG Lys Minor CCAGGG Arg

Size

(>lo0alleles)

The sequences shown in Fig. 2 are representative of the major alleles of each of these polymorphisms.

* The PCR amplification conditions using these primer pairs were as follows: 95" for 7 min followed by 40 cycles of 95", 45 s; T,,,, 45 s; 72", 95

s in

an OmniGene Thermal Cycler (Marsh Scientific, Rochester, N Y ) . PIC, polymorphism informationcontent, based on the number of alleles tested (under allelic frequency). There are constant bands of 160 and 122 bp, since there are two additional Eco0109I sites in thisfragment. The 203-bp bandresults from the loss of the polymorphic Eco0109I site which creates the 111- and 92-bp fragments. e

and interruptions within the collagenous domain are largely conserved, suggesting that they play an important role in providing flexibility to the molecule (30). This interpretation is consistent with the known morphologicvariation in the anchoring fibrils at the dermal-epidermal junction (32). Cloning of the full-length type VI1 collagen cDNA has allowed determination of the primary structure of the al(VI1) polypeptide. These data confirm previous suggestions that the large non-collagenous NC-1 domain is amino-terminal, while the smaller NC-2 is at the carboxyl-terminal end of the molecule (11-13). Further analysis of the NC-1 domain sequences revealed multiple submodules with homology to known adhesive proteins. These included a segment with homology t o cartilage matrix protein, nine consecutive fibronectin type 111-like domains, and a segment with homology t o the Adomain of von Willebrand factor. Since these segments reside at the ends of the anchoring fibrils and are thought to interact with basement membrane structures either within the lamina densa of the dermal-epidermal basement membrane or within the anchoring plaques, it is conceivable that the adhesive properties of these domains play a critical role in providing integrity to the lamina densdpapillary dermis interaction. In thiscontext, it is of interest to note that theNC-1 domainhas been suggested to interact with type Iv collagen (4). It should also be noted that the deduced primary structure of the human al(VI1) chain contains four RGD sequences, which have been shown to serve as the integrin-mediated attachment sites of cells to extracellular matrixcomponents, such as fibronectin (20). In the case of type VI1 collagen, the functionality of these RGD sequences is currently unknown, and some of these sequences are not conserved between the human and mouse (30). Recent studies on the recessive dystrophic forms of EB have disclosed several mutations in the type VI1 collagen gene which underlie the fragility of the skin and mucous membranes in this group of skin diseases (23-26): Several of these mutations have been highly instructive toward understanding the struc-

ture-function relationships of various type VI1 collagendomains. First, most of the mutations identified thus far consist of premature termination codons whichpredict the synthesis of truncated al(VI1)polypeptides (26). In thehomozygous or compound heterozygous state, these mutations result in the absence of anchoring fibrils, thus explaining the fragility of the skin. However, the heterozygous carriers, which have been shown in some cases to contain only about half the normal amount of anchoring fibrils in their skin (261, are clinically normal. The latter observation has implications for future prospects of gene therapy, suggesting that replacement of 50%, or perhaps even less, of anchoring fibrils in the skin ofRDEB patients would be sufficient to restore the functional integrity of the dermal-epidermal junction (5). One particularly instructive patient is a compound heterozygote for two premature termination codons,one predicting atruncation within fibronectin type I11 repeat domain 8A encoded by exon 20, while the second mutation in the other allele predicts premature termination of translation within the collagenous domain, encodedbyexon32 (26). The heterozygous carrier for the upstream mutationdemonstrated the presence of morphologically normal anchoring fibrils, yet their density was reduced t o about half of that detected on the site-matched area of the skin of a clinically and genetically unaffected older sister. In contrast, a younger sibling who was heterozygous forthe downstream mutation also demonstrated reduction in theanchoring fibril density along the cutaneous basement membrane zone, but the morphology of the anchoring fibrils was perturbed. This observation suggests that thepolypeptide segment between the upstream and downstream termination codons, consisting of the A domain of von Willebrand factor and the cysteine-proline-rich region interact with other al(VI1)collagen chains, perturbing the normal aggregation of the anchoring fibrils. Thus, this region definedby the COOH-terminal ends of the two truncated polypeptides in this family (amino acids 866-1323) appears to contain a critical region for fiber assembly.

20262

Human Type VII Collagen cDNA

The deduced amino’acid sequence of the collagenous domain insertions in theGly-X-Yrepeat sequence. The molecular mass demonstrated the characteristic Gly-X-Y repeat structure in- of the al(VI1)polypeptide, without post-translational modifiterrupted by imperfections or insertions on 19 separate occa- cations, is predicted t o be 295,217 Da. The information on the sions. Nevertheless, there were several segments, up t o 235 primary structure anddomain organization of human typeVI1 amino acids in size, which demonstrated a perfect Gly-X-Yre- collagen will facilitate the continued identificationof mutations peat structure. The importance of the intact triple-helical con- in the corresponding gene, COL7A1, the candidate gene for formation in some of these segments is emphasized by recent both the autosomal dominant and autosomal recessive forms of demonstration of a glycine-to-serine substitution in the triple- dystrophic epidermolysis bullosa. It will also provide the basis helical domain resulting in dominantdystrophic EB (33). Pre- for computational modeling of peptides which might be useful viously, over 100 missense mutations which result in the sub- in blocking dominanthegative interference of mutated chains stitution of a glycine residue by another amino acid have been with wild-type polypeptides in dominantly inherited forms of described in thegenes encoding collagens type I, 11,111,and IV, dystrophic EB, and in gene replacement therapy in the recesmanifesting clinically as osteogenesis imperfecta, chondrodys- sive forms of DEB in the future. trophies,Ehlers-Danlossyndrometype IV, and Alport synAcknowledgments-We are grateful to Drs. Robert E. Burgeson and drome, respectively (1).The mutationcaused by the glycine-toLouise M. Rosenbaum for providing the type VI1 collagen peptide serine substitution resided within a collagenous region sequences. We appreciate theexpert technical assistance of Xin consisting of 24 contiguous Gly-X-Yrepeats beginning 62 amino Zhang, Yoshiko Tamai, Guy Hoffman, and Wen Cheng. We thank Debra acids downstream from the major 39-amino-acid non-collage- Pawlicki, Eileen O’Shaughnessy, and TamaraAlexander for preparation nous interruption within the collagenous domain. Thus, the sta- of the manuscript. bility of this triple-helical segment, although near the non-helREFERENCES ical interruption, appears to be critical for the function of type 1. Kivirikko, K. (1993)Ann. Med. 26, 113-116 VI1 collagen. As mentioned above, a specific domain organiza2. van der Rest, M., and Gamone, R. (1991)FASEB J. 6, 2814-2823 tion, dictated by the positions of the imperfections and not nec3. Uitto, J., Chung-Honet, L. C., and Christiano,A. M. (1992)Exp. Dermatol. 1, 2-11 essarily by their amino acid content, appearsto be necessary for 4. Burgeson, R. E. (1993)J. Inuest. Dermatol. 101, 252-255 the function of type VI1 collagen as themajor component of an5. Uitto, J., and Christiano, A. M. (1992)J. Clin. Inuest. 90,687692 choring fibrils. 6. Fine, J. D., Bauer, E. A., Briggaman, R. A., Carter, D. M., Eady, R. A., Esterly, N. B., Holbrook, K. A,, Hurwitz, S., Johnson, L., Lin, A,, Pearson, R., and The overall conservation between the hamster and human Sybert, V. P. (1991)J. Am. Acad. Dermatol.24, 119-135 NC-2 domain sequencesis 88%at the amino acid level (13).The 7. Lin, A. N., and Carter, D. M. (eds.) (1992)Epidermolysis Bullosa: Basic and relatively high degree of conservation of this segment ishighClinical Aspects, Springer-Verlag, New York 8. Uitto, J., and Christiano,A. M. (1993)Semin. Dermatol. 12, 191-201 lighted by the presence of a 67-amino-acid residue segment 9. Parente, M. G., Chung, L. C., Ryynanen, J.,Woodley, D. T., Wynn, K. C., Bauer, spanning the junction of the triplehelical and theNC-2 domain, E. A,, Mattei, M. G., Chu, M.-L., and Uitto, J. (1991)Proc. Natl. Acad. Sci. U. 5’. A . 88,69314935 which is 100% conserved between the hamster and humanse10. Seltzer, J. L., Eisen, A. Z., Bauer, E. A,, Morris, N. P., Glanville, R. W., and quences (13). The importance of this conserved sequence is atBurgeson, R. E.(1989)J. Biol. Chem. 264, 3822-3826 tested toby the fact that we have recently demonstrated a mis- 11. Christiano, A.M., Rosenbaum, L. R., Chung-Honet, L.C., Parente, M. G., Woodley, D. T., Pan, T. C., Zhang, R. Z., Chu, M.-L., Burgeson, R. E., and sensemutationwithinthissegmentin two siblings with Uitto, J. (1992)Hum. Mol. Genet. 7,475-481, 1992 recessively inherited dystrophic EBwith relatively mild (mitis) 12. Gammon, W. R.,Abernethy, M. L., Padilla, K. M., Prisayanh, P. S., Cook, M. E., Wright, J., Briggaman, R. A., and Hunt, S. W., 3rd. (1992)J. Invest. Derphenotype (23).Specifically, the T-to-A transversion resulted in matol. 99,691696 substitution of a methionine residue in position 2,798 by a lysine 13. Greenspan, D. (1993) Hum. Mol. Genet. 2, 273-278 in both alleles of the affected individuals. The mother and a 14. Ryynanen, M., Knowlton, R. G., Parente, M. G., Chung, L. C., Chu, M.-L., and Uitto, J. (1991)Am. J. Hum. Genet. 49, 797-803 clinically unaffected half-brother of the affected siblings were 15. Ryynanen, M., Ryynanen, J., Sollberg, S., Iozzo, R. V., Knowlton, R. G., and shown to be heterozygous carriers of the mutation, without an Uitto, J. (1992)J. Clin. Invest. 89,974-980 A,, Duquesnoy, P., Blanchet-Bardon, C., Knowlton, R. G., A m apparent clinical phenotype (23). Thus, it appears that muta- 16. Hovnanian, selem, S., Lathrop, M., Dubertret, L., Uitto, J., and Goossens, M. (1992)J. tions that affect the highly conserved NC-2 domain in a homozyClin. Invest. 90, 1032-1036 gous state causea mild (mitis) form of recessive dystrophic EB 17. Christiano, A. M., Hoffman, G. G., Chung-Honet, L. C., Lee, S., Cheng, W., Uitto. J.. and Greensuan. D. S. (1994)Genomics 21. 169-179 (23). Although the precise mechanism for the pathogenesis of 18. Sanger, F., Nicklen, S., i n d Cou1son;A. R. (1977jProc.Natl. Acad. Sei. U. S. A. 14, 5463-5467 such mutations is not clear, we have proposed that the introW. R., and Lipman, D. J. (1988)Proc. Natl. Acad. Sei. U. S. A . 86, duction of a charged lysineresidue in this segment may change 19. Pearson, 2444-2448 the conformation of the NC-2 domain in a manner such that the20. Ruoslahti, E., and Pierschbacher, M. D. (1987)Science 238,491497 formation of intermolecular disulfidebonds is disturbed and the21. Hart, G. W., Brew, K., Grant, G. A., Bradshaw, R. A,, and Lennarz,W. J. (1979) J. B i d . Ghem. 264,9747-9753 integrity of functional anchoring fibrils is compromised (23). 22. Morris, N. P., Keene, D. R., Glanville, R.W., Bentz, H., and Burgeson, R. E. (1986)J . B i d . Chem. 261,5638-5644 During this study, several intragenic polymorphisms in the A. M., Greenspan, D. S., Hoffman, G. G., Zhang, X., Tamai, Y., Lin, coding region of the COL7A1 gene were identified. The combi- 23. Christiano, A. N., Dietz, H. C., Hovnanian, A., and Uitto, J. (1993)Nature Genet. 4, nation of these polymorphisms is useful notonly for genetic link62-66 age analyses, but they have also been used successfully for pre- 24. Hilal, L., Rochat,A., Duquesnoy, P., Blanchet-Bardon, C.,Wechsler, J., Martin, N., Christiano,A., Bamandon,Y., Uitto, J., Goossens, M., and Hovnanian,A. natal diagnosis of the severe, recessively inherited dystrophic (1993)Nature Genet. 6,287-293 form of EB infamilies at risk for recurrence of the disease(28). 25. Christiano, A. M., and Uitto, J. (1994)Chron. Dermatol. 4, 1-12 A. M., Anhalt, G., Gibbons, S., Bauer, E. A,, and Uitto, J. (1994) In fact, thesepolymorphisms can be used for DNA-based pre- 26. Christiano, Genomics 21, 160-168 natal diagnosis from chorionic villus biopsy specimens as early 27. Christiano, A.M., Chung-Honet, L. C., Hovnanian, A., and Uitto, J. (1992) Genomics 14,827-828 as the tenth week of the first trimester of gestation. This ap28. Christiano, A. M., and Uitto, J. (1993)Arch. Dermatol. 129, 1455-1459 proach will largely replace the method used currently for pre- 29. Mayne, R., and Brewton, R. G. (1993)Curr. Opin. Cell B i d . 5,883-890 natal diagnosis of recessive dystrophic EB, which consists of an 30. Li, K., Christiano, A. M., Copeland, N. G., Gilbert, D. J., Chu, M.-L., Jenkins, N. A,, and Uitto, J. (1993)Genomics 16,733-739 invasive fetal skin biopsy at 18-20 weeks of gestation. 31. Blumberg, B., and Kurkinen, M. (1990)Extracellular Matrix Genes, pp. 115In conclusion, we have cloned overlapping cDNAs which cor135,Academic Press Inc., New York respond to the full-length human al(VI1) mRNA sequences, 32. McGrath, J. A,, Ishida-Yamamoto, A., OGrady, A., Leigh, I. M., and Eady, R. A. J. (1993)J. Inuest. Dermatol. 100, 366-372 predicting a deduced polypeptide of 2,944 amino acids. The 33. Christiano, A. M., Ryynanen, M., and Uitto, J. (1994)Proc. Natl. Acad. Sei. U. S. A. 91,3549-3553 collagenous domain is interrupted 19 times by imperfections or I

~

I

~~