Cloning, Genomic Organization, and Chromosomal Localization of ...

3 downloads 0 Views 6MB Size Report
Jun 22, 1992 - Shyam S. ChauhanS, Nicholas C. Popescus, David Ray$, Robert ...... Harper, M. E., and Saunders, G. F. (1981) Chromosoma (Berl.) 83, 431-.
THEJOURNAL OF BIOLOGICAL

Vol. 268, No. 2. Issue of January 15, pp. 1039-1045,1993 Printed in U.S.A .

CHEMISTRY

Cloning, Genomic Organization, and Chromosomal Localization of Human Cathepsin L* (Received for publication, June 22, 1992)

Shyam S. ChauhanS, Nicholas C. Popescus, David Ray$, Robert FleischmannS, Michael M. GottesmanS, and Bruce R. Troenn11 From the $Laboratory of Cell Biology and the §Laboratory of Biology, National Cancer Institute, National Institutesof Health, Bethesda, Maryland 20892 and the nDepartmentof Internal Medicine, Division of Geriatric Medicine, Institute of Gerontology, Geriatric Research. Education and Clinical Center. Veterans Administration Medical Center, University of Michigan, Ann Arbor, Michigan 48109

Cathepsin L is a lysosomal cysteine protease whose expression and secretion is inducedby malignant transformation, growth factors, and tumorpromoters. Many human tumors express high levels of cathepsin L, which is a broadspectrum protease with potent elastase and collagenase activities. Two published human cathepsin L cDNA sequences differ only in their 5“untranslated regions. In this study, we demonstrate the concurrent expression of two distinct human cathepsin L mRNAs (hCATL-A and hCATL-B)in adenocarcinoma, hepatoma, and renal cancer cell lines. Cloning of the human cathepsin L gene by polymerase chain reaction amplification of genomic DNA and subsequent sequencing reveals that hCATL-A and hCATL-B mRNAs are encoded bya single gene.The 3’ end of the first intron contains the 5’ portion of hCATL-B and is contiguous to the second exon of the gene. These data suggest either the possibility of alternative splicing or the presence of a second promoter within the first intron of the hCATL gene. We mapped the hCATL gene to chromosome 9q21-22. Sequencing of both the mouse and human cathepsin L genes demonstrates almost complete conservation of exon and intron position, but significant divergence in intron structure, possibly reflecting differences in regulation of expression of the mouse and humancathepsin L genes.

Cathepsin L is a lysosomal acid cysteine protease that is distributed in many cells and tissues (1-6) and is primarily responsible for intracellular protein catabolism and turnover (7,8). CathepsinL cleaves a wide range of substrates including extracellular proteins (fibronectin,collagen, elastin, and laminin), serum proteins,cytoplasmic proteins, and nuclear proteins (9-11). Cathepsin L may play a role in tumor invasion and metastasis (12-14) andhas beenimplicated in bone resorption (15), sperm maturation (16, 17), and the pathobiology of macrophages (3,4,18). CathepsinL is expressed in large quantities by transformed cells in culture (19, 20), and itssynthesisandsecretionare also stimulated by tumor promoters, growth factors, and secondmessengers (20-22). We have also found high levels of cathepsin L expression in * The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. The nucleotide sequence(s) reported in this paper has been submitted totheGenBankTM/EMBLDataBank with accession number(s) L06426. I( To whom correspondence and reprint requests should be addressed: 5510 MSRB I, 1150 West Medical Center Dr., Ann Arbor, MI48109-0680.

many human tumorsincluding cancers of the kidney, testicle, lung, colon, breast, adrenal gland, and ovary (23). It was perplexingthat thesequence of the human cathepsin L cDNA (hCATL-B) cloned by Joseph et al. (24)froma human kidney cDNA librarydiffers inthe 5’ noncoding region from a human cathepsin L cDNA (hCATL-A) cloned in our laboratory from SV40-transformedhuman fibroblasts (25) (Fig. 1). However, the codingregions of both cDNAs are identical.Furthermore,preliminarystudiesonthehuman CATL gene revealed the presence in the genome of a t least two genes closely related tohCATL. Therefore, we decided to isolate and characterize the structure and expression of the hCATL gene or genes todetermine if the two species of hCATL mRNA are encoded by two different genes or are transcribed from a single gene. Since regions downstream of the transcription initiation site of the mouse cathepsin L (mcatL) gene mediate theresponse to phorbol ester regulation of expression (22), we also determined and compared the intron-exon structures of both the human andmouse genes. Our studies reveal that hCATL-A and hCATL-B mRNAs are expressed concurrently in different cell lines and that both mRNAs are encoded by a single gene. The hCATL gene maps to chromosome 9q21-22. The genomic structure of the mouse and human cathepsin L genes are similar, with preservation of intron-exon splice junctions that resultin identically sized exons within the coding regions, but markedly different introns. EXPERIMENTALPROCEDURES

Materials-The cloning vectors pGEM 3 andpGEM42 were purchased from Promega Corp. Escherichia coli strain HB 101 was used for subcloning and purification of plasmid DNA. All restriction endonucleases were purchased from BethesdaResearch Laboratories and used according to themanufacturer’s guidelines. Oligonucleotide primers usedfor the polymerase chainreaction(PCR)’or DNA sequencing were synthesizedinthelaboratory using an Applied Biosystems 380B DNA synthesizer. Cell Culture-The human cell line KB-3-1 is a subclone of the KB cervical adenocarcinoma cell line obtained from the American Type Culture Collection (26). KB-3-1 cells were grown as monolayers in Dulbecco’s modified Eagle’s medium (Quality Biologicals, Inc.) containing penicillin (50 units/ml) and streptomycin (50 pglml), glutamine(2 mM), and 10% fetal calf serum. HTB 44 (human kidney adenocarcinoma) cells were also obtained from ATCC and grown in Dulbecco’s modified Eagle’s medium containing 10% fetal bovine serum, 1 mM sodium pyruvate, glutamine, penicillin and streptomycin, and nonessential aminoacids. BL 7404 cells (human hepatoma) were obtained from Dr. D. W. Shen, Laboratoryof Cell Biology, NCI, NIH and grown as described earlier (27). Ribonuclease Protection Assay-A 415-bp (SacII-BglII) fragment of hCATL-A cDNA was cloned into the SmaI and BamHI sites of The abbreviations used are: PCR, polymerase chain reaction; bp, base pair(s).

1039

Human CathepsinL Gene

1040 FIG. 1. Comparison of LCATL-A and LCATL-B cDNAs. The hCATL-A cDNA (25) and thehCATL-B cDNA(24) are not homologous in the 5"noncoding regions. The hCATL-A cDNA contains 276bp and the hCATL-B cDNA contains 181 bp in this nonhomologous region. Beginning 12 nucleotides upstream of thetranslationalstart codon (bold ATGs), thesequences are identical until 4 nucleotides downstream of the translational termination codon (bold TGAs). Regions not shown areidentical. The published hCATL-B cDNA sequence contaids an additional 29 bp at the 5' end; however, these nucleotides are not shown here because they are partof the multiple cloning site of M13, the vector used for the subcloning and sequencing of hCATL-B.

hCATLA

AGMCCGCGACCTCCGCMCCTTGAGCffiCATC~TGGAGTGCGCCTGCA~TACGACCGCA~AffiAAAGCGCCGCCGG 80

hCATL-A

CCAffiCCCAGCTGTGGCCGGACAGGGACTGGM~~GGACGCGG~GAGTAffiTGTGCACCI\GCCCTffiCMCGAGAGC 160

hCATLB

A~TCCACGTGCCCTGTTTTTCTGGI\GGCACATCCTTGGCCTCTTCCACAGTCCTTGGGT

60

hCATL-A

GTCTACCCCGMCTCTGCTGGCCTTGAffiTGGGGMGCCffiGGAGffiCAG~GAGGACCCCGCGGAGGCGCGTGACTGGT2 4 0

hCATL-B

AAATGCTTGGGAGMTMTTTAAATATTTTTATTCTACCATGGTGGCCCTMTTTT~AGG~CAGTMGATGGCTTTT 1 4 0

hCATL-A

T..GAGCffiGCAGGCCAGCCTCCGAGCCffiGT...GGACACAGGTTTTAAAAC~TCCTACACTCATCCTTGCTGCC 315 IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII TAGGATTGGTCTMTCAMTCCTCATTTTTGTTCCCTTCCTAGGTTTT~C~TCC~ACACTCATCCTTGCTGCC220

hCATL-B

hCATL-A hCATL-B

TACCCCACTGTGDCAGCTffiTffiACGG~ATGAGGMGGACTTGACTGGGGATGGCGCATGCATffiGAffiMTTCATCTT 1355 IIIIIIIIIIIIIIIIII IlIIIIIIIIIIIIIIIIlIIIIIIIIIIlIIIIIIlIlIIlIIIIIlIIIIII IIII TACCCCACTGTGDCAGCT.GTffiACGGTGATGAffiMGGACTTGACTGGGGATGGCGCATGCATGGGAffiAAT...TCTT 1256

hCATL-A

TTCGTTTTTAAAAGGATGTATAAATTTTTACCTGTTTAAATAMATTTAATTTCAAATGT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

1515

hCATL-B

TTCGTTTTTAAAAGGATGTATAAATTTTTACCTGTTTAMTWTCG

1464

pGEM 42 after digestion of the 3"overhanging Sac11 end with T4 DNA polymerase (utilizing the 3' to 5' exonuclease activity). This construct was digested with PuuII, and antisense radiolabeled riboprobe was prepared by Bioserve Biotechnologies (College Park, MD) using T 7 RNApolymerase.Inclusion of vectorsequences in the transcribed probe resulted in a size of approximately 600 bp. The ribonuclease protection assays were performed using whole cell lysates according to the instructions in the Ribonuclease Protection Kit from RNA Laboratories Inc. (Eagle, PA). 2 X lo6 cells were lysed in 100 pl of nuclear lysis buffer. 5 X lo5 cpm of riboprobe (in 5 pl) was added to 20 pl of the cell lysate, and the solution was heated to 85 "C for 3 min and then incubated overnightat 37 "C. After RNase treatment and subsequentpheno1:chloroform extraction and ethanol precipitation, the sampleswere electrophoresed as described in Sambrook et al. (28). Preparation of Genomic DNA-Total genomic DNA was isolated according to the procedure described by Miller et al. (29). KB-3-1 cells (70-80% confluent) ina 100-mm dishwere washed with ice-cold phosphate-buffered saline three times, harvested in 3 ml of nuclear lysis buffer (10 mM Tris-HCI, 400 mM NaCl, and 2 mM Na,EDTA), and transferred to a 15-ml polypropylene tube. 0.2 ml of 10% SDS and 0.5 ml of 10 mg/ml proteinase K were added, and the solution was incubatedovernight a t 37 "C.After adding 1 ml of saturated sodium chloride, the solution was vortexed vigorously and then centrifuged a t 2500 X g at 4 "C for 15 min. The clear supernatant was transferred to another tube, and 2volumes of room temperature absolute ethanol were added and mixed gently by inverting several times. The precipitated DNA was spooled on a sealed glass Pasteur pipette, washed in 70% ethanol, and resuspended in HZO. Screening of Human Genomic Library-A genomic cosmid library prepared from human KB-V1 cells (30) was screened with a radiolabeledprobe prepared by randomprimersynthesis froma501-bp (PstI-EcoRI) fragment of the hCATL-A cDNA clone pHU-16 (25) using the method described by Troen (31). A positive colony was identified and designated pcosHMEP-3. BamHI- and XbaI-digested fragments of the entire -40-kilobasecosmid were subcloned into pGEM 3 for further analysis. Amplification and Cloning of Human Cathepsin L Genomic Fragments-Genomic fragments of human cathepsin L were amplified by the polymerase chain reaction (PCR) using primers homologous to both intron and exon regions (Table I). The PCR was performed in a final volume of 100 pl, containing 1 pg of KB-3-1 genomic DNA, 50 mMKC1, 10 mM Tris-HC1 (pH 8.8), 1.5 mM MgCI,, 0.1% Triton X-100, 0.2 mM concentration of each dNTP, 2.0 units of Taq DNA polymerase, and 100 pmol of each primer. Denaturation, annealing, and amplification were performed in a Perkin-Elmer Cetus Instruments DNA thermal cycler initially a t 94 "C for 1 min, 42 "C for 2 min,and 72 "C for 3 min, respectively. Thetemperature of the annealing was varied empirically to maximize yield of the product. Aliquots of the PCR mixtures were used to identify fragments containing hCATL regions by performing a Southern hybridization using radiolabeled hCATL-A cDNA probe. The fragments thus identified were purified from the rest of the reaction mixture on low melting point agarose, blunt-ended by T4 DNA polymerase, and cloned into the SmaI siteof pGEM 42 for subsequent sequencing and restriction endonuclease analysis (the plasmids were named pGHCL1-5). Fur-

ther PCR analysisof DNA from human-hamster somaticcell hybrids (BIOS Laboratories)was performed under similar conditionsaccording to the manufacturer's instructions. DNA Sequencing-Human and mouse cathepsin L gene fragments were sequenced by the method of Sanger et al. (32) using synthetic oligonucleotide primers, fluorescent dideoxynucleotides, and a GenesisTM2000 automated Sequencer (Du Pont-New England Nuclear). All sequences were obtainedinbothdirections,except pcosHMEP3a and pcosHMEP3b (see "Results"). Sequence analysis was performed with theGCG package (University of Wisconsin) and the Gene Construction Kit (Textco). InSitu Hybridization-Human metaphaseandprometaphase preparations from methotrexate-synchronizednormalperipheral lymphocyte cultures were pretreated withribonuclease and denatured in 70% formamide a t 70 "C. Both a 235-bp (PstI-DraI) fragment of hCATL-A cDNA(fromthe 5"noncodingregion) andtheentire hCATL-A cDNA were nick-translated with all four 3H-nucleotidesto a specific activity of 2-3.8 X lo7 cpmlpg of DNA and used as probes. A hybridizationmixturecontaining50% formamide, 5% dextran sulfate, 2 X Denhardt's solution, 300 mM sodium phosphate (pH6.4), and 5 X 10' cpm of radiolabeled probe was layered on each slide (33). The slides were covered and incubated in a moist chamber at 42 "C for 20 h, washed for 20 min in 50% formamide, 2 X SSC at room temperature, and then four times for 5 min each in 2 X SSC, and subsequently dehydrated in an ethanol series. The slides were then coated with nuclear track emulsion (NTB-2, Kodak, diluted 1:l with water) and stored desiccated at 4 "C. After 14 days, autoradiographs were developed and stained, and spreads exhibiting silver grains on non-overlapped chromosomes were photographed.G-banding was obtained after destaining with alcohol, treatment with a solution of 0.03% trypsin, 0.012% EDTA,andrestainingwith 0.25% Wright stainin 0.06 M phosphate buffer (1:3, pH 6.8) (34). Previously photographed chromosome spreads were relocated, and asecond photomicrograph exhibiting G-bands was used for grain localization a t specific chromosome sites. RESULTS

Expression of Two Forms of Cathepsin L mRNA-We employed a ribonuclease protection assay to identify the two species of hCATL mRNA in KB-3-1, human renal adenocarcinoma H T B 44 and human hepatomaBL 7404 cells (Fig. 2). The riboprobe was transcribed from an hCATL-A cDNA fragment and contained415 bp identical with the region from bp 222 to 638 of the hCATL-A mRNA, spanning fromexon 1 to exon 4 of the hCATL gene (Fig. 2 A ) . Since bp 222-277 of the hCATL-A cDNA are not found in the hCATL-B cDNA (Fig. I), the riboprobe contained only 362 bp identical with bp 182-543 of the hCATL-B cDNA. In cell lysates from all three human cell lines, probe fragments of 415 bp (representative of hCATL-A mRNA)and 362 bp(representative of hCATL-B mRNA) were protected from the RNase digestion (Fig. 2B, lanes 5, 6,and 7). In contrast, no protected fragment

Human Cathepsin L Gene peosHMEP3a x hCATL cDNA

...... 415

hCAIL

I

1

C............................................................................. ?'UF?F?"CCI?F?FM?~GGEF?TG??rrCFffmC?F?F

"P3.

I

I

2

1

3

495 G ~ C A C A C E T T C A C M I C C C C A t C M ~ C I t ~ C A C ~ ~ I C A C C A C I C M C M I ~ A C C C ~ I574 C~~MI~IIIC

4

1

WEPJ~

hCATL

575

HMEP3a hCATL $55

hCA7L-B

PcrrmtSimilarity:88

MMi ~., M~i %.i i- i.i ~?~ ;., ;. i .t A?~ i.c ~. ; ? , M iEM ~?i i.M ~~i i?i i~~ M~; I ~ i t ~~; M~~ ~~~ M~i i(~;9(; ~ ~ ~ i~ ~ ~ ~

hCITL

ham-A

1041

Mf?FF"~??4F"F?CC??C???c~??FF~M~cF~?~?~C~~FFF?c?~rc'?G~c?~?~?c?~ ij;;;c,M;;;c;i%M,,,-c~ii;;~M~I~i~~iIA~u~cc;~;;c;~~~~ci~i~~;,ic 65. ??3CGCI?IA?CC?FIFFfG?cMcC?K.MC

MA%iiiccit&i;i.iCii.Mi.MitiGi.

60s

h"W3b cA~L

,15

Perant S imilarity:87 ~~r;~icciiticir~~A;;,;~~;ci%~;~;~ii;i~;;;~u~ici;;;;~M~iffii~;;~ C C M C ? C C " F ? F ? G G M " ? G ? r c ' ~ ~ C C ? ? 3 ? c ~ ~ r r ? ~ F ? F ~ r c ' ? ~ ? c M794 ~~?~?F"

hCIT' "P3b

195

tF i? ~F~n;;iw( c. Mc ~K~. MMi ~, M ~ ,?tG~ ?c Mc i~ ~~~~cti~M?i ?i %C ;Ci?~ ?; i~ ~? ic i~;?; ~f ~f i? ~? i?~87( G~ c ~ ;? ~C;CI ~M ~c ~c ;t

pcosHMEP3b x hCATL cDNA

B

1

2

3

4

5

6

7 ".

nmP3b

hCAIL

?CCGG?CTC""??FF??K.F?C?"UV.?? 075

~ffi~i;~,~Mi~i;i;c;I;rc-M

Ex8h x hCATL cDNA

t 600 bp

527

similarity:94

eXm

..

??F'F?F?"rF?-??FFc~~rc'?~??~?F. ........................................................................... .rc'???~rr?F?F?F-????rc'rr~FF?c??~

hCATL 1383

TACACACICCU~CA~CMCAICCCA~CICA~ICMIICIC~AIAIIIICAC~t~I~ICIIAC1 C4I 6C1 I A I I I T

~ C A T L1463

M.iri;i.iCCiiiMAtici;iiiAtiiii;;iCiiii.;P~i

E*8b

???~?FTT?????"?N?M???F~?C ...............................

hCATL IS43

622

907

M?!?F"F???-C?'C??~?rr???G??K.?F??.

... ? ~ ? G C ? ? T T K . . A ? ? ? J ? ? 3 ? C C ? ? ? ? ? ? l ? ls(2

T ~ A C C T C I I t ~ T ~ T t T M I t I C1574 ~ I C

FIG.3. Comparison of hCATL cDNAwith pcosHMEP3a and of pcosHMEP3a, pcos-b and ExSb. Nucleotidealignments HMEP3b,andEx8hwith hCATLcDNA. Numbers arepositions within hCATL-A cDNA sequence. Thetermination codon within pcosHMEP3a is highlightedby the bold TGA.

t 415

bp

cathepsinL gene. Two regions containedputative exons (pcosHMEP3a and pcosHMEP3b) and were homologous to t 362 bp the hCATL-A cDNA, but they exhibited nucleotide similarities of only 89% and 87%) andone region contained a translational termination codon (Fig. 3). Therefore, pcosHMEP3 314 does not contain thefunctional cathepsin L gene. PCR Amplification of the Human Cathepsin L Gene-Because repeated attempts toclone the humancathepsin L gene by screening genomic libraries were unsuccessful, we at267 tempted to clone this gene by utilizing PCR technology. We initially amplified a 2000-bp fragment (HCL1) using an oligonucleotide primer pair from bp 18-37 and 426-443 of the hCATL-A cDNA (Table I and Fig. 4). The next genomic FIG. 2. Ribonuclease protection assay of hCATL-A and fragment (HCL2) was amplified by a primer pair consisting hCATL-B mRNA. A, the rihoprobe is -600 bp and containsa 415- of a sense oligonucleotide from within intron 2 (170 bp uph p region of the hCATL-A cDNA, of which 362 bp are identical with t h e hCATL-BcDNA (see "Experimental Procedures" and Fig. 1). stream of the beginning of exon 3, Table I and Fig. 4) and an Homologousregions are represented by similar shading. Numbers antisense oligo from bp 767-784 of the hCATL-A cDNA. In a embedded within the cDNAs represent the corresponding exons in similar fashion, HCL3and HCL4 were amplified, resulting in the hCATL gene. R: lane 1 , radiolabeled MspI digest of pBR322; lane four overlapping fragments that covered the introns of the 2, no cell lysate, riboprobe undigested; lane 3, NIH-3T3 cell lysate hCATL gene. A portion(HCL5) of the lastexon was amplified (murine fibroblast); lane 4, NIH-3T3 cells stably transfected with a eukaryotic expression vector containing hCATL-A cDNA; lane 5,KB- by using a primer pair from bp 1380-1400 and 1538-1566 of 3-1 cell lysate (human adenocarcinoma); lane 6,HTB 44 cell lysate the hCATL-A cDNA. This lastamplification actually resulted (human kidney adenocarcinoma); lane 7,BL7404 cell lysate (human in the simultaneous cloningof another fragment which exhibhepatoma). ited 91% nucleotide similarity to hCATL-A cDNA (Ex8b, Fig. 3), further suggesting the presence of another gene closely was seen in the hybridization with murine NIH-3T3 fibroblast related to cathepsin L. lysates (lane 3 ) . Both A and B forms of cathepsin L mRNA Genomic Organization of the Human and Mouse Cathepsin are expressed concurrently in the cell lines studied. However, L Genes-We mapped the intron-exon boundaries of both the expression of hCATL-B mRNA was consistently several- genes by completely sequencing the exons and portions of the fold higher than hCATL-A mRNA. Similar results were ob- adjacent introns using primers complementary to hCATL-A tained for RAJI and osteosarcoma cells (data not shown). cDNA and mcatL cDNA (Fig. 4). Intron sizes were determined Hybridization of the riboprobe with lysates from NIH-3T3 by PCR amplification using pairs of primers complementary cells that were stably transfected with an expression vector to adjacent exons. Both the human and mouse cathepsin L containing the hCATL-A cDNA resulted in only the expected genes contain 8 exons and 7 introns; the hCATL gene spans 415-bp protected fragment (Fig. 2B, lane 4 ) . approximately 5100 bp, whereas the mcatL gene spans apIsolation of a Gene Closely Related to Cathepsin L-In an proximately 7400 bp. Exons 2 through 7 are the same size in attemptto isolate the human CATL gene or genes, we both species, although there is considerable variation in the screened a cosmid library prepared from human KB-V1 ge- size of the comparable introns. All of the introns follow the nomic DNA and obtained a positive clone that we designated GT/AG consensus rule (Table 11) (35). Among the introns pcosHMEP3. We sequenced subclones of pcosHMEP3 using within the coding region, introns 2-5 interruptthe open oligonucleotide primers from the hCATL-A cDNA in order to reading frame between codons (type 0 intron), intron6 interdetermine whether the clone was identical with the human rupts the codon after the firstnucleotide (type 1 intron), and

404

Cathepsin Human

1042

L Gene TABLE I

Sequences of the oligomers used for the amplification of the human cathepsinL gene Sequences are 5' to 3'. The first oligonucleotide of each pair is a sense primer and the second is an antisense primer. Numbers in parentheses refer to location within hCATL-A cDNA for exon primers and the number of nucleotides upstream of an exon for the intron primers. Fragment amulified

Source (intronlexon)

Amplimers used

HCLl

AACCTTGAGCGGCATCCGTG (1837) CACACTGCTCTCCTCCAT (443:426) HCLZ ACAAGTGACATTCATCCTTG (-170:-151) GATTCTGCTCACTCAGTG (784:767) CAGATGTGTGAGCTGTTG HCL3 (-203:-186) ACAGTTGCAACTGCCTTC (1013:996) HCL4 AGCTGCACACTGCTGAGTG (-95:-77) TTCGAGTGTGTATCCGAC (1394:1377) HCL5 ACTCGAATCATTGAAGATCCG (13881408)

Exon 1 Exon 3 Intron 2 Exon 5 Intron 4 Exon 6 Intron 5 Exon 8 Exon 8 Exon 8

CATTTGAAATTAAATTTTATTTAAACAGG(1574:1546)

57

27%

137

123 147

118225

163

403

138

123 147

118225

163

385

HCLl +"""""""""e

- - - -HCLZ - - - - - - -+HCL3 "t - - - - - - - +

+

HCL4 +"-""""""

4HCLS

")t

FIG. 4. Human and mouse cathepsin L genomic structures. Cathepsin L gene structures were determined by a combination of sequencing and PCRanalysis as described under "Experimental Procedures." Black rectangles with white numbers represent exons. Unshuded rectangles represent introns. Breaks are prdsent in those introns not drawn to scale. The 181-bp hatched area represents the 3' end of the first human intron that is congruent with the 5"untranslated region of hCATL-B cDNA. The intron and exon sizes are shown above and below the unshaded and black rectangles, respectively. The fragments of the hCATL gene amplified from genomic DNA (HCLI-HCL5) are shown with the arrows representing the approximatelocations of the primers used for the PCR.

TABLEI1 Splice junctions of the mouse and human cathepsinL genes The consensus sequence for splice donor and acceptor sites is shown on the top line. Py = pyrimidine, IVS = intron. The residues surrounding the sitesfor introns 2-7 are shown. Nonidentical hCATL amino acids are shown beneath the corresponding underlined mcatL amino acids. The splice phases for the introns interrupting theopen reading frame are shown. S'donor AG*GTNNNN ....... IVS . . . . . . . . . . (Py)l2AG*3'acceptor

Consensus mcatL IVSl hCATL

GTATGGCACG*GTTTGTAGTG . . . IVS2 . . . TGTGCTTGCTCTAG*AATGAGGAAG ATGTTTGCCTCTAG*AATGAAGAAG ATACGGCATG*GTTAGTGAAA AsnGluGluG uTyrGlyTlx Met

0

hCATL

TGGTGACATG*GTCAGTGGGG . . . IVS3 TGGAGACATG*GTAAGTGTGC eGlyAspMet

. . .TCCCTTCCCGGAAG*ACCAATGAGG

0

hCATL

mcatL hCATL

GAAGAACCAG*GTATGAGGGC GAAGAATCAG*GTGAGACAGT lLysAsnGln

TGAAGCAAAG*GTAATTGGTT . . . IVSS . . .CTTAACTTTCAAAG*GACGGATCTT TTTACCTTTGAAAG*GAAGAATCCT TGAGGCAACA*GTAAGTGGAG &&l.ySerC rGluAlaLys GluGlu Thr

0

hCATL

TATAGTTCAG*GTAATGGTCA . . . IVS6 . . . TTTTTCTCTGCCAG*GCATCTACTA TTTTACCATCCCAG*GCATTTATTT CA TATAAAGAAG*GTAAGCATAT TL lyIleTyrXy T y r e G LysGlu

1

TCAAGAACAG*GTAAGAGACT . . . IVS7 . . . TGGATCTCTTTCAG*CTGGGGAAGT TGCTCTTTTTTCAG*CTGGGGTGAA TGAAGAACAG*GTATAAATTG alLysAsnSe rTrpGlyGlu

2

mcatL IVS2

mcatL IVS3

IVS4

GCCGCCTCAG*GTGAGTGACC . . . IVS1 . . . TTTTCCTTCCCTAG*GTGTTTGAAC TGTTCCCTTCCTAG*GTTTTAAAAC GTGGACACAG*GTACCGCAGC

Splice Phase

mcatL IVS5

mcatL IVS6 h

mcatL IVS7 hCATL

CCTTTTCCTTGAAG*ACCAGTGAAG ThragnGluG Ser

...IVS4...TTTTCTCTCTTCAG*GGCCAGTGCG

0

AATTCCCTTTTCAG*GGTCAGTGTG GlyGlnCysG

Ph

H u m a n Cathepsin L Gene intron 7 interruptsthe codon afterthesecondnucleotide (type 2 intron) (36). We have shown that sequences downstream of the transcription intiation site regulate transcription of the mcatL PIil

sac:,

P,ti

*ma I

Xrn",

Intron

1

hCATL-A

100

hCATL-8

H Exon 1 >>,

1 AGAACCGCGACCTCCGCAACCTTGAGCGGCATCCGTGGAGTGCGCCTGCAGCTACGACCG

61 121 181 241

CAGCAGGAAAGCGCCGCCGGCCAGGCCCAGCTGTGGCCGGACAGGGACTG GAAGAGAGGA CGCGGTCGAGTAGGTGTGCACCAGCCCTGGCAACGAGAGCGTCTACCCCGAACTCTGCTG GCCTTGAGGTGGGGAAGCCG GGGAGGGCAG TTGAGGACCCCGCGGAGGCGCGTGACTGGT TGAGCGGGCAGGCCAGCCTCCGAGCCGGGTGGACACAG lnlron 1 >>>

279 339 399 459 519 579 639 699 759 819 079 939 999 1059 I119 1179

GTACCGCAGCCAGGCCGGCGCCAACGACTCAGGGCCTGGCCCGGCCAGACAGGGAAGCTC AGTCCCCGCACGCCAGACAGCGGTACTCCTGCTGGCGTCACCGCAAACATCCTCTGACCG CTACAGCCAGTGTGTGGGCAGGCGTCATGTCCCCGGCCCTGCCACGCCTGGAGCCCTGGA AGCTGGCTGCAGGGCTCTGGCTTCCCGCGTGCGCCCATATGACCCCGTCCCTGATTTAGG GGAGCAGTTTGGGGTGTCGGCAGCACAGGCCCAAGTGAAT GAAGGAGGGA AGCAGTGCGT GCTCTCCTTCCCAGTTTTTC CTGGGAAAGCATTTCAGAAAGGTTTCATTTAAGAAGAGGT TGGGGCGGCCAGGTGGCTCACTCCTGTAATCCCAGCACTTTGGGAGGCTGAGGTGGGCGG ATCACCTGAGGTCAGTAGTTCAGACCAGCCTGGCCAACATGGTGAAACCCCGTCTCTACT GAAAATACAAAATTAGACGGGCGAGGCGGCGCACGCCTGTAGTTCCAGCTATTCAAGAGG CTGAGGAAGAATGGCTTGAACCCGGGAGGCAGAGGTTGCTGTGAGTCGATATCGCGCCGT TGAACTCCAGCCTGGGCCACAGAGCAAGACTCCATCTCAAAAAATAAATAAATAAATAAA TAAATAAATA AATAGGAGAG ATTGGAAAACTTATCTCAGCTTTTGGTGTTTGTTAGTCAG GAAGATGTGTGAAGGCCTCCTAACTCTTGGGGATCTCTTTGTCCCCTACTTGGGAATCCC ACCTTATCATTAGTGAGGTTTTGCCTGGGCACGAAACCTGGATTTTTTGCGATTGGTACA AAACCTGGATCAACCGTTTCCCGGTTTCCTAGTTGTTGCCTTAAGCTTCTCACACACAAG GTAGTTTCATACGGTTCTCATAACCTAAATTGTCATCGCATAAACTGTTTCAGCTCCTAC hCATL-8 >>>

"

1239 AGCTCTGGACAGGCTGCTTTTCATTTTGGTAAGTCCATCCAGTRCCTCCA CGTGCCCTGT 12?9 TTTTCTCCAGGCACATCCTTGGCCTCTTCCACAGTCCTTGGGTAAATGCTTGGGAC.AATA 1359 1419 G "ATCCTCATT"

"

Exon 2 >*>

1465 GTTTTAAAACATGAATCCTA

FIG.5. Nucleotidesequence of the 5' end of the human 20 cathepsin L gene. Sequence of first exon, entire first intron, and h p of the second exon. Underlined nucleotides at the 3' end of the intron are those found in hCATL-B cDNA. The translational start codon in exon 2 is highlighted by the bold ATG. The two ouerlined nucleotides in the intron differ from the sequence in the hCATL-B cDNA (CC insteadof GG).

FIG.6. In situ hybridization of the human cathepsin L gene. a-d, metaphasechromosomesfromnormal human lymphocytes after hybridization with a radiolabeled cathepsin L probe ( a andc).Chromosomesexhibiting label are indicated by arrows and are identified by G-banding as chromosomes 9 ( b ) and 10 ( d l .

1043

gene (22). Therefore, we sequenced the entire first intron of the hCATL gene (Fig.5). T o our surprise, the 5"untranslated region of hCATL-B cDNA is found a t t h e3' end of the first intron and is contiguous with the second exon of hCATL-A (Figs. 4 and 5). This confirms that both species of hCATL mRNA are encoded by a single gene. Mapping of the Human Cathepsin L Gene to Chromosome 9-The chromosomal mapping of the hCATL gene was performed by in situ hybridization using probes generated both from the full-length hCATL-A cDNA and from the 5'-untranslated region of hCATL-A cDNA. Similar results were obtained with both probes. When probe radiolabeledfrom full-length hCATL-A cDNA was used, 246 grains from 110 metaphase spreads were counted with clusteringof grains a t two sites (Fig. 7A). 54 (22%) of the total number of grains scored clustered a t region 9q21-22. A second hybridization site was identified on chromosome1Oq23-24, consisting of 18 grains. The grainsobserved a t these sites jointly represented 29% of the total grains scored. The remaining grains were randomly distributedover the rest of the chromosomes. Representative metaphases exhibiting label on chromosomes 9 and 10 are shown Fig. in 6. Similarly, after hybridization with the probe made from the 5"noncoding region of hCATL-A cDNA, 68 grains froma total of 267 counted on 110 chromosome spreads were localized a t 9q21-22 (Fig. 7B), and 12 grains were identified a t 1Oq23-24. The localization of the hCATL gene to chromosome9 was confirmed by PCR amplification of human-hamster somatic cell hybrid DNA, using the primer pair that generated the 800-bp HCL2 (Fig. 8). Amplification of a n appropriately sized fragment wasachieved using pGHCL2, total humangenomic DNA, and hybrid DNA containing human chromosomes 5 and 9 (Fig. 8, lanes 2, 3, and 4 ) . Incontrast,nofragment wasamplifiedfrom the 5 and 10, hybridDNAcontaininghumanchromosomes pcosHMEP3, or total Chinese hamster ovary genomic DNA (Fig. 8, lanes 5, 6, and 7). Theabsence of an amplified fragment from chromosome 10 and the clustering of grains at

Cathepsin Human

1044 A lo

r

0

p4 e

e

1

.

2

FIG.7. Chromosal localization of the human cathepsin L gene. A, histogram representing grain distribution in 110 cells hybridized to full length cathepsin L probe. The number of grains is plotted on a 400-band human chromosome ideogram. R, distribution of grains on chromosome 9 after hybridization with 5”untranslated region of hCATL-A probe.Region 9q21-22 had the largest number of grains. A similar grain distribution was observed after hybridization with full-length cathepsin L cDNA probe. 1 2 3 4 5 6 7

-

1

800 bp

FIG.8. Localization of cathepsin L to chromosome 9 by PCR amplification. A variety of D N A samples were subjected to PCR analysis with the oligonucleotide pair used to amplify HCL2. Lane 1,