Developmentally Regulated Cytokeratin Gene in Xenopus laevis

7 downloads 0 Views 1MB Size Report
May 10, 1985 - basic, and show a lower degree of amino acid sequence homology to sheep wool type I microfibrillar keratin (19, 20,. 50). Recently, two classes ...
Vol. 5, No. 10

MOLECULAR AND CELLULAR BIOLOGY, OCt. 1985, p. 2575-2581

0270-7306/85/102575-07$02.00/0

Developmentally Regulated Cytokeratin Gene in Xenopus laevis JEFFREY A.

WINKLES,'* THOMAS D. SARGENT,' DAVID A. D. PARRY,2t ERZSEBET JONAS,' AND IGOR B. DAWID'

Laboratory of Molecular Genetics, National Institute of Child Health and Human Development,' and Laboratory of Physical Biology, National Institute of Arthritis, Diabetes, and Digestive and Kidney Diseases,2 National Institutes of Health, Bethesda, Maryland 20205 Received 10 May 1985/Accepted 25 June 1985

We have determined the sequence of cloned cDNAs derived from a 1,66S-nucleotide mRNA which transiently accumulates during Xenopus laevis embryogenesis. Computer analysis of the deduced amino acid sequence revealed that this mRNA encodes a 47-kilodalton type I intermediate filament subunit, i.e., a cytokeratin. As is common to all intermediate filament subunits so far examined, the predicted polypeptide, named XK70, contains N- and C-terminal domains flanking a central cc-helical rod domain. The overall amino acid homology between XK70 and a human 50-kilodalton type I keratin is 47%; homology within the a-helical domain is 57%. The N-terminal domain, which is not completely contained in our cDNAs, is basic, contains 42% serine plus alanine, and includes five copies of a six-amino-acid repeating unit. The C-terminal domain has a high a-helical content and contains a region with sequence homology to the C-terminal domains of other type I and type Ill intermediate filament proteins. We suggest that different keratin flament subtypes may have different functional roles during amphibian oogenesis and embryogenesis.

Xenopus laevis polyadenylated [poly(A)+] RNAs which are not stored in the egg, but are initially transcribed during the midblastula to gastrula stages of embryogenesis (47). As an approach to elucidate the role of the proteins synthesized by these differentially expressed gastrula RNAs, a number of individual cDNA clones have been sequenced. We have previously reported that one of these clones, DG81, is derived from a 1,570-nucleotide (nt) mRNA encoding a 47-kDa type I cytokeratin (E. Jonas, T. Sargent, and I. B. Dawid, Proc. Natl. Acad. Sci. USA, in press). In this report, we present the nucleotide and deduced amino acid sequence of another cDNA clone, pC7005, which represents most of the sequence of the DG70 mRNA. This 1,665-nt mRNA also encodes a type I cytokeratin with a molecular weight similar to that of DG81; however, the overall nucleotide and amino acid homology between the DG81 and DG70 sequences is only 55 and 47%, respectively.

The cytoskeleton of vertebrate cells consists largely of actin microfilaments, intermediate filaments (IFs), and microtubules. The 7- to 15-nm-diameter IFs are divided into five major classes based on their differing subunit composition and cell type specificity: (i) keratin (or cytokeratin) filaments, present in epithelial cells; (ii) desmin filaments, found predominately in myogenic cells; (iii) vimentin filaments, present in mesenchymally derived cells; (iv) neurofilaments, found in neuronal cells; and (v) glial filaments, present in glial cells (26; P. M. Steinert and D. A. D. Parry, Annu. Rev. Cell Biol., in press). Analysis of amino acid sequence, X-ray diffraction, and electron microscopy data have determined that all IF subunits have partial sequence homology and a similar secondary structure. From sequence data comparisons, the IF subunit family has been classified into four types of related polypeptides: type I and II keratins, type III subunits such as desmin and vimentin, and type IV neurofilament subunits (Steinert and Parry, in press). Despite substantial sequence diversity, the basic structure of all IF proteins is similar and consists of a central at-helical coiled-coil rod domain flanked by terminal domains of variable length that are generally not a-helical (8, 15; Steinert and Parry, in press). The 40- to 70-kilodalton (kDa) cytokeratins, the subunits of keratin filaments, constitute a complex family of related polypeptides which are differentially expressed in various epithelial tissues (23, 25, 33) and at different developmental stages (3, 34). The cytokeratin polypeptides have been grouped into two subfamilies; in comparison to type I cytokeratins, type II cytokeratins are generally larger, more basic, and show a lower degree of amino acid sequence homology to sheep wool type I microfibrillar keratin (19, 20, 50). Recently, two classes within the type I cytokeratin subfamily have been distinguished based on differing carboxy-terminal sequences (24). We have prepared a cDNA library representing those

MATERIALS AND METHODS Source of cDNA clones. The DG70 and DG26 cDNA clones, derived from a cDNA library prepared from differentially expressed gastrula RNAs, were found to cross-hybridize at high stringency (47). To isolate longer cDNA clones, DG70 was nick translated and used as a probe to screen another cDNA library. This second library, kindly provided by Susan Haynes, was prepared from gastrula poly(A)+ RNA by using the Okayama and Berg cloning strategy (38). The pC7005 isolate was the longest clone recovered from this screen. Nomenclature. The cDNA clone derived from the original library was named DG70; this name is retained for the mRNA and the gene encoding it. Additional cDNA clones selected from other libraries by homology to DG70 are named pC70xx, i.e., by adding a two-digit isolate number. The protein encoded by the DG70 mRNA is named XK70, for Xenopus keratin 70. DNA sequence analysis. The chemical modification procedure of Maxam and Gilbert (31) was used, except for the following changes. Reactions were for 2 min, the cytosine reaction contained 1.5 M NaCl, and the adenine-guanine

* Corresponding author. t Permanent address: Department of Physics and Biophysics, Massey University, Palmerston North, New Zealand.

2575

2576

WINKLES ET AL.

A

B

a b

E

I*

c

100

x

(c 2

z

50 -

v

d e

MOL. CELL. BIOL.

0

-&

I

0

I

L

______.

I

T

10 20 30 40 50 60 70

...L1

8i

'

hr

8 11

16

4.0--

40

30

I

5o;

65

St

FIG. 1. Temporal accumulation pattern of DG70 mRNA. (A) A 1-,ug sample of total RNA was applied to a nylon membrane and hybridized to nick-translated DG70. The RNA was prepared from (a) eggs, (b) gastrulae (stage [St.] 11), (c) late neurulae (St. 22), (d) 3 to 4-day tadpoles (St. 42), (e) 4-week tadpoles (St. 55), and (f) just-metamorphosed froglets (St. 65). The autoradiograph is overexposed to show the absence of DG70 RNA in eggs and froglets; therefore the peak levels (c, d) are well above the linear response of the film. (B) autoradiographic signals from a more complete series of RNA dot blots were quantitated with densitometry. Developmental stages are given according to Nieuwkoop and Faber (37).

reaction was done in 70% (vol/vol) formic acid instead of 1 M piperidine formate. Computer analysis of amino acid sequences. The initial detection of homology between XK70 and published IF sequences was accomplished by a search of the NBRF data base using the program DFASTP (28). More detailed comparisons were made with dot matrix (17) and Fourier (32, 39, 40) analyses. Secondary structure predictions were made with the Gamier et al. (12) and Chou and Fasman (5, 6) techniques. RNA preparation and blot analyses. RNA was prepared as previously described (Jonas et al., in press). For gel blots, 1 ,ug of X. laevis stage 35 (37) total RNA, and 1 ,ug of Escherichia coli rRNA marker were electrophoresed on a 1.2% agarose gel containing 5 mM methylmercury hydroxide (1). After electrophoresis, the gel was stained in 12 mM Tris hydrochloride-6 mM sodium acetate-0.6 mM EDTA (pH 7.8)-50 mM 2-mercaptoethanol-0.5 p.g of ethidium bromide per ml. After destaining, the gel was photographed on a UV transilluminator. Electrophoretic transfer to a nylon membrane (Zetabind, AMF Cuno), hybridization to nicktranslated DG70 DNA, and post-hybridization washing conditions were as described by Church and Gilbert (7). For dot blots, 1 ,ug of total RNA from eggs or developmental stages (37) was diluted into 0.5 ml of 25 mM NaH2PO4 (pH 7.0) and applied to a nylon membrane with a minifold device. Hybridization and washing conditions were as described above. Dot hybridization signals were quantitated by densitometry. RESULTS Isolation and sequencing of DG70 and pC7005 cDNA clones. The original DG70 cDNA clone was chosen from a cDNA library of poly(A)+ RNA sequences differentially expressed in X. laevis gastrula embryos (47). This particular clone was chosen for DNA sequencing first, because the expression of the DG70 gene(s) is developmentally regulated. In the experiment of Fig. 1, total RNA was prepared from various developmental stages; equivalent amounts were then dotted and hybridized to nick-translated DG70. DG70 RNA is first detectable at 10 h postfertilization (early gastrula); its level peaks approximately 1 day later (early tailbud), and then

during subsequent development the RNA declines in abundance at least 100-fold so that it becomes undetectable in the metamorphosed froglet (stage 65). Second, the activation of the DG70 gene(s) can be affected by cellular interactions occurring during the first 10 h of embryogenesis (T. Sargent and M. Jamrich, unpublished data). Since the size of the DG70 cDNA insert was only 75% of the length of the mature poly(A)+ RNA, we used this insert as a probe to isolate a longer cDNA clone, pC7005, from a different cDNA library prepared from gastrula poly(A)t RNA. Both strands of each of the two cDNA clones were then sequenced; the sequencing strategies and relevant restriction sites are shown in Fig. 2. In comparison to the DG70 cDNA insert, pC7005 contained 370 nt more 5' sequence and also a 417-nt insertion near its 3' end. As discussed below, we believe that the inserted DNA represents an intron present in a DG70 RNA-processing intermediate which had been copied into cDNA and cloned. Deduced amino acid sequence of pC7005. The predicted amino acid sequence of pC7005 is shown in Fig. 3. We have translated from the extreme 5' end of the cDNA, instead of the first ATG, because the clone is probably close to but not entirely full length. The cDNA insert (minus intron, see below) is 1,498 nt long; the size of the mRNA [minus an assumed 80 nt of poly(A) (45)] is 1,585 nt, as shown below. In addition, if the first ATG was in fact the initiator codon, then this type I keratin (see below) would contain an unusually short N-terminal region. Thus, approximately 85 5'-terminal nt may be missing from pC7005. Although the open reading frame continues to nt position 1206, we have not translated past nt 1147 because we believe that the sequence from nt 1148 to 1564 represents an intron. Our reasons for this assumption are as follows; first, as calculated from the RNA gel blot analysis shown in Fig. 4, the size of the predominant species of DG70 RNA is 1,665 nt. This is 330 nt less than the length of the pC7005 insert, taking into account a DG70 poly(A) tract of 80 residues. Second, the 417-nt region at position 1148 to 1564 is not present in two other cDNA clones, the original DG70 and another homologous independent isolate, DG26. DG70 has been completely sequenced; in comparison to pC7005, it has 6 nt differences in the coding region, all but one of which are silent substitutions. There are 7 nt differences in the 225-nt 3'

0

0

s

DG70

lO0nt

EBg

R

P H Ps

x TXb R I

Hp H

Il

.o

\

\\ PsHp BB

pC7D06

' I L'

Bg

R I

P H Ps It,

-L

S X

R XN\PI XbR 'I

Hp

Is

FIG. 2. Restriction maps and sequencing strategy for DG70 and pC7005 cDNA clones. Restriction sites: E, EcoRl; R, RsaI; P, Pvull; H, Hindll; X, Xhol; Xb, Xbal; Ps, Pstl; B, BamHl; Hp, Hpall; Bg, Bgll; S, Sac1; T, Taql. The E, H, and Ps sites on the ends of the two cDNA inserts are in vector DNA. The closed and open circles denote the site of 5' or 3' labels, respectively. The arrows indicate the direction and length of the sequence read from each sequencing reaction. The thin line in pC7005 represents the intron DNA fragment within this clone.

X. LAEVIS TYPE I CYTOKERATIN

VOL. 5. 1985 TTT GGG GTC Ph. Gly Vol AGC TCA GCC Ser Ser Ala AGC AAC AAT GGC AAG GAG Ser Asn Asn Gly Lys Glu GM CTA GAG CTG AAG ATC Glu Lou Glu Lou Lys IIe CTG CGC TCC CAG ATC AMT Lou Arg Ser Gln Ile Asn ATC AAC TAT GAG TCG GAG Ile Lys Tyr Glu Ser Glu

TCG GCC CGC Ser Ala Arg AAT GTG ACC Asn Vol Thr

CTT Leu TTT Phe ACC Thr AGA Arg GAC Asp ATG Met

GCA Alo GGG Gly ATG Met GAA Glu GCA Ala GCC Ala ATT 11e GAC Asp

AA CGG GCC Asn Arg Ala TTT GCA GGG Phe Ala Gly GAT CGT CTG Gin Asn Lou Asn Asp Arg Lou TAC CTT GAC MG AAA GCA GOCA Tyr Lou Asp Lys Lys Ala Ala ACC ATT GAC AAT ACC AGG CTG Thr lie Asp Asn Thr Arg Lou ATT AGG ACT GGA OCA GM AGT 11 Arg Thr Gly Ala Glu Sor

GTG Vol GCC Ala CAG AAC CTG AMT

TCA CCC GGA Ser Pro Gly GGA TCC TCA Gly Ser Ser

CGC AGT GTG GCA GGA GGT GCC Arg Ser Vol Ala Gly Gly Alo TCC TCA GCC TTT GCA GGA TCC Ser Ser Ala Phe Ala Gly Ser GCC AAC TAC CTG GAC CGA GTC Ala Asn Tyr Leu Asp Arg Vol

TCA ACT GTC Ser Thr Vol CCA GCC TTC Pro Ala Phe CGT TCC CTG Arg Ser Lou

CGC ATG TCA TCG GCT Arg Met Ser Ser Ala AAT GTT AGT GTC ACC Asn Vol Ser Vol Thr

GAG CM Glu Gin GGT TAC TAT MT ACC Gly Tyr Tyr Asn Thr AAG CTG GCA GCC GAT Lys Leu Alo Ala Asp GTC CTT GAT GAG CTG

GCC AAC CAT

2577

90 180 270

Ala Asn His ATC AAC TTA 360 AGT Ser Ile Asn Lou GAC TTC MA 450 GCC Ala Asp Ph* Lys ACC CTC AMT 540 AGA Arg Vol Lou Asp Glu Lou Thr Lou Asn CAT GAA GAG GAA CTT GCT GTT GTT CGT 630 His Glu Glu Glu Lou Ala Vol Vol Arg ATC ATG GCT GAT GTG AGG TCT CAG TAT 720 11e Met Ala Asp Vol Arg Ser Gin Tyr CTC AAC CAT GM GTT GCC ACC AAC ACA 810 Lou Asn His Glu Vol Ala Thr Asn Thr ATT GAA CTT CAG TCT CTC TTG AGC ATG 900 11e Glu Lou Gin Ser Lou Lou Ser Met CTG CAG GCG ATG ATC ACA CAA GTG GMA 990 Leu Gin Ala Met 11e Thr Gin Vol Glu GAT GCC AMG ACA AGA CTG GMA ATG GMA 1080 Asp Ala Lys Thr Arg Lou Glu Met Glu

GTT GGC TCC CTT GAT TAT Vol Gly Ser Lou Asp Tyr GTC CTC AGT ATA GAT MT Vol Leu Ser IIo Asp Asn GAC ATA GTT GGA CTG CGC Asp 11e Vol Gly Lou Arg GAA AGT CTG AMG GAA GAG CTG ATC TAC CtA MG MA MT AAA ACA GAC TTG GAG TTG GAG Glu Sotr Lou Lys Glu Glu Lou lie Tyr Lou Lys Lys Asn Lys Thr Asp Lou Glu Leu Glu GTA CAG GTA GAT TCT GCT CCT CCT GTG GAT CTG GCA CAG AGC ACT OCT AGA GGC MT GTG Vol Gln Vol Asp Ser Ala Pro Pro Vol Asp Lou Ala Gin Sor Ser Ala Arg Gly Asn Vol GM AGT ATG ATG GAG AAG MC CGC CAG GAG TTG GA GCC TGT TAC MA GGA CAG AGT GAA AAT Glu Sor Met Met Glu Lys Asn Arg Gin Glu Lou Glu Ala Cys Tyr Lys Gly Gin Ser Glu Asn OCT GCT CTC CAG ACA TCT AMG ACA GCA ATT ACA GAC CTG AA CGC ACA ATA CAA AGC TTG GAG Ala Ala Lou Gln Thr Ser Lys Thr Ala l1 Thr Asp Lou Lys Arg Thr 11e Gln Str Leu Glu AA C00 GCA CTG GM GGG ACT TTG GCA GMA ACC GAG GCC CAA TAC GGG GCT CAG CTC MC CAT Lys Gly Ala Lou Glu Gly Thr Lou Ala Glu Thr Glu Ala Gln Tyr Gly Ala Gln Lou Asn His ATG GAG CTT CM AAC TTA CGG TCA GAT GCC GAC CAC CAG TCT TTG GAG TAT AMG AGG TTG CTA Met Glu Lou Gin Asn Lou Arg Ser Asp Ala Asp His Gin Sor Leu Glu Tyr Lys Arg Lou Lou ATT GCC ACC TAC AGA COC CTA CTG GMA GGA GMA GAC ACC CGA TTT TCC CAG ACT GMA ACA CAG AAA G GTAGCCATTTGGTAATACTTCTTATTGT 1176 11 Ala Thr Tyr Arg Arg Lou Lou Glu Gly Glu Asp Thr Arg Phe Ser Gln Thr Glu Thr Gln Lys A AGGATGTGAGATGTATATTGTATCATAGATAMTAGTAGCATGCTCCCOCTGGTCACATAATGCTAAAAAAAAGTTCATACGAGAOGTGGGATGTGCTTTAGMATCTATGGTCAA 1295

TCTGGAAAATTGAATAGGTTTCATTTTTTCCAACTTGGACAAACTGCCTGAGATGCTTCAGGTGGCCCATTGAGAACAATAAATATGCAGCCAATMATGTTCTTCAATAMCCTGTACA

1414

AAGGTTMTAGTATTGTCTCTAGAGTGTTCCTATTTGGATGCGATCCGTTAAGGCATTGACCTCAGACCATCATTTCAGCTTATCCTTGACTTAGTTCACTCATGGTACATCTG

1533

CT GTT ACC ATC GTC AGT MA GAGCAG ACC TCG la Vol Thr 11 Vol Ser Lys Glu Gln Sotr Ser GAA GTC OTT GAC GGG A GTG GTC TCT TCT AGA GTT GAG GAA TTG ACT GM ACA TCC Glu Vol Vol Asp Gly Lys Vol Vol Ser S*r Arg Vol Glu Glu Lou Thr Glu Thr Sor

ATCMCACTGACTTTCCTTTGTTTGTTTCAG

G GTG MAG ACT GTG ATT GAA 1629 AGT tCC ATC MAA Sor Sor lie Lys Lys Vol Lys Thr Vol 11e Glu TM AGAGATCTTGAGATTGAGAGCGTGTACCTGAGGAGG 1728 *se

OCAATAGAAMTTGATCAACCAGAAGTTTGACAGTAATGTCGCCTTGTAGCTACGTTTATMATATGACAAGAGAGCGTAOGAAAAATMAAATGCAGTGGTGATGCCA

1847

MTCCGOCATATGTCTATGTTTGACTCTTCCTTAGOGATAACGCATCCGOTAAGATTTCAMTG(A)n

1915

FIG. 3. Nucleotide sequence of the pC7005 cDNA insert and its predicted amino acid sequence. The nucleotide sequence is numbered in the righthand margin. The polyadenylation signal sequence (9, 42) is underlined, and the translation termination codon TAA is denoted with three asterisks.

untranslated region. These relatively minor differences are most likely due to polymorphisms, which are expected to occur in these cDNA libraries that have been generated from embryos derived from several outbred frogs. DG26 has been partially sequenced; like DG70, its sequence is not interrupted at the presumed exon-intron junctions. Third, the

A B

1825-

N

C

0_1665

1541-|

FIG. 4. Size of DG70 mRNA. A 1-,ug sample of X. laevis embryo total RNA and 1 ,Lg of E. coli rRNA marker were electrophoresed. blotted, and hybridized with nick-translated DG70 DNA. Only a portion of the gel is shown. Lanes: A, photograph of gel region containing X. laev'is 18S rRNA marker (1,825 nt [461); B, photograph of gel region containing E. coli 16S rRNA (1,541 nt [4]): C, RNA gel blot of X. laei'is embryo total RNA. The size of DG70 mRNA, 1,665 nt, was estimated from the relative positions of the hybridization signal and the two rRNA species.

insertion is not homologous to any vector sequences or other regions of the pC7005 insert; therefore there is no evidence for intramolecular rearrangement of plasmid DNA during bacterial growth. Fourth, the nucleotide sequences surrounding the inserted DNA conform to the exon-intron consensus sequence (35). Finally, the location of the inserted DNA correlates well with the known position of the most 3' proximal intron present in two human keratin genes (see below; 22, 30). After the intron sequence, the open reading frame continues for another 40 amino acids (aa). The 3' untranslated region contains the polyadenylation signal sequence, AATAAA (9, 42), 12 nt 5' from the poly(A) tract. Structural analysis of the deduced amino acid sequence of XK70. The predicted 423-aa sequence of the DG70-encoded polypeptide XK70 was checked for homology to published protein sequences by using the DFASTP search program (28). The most statistically significant match was with a human epidermal keratin sequence. The sequence comparison between XK70, the X. laevis 47-kDa type I keratin XK81 (Jonas et al., in press), and the human 50-kDa type I keratin, H50K (30), is shown in Fig. 5. The overall amino acid sequence homology between XK70 and either XK81 or HSOK is 47%; XK81 is 9% more homologous to H5OK. To confirm that XK70 was indeed an intermediate filament protein, and to distinguish whether it was a type I or II subunit, its secondary structure was predicted and conmpared with that of other IF proteins (5, 6, 12). The major structural characteristics of XK70 are also shown in Fig. 5. Like all IF

2578

WINKLES ET AL.

MOL. CELL. BIOL. 20

36

46

50

60

70

90

80

I

1e

A

1 1m XK8___K.____ MYTSYRSSSASYYSGSSSKGGFGSRS LAGSNSYGGSSFGAGFSSGVGSGFSSSGGNFAMAEAASSSFGGNEKHAMON LNDR LASY LEKVRA LEAT NSDLEG

XK7B

SARFGVLASPGVNRARSVAGGASTVRMSSANVTSSAFG-GSSAFAGS-SAFAG-SPA-FNVSVTSNNGKETMONLNDRLANYLDRVRSLEOANHELEL

156K