ntron. 18. 2320. Emxl. 19. 2397. atgtttCCetttf..........tBtQttcctgCCCBC89. GGT TTC ACA. TTT GGC AAA GCT GGA GAG AK. CTC ACC AAG CGG CTC CGA TAC ...
Vol. 265, No. 1. Issue of January 5, pp. 506-514,199O Printed in CJ.S. A.
Genomic Organization of the Human Gene and Origin of P-glycoproteins*
Multidrug
Resistance
(MDRl)
(Received for publication, Chang-jie Chen, Igor B. Roninsonll
Douglas
Clark$$,
From the Department of Genetics, University National Cancer Institute, Bethesda, Maryland
Kazumitsu of Illinois,
UedaS, Chicago,
Ira Pastan+, Illinois
60612
Michael
M. GottesmanS,
and the SLaboratory
of Molecular
July 26, 1989) and Biology,
20892
A major mechanism for protection of mammalian cells against lipophilic cytotoxic drugs, known as multidrug resistance, involves energy-dependent efflux of drugs through the action of membrane-associated pump proteins called P-glycoproteins (1, 2). P-glycoproteins are encoded by a small family of mdr (or pgp) genes, which includes two members in the human and three members in the rodent genome (3, 4). The multidrug transporter encoded by one of the human genes, MDRl, is responsible for the efflux of Vinca alkaloids, anthracyclines, colchicine, epipodophyllotoxins, actinomycin D, and several other drugs, some of which are widely used in cancer chemotherapy. Substrates transported by the product
of the second gene, MDRB, have not yet been identified. mdr genes have been isolated not only from mammalian cells but also from Plasmodium falciparum, where their amplification and expression have been associated with resistance to antimalarial drugs such as chloroquine (5, 6). Both mammalian and protozoan P-glycoproteins have similar structures (6-9). P-glycoproteins, approximately 1300 amino acids long, consist of two halves that share a high degree of sequence similarity. Each half of the protein includes a short highly hydrophilic N-terminal segment, a long hydrophobic region with six predicted transmembrane segments, and a relatively hydrophilic region which contains consensus sequences for a nucleotide-binding site. These nucleotidebinding regions are apparently responsible for the ATP binding and hydrolysis by P-glycoprotein (10,ll). The nucleotidebinding regions of P-glycoprotein share homology with a group of ATP-binding bacterial proteins, which includes energy-coupling subunits of multicomponent periplasmic transport systems for the uptake of various metabolites (12). The highest levels of homology to P-glycoprotein are found in bacterial proteins hlyB, cyaB, I/&B, and ndvA, which are associated with specific efflux processes (13-16). In their hydrophobicity profiles, the number of potential transmembrane segments and sequences of the nucleotide-binding regions, the bacterial efflux proteins strongly resemble one-half of P-glycoprotein. Sequence homology between the N-terminal and C-terminal halves of P-glycoprotein suggested that this protein arose by duplication of a primordial gene (7, 8). This hypothesis predicts that introns are likely to be found at similar positions in the two halves of the protein-coding sequence, since intron positions have been found to be conserved in almost all known cases of internal duplication (17). In the present study, we have determined the complete intron/exon structure of the human MDRl gene and found very little conservation in intron positions between the two halves of the protein. On the basis of this result, we propose a new model for the evolutionary origin of P-glycoproteins.
* This work was supported by Grant CA40333 from the National Cancer Institute (to LB. R.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “aduertisement” in accordance with 18 U.&C. Section 1734 solely to indicate this fact. The nucleotide sequence(s) reported in this paper has been submitted to the GenBankTM/EMBL Data Bank with accession number(s) JO51 68. § Howard Hughes Medical Institute/NIH Research Scholar. ?I Recipient of a faculty research award from the American Cancer Society. To whom correspondence should be sent: Dept. of Genetics (m/c 669), University of Illinois at Chicago, 808 South Wood St., Chicago, IL 60612.
Construction of a genomic library using a partial PstI digest of DNA from multidrug-resistant human KB-Vl cells in the cosmid vector pSV13 has been previously described (18). The library was screened by colony hybridization with previously isolated MDRl cDNA clones (7). The hybridizing cosmids were characterized by digestion with PstI and Southern hybridization with MDRl cDNA and then used directly for sequence analysis using different primers corresponding to specific MDRl cDNA sequences. The resulting cosmids were found to contain exons l-20, 25, 26, and 28. Cosmids containing all the remaining exons were subsequently isolated from the same library by screening with short cDNA probes, amplified by
MATERIALS
506
AND
METHODS
Downloaded from http://www.jbc.org/ at CNRS on September 1, 2015
The MDRl gene, responsible for multidrug resistance in human cells, encodes a broad specificity efflux pump (P-glycoprotein). P-glycoprotein consists of two similar halves, each half including a hydrophobic transmembrane region and a nucleotide-binding domain. On the basis of sequence homology between the N-terminal and C-terminal halves of P-glycoprotein, we have previously suggested that this gene arose by duplication of a primordial gene. We have now determined the complete intron/exon structure of the MDRl gene by direct sequencing of cosmid clones and enzymatic amplification of genomic DNA segments. The MDRl gene includes 28 introns, 26 of which interrupt the protein-coding sequence. Although both halves of the protein-coding sequence are composed of approximately the same number of exons, only two intron pairs, both within the nucleotide-binding domains, are located at conserved positions in the two halves of the protein. The other introns occur at different locations in the two halves of the protein and in most cases interrupt the coding sequence at different positions relative to the open reading frame. These results suggest that the P-glycoprotein arose by fusion of genes for two related but independently evolved proteins rather than by internal duplication.
Structure
and Evolution
of the Human
FIG. 1. Map of the human MDRl gene. Exons are indicated by vertical lines. Open bars indicate the approximate length of the genomic segments contained in the corresponding clones. Plasmid clones pHDR5.1, pHDR3.25, and pHDR4.4 were obtained from sizefractionated libraries of HindIII-digested DNA, as previously described (4,
MDRl
0 PSVCG
Gene
pHDR4.4 71 pSVA4
7).
) pSVSH24
0
pSVSH13
-J pSV6A
a 0
pSVTH21
IJ
pSV17A
polymerase chain reaction (PCR)’ and corresponding to the missing exon sequences. Sequence analysis by the dideoxy chain termination technique (19) was carried out using 3-10 rg of cosmid DNA (depending on the size of the insert), isolated by a rapid polyethylene glycol mini-scale procedure (20), 0.5 pmol of 19-22-nucleotide-long primers, and either reverse transcriptase (Bio-Rad) or Sequenase (United States Biochemical Corp.) under the conditions recommended by the enzyme manufacturer. Primers were synthesized by using a DNA synthesizer (model 380A, Applied Biosystems, Inc.). The sequences of all exon/ intron junctions were determined on both strands. Nucleic acid and protein sequences were analyzed using the PC/Gene sequence analysis package (IntelliGenetics). Amplification of intron sequences by PCR (21) was carried out using one PCR primer (amplimer) corresponding to the upstream exon sequence and one amplimer complementary to the downstream exon sequence. The amplimers were 20-26 nucleotides long and contained >50% G + C. Each PCR mixture included l-10 pg of cosmid DNA or 0.5-l pg of cellular DNA, 50 pmol of each amplimer, 200 pM of each of the four dNTPs, and 1.25 units of Taq polymerase in 50 ~1 of 10 mM Tris.HCl, pH 8.3, 50 mM KCl, 1.5 mM MgCl,, 0.01% gelatin. 30 cycles of PCR were carried out in the Perkin-ElmerCetus DNA thermal cycler. Each cycle included denaturation for 1 min at 94 “C, 1 min annealing at 45-65 “C (the optimal annealing temperature varied for individual pairs of amplimers) and 4-min extension at 72 “C, with an additional 7-min extension at 72 “C added to the last cycle. A 6-~1 sample of each PCR product was then analyzed by electrophoresis in 1% agarose followed by ethidium bromide staining and Southern hybridization with “P-labeled full-length MDRl cDNA.
m
of the MDRl Gene-A genomic library from multidrug-resistant human KB-Vl cells (18), containing loo-fold amplification of the MDRl gene (22), was screened by colony hybridization with previously described MDRl cDNA clones (7, 23). The isolated cosmids (Fig. 1) were used directly for sequence analysis by the dideoxy chain termination technique using specific oligonucleotide primers corresponding to different portions of MDRl cDNA.
The positions of the splice junctions were identified by comparison of cDNA and genomic sequences. All the exon sequences of MDRl as well as intron sequences adjacent to the splice junctions were determined from the cosmid clones. The genomic sequence of MDRl exons agrees with the sequence determined from cDNA clones (7) except for the previously described differences due to mutations at codon 185 present in some multidrug-resistant cell lines (24) and cDNA cloning artifacts in the 5’-untranslated region (4). The partial genomic sequence of MDRl based on the results of this and previous studies is presented in Fig. 2. Since some of our cosmid clones were found to be rearranged relative to genomic DNA (data not shown), the sizes of the intervening sequences were determined by enzymatic amplification of genomic DNA by PCR. DNA from two independently isolated multidrug-resistant cell lines KB-Vl and KBC4 (the latter containing 30-fold amplification of MDRl) (22), as well as the appropriate cosmid clones, was used as a template for PCR amplification using oligonucleotides from adjacent exons of the gene as amplimers. The intron sizes were determined by gel electrophoresis and ethidium bromide staining of the resulting PCR products, each containing an intron and portions of two adjacent exons (Fig. 3). The primers used for PCR and the results of the PCR assays are summarized in Table I. By using this approach, we were able to amplify all but three introns of the MDRl gene, with the largest PCR-amplified segment being 5.7 kb in size. No amplification was carried out for intron 6, since its size was known from previous sequence analysis (7). One of the remaining introns (intron 8) could not be amplified from genomic DNA, but it was possible to amplify this intron as a 7.5-kb band starting from a cosmid clone. The sizes of the introns -1 and 4 were estimated to exceed 15-20 kb, judging from their lack of linkage in the cosmid clones and the results of previous Southern hybridization analysis of genomic DNA
’ The kilobase(
Southern It should
RESULTS
AND
DISCUSSION
Sequencing and Mapping
cosmid
(4). abbreviations used are: PCR, bp, base pair(s).
polymerase
chain
reaction;
kb,
The
specificity
of the
PCR
products
hybridization with MDRl cDNA be noted that such hybridization
was
confirmed
by
(data not shown). does not provide
Downloaded from http://www.jbc.org/ at CNRS on September 1, 2015
psv13c
Structure EXO"
and Evolution
of the Human
-1
MDRl
Gene
-330
CCTACTCTATTCAGATATTCTCCAGATTCCTAAA~TTIGG
Intro"
-1
gtnaggtncnnntnct~ttt...........ctgcagaaaaatttctccta9ccttttcaaa99t9tt*~
gaagcagaaaggtgatecagaatrgg*gaggfcggagtttttgt*ttaactgtattaaatgcgaatcccgageaaafttcccttaactecgtcctgtagftatatggatafgeag*cttatgtgaactttgaaagacgtgtctacata*gttgaaatgtcccc**t -329 gattcsgctgatgcgcgtttctctscttgccctttctag
AGAGGTGCMCGGAllGCCAC~CATTCCTCCTGGAMTTCAACCTGTTTCGCAGTTTCTCGAG~TCAGCATTCAGTCMTCCGGGCCGGGAGCAGTCATCTGTGGT~GGCTGATTGGCTGGGC
Exon la -140 Exon lb AGGMCAGCGCCGGGGCGTGGGCTGAGCACAGCCGCTTCGCTCTCTTTGCCACAG~GCCT~GCTCATTCGAGTAGCGGCTCTTC~GCTC~G~GCA~GGCCGCTGTTCGTTTCCTTTAGGTCTTTCCACTA~GTCG~GTATCTTCTTCC~TTT -7 CACGTCTTGGTGGCCGTTCCAAGGAGCGCGAG
Intro"
1
gtaggggCsCgCsssgCtgggsgCtsCtstgggsCsgttCCCssgtgtCsggctttcsgstttcctgsscttggtcttcacgggsgssgggcttcttgsggCgtggstsgtgtgssgtCCtCtggCssgtCCs
tggggaccasgtggggttagatct*g*ctceggegctccgcagcgcccaaaccgtegtggcectggaccatgttgcccggagcgcgc*cagccgggtgcggggacctgctctctgegcccgcgggcggtgggtggg*ggaagc*tcgtccgcggcgactgg*eccg -6 GTCGGG
ggagggageatcgcactggcggcgggcaaagtccn~nacgcgctgcCagaCCCCCnaCtCtgCCttCgtggagatgctggagaccccgcgcecnggaaegcccctgcag...........ggcgtttctcttceg
ATG Met
GAT Asp
CTT Le"
GAA Glu
GGG GAC CGC MT Gly Asp Arg Am
CCA Pro
Exon 3 ACT GTC Thr l's1
ACT Ser
GTA Val
TTT Phe
117 TCA ATG SW Met
ACT Thr
TTG Leu
GCC ATC Ala Ile
ATC Ile
CAT His
GCT Ala
MG Lys
MC Lys
MC Asn
TTT Phe
TTT Phe
AAA Lys
Intro"
CTG Leu
MC Asn
MT Asn
GGA CTT Gly Lsu
CCT Pro
CTC Leu
ATG llet
63 AG gtaactegcttgttt........... Se
AAA Lys
CM Gin
CAG Gln
AAA Lys
ATT Ile
AGA Arg
MA Lys
339 G TAT g Tyr
GGA GAA Gly Glu
ATG llet
Exm ACA Thr
4 CAT ATC Asp Ile
TTT Phe
TTT Phe
CAT GCT His Ale
ATA lie
ATG Met
CGA Arg
TTT Phe
TT, Phe
GCA Ala
MT Asn
287 CT GAT er Asp
TAT Tyr
CAG Gln
TAC Tyr
AAA Lys
CAT Asp
MG Lys
MC Lys
GAA MG Glu Lys
MA LYS
118
CTG GTG TTT Leu Val Phe
GCC TAT Als Tyr
69 T GAA r Glu
sttgctgttttgcag
ttttttctctctttttsg
ATG Met
2
AGT Ser
GAG ATA Glu Ile
GW Gly
ATT Ile
GGC TGG TTT Gly Trp Phe
GGT GCT Gly Ala
GGG GTG CTG Gly Val Le"
GAT GTG CAC Asp Val His
GAT Asp
GTT Val
GCA Ala
ATC Ile
GTT Val
CGC Arg
TAT Tyr
TCA MT Ser As"
GGA MT Gly Asn
TTA Lw
MT As"
GCT Ale
GAT Asp
GCT Ala
GGG GAG CTT Gly Glu Leu
ACA Thr
GAA Glu
CAT Asp
GGG TTC Gly Phe
TAC ATT Tyr Ile
MC Asn
TGG CTT Trp Leu
CAG Gin
ACC Thr
CGA Arg
CTG ATG Leu net
CTT Leu
TTG Le"
Em" TCA TTT Ser Phe
ACA Thr
TAT Tyr
TCA AAC Ser Asn
Em" 5 ATG MT CTG Met As" Le"
TTC Phe
GTT Val
GAC MG Asp Lys
ATG Met
ATC Ile
GTG GTG GW Vsl Vsl Gly
ACT Thr
GAG GM Glu Glu
6 TGG TGC Trp Cys
GAC Asp
CTG GCA Le" Ale
286 AGA A Arg S
MT Asn
ATG Met
GCT Ale
ACC Thr
GGA AGA Gly Arg
1ntrm
530 GA As
gtaagtstttagttttatgttgssctt
CGA Gly
TTT Phe
6
gggtgtcgttcttatccttagtaaeatgaeategatgtcetc*c*tctgtt*ggeggtgttaetgtetcettcaeeggtactt*tg*gaca*aettccttct**gcagc*aca*tgtcgtgtgcetccttttgttccc*gtgccttgacagggt*tggggggecct gcatgactagcattaa*tga*ggactgggctttccagaafge*g**atcctctg*g**tgtgc*gtagagcs**ece*gat*ctttctg*gg*eatttctgagceatttgaaettcctaggttgaatacftcttgtgtacacgatgtccatttcctggggccafgt ggctetggatttttgttgttaatg*~***t*t~~t*gteg*e*~tt~t*c~~tg~t***t**ee~*seg~etegg~e~*e**t*~t~t*g~~*teescte~~~te~*~t~*e**~*gg~tt~*~g*geeeegttgetgttt*~**ttctg*~**tt*ttt~t*e~e 531 T GAT p Asp
CtstCtgttCtttCsg
GTC Vsl
TCC AAG Ser Lys
ATT Ile
MT As"
GAA Glu
GGA ATT Gly Ile
GGT GAC Gly Asp
ACC Thr
CTT Le"
GTG ATT Vsl Ile
TTG Le"
CCC Ala
ATC Ile
AGT Ser
CCT Pro
GTT Val
CTT Le"
GGA CTG Gly Leu
TCA Ser
TCA Ser
TTT Phe
ACT Thr
GAT Asp
AM Lys
GAA Glu
CTC Le"
TTA Le"
GCG TAT Ala Tyr
GCA Ala
AAA Lys
Exon 8 GGA GCA GTA GCT Gly Ale Vsl Ala
Intro" tttt...........tttttgttctttttctcsg
a
828 G TIC g Tyr
MC Am
GCT Ala
AAA AAT Lys As"
TTA Le"
GCT Ala
AM Lys
GM Glu
ATT IIe
GCT GTC Ala Vsl
GAA Glu
GW Gly
ATG Met
TGG GCA Trp Ala
GM Glu
GCT Ala
AAA Lys
Ema 7 TTC TTT CAG Phe Phe Gin
702 AAG Lys
GAG GTC Glu Val
AGA Arg
TCT Ssr
ATT Ile
ATT Ile
TCA SW
TTG Le"
GCA Ala
GGG ATA Gly Ile
GGA CM Gly Gin
GCA Ala
ATT Ile
AAG Lys
999 CTC ACT Leu Thr
CTG Leu
GCC TTC Ala Phe
TGG TAT Trp Tyr
GGG ACC Gly Thr
ACC Thr
TTG Le"
GTC Vsl
CTC Leu
TCA GGG GAA SeP Gly Glu
TAT Tyr
TCT Ser
GTA Val
TTA Lsu
ATT Ile
GGG GCT Gly Ala
TTT Phe
AGT Ser
GTT Vel
GGA Gly
CAG GCA Gln Ala
TCT SW
CCA Pro
AGC Ser
ATT Ile
GM Glu
Exon 10 GCA TTT GCA Ala Phe Ala
MT Am
GCA AGA Ala Arg
MC Lys
AGT Ser
ATT Ile
WC Asp
AGC Ser
TAT Tyr
AGT Ser
CCC Gly
CAC HiS
AAA Lys
CCA Pro
ATT Ile
Exon 11 MG CGA MT Lys Gly Asn
1114 GAT AA1 Asp Am
GTA Val
Introll aCagtget**atg*ttaatC*eceettaetctettgeetga*gegtttctgetgttttcttgteg*gatteteee**egtgcet~tet*ttt**acctagtgaacagtcegttcctetatcctgtgtctgtgaattgccttgeegtttttttctcacggtcctggt
1225 ag ATC ,,e 1350 ATG net
TTG AAG LeU L,'S
GGC Gly
CTG LW
MC Am
CTG AAG LW Lys
GTG CAG Vsl Gln
AGT Ser
GGG CAG ACG Gly Gln Thr
GTG GCC Vsl Ale
CTG LeU
GTT Vst
GGA MC Gly Asn
AGA Arg
AAA GCT Lys Ala
GCT Ala
TCG MG Ser Lys
ACA Thr
TTT Phe
TTC Phe
1ntrm
TAT Tyr
CCA Pro
GCA Ala
AGG TTT Arg Phe
AAG Lys
GAA Glu
GGG CAG Gly Gin
CTA Le"
CGG GAA Arg Glu
ATC Ile
ATT Ile
GGT GTG GTG AGT Gly Val Vel Ser
CAG Gin
GCC MT Ala Asn
GCC TAT Ala Tyr
&AC Asp
TTT Phe
ATC Ile
ATG Met
AAA Lys
1554 CCT CAT PPO His
MG Lys
AGG ATC Aipg Ile
GCC ATT Ala Ile
GCA Ala
CGT Arg
GCC CTG GTT Ala Ls" Vsl
CAG Gin
Intro" gtcagtgaggcttegttceaaccaacc..
14 . . . . . . ..ssstttCtCtCtCtttsg
CTG Leu
GAA CCT GTA Glu Pro Vsl
TTG LeU
TTT Phe
CCC Mb Pro Lys
1726 GCC AGA AAA Ala Arg Lys
ATC Ile
ATA Ile
GTA Val
ACT Thr
ATT Ile
TTG Leu
GM Glu
CGT Arg
7
GTG ATT Val Ile
ACA Thr
GCA Ala
EXO" 9 GCC AAT Ala As"
TTT Phe
@GA GGA CM Gly Gly Gin
ATT Ile
TCT Ser
ATA Ile
AAG AAA Lys Lys
GGT GCT Gly Ala
GM Glu
GCT Ala
TAT Tyr
GAA ATC Glu Ile
TTC Phe
AAG Lys
ATA Ile
ATT Ile
GAT Asp
1113 AAT As"
TTC Phe
AGA Arg
GTT Vat
CAC His
TTC Phe
AGT Ser
TAC Tyr
CCA pro
MT Am
GGT TGG MG Gly Trp Lys 703 ATA 1le
CTT Leu
TTC Phe
GM Glu
CTA Leu
CTA Leu
TCT ser
027 AG gttgsgtttctt Ar
CTG CTG Le" Le"
Intro" 9 . . . . . . . . . ..tttttcttcscsttcctcsg
gtssgtgtttscsttgsgsss
GGA GCA GCT Gly Ala Ale
ACA Thr
ATC Ile
TAT Tyr
1000 GTA Val
GCA ALa
TTC Phe
TTT Phe
Intro"
10
gtssgtctgsgttggcc........
TCT ser
CGA AM Arg ~ys
GM Glu
1224 GTT MG Val Lys
gt
11
Em" 12 ACT GGC TGT Ser Gly Cys
GGG A& Gly Lys
AGC SW
ACA Thr
AU Tbr
GTC Vsl
CAG Gl"
CTG Le"
ATG net
CAG Gl"
AGG Arg
CTC Lw
TAT Tyr
GAC Asp
CCC pm
ACA Thr
OAG GGG Gl" G,y
1351
GCC ICC Ale Thr
ACG Thr
Intro" gtssgttgtCCttgCCCtttgCCCtt........tgggttttctgtggtag
CGC AAC Arg As"
GGG TTT Gly Phe
ssstgtsttttssscsg
Intro" 12 gtgngatgOCCCefgCgegCtngaCtgCggtgatCBgCBg~tCtttCt~~t~tt~CCCtttC~~tt~C~~~t~t~t~~~~~tC~C~CtteCtttttettCCag
GTA Val
ACT Thr
gteggtgasgcctgtgsstccagatfftgaactgcacetf~t~~...........
TCT Ser
asattgstctgttag
ATG "et
CTC Lw
CTG Leu
Emn 13 ATA GCT Its Ala
GAA MC Glu Am
ATT Ile
GTG ATA Vsl Ile
GCT Ala
CCC Arg
WT
GGA
CAG GAT
ATT
AGG ACC
ATA
MT
Asp
Gly
Gln
Ile
Arg
Ile
Am
ACC Thr
ATG net
CAT Asp
1555 AAA Lys
TTT Phe
WC Asp
ACC Thr
CTG GTT Le" Val
TCA GCC TTG SW Ale Ls"
GAC Asp
ACA Thr
GAA Glu
AGC Ser
GAA GCA GTG GTT CAG GTG GCT CTG Gl" Ala Va, Val Gin Val Ala Le"
CAT His
TCT Ser
ACA Thr
GTT Val
CGT Arg
AAT Am
GGA GAG AGA Gly Glu Arg
GCT Ala
GAC Asp
GTC Val
GAG ATT Glu Ile
Thr
AAT As"
TTG Leu
GTC Val
Asp
GAA Glu
CGT Arg
TAT Tyr
GTT Val
Ser
GGC CGT Gly Arg
13
Em" 14 CTG CAT GAG GCC ACG Leu Asp Glu Ale Thr
GGT CGG ACC ACC Gly Arg Thr Thr
ATT Ile
GTC AGT Vat
GAG AAA Glu Lys
GGG GCC CAG Gly Ala Gin
ATC Ile
GCT Ala
FIG. 2. Partial genomic sequence of the human MDRl gene. The sequences of exon -1 and the 5’ end of intron -1 are from Ref. 4; the 3’ end of exon -1, exon 1, and intron 1 are from Ref. 18 with minor corrections; exons 6 and 7 and intron 6 are from Ref. 7; the rest of the sequences was determined in the present study. The splice junctions were identified by alignment with the full-length cDNA sequence of MDRl (7). Numbers of cDNA residues, corresponding to the exon borders, are indicated. Exon sequences are shown in upper case and intron sequences in lower case letters. Exon la corresponds to the portion of exon 1 located 5’ from the downstream transcription initiation site at -140; exon lb is the portion of exon 1 located 3’ from this site.
TTG Le"
Exon 15 GGT TTC Gly Phe
CC, Ala
GTC Val
AGT Ser
GGT Gly
1725 GAT AAG Asp Lys
GAT Asp
GAT Asp
Downloaded from http://www.jbc.org/ at CNRS on September 1, 2015
5
Intro"
3
,ntron 4 fftttaCatgttt~ttttt~~t~~~~~Ct~**~*gtC~t~~~t9tt~tgtttgttttgtggtggtCteg
338 Intro" AG gtssttsgsCsttCttC...........ttCtCCttCtttttCsg Ar
CAC His
Em" 2 GGA GCA MG Gly Ala Lys
gtgsgttttgsstttsttasctstacseaafacttcggaeattt...........
GGG GCT Gly Ala
gtatgtattgtttgtgt...........
ATA Ile
CGA Gly
Structure and Evolution
of the Human
CGA Gly
GTC Val
ATT Ile
GTG Vat
GAG AAA Glu Lys
CGA MT Gly Asn
CAT His
GAT Asp
GM Glu
CTC Lw
ATG Met
AAA Lys
GAG AAA Glu Lys
GGC ATT Gly Ile
TAC TTC Tyr Phe
AIIA Lys
CT1 Leu
GM Glu
GTT Va,
GM GLu
TTA Leu
GAA Glu
MT As,,
GCA ALa
GCT Ale
GAT Asp
GM Clu
TCC SW
AAA AGT LYS SW
GM GLU
ATT Ile
CCC Ala
GAA GlU
TCT SeP
Exca 16 TCA MT GAT Ser As" Asp
CM Gin
CCC Ala
CAA Cl"
GAC Asp
AGA Arg
MC Lys
CTT Leu
AGT Ser
ACC Thr
AAA Lys
2064 GAG GCT CTG Glu Ala Le"
TTA Lw
ACT Thr
GM Glu
Exon 17 TOG CCT TAT Trp Pro Tyr
TTT Phe
GTT Val
GTT Vsl
GGT GTA Gly Vat
TTT Phe
GA1 Asp
TTG LeU Intro"
ATG Met
ATT Ile
ATA Ile
GTT Val
,ntron atgtttCCetttf..........tBtQttcctgCCCBC89
TTT Phe
ACA AGA ATT Thr Arg IIe
18
GAT Asp
GAT Asp
CCT Pro
AAT As"
GGT TTC tly Phe
21 TTA Leu
CTC Leu
TTA Leu
AM L'IS
GGA GGC CT0 Gly Gly Leu
CGA Arg
WG Gl"
PAT As"
ACA Thr
TTT Phe
GGC AAA Gly Lys
GCT A(a
GAT Asp
GTG ACT Val SW
CM Cl"
AGT Ser
GCA Ala
ATT Ile
GTA Vel
CCC Pro
TCC Ser
AGT Ser
CTA Le,,
ATA ILe
A&, A.rg
MI LYS
A&i, Airg
TCA Ser
ACT Thr
1ntron
GCA Ale
ATA Ile
CTA Lw
TTG Le"
ATA Ile
TTT Phe
Emxl 19 AAG CGG CTC CGA Lys Arg Leu Arg
CTC ACC Leu Thr
GAT GAC Asp Asp
CCT Pro
ATC Ike
ATT Ike
GCA Ala
GCT ATA Ala Ile
GGT TCC Gly SW
AGG CTT Arg Le"
TTT Phe
MA LYS
MC Am
ICC Thr
ACT Thr
CGA GLy
ATA Ile
TCA SW
CTA Leu
CCT Pro
AAG Lys
GCC CTT Ala Le,,
TAC ATG Tyr t&t
GTT Val
CCA Pro
ATT Ile
GTT Vel
TTC Phe
CGA ArQ
TCC Ser
2211 CCC Gly
ATA Ile
GGA AT, Gly Ile
EX0r-l 20 TTG ACT ACC Leu Thr Thr
GCA Ala
ATA Ile
GW Gly
Cl1 Val
GCA Ala
CTT Val
GCT Ala
GTA Vel
ATT Ile
ACC Thr
CAG Gin
AAT As"
ATA Ile
GCA Ale
MT As"
GAA ATG Clu Uet
AAA LYS
ATG Met
TTG LeU
TCT SeP
CGA GIY
CM Cl"
GCA Ale
CTG AAA Leu Lys
ACT Thr
GAJ Glu
GCA Ala
ATA Ile
GAA MC Glu Asn
TTC Phe
CGA ACC Arg Thr
GTT Val
Cl1 Vet
ATT ,,e
eatgtcttcftttcgag
A MC g Asn
TCT Ser
TTG AGG AAA Leu Arg Lys
GCA Ale
UC His
ATC Ile
TTG Leu
GTG GCA Val Ala
CA, His
AM Lys
CTC Leu
ATG net
AGC Ser
TTT Phc
GAG GAT Glu Asp
GTT Vel
2927 CTC TT gt88gtBffQQQCfBt........... Leu Le
GGG CM Gly Gin
GTC Vet
ACT Ser
TCA Ser
GCT ALa
CCT Pro
WC Asp
TAT Tyr
CCC Ala
AM Lys
GCC AAA Ala Lys
ATA Ile
Exon 24 TCA GCA CCC CAC ATC Ser Ala ALa His Ile
TTT Phe
Intro"
24 ttcttctrattgcag
GGA GAC Gly Asp
CTT Leu
TCA Ser
GCT Ala
AM Lys
TAT Tyr
AGC Ser
CTC ~eu
GCC AAT 118 Asn
ATT ,,e
AGA ArQ
ACA Thr
2397 CAG Gin
GAT GCT Asp Ala
GAG CAG MC Gtu Gin Lys
GGA ATA Gly Ile
ATT Ile
CAT Asp
AAA Lys
CTA GAA Leu Glu
TTT Phe
MC Lys
GAA CAT Glu His
GGA ATT Cly Ile lntron
ACA Thr
TTT Phe
TCC TTC Ser Phe
ACC Thr
23
CAG Gin
GCA Ala
CGT Arg
GGA TcA G,y Ser
ATG Met
AAG Lys
CTA Leu
MT As"
17
TTT Phe
TTC phe
2319 CAG G,"
CTT Leu
Qtaa
Qtatgt~t~f~g~ggg......
GCT Ala
CAA Cl,,
GTT Vat
2481 GGG gtacg Gly
AAA Lyr
GAA Glu
ATG net
TAT Tyr
ATC ILe
ATG Met
ATC Ile
ATG clef
ATG Wet
ATA Ile
GCT Ale
TCC Ser
MC Asn
ACT Thr
ATC Ile
CAG AGT Gin Ser
TTG Leu
TAT Tyr
MC As"
ACA Thr
TTG Leu
GM GLu
CGA GLy
MT Asn
GTC Val
ACA Thr
CTG Leu
GCT Ala
CTG Lw
GTG GGC AGC Vel Gly Ser
AGT SW
GGC TGT Gly Cys
CCC Cl,’
AAG Lys
AGC ACA Ser Thr
CTG Leu
CTT Leu
CAT Asp
CCC Cly
AAA Lys
GM GLu
ATA Ile
MG Lys
CTG LNI
MT Asn
GTT Val
AGC Ser
CGG Arg
CTC Vsl
GTG TCA Val SCP
CAG Gin
GAA Glu
GAG ATC Glu ILe
TTT Phe
GGT GM Gly GLu
GTG GTC Vel Val
GTT Val
CGA Arg
CAG TGG Gin Trp
GTG AGG GCA GCA MC Val Arg Ala Ale Lys
AAA LYs
GTA Vel
CGA Gly
3636 MC Lys
GAC Asp
AAA Lys
GGA ACT Gly Thr
CAG Gin
ATT Ite
GM Glu
ML Lys
A GTA " Val
ICC Thr
CCT Pro
TTG Leu
ATT Ike
GTA Val
TTC Phe
MC Asn
TAT Tyr
CCC Pro
ACC Thr
CGA Arg
TTT Phe
GAC Asp
TCA Se?
ACC Ser
GCT Ala
TAC Tyr
GTT Vel
ACC Ser
CCG GAC Pro Asp
ATC Ile
GTC Val
ACG lhr
CCA‘GTG Pro Vat
CAG Gl"
CTC Leu
CTG Lw
GAG CGG TTC GlU Arg Phe
TAC Tyr
WC Asp
CCC Pro
TTG GCA Le" Ale
GGG MA Gly Lys
CTC CGA Leu Arg
GCA Ala
CAC His
CTG Leu
GGC ATC Gly Ile
GTG TCC Vel Ser
CAG Cl"
GAG CCC Glu Pro
ATC Ik
GAG CCC Glu Ala
AAC Am
ATA Ile
CAT His
GCC TTC Ala Phe
ATC Ile
GAG TCA CTG Glu Ser Leu
TCT Ser
GGT GGC CAG Gly Gly Cln
AAA Lys
CAA CCC ATT Gin Arg Ile
Intro" 27 . . . . . . . . . ..atgtgettatggeatag
GCC ATA Ala Ile
GCT CGT Ala Arg
GCC CTT Ale Leu
GTT Val
CAG Gin
GCT Ala
GAA GLu
AGT Ser
GAA Glu
CGC CTG Arg Lw
TCC SW
ACC Thr
ATC Ile
CAG Gl"
AAT AS"
GCA Ale
GCT Ala
ACA Thr
MG Lys
3840 CGC CAG ArQ Cl"
TGA TEA
ACTCT~CTGTATGA~TGTT~T~TTTTTMTATTTGTTTA~TATGA~T~TATT~GTT~G~~C~TACA~~TAT~GAGGTA~CTGTT~~CATTTCCTCAGTC~GTT~~GTCTT~~G
TTA Let!
ATA Ile
GTG GTG TTT Val Vel Phe
CAG Ml Gin As"
GGC AGA GLY Arg
GTC MG Val Lys
GGT TGG CM, Gly lrp GL"
gtgngtcaeectae
GTA Val
CGA Gly
CTA Le"
27.36 TAC AG gte Tyr Ar
CCA Pro
TG, Cys
TTC Phe
CGG TTT Arg Phe
TTT Phe
GAA Glu
GGT GCC ATG Gly Ala Met
GGC CTA Gly Leu
Em" AGA ArQ
3489 CCT MT Pro Am 27 CAG CCT Gl" Pro
CTT Leu
CAG Gin
ATG Met
3084 CCC gtgagttt Pro
Em" 25 GGA CTG AGC CTG Gly Leu Ser Leu
3282 GTG QtQagcacactttcBca........... Vel
CTG LW
GCC GTG Ala VaI
Exon 26 TTT GAC TGC phe Asp cys
AGC Ser
AT, 1le
Intron
lntron
GCT Ale
GAG GLu 25
GAG MC Glu AS"
26
gtaagtctctcttcasa...........aaaacctt
CAT His
ATT Ile
TTG L+u
CTT Lw
TTG GAT GM Leu Asp Glu
GCC ACG Ala Thr
3637
ACA Thr
GAC Asp
gta891)8ttf88lttgggtfcet
CTC Leu
TAT Tyr
2683 GGG AAG Cly Lys
GGT GCT Gly Ala
Exon 23 TTT TCC Phe Ser
TAT Tyr
TTC Phe
2298 ttttQfgfffQtQCttfCCaQ
CTG GAT Leu ASP
GW Gly
TGG AGG ATT Trp Arg ILe
TTT Phe
GGG ACA Gly Thr
3490 atttacag
GTC Vet
GGA MT Gly As"
Intro"
3283 98tCtQtgWXCttgftttCag
TAT Tyr
AGG AGT Arg Ser
3085
Qat9tffC**ctgttt...........
ACG Thr
Em" 22 TTC ACT CAG Leu Thr GLn
TCT Ser
TTT Phe
TAC Tyr
GCC Ale
CGT Arg
GCA Ala
gtsegtgtgetgccca..........eeaeafcct
TCC ATG ser llet
AGG CTC Arg Leu
TTT Phe
TCT Sop
2787
22
GGA GCC Cly Ale
ATT Ile
1888 ACA Thr
GTT GTC Val Val
CM Gin
GAA GCC CTG GAC Glu Ala Leu Asp
AAA Lys
GCC AGA Ala Arg
GM Glu
Exon 28 GAG CAT GGC ACG Clu His CIY lhr
CAT His
CAG Cl"
GCA Ala
WC Gl"
GGC ATC Gly Ile
CAG CL"
CTG Leu
CTC Lw
AAA Lys
GGC CCC Gly Arg
TAT Tyr
ACC Thr
TGC Cys
ATT Ile
GTG All vat Ile
GCT Ala
CAC His
TTT Phe
TW Ser
ATG Wet
GTC Val
GTC Val
CAG Cl"
AGT Ser
FIG. 2-continued
definitive proof that the PCR products represent the length of the entire intron segment between two exons, since it is conceivable that some PCR products may result from specific priming at one end and nonspecific priming at the other end. For introns 1, 2, 10-16, 22, 23, 25, and 27, this possibility was ruled out by amplifying genomic DNA segments spanning several introns and exons and demonstrating that the size of these PCR products corresponded to the sum of the sizes of individually amplified introns and exons (Table I). The length of introns 6,11, and 12 is known from their complete sequence determined in genomic clones. For the longest PCR-amplified intron (intron 8), the termini of the PCR product were shown by hybridization to map to those restriction fragments of the
cosmid clone that contained the corresponding exons (data not shown). In the other cases, however, our determination of the intron sizes should presently be viewed as provisional. Intron/Eron Structure of the MDRI Gene-The map of the MDRl gene, spanning >lOO kb of DNA, is shown in Fig. 1. The sizes of all exons and introns, the positions of the intervening sequences, and the sequences of the splice junctions are summarized in Table II. The MDRl gene includes 29 exons, numbered from -1 to 28; introns are numbered as the preceding exons. This numbering system reflects the fact that MDRl mRNA can be transcribed from two different promoters, an upstream and a downstream promoter, with the downstream promoter preferentially expressed in most
Downloaded from http://www.jbc.org/ at CNRS on September 1, 2015
ATCGCT IleAle
GGC CAG Gly Gin
TTT Phe
Em" 18 TTG TTT TCA LW Phe Ser
2686
ataaccQctgeagagt...........
MO LyS
1ntrwl 15 . . . . . . . ..tttettatttattttag
2482
l"fPcm 21 a...........ggtgctQtctgttatcaQ
GTG MG Val Lys
TCA AGA Ser Arg
GAT GM ACT Asp GLu Ser
CCA GCA Pro Ala
AAC Am
GGA GAG AK Cly Clu Ike
TGG TTT Trp Phe
509
Qtati,gtttaacttcagaa..
2398
lntron 20 tgCCtCcttt...........ttfctctaatttgttttgftftgcag
Exon CTG Lw
ACA Thr
Gene
2065
2320
lntron 19 . . ..tcft8ta8ec8Qcttta8gOfeefaaaatcetttfcrgtg~~~~~g
ACA Thr
Wu Glu
1887 CAG Cln
ATG net
t9taataatttgtQttttCfag
2212 ttaaatgttttctcacag
ACA Thr
16
QtetQ8aQggagetgC...........
TGT CCC Cys Ala
GTC Val
MIlRl
Structure
510
amplification
of
MDRl
introns
in
the Human MDRl
of
Gene
cell types (18, 23). The upstream promoter is found at the promoter is beginning of exon -1,’ and the downstream located within exon 1, with the major transcription initiation site at nucleotide -140 (23). The portion of exon 1 located 5’ from the downstream promoter is designated exon la, and the 3’ portion of this exon is called exon lb (Fig. 2). The ATG translation initiation codon is located within exon 2. The MDRl exon sizes range from 49 to 587 bp or from 16 to 69 codons in the protein-coding region. The average length of the internal protein-coding exons of MDRl is 47.5 codons, in agreement with the internal exon sizes found in other genes (44.5 codons average length) (17). All the splice junctions follow the GT/AG rule and agree with consensus sequences for the donor and acceptor sites (25) (Table II). Among the introns located within the open reading frame, 19 introns interrupt this frame between the codons (type 0 introns), one intron interrupts the frame after the first nucleotide of a codon (type 1 intron), and six introns occur after the second nucleotide of a codon (type 2 introns). Introns of different types show highly uneven distribution throughout the gene (Figs. 4 and 5). The part of the gene coding for the N-terminal and membrane-bound regions in the N-terminal (left) half of the protein includes eight introns, four of which belong to type 2, three to type 0, and one to type 1, in no apparent order. In contrast, the equivalent region in the C-terminal (right) half begins with six introns of type 0, followed by two introns of type 2. Both nucleotide binding regions contain only type 0 introns. The protein-coding sequence of MDRl comprises 27 exons,
genomic
Ethidium bromide staining of a 1% agarose gel, containing PCR products, obtained using amplimers and DNA templates enumerated in Table I. Numbers on top indicate introns amplified in each PCR product. Arrows indicate bands that hybridized with a fulllength MDRl cDNA probe after Southern transfer (data not shown). The rightmost lane contains the 1-kb ladder (Bethesda Research Laboratories) used as size standards. DNA.
‘A. V. Gudkov and I. B. Roninson, unpublished TABLE
Estimation Sue of PCR
product -24-l
- -2’3/~50
- 69
-4 - 15/94 - 117 69-90/l%-211 “89-310/394-416 631-651/784-808 Z-747/862-882 828-846/1094-1113 1003-1024/1203-1222 1114-1134/1277-1296 1225-1244/1403-1422 1429-1448/1705-1725 1703-172.5/1727-li47 l’i27-1747/1949-1969 1888-1908/2122-2142 2065-208.5/2269-2288 2212-2233/2377-2397 2.125 - 2346/2459 - 2478 2399 - 2418/2548 - 2570 “Sl-2606/X67-2786 ‘X2-2534/2909-2928 2821-2840/3064-3083 3017 - 3036/3136 - 3154 3206-3225/3341-3360 3283 - 3301/3613 - 3635 3573 - 359”/3823 - 3848 -113 - -92/94 - 117 1003 - 1024/1705 - 1725 1429-14X3/1727-1747 liO3 - 172.5/1949 - 1969 1727-1747/2122-“142 2712 - 27:34/3064 - 3083 :3283 - 3301/3823 - 3848 ” Position in cDNA sequence.
bP 800
4400 1200 3200 4900 5500 3100 450 350 300 600 3200 1100 800 2900 2100 5000
1400 1500 2900 1200 5700 3600 1700 .iOOO 1600 3800 4300 1800 4200 5200
data.
I
of in&-on. sires
bv PCR
lntron(s)
amplified 1 2 3 5 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 1+2 10 + 11 + 13 + 14 + 15 + 22 + 26 +
Genomic DNA template KB-C4, KB-C4, KB-Vl KB-C4, KB-C4.
KB-Vl KB-Vl
KB-C4, KB-C4, KB-C4, KB-C-I, KB-C4, KB-Vl KB-C4, KB-C4, KB-C4, KB-C4, KB-C4, KB-C4, KB-Vl, KB-C4, KB-C4, KB-C4, KB-C4, KB-C4, KB-C4,
KB-Vl KB-Vl KB-Vl KB-Vl KB-Vl
Cosmld DNA template
pSVB1 KB-Vl KB-Vl pSVC6
12 + 13 14 15 16 23 27
KB-C4 KB-Vl KB-Vl KB-C4 KB-Al
KB-Vl KB-Vl KB-Vl KB-Vl KB-Vl KB-Vl KB-Al KB-Vl KB-Vl KB-Vl KB-Vl KB-Vl KB-Vl
pSVA4 pSVA4 pSVA4 pSVA4 pSVA4 pSVA4 pSVA4 pSVA4 pSVA4 pSVA4 pSVSH13 pSVSH13 pSV6A
pSVB1 pSVA4 pSVA4 psv.44 pSVA4 pSVSH13
Downloaded from http://www.jbc.org/ at CNRS on September 1, 2015
3. PCR
and Evolution
Structure
and Evolution
of the Human
TABLE
MDRl
Gene
511
II
Splice junctions in the MDRl gene Exon/intron number
EXOn
EXOIl
size
end”
Exon 3' junction
Intron
Intron
5’ junction
b -1
>95 323 74 49 169 52 192 172 125 172 114 111 126 204 171 162 177 147 108 78 84 204 101 141 157 198 207 147 587
Consensus Freauencv
3’ junction
Exon
1ntron WPe
5’ junction
b -330 -7 68 117 286 338 530 702 827
999 1,113 1,224 1,350 1,554 1,725 1,887 2,064 2,211 2,319 2,397 2,481 2,685 2,786
2,927 3,084 3,282
3,489 3,636 4,223
sequence ( 5%)
n Position in cDNA ’ NA, not applicable.
CCAGATAAAAG AGGAGCGCGAG AAT AAA AG TTT TCA ATG AAT AGA A ATG ACC AG CTT ACA GA TGG GCA AAG CTT GAA AG GTA CTC ACT ATT GAT AAT GAA GTT AAG GAG GGG ATG CTG CCT CAT CTG GAT AAG ACA ATG CAG GAG GCT CTG ATT ATA GGG TTC CTT CAG CTC AGA CAG GTT AAA GGG GCT GGG AAG CCA TAC AG GTT CTG TT CTA ATG CCG GGG AA4 GTG CTG CCT AAT ATG GAA AAG
A
G/g
64
75
100
gtaaggtacaaatac gtaggggcacgcaaa gtaactagcttgttt gtgagttttgaattt gtatgtattgtttgt gtaattagacattct gtaagtatttagttt gtaggtgaagcctgt gttgagtttcttttt gtaagtgtttacatt gtaagtctgagttgg gtacagtgataaatg gtgagatgacccatg gtaagttgtccttgc gtcagtgaggcttag gtatagtttaacttc gtatgaagggagatg gtaagtgtgatgccc gtaaatgtttccatt gtatgtctatcgagg gtacgtgcctccttt gtgagtcaaactaaa gtaataaccgctgaa gtaagtattgggcta gtgagtttgatgttt gtgagcacactttca gtaagtctctcttca gtaagaatttaaatt
>18,000 500 4,300 1,000 >20,000 3,100 541 4,700 7,500 2,800 250 170 200 300 3,200 850 600 2,700 1,900 2,700 4,800 1,100 1,200 2,500 1,100 5,400 3,300 1,400
acttgccctttctag ggcgtttctcttcag attgctgttttgcag tttctctctttttag tttgtggtggtctag ctccttctttttcag tatctgttctttcag atgtattttaaacag tgttctttttctcag cttcacattcctcag aaattgatctgttag tcacggtcctggtag tactttttattccag ggttttctgtggtag tttctctctctttag tcttatttattttag atttgtgttttctag aatgttttctcacag tgttcctgcccacag ttttctgtgccacag tgttttgttttgcag gctgtctgttatcag tgtcttcttttcgag gtttgtgctttccag ttcttctcattgcag aactcttgttttcag aaaccttatttacag gtgattatggaatag
. . . . . . . . ,Y
t
a
a
g
t.....
100
75
68
75
64.............82
AGAGGTGCAAC GTCGGG ATG T GAA AAA TTT CGC TAT GT GAT ATC G TAT GCC T GAT GTC ATA CTA TCT G TAC AAC GTA TTC TTT AAG CCA AGT ATC TTG AAG GTC AGT GTT AA4 TTT GAC GCC AGA AAA ACA GCA CGA GAT GAA ACT GTT TTT ACA GGT TTC ACA GAT GTG ACT GCT ATA GGT ATC GCT ACT A AAC TCT A GTA TTT AAC ACA TTG CTG CTT GAT AAA TAT AGC GTT GTC CAA
Y Y Y Y Y nc 68
86
75
89
75
w
61
a
g/N
100
100
NAb NA 2
0 1 2 2
0 2
0 0 0 0 0 0 0 0 0 0 0 0 0 2 2
0 0 0 0
sequence.
14 of which encode the left and 13 the right half of the protein (Figs. 4 and 5). For many genes, specific correlations have been demonstrated between individual exons and structural or functional domains of the protein (17, 26-28). Based on the hydrophobicity profiles, we have previously subdivided each half of P-glycoprotein into a highly hydrophilic Nterminal region, a hydrophobic membrane-bound region, and a relatively hydrophilic nucleotide-binding region (7). None of the exon borders match precisely with these somewhat arbitrary demarcation lines, but introns 3 and 10 in the left half and introns 16 and 24 in the right half are found reasonably close to these borders. Each of the nucleotide-binding folds in the hydrophilic portion is contained within a separate exon. In the hydrophobic region, 4 out of 12 predicted transmembrane segments of P-glycoprotein are interrupted by introns, but specific transmembrane segments, predicted on the basis of hydrophobicity analysis (29), may not necessarily be precise. The entire lengths or the major parts of eight transmembrane segments are encoded by individual exons, but one pair of adjacent transmembrane segments in each half of the protein is encoded by the same exon. In the absence of additional information about tertiary structure of P-glycoprotein, it does not seem possible at this time to determine whether the exons indeed correspond to structural domains of this protein. Analysis of intron positions in the alignment of the left and the right halves of P-glycoprotein indicates little similarity (Figs. 4 and 5). Within the nucleotide binding region, one pair of introns (introns 13 and 26) is matched precisely, and another pair (introns 12 and 25) is shifted by one codon, with
both introns belonging to type 0. Such a shift can be readily explained by intron sliding (see below). The sizes of the matching introns are quite different (Table II), but variability of intron sizes among homologous introns has been frequently observed. Outside of the nucleotide-binding domain, only one pair of introns, 9 and 23, is found at corresponding codons, but these two introns belong to different types. None of the other introns are found at equivalent positions in this alignment. Preferential conservation of intron positions within the nucleotide-binding regions parallels a much higher degree of amino acid sequence homology for these regions (43.0 alignment score, as calculated using PCOMPARE program, based on the method of Needleman and Wunsch (30) for amino acid residues 351-632 and 994-1280) than for the rest of the protein (alignment score 16.6). Evolutionary Implications-On the basis of sequence similarity between the left and the right halves of P-glycoprotein, we and others have previously suggested that this protein arose by duplication of a primordial gene (7-9). We expected to find significant conservation of intron positions between the two halves of the MDRl gene, since almost all other known genes with an internal duplication show strong conservation of the intron positions between the duplicated domains (16). (The only exception known to us involves the rabbit muscle phosphofructokinase gene, which shows no apparent conservation of intron positions between its two similar halves; no explanation for that discrepancy has been proposed (31).) We have found, however, that only two or three pairs of introns in the MDRl gene are located at corresponding positions in both halves of P-glycoprotein, and
Downloaded from http://www.jbc.org/ at CNRS on September 1, 2015
1 2 3 4 5 6 1 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
Intron
length
512
Structure
.:.:::::.::
.:::::
::
:. .::
: ::::
:: :: :. :
GKEIXRLNVQWLRAHLGIVSQEPILFDCSIAENIAYGDNSRWSQEEIVRARXEAN~HAPIES~PNKYST
l‘