J. Biol. Chem

1 downloads 0 Views 2MB Size Report
ntron. 18. 2320. Emxl. 19. 2397. atgtttCCetttf..........tBtQttcctgCCCBC89. GGT TTC ACA. TTT GGC AAA GCT GGA GAG AK. CTC ACC AAG CGG CTC CGA TAC ...
Vol. 265, No. 1. Issue of January 5, pp. 506-514,199O Printed in CJ.S. A.

Genomic Organization of the Human Gene and Origin of P-glycoproteins*

Multidrug

Resistance

(MDRl)

(Received for publication, Chang-jie Chen, Igor B. Roninsonll

Douglas

Clark$$,

From the Department of Genetics, University National Cancer Institute, Bethesda, Maryland

Kazumitsu of Illinois,

UedaS, Chicago,

Ira Pastan+, Illinois

60612

Michael

M. GottesmanS,

and the SLaboratory

of Molecular

July 26, 1989) and Biology,

20892

A major mechanism for protection of mammalian cells against lipophilic cytotoxic drugs, known as multidrug resistance, involves energy-dependent efflux of drugs through the action of membrane-associated pump proteins called P-glycoproteins (1, 2). P-glycoproteins are encoded by a small family of mdr (or pgp) genes, which includes two members in the human and three members in the rodent genome (3, 4). The multidrug transporter encoded by one of the human genes, MDRl, is responsible for the efflux of Vinca alkaloids, anthracyclines, colchicine, epipodophyllotoxins, actinomycin D, and several other drugs, some of which are widely used in cancer chemotherapy. Substrates transported by the product

of the second gene, MDRB, have not yet been identified. mdr genes have been isolated not only from mammalian cells but also from Plasmodium falciparum, where their amplification and expression have been associated with resistance to antimalarial drugs such as chloroquine (5, 6). Both mammalian and protozoan P-glycoproteins have similar structures (6-9). P-glycoproteins, approximately 1300 amino acids long, consist of two halves that share a high degree of sequence similarity. Each half of the protein includes a short highly hydrophilic N-terminal segment, a long hydrophobic region with six predicted transmembrane segments, and a relatively hydrophilic region which contains consensus sequences for a nucleotide-binding site. These nucleotidebinding regions are apparently responsible for the ATP binding and hydrolysis by P-glycoprotein (10,ll). The nucleotidebinding regions of P-glycoprotein share homology with a group of ATP-binding bacterial proteins, which includes energy-coupling subunits of multicomponent periplasmic transport systems for the uptake of various metabolites (12). The highest levels of homology to P-glycoprotein are found in bacterial proteins hlyB, cyaB, I/&B, and ndvA, which are associated with specific efflux processes (13-16). In their hydrophobicity profiles, the number of potential transmembrane segments and sequences of the nucleotide-binding regions, the bacterial efflux proteins strongly resemble one-half of P-glycoprotein. Sequence homology between the N-terminal and C-terminal halves of P-glycoprotein suggested that this protein arose by duplication of a primordial gene (7, 8). This hypothesis predicts that introns are likely to be found at similar positions in the two halves of the protein-coding sequence, since intron positions have been found to be conserved in almost all known cases of internal duplication (17). In the present study, we have determined the complete intron/exon structure of the human MDRl gene and found very little conservation in intron positions between the two halves of the protein. On the basis of this result, we propose a new model for the evolutionary origin of P-glycoproteins.

* This work was supported by Grant CA40333 from the National Cancer Institute (to LB. R.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “aduertisement” in accordance with 18 U.&C. Section 1734 solely to indicate this fact. The nucleotide sequence(s) reported in this paper has been submitted to the GenBankTM/EMBL Data Bank with accession number(s) JO51 68. § Howard Hughes Medical Institute/NIH Research Scholar. ?I Recipient of a faculty research award from the American Cancer Society. To whom correspondence should be sent: Dept. of Genetics (m/c 669), University of Illinois at Chicago, 808 South Wood St., Chicago, IL 60612.

Construction of a genomic library using a partial PstI digest of DNA from multidrug-resistant human KB-Vl cells in the cosmid vector pSV13 has been previously described (18). The library was screened by colony hybridization with previously isolated MDRl cDNA clones (7). The hybridizing cosmids were characterized by digestion with PstI and Southern hybridization with MDRl cDNA and then used directly for sequence analysis using different primers corresponding to specific MDRl cDNA sequences. The resulting cosmids were found to contain exons l-20, 25, 26, and 28. Cosmids containing all the remaining exons were subsequently isolated from the same library by screening with short cDNA probes, amplified by

MATERIALS

506

AND

METHODS

Downloaded from http://www.jbc.org/ at CNRS on September 1, 2015

The MDRl gene, responsible for multidrug resistance in human cells, encodes a broad specificity efflux pump (P-glycoprotein). P-glycoprotein consists of two similar halves, each half including a hydrophobic transmembrane region and a nucleotide-binding domain. On the basis of sequence homology between the N-terminal and C-terminal halves of P-glycoprotein, we have previously suggested that this gene arose by duplication of a primordial gene. We have now determined the complete intron/exon structure of the MDRl gene by direct sequencing of cosmid clones and enzymatic amplification of genomic DNA segments. The MDRl gene includes 28 introns, 26 of which interrupt the protein-coding sequence. Although both halves of the protein-coding sequence are composed of approximately the same number of exons, only two intron pairs, both within the nucleotide-binding domains, are located at conserved positions in the two halves of the protein. The other introns occur at different locations in the two halves of the protein and in most cases interrupt the coding sequence at different positions relative to the open reading frame. These results suggest that the P-glycoprotein arose by fusion of genes for two related but independently evolved proteins rather than by internal duplication.

Structure

and Evolution

of the Human

FIG. 1. Map of the human MDRl gene. Exons are indicated by vertical lines. Open bars indicate the approximate length of the genomic segments contained in the corresponding clones. Plasmid clones pHDR5.1, pHDR3.25, and pHDR4.4 were obtained from sizefractionated libraries of HindIII-digested DNA, as previously described (4,

MDRl

0 PSVCG

Gene

pHDR4.4 71 pSVA4

7).

) pSVSH24

0

pSVSH13

-J pSV6A

a 0

pSVTH21

IJ

pSV17A

polymerase chain reaction (PCR)’ and corresponding to the missing exon sequences. Sequence analysis by the dideoxy chain termination technique (19) was carried out using 3-10 rg of cosmid DNA (depending on the size of the insert), isolated by a rapid polyethylene glycol mini-scale procedure (20), 0.5 pmol of 19-22-nucleotide-long primers, and either reverse transcriptase (Bio-Rad) or Sequenase (United States Biochemical Corp.) under the conditions recommended by the enzyme manufacturer. Primers were synthesized by using a DNA synthesizer (model 380A, Applied Biosystems, Inc.). The sequences of all exon/ intron junctions were determined on both strands. Nucleic acid and protein sequences were analyzed using the PC/Gene sequence analysis package (IntelliGenetics). Amplification of intron sequences by PCR (21) was carried out using one PCR primer (amplimer) corresponding to the upstream exon sequence and one amplimer complementary to the downstream exon sequence. The amplimers were 20-26 nucleotides long and contained >50% G + C. Each PCR mixture included l-10 pg of cosmid DNA or 0.5-l pg of cellular DNA, 50 pmol of each amplimer, 200 pM of each of the four dNTPs, and 1.25 units of Taq polymerase in 50 ~1 of 10 mM Tris.HCl, pH 8.3, 50 mM KCl, 1.5 mM MgCl,, 0.01% gelatin. 30 cycles of PCR were carried out in the Perkin-ElmerCetus DNA thermal cycler. Each cycle included denaturation for 1 min at 94 “C, 1 min annealing at 45-65 “C (the optimal annealing temperature varied for individual pairs of amplimers) and 4-min extension at 72 “C, with an additional 7-min extension at 72 “C added to the last cycle. A 6-~1 sample of each PCR product was then analyzed by electrophoresis in 1% agarose followed by ethidium bromide staining and Southern hybridization with “P-labeled full-length MDRl cDNA.

m

of the MDRl Gene-A genomic library from multidrug-resistant human KB-Vl cells (18), containing loo-fold amplification of the MDRl gene (22), was screened by colony hybridization with previously described MDRl cDNA clones (7, 23). The isolated cosmids (Fig. 1) were used directly for sequence analysis by the dideoxy chain termination technique using specific oligonucleotide primers corresponding to different portions of MDRl cDNA.

The positions of the splice junctions were identified by comparison of cDNA and genomic sequences. All the exon sequences of MDRl as well as intron sequences adjacent to the splice junctions were determined from the cosmid clones. The genomic sequence of MDRl exons agrees with the sequence determined from cDNA clones (7) except for the previously described differences due to mutations at codon 185 present in some multidrug-resistant cell lines (24) and cDNA cloning artifacts in the 5’-untranslated region (4). The partial genomic sequence of MDRl based on the results of this and previous studies is presented in Fig. 2. Since some of our cosmid clones were found to be rearranged relative to genomic DNA (data not shown), the sizes of the intervening sequences were determined by enzymatic amplification of genomic DNA by PCR. DNA from two independently isolated multidrug-resistant cell lines KB-Vl and KBC4 (the latter containing 30-fold amplification of MDRl) (22), as well as the appropriate cosmid clones, was used as a template for PCR amplification using oligonucleotides from adjacent exons of the gene as amplimers. The intron sizes were determined by gel electrophoresis and ethidium bromide staining of the resulting PCR products, each containing an intron and portions of two adjacent exons (Fig. 3). The primers used for PCR and the results of the PCR assays are summarized in Table I. By using this approach, we were able to amplify all but three introns of the MDRl gene, with the largest PCR-amplified segment being 5.7 kb in size. No amplification was carried out for intron 6, since its size was known from previous sequence analysis (7). One of the remaining introns (intron 8) could not be amplified from genomic DNA, but it was possible to amplify this intron as a 7.5-kb band starting from a cosmid clone. The sizes of the introns -1 and 4 were estimated to exceed 15-20 kb, judging from their lack of linkage in the cosmid clones and the results of previous Southern hybridization analysis of genomic DNA

’ The kilobase(

Southern It should

RESULTS

AND

DISCUSSION

Sequencing and Mapping

cosmid

(4). abbreviations used are: PCR, bp, base pair(s).

polymerase

chain

reaction;

kb,

The

specificity

of the

PCR

products

hybridization with MDRl cDNA be noted that such hybridization

was

confirmed

by

(data not shown). does not provide

Downloaded from http://www.jbc.org/ at CNRS on September 1, 2015

psv13c

Structure EXO"

and Evolution

of the Human

-1

MDRl

Gene

-330

CCTACTCTATTCAGATATTCTCCAGATTCCTAAA~TTIGG

Intro"

-1

gtnaggtncnnntnct~ttt...........ctgcagaaaaatttctccta9ccttttcaaa99t9tt*~

gaagcagaaaggtgatecagaatrgg*gaggfcggagtttttgt*ttaactgtattaaatgcgaatcccgageaaafttcccttaactecgtcctgtagftatatggatafgeag*cttatgtgaactttgaaagacgtgtctacata*gttgaaatgtcccc**t -329 gattcsgctgatgcgcgtttctctscttgccctttctag

AGAGGTGCMCGGAllGCCAC~CATTCCTCCTGGAMTTCAACCTGTTTCGCAGTTTCTCGAG~TCAGCATTCAGTCMTCCGGGCCGGGAGCAGTCATCTGTGGT~GGCTGATTGGCTGGGC

Exon la -140 Exon lb AGGMCAGCGCCGGGGCGTGGGCTGAGCACAGCCGCTTCGCTCTCTTTGCCACAG~GCCT~GCTCATTCGAGTAGCGGCTCTTC~GCTC~G~GCA~GGCCGCTGTTCGTTTCCTTTAGGTCTTTCCACTA~GTCG~GTATCTTCTTCC~TTT -7 CACGTCTTGGTGGCCGTTCCAAGGAGCGCGAG

Intro"

1

gtaggggCsCgCsssgCtgggsgCtsCtstgggsCsgttCCCssgtgtCsggctttcsgstttcctgsscttggtcttcacgggsgssgggcttcttgsggCgtggstsgtgtgssgtCCtCtggCssgtCCs

tggggaccasgtggggttagatct*g*ctceggegctccgcagcgcccaaaccgtegtggcectggaccatgttgcccggagcgcgc*cagccgggtgcggggacctgctctctgegcccgcgggcggtgggtggg*ggaagc*tcgtccgcggcgactgg*eccg -6 GTCGGG

ggagggageatcgcactggcggcgggcaaagtccn~nacgcgctgcCagaCCCCCnaCtCtgCCttCgtggagatgctggagaccccgcgcecnggaaegcccctgcag...........ggcgtttctcttceg

ATG Met

GAT Asp

CTT Le"

GAA Glu

GGG GAC CGC MT Gly Asp Arg Am

CCA Pro

Exon 3 ACT GTC Thr l's1

ACT Ser

GTA Val

TTT Phe

117 TCA ATG SW Met

ACT Thr

TTG Leu

GCC ATC Ala Ile

ATC Ile

CAT His

GCT Ala

MG Lys

MC Lys

MC Asn

TTT Phe

TTT Phe

AAA Lys

Intro"

CTG Leu

MC Asn

MT Asn

GGA CTT Gly Lsu

CCT Pro

CTC Leu

ATG llet

63 AG gtaactegcttgttt........... Se

AAA Lys

CM Gin

CAG Gln

AAA Lys

ATT Ile

AGA Arg

MA Lys

339 G TAT g Tyr

GGA GAA Gly Glu

ATG llet

Exm ACA Thr

4 CAT ATC Asp Ile

TTT Phe

TTT Phe

CAT GCT His Ale

ATA lie

ATG Met

CGA Arg

TTT Phe

TT, Phe

GCA Ala

MT Asn

287 CT GAT er Asp

TAT Tyr

CAG Gln

TAC Tyr

AAA Lys

CAT Asp

MG Lys

MC Lys

GAA MG Glu Lys

MA LYS

118

CTG GTG TTT Leu Val Phe

GCC TAT Als Tyr

69 T GAA r Glu

sttgctgttttgcag

ttttttctctctttttsg

ATG Met

2

AGT Ser

GAG ATA Glu Ile

GW Gly

ATT Ile

GGC TGG TTT Gly Trp Phe

GGT GCT Gly Ala

GGG GTG CTG Gly Val Le"

GAT GTG CAC Asp Val His

GAT Asp

GTT Val

GCA Ala

ATC Ile

GTT Val

CGC Arg

TAT Tyr

TCA MT Ser As"

GGA MT Gly Asn

TTA Lw

MT As"

GCT Ale

GAT Asp

GCT Ala

GGG GAG CTT Gly Glu Leu

ACA Thr

GAA Glu

CAT Asp

GGG TTC Gly Phe

TAC ATT Tyr Ile

MC Asn

TGG CTT Trp Leu

CAG Gin

ACC Thr

CGA Arg

CTG ATG Leu net

CTT Leu

TTG Le"

Em" TCA TTT Ser Phe

ACA Thr

TAT Tyr

TCA AAC Ser Asn

Em" 5 ATG MT CTG Met As" Le"

TTC Phe

GTT Val

GAC MG Asp Lys

ATG Met

ATC Ile

GTG GTG GW Vsl Vsl Gly

ACT Thr

GAG GM Glu Glu

6 TGG TGC Trp Cys

GAC Asp

CTG GCA Le" Ale

286 AGA A Arg S

MT Asn

ATG Met

GCT Ale

ACC Thr

GGA AGA Gly Arg

1ntrm

530 GA As

gtaagtstttagttttatgttgssctt

CGA Gly

TTT Phe

6

gggtgtcgttcttatccttagtaaeatgaeategatgtcetc*c*tctgtt*ggeggtgttaetgtetcettcaeeggtactt*tg*gaca*aettccttct**gcagc*aca*tgtcgtgtgcetccttttgttccc*gtgccttgacagggt*tggggggecct gcatgactagcattaa*tga*ggactgggctttccagaafge*g**atcctctg*g**tgtgc*gtagagcs**ece*gat*ctttctg*gg*eatttctgagceatttgaaettcctaggttgaatacftcttgtgtacacgatgtccatttcctggggccafgt ggctetggatttttgttgttaatg*~***t*t~~t*gteg*e*~tt~t*c~~tg~t***t**ee~*seg~etegg~e~*e**t*~t~t*g~~*teescte~~~te~*~t~*e**~*gg~tt~*~g*geeeegttgetgttt*~**ttctg*~**tt*ttt~t*e~e 531 T GAT p Asp

CtstCtgttCtttCsg

GTC Vsl

TCC AAG Ser Lys

ATT Ile

MT As"

GAA Glu

GGA ATT Gly Ile

GGT GAC Gly Asp

ACC Thr

CTT Le"

GTG ATT Vsl Ile

TTG Le"

CCC Ala

ATC Ile

AGT Ser

CCT Pro

GTT Val

CTT Le"

GGA CTG Gly Leu

TCA Ser

TCA Ser

TTT Phe

ACT Thr

GAT Asp

AM Lys

GAA Glu

CTC Le"

TTA Le"

GCG TAT Ala Tyr

GCA Ala

AAA Lys

Exon 8 GGA GCA GTA GCT Gly Ale Vsl Ala

Intro" tttt...........tttttgttctttttctcsg

a

828 G TIC g Tyr

MC Am

GCT Ala

AAA AAT Lys As"

TTA Le"

GCT Ala

AM Lys

GM Glu

ATT IIe

GCT GTC Ala Vsl

GAA Glu

GW Gly

ATG Met

TGG GCA Trp Ala

GM Glu

GCT Ala

AAA Lys

Ema 7 TTC TTT CAG Phe Phe Gin

702 AAG Lys

GAG GTC Glu Val

AGA Arg

TCT Ssr

ATT Ile

ATT Ile

TCA SW

TTG Le"

GCA Ala

GGG ATA Gly Ile

GGA CM Gly Gin

GCA Ala

ATT Ile

AAG Lys

999 CTC ACT Leu Thr

CTG Leu

GCC TTC Ala Phe

TGG TAT Trp Tyr

GGG ACC Gly Thr

ACC Thr

TTG Le"

GTC Vsl

CTC Leu

TCA GGG GAA SeP Gly Glu

TAT Tyr

TCT Ser

GTA Val

TTA Lsu

ATT Ile

GGG GCT Gly Ala

TTT Phe

AGT Ser

GTT Vel

GGA Gly

CAG GCA Gln Ala

TCT SW

CCA Pro

AGC Ser

ATT Ile

GM Glu

Exon 10 GCA TTT GCA Ala Phe Ala

MT Am

GCA AGA Ala Arg

MC Lys

AGT Ser

ATT Ile

WC Asp

AGC Ser

TAT Tyr

AGT Ser

CCC Gly

CAC HiS

AAA Lys

CCA Pro

ATT Ile

Exon 11 MG CGA MT Lys Gly Asn

1114 GAT AA1 Asp Am

GTA Val

Introll aCagtget**atg*ttaatC*eceettaetctettgeetga*gegtttctgetgttttcttgteg*gatteteee**egtgcet~tet*ttt**acctagtgaacagtcegttcctetatcctgtgtctgtgaattgccttgeegtttttttctcacggtcctggt

1225 ag ATC ,,e 1350 ATG net

TTG AAG LeU L,'S

GGC Gly

CTG LW

MC Am

CTG AAG LW Lys

GTG CAG Vsl Gln

AGT Ser

GGG CAG ACG Gly Gln Thr

GTG GCC Vsl Ale

CTG LeU

GTT Vst

GGA MC Gly Asn

AGA Arg

AAA GCT Lys Ala

GCT Ala

TCG MG Ser Lys

ACA Thr

TTT Phe

TTC Phe

1ntrm

TAT Tyr

CCA Pro

GCA Ala

AGG TTT Arg Phe

AAG Lys

GAA Glu

GGG CAG Gly Gin

CTA Le"

CGG GAA Arg Glu

ATC Ile

ATT Ile

GGT GTG GTG AGT Gly Val Vel Ser

CAG Gin

GCC MT Ala Asn

GCC TAT Ala Tyr

&AC Asp

TTT Phe

ATC Ile

ATG Met

AAA Lys

1554 CCT CAT PPO His

MG Lys

AGG ATC Aipg Ile

GCC ATT Ala Ile

GCA Ala

CGT Arg

GCC CTG GTT Ala Ls" Vsl

CAG Gin

Intro" gtcagtgaggcttegttceaaccaacc..

14 . . . . . . ..ssstttCtCtCtCtttsg

CTG Leu

GAA CCT GTA Glu Pro Vsl

TTG LeU

TTT Phe

CCC Mb Pro Lys

1726 GCC AGA AAA Ala Arg Lys

ATC Ile

ATA Ile

GTA Val

ACT Thr

ATT Ile

TTG Leu

GM Glu

CGT Arg

7

GTG ATT Val Ile

ACA Thr

GCA Ala

EXO" 9 GCC AAT Ala As"

TTT Phe

@GA GGA CM Gly Gly Gin

ATT Ile

TCT Ser

ATA Ile

AAG AAA Lys Lys

GGT GCT Gly Ala

GM Glu

GCT Ala

TAT Tyr

GAA ATC Glu Ile

TTC Phe

AAG Lys

ATA Ile

ATT Ile

GAT Asp

1113 AAT As"

TTC Phe

AGA Arg

GTT Vat

CAC His

TTC Phe

AGT Ser

TAC Tyr

CCA pro

MT Am

GGT TGG MG Gly Trp Lys 703 ATA 1le

CTT Leu

TTC Phe

GM Glu

CTA Leu

CTA Leu

TCT ser

027 AG gttgsgtttctt Ar

CTG CTG Le" Le"

Intro" 9 . . . . . . . . . ..tttttcttcscsttcctcsg

gtssgtgtttscsttgsgsss

GGA GCA GCT Gly Ala Ale

ACA Thr

ATC Ile

TAT Tyr

1000 GTA Val

GCA ALa

TTC Phe

TTT Phe

Intro"

10

gtssgtctgsgttggcc........

TCT ser

CGA AM Arg ~ys

GM Glu

1224 GTT MG Val Lys

gt

11

Em" 12 ACT GGC TGT Ser Gly Cys

GGG A& Gly Lys

AGC SW

ACA Thr

AU Tbr

GTC Vsl

CAG Gl"

CTG Le"

ATG net

CAG Gl"

AGG Arg

CTC Lw

TAT Tyr

GAC Asp

CCC pm

ACA Thr

OAG GGG Gl" G,y

1351

GCC ICC Ale Thr

ACG Thr

Intro" gtssgttgtCCttgCCCtttgCCCtt........tgggttttctgtggtag

CGC AAC Arg As"

GGG TTT Gly Phe

ssstgtsttttssscsg

Intro" 12 gtgngatgOCCCefgCgegCtngaCtgCggtgatCBgCBg~tCtttCt~~t~tt~CCCtttC~~tt~C~~~t~t~t~~~~~tC~C~CtteCtttttettCCag

GTA Val

ACT Thr

gteggtgasgcctgtgsstccagatfftgaactgcacetf~t~~...........

TCT Ser

asattgstctgttag

ATG "et

CTC Lw

CTG Leu

Emn 13 ATA GCT Its Ala

GAA MC Glu Am

ATT Ile

GTG ATA Vsl Ile

GCT Ala

CCC Arg

WT

GGA

CAG GAT

ATT

AGG ACC

ATA

MT

Asp

Gly

Gln

Ile

Arg

Ile

Am

ACC Thr

ATG net

CAT Asp

1555 AAA Lys

TTT Phe

WC Asp

ACC Thr

CTG GTT Le" Val

TCA GCC TTG SW Ale Ls"

GAC Asp

ACA Thr

GAA Glu

AGC Ser

GAA GCA GTG GTT CAG GTG GCT CTG Gl" Ala Va, Val Gin Val Ala Le"

CAT His

TCT Ser

ACA Thr

GTT Val

CGT Arg

AAT Am

GGA GAG AGA Gly Glu Arg

GCT Ala

GAC Asp

GTC Val

GAG ATT Glu Ile

Thr

AAT As"

TTG Leu

GTC Val

Asp

GAA Glu

CGT Arg

TAT Tyr

GTT Val

Ser

GGC CGT Gly Arg

13

Em" 14 CTG CAT GAG GCC ACG Leu Asp Glu Ale Thr

GGT CGG ACC ACC Gly Arg Thr Thr

ATT Ile

GTC AGT Vat

GAG AAA Glu Lys

GGG GCC CAG Gly Ala Gin

ATC Ile

GCT Ala

FIG. 2. Partial genomic sequence of the human MDRl gene. The sequences of exon -1 and the 5’ end of intron -1 are from Ref. 4; the 3’ end of exon -1, exon 1, and intron 1 are from Ref. 18 with minor corrections; exons 6 and 7 and intron 6 are from Ref. 7; the rest of the sequences was determined in the present study. The splice junctions were identified by alignment with the full-length cDNA sequence of MDRl (7). Numbers of cDNA residues, corresponding to the exon borders, are indicated. Exon sequences are shown in upper case and intron sequences in lower case letters. Exon la corresponds to the portion of exon 1 located 5’ from the downstream transcription initiation site at -140; exon lb is the portion of exon 1 located 3’ from this site.

TTG Le"

Exon 15 GGT TTC Gly Phe

CC, Ala

GTC Val

AGT Ser

GGT Gly

1725 GAT AAG Asp Lys

GAT Asp

GAT Asp

Downloaded from http://www.jbc.org/ at CNRS on September 1, 2015

5

Intro"

3

,ntron 4 fftttaCatgttt~ttttt~~t~~~~~Ct~**~*gtC~t~~~t9tt~tgtttgttttgtggtggtCteg

338 Intro" AG gtssttsgsCsttCttC...........ttCtCCttCtttttCsg Ar

CAC His

Em" 2 GGA GCA MG Gly Ala Lys

gtgsgttttgsstttsttasctstacseaafacttcggaeattt...........

GGG GCT Gly Ala

gtatgtattgtttgtgt...........

ATA Ile

CGA Gly

Structure and Evolution

of the Human

CGA Gly

GTC Val

ATT Ile

GTG Vat

GAG AAA Glu Lys

CGA MT Gly Asn

CAT His

GAT Asp

GM Glu

CTC Lw

ATG Met

AAA Lys

GAG AAA Glu Lys

GGC ATT Gly Ile

TAC TTC Tyr Phe

AIIA Lys

CT1 Leu

GM Glu

GTT Va,

GM GLu

TTA Leu

GAA Glu

MT As,,

GCA ALa

GCT Ale

GAT Asp

GM Clu

TCC SW

AAA AGT LYS SW

GM GLU

ATT Ile

CCC Ala

GAA GlU

TCT SeP

Exca 16 TCA MT GAT Ser As" Asp

CM Gin

CCC Ala

CAA Cl"

GAC Asp

AGA Arg

MC Lys

CTT Leu

AGT Ser

ACC Thr

AAA Lys

2064 GAG GCT CTG Glu Ala Le"

TTA Lw

ACT Thr

GM Glu

Exon 17 TOG CCT TAT Trp Pro Tyr

TTT Phe

GTT Val

GTT Vsl

GGT GTA Gly Vat

TTT Phe

GA1 Asp

TTG LeU Intro"

ATG Met

ATT Ile

ATA Ile

GTT Val

,ntron atgtttCCetttf..........tBtQttcctgCCCBC89

TTT Phe

ACA AGA ATT Thr Arg IIe

18

GAT Asp

GAT Asp

CCT Pro

AAT As"

GGT TTC tly Phe

21 TTA Leu

CTC Leu

TTA Leu

AM L'IS

GGA GGC CT0 Gly Gly Leu

CGA Arg

WG Gl"

PAT As"

ACA Thr

TTT Phe

GGC AAA Gly Lys

GCT A(a

GAT Asp

GTG ACT Val SW

CM Cl"

AGT Ser

GCA Ala

ATT Ile

GTA Vel

CCC Pro

TCC Ser

AGT Ser

CTA Le,,

ATA ILe

A&, A.rg

MI LYS

A&i, Airg

TCA Ser

ACT Thr

1ntron

GCA Ale

ATA Ile

CTA Lw

TTG Le"

ATA Ile

TTT Phe

Emxl 19 AAG CGG CTC CGA Lys Arg Leu Arg

CTC ACC Leu Thr

GAT GAC Asp Asp

CCT Pro

ATC Ike

ATT Ike

GCA Ala

GCT ATA Ala Ile

GGT TCC Gly SW

AGG CTT Arg Le"

TTT Phe

MA LYS

MC Am

ICC Thr

ACT Thr

CGA GLy

ATA Ile

TCA SW

CTA Leu

CCT Pro

AAG Lys

GCC CTT Ala Le,,

TAC ATG Tyr t&t

GTT Val

CCA Pro

ATT Ile

GTT Vel

TTC Phe

CGA ArQ

TCC Ser

2211 CCC Gly

ATA Ile

GGA AT, Gly Ile

EX0r-l 20 TTG ACT ACC Leu Thr Thr

GCA Ala

ATA Ile

GW Gly

Cl1 Val

GCA Ala

CTT Val

GCT Ala

GTA Vel

ATT Ile

ACC Thr

CAG Gin

AAT As"

ATA Ile

GCA Ale

MT As"

GAA ATG Clu Uet

AAA LYS

ATG Met

TTG LeU

TCT SeP

CGA GIY

CM Cl"

GCA Ale

CTG AAA Leu Lys

ACT Thr

GAJ Glu

GCA Ala

ATA Ile

GAA MC Glu Asn

TTC Phe

CGA ACC Arg Thr

GTT Val

Cl1 Vet

ATT ,,e

eatgtcttcftttcgag

A MC g Asn

TCT Ser

TTG AGG AAA Leu Arg Lys

GCA Ale

UC His

ATC Ile

TTG Leu

GTG GCA Val Ala

CA, His

AM Lys

CTC Leu

ATG net

AGC Ser

TTT Phc

GAG GAT Glu Asp

GTT Vel

2927 CTC TT gt88gtBffQQQCfBt........... Leu Le

GGG CM Gly Gin

GTC Vet

ACT Ser

TCA Ser

GCT ALa

CCT Pro

WC Asp

TAT Tyr

CCC Ala

AM Lys

GCC AAA Ala Lys

ATA Ile

Exon 24 TCA GCA CCC CAC ATC Ser Ala ALa His Ile

TTT Phe

Intro"

24 ttcttctrattgcag

GGA GAC Gly Asp

CTT Leu

TCA Ser

GCT Ala

AM Lys

TAT Tyr

AGC Ser

CTC ~eu

GCC AAT 118 Asn

ATT ,,e

AGA ArQ

ACA Thr

2397 CAG Gin

GAT GCT Asp Ala

GAG CAG MC Gtu Gin Lys

GGA ATA Gly Ile

ATT Ile

CAT Asp

AAA Lys

CTA GAA Leu Glu

TTT Phe

MC Lys

GAA CAT Glu His

GGA ATT Cly Ile lntron

ACA Thr

TTT Phe

TCC TTC Ser Phe

ACC Thr

23

CAG Gin

GCA Ala

CGT Arg

GGA TcA G,y Ser

ATG Met

AAG Lys

CTA Leu

MT As"

17

TTT Phe

TTC phe

2319 CAG G,"

CTT Leu

Qtaa

Qtatgt~t~f~g~ggg......

GCT Ala

CAA Cl,,

GTT Vat

2481 GGG gtacg Gly

AAA Lyr

GAA Glu

ATG net

TAT Tyr

ATC ILe

ATG Met

ATC Ile

ATG clef

ATG Wet

ATA Ile

GCT Ale

TCC Ser

MC Asn

ACT Thr

ATC Ile

CAG AGT Gin Ser

TTG Leu

TAT Tyr

MC As"

ACA Thr

TTG Leu

GM GLu

CGA GLy

MT Asn

GTC Val

ACA Thr

CTG Leu

GCT Ala

CTG Lw

GTG GGC AGC Vel Gly Ser

AGT SW

GGC TGT Gly Cys

CCC Cl,’

AAG Lys

AGC ACA Ser Thr

CTG Leu

CTT Leu

CAT Asp

CCC Cly

AAA Lys

GM GLu

ATA Ile

MG Lys

CTG LNI

MT Asn

GTT Val

AGC Ser

CGG Arg

CTC Vsl

GTG TCA Val SCP

CAG Gin

GAA Glu

GAG ATC Glu ILe

TTT Phe

GGT GM Gly GLu

GTG GTC Vel Val

GTT Val

CGA Arg

CAG TGG Gin Trp

GTG AGG GCA GCA MC Val Arg Ala Ale Lys

AAA LYs

GTA Vel

CGA Gly

3636 MC Lys

GAC Asp

AAA Lys

GGA ACT Gly Thr

CAG Gin

ATT Ite

GM Glu

ML Lys

A GTA " Val

ICC Thr

CCT Pro

TTG Leu

ATT Ike

GTA Val

TTC Phe

MC Asn

TAT Tyr

CCC Pro

ACC Thr

CGA Arg

TTT Phe

GAC Asp

TCA Se?

ACC Ser

GCT Ala

TAC Tyr

GTT Vel

ACC Ser

CCG GAC Pro Asp

ATC Ile

GTC Val

ACG lhr

CCA‘GTG Pro Vat

CAG Gl"

CTC Leu

CTG Lw

GAG CGG TTC GlU Arg Phe

TAC Tyr

WC Asp

CCC Pro

TTG GCA Le" Ale

GGG MA Gly Lys

CTC CGA Leu Arg

GCA Ala

CAC His

CTG Leu

GGC ATC Gly Ile

GTG TCC Vel Ser

CAG Cl"

GAG CCC Glu Pro

ATC Ik

GAG CCC Glu Ala

AAC Am

ATA Ile

CAT His

GCC TTC Ala Phe

ATC Ile

GAG TCA CTG Glu Ser Leu

TCT Ser

GGT GGC CAG Gly Gly Cln

AAA Lys

CAA CCC ATT Gin Arg Ile

Intro" 27 . . . . . . . . . ..atgtgettatggeatag

GCC ATA Ala Ile

GCT CGT Ala Arg

GCC CTT Ale Leu

GTT Val

CAG Gin

GCT Ala

GAA GLu

AGT Ser

GAA Glu

CGC CTG Arg Lw

TCC SW

ACC Thr

ATC Ile

CAG Gl"

AAT AS"

GCA Ale

GCT Ala

ACA Thr

MG Lys

3840 CGC CAG ArQ Cl"

TGA TEA

ACTCT~CTGTATGA~TGTT~T~TTTTTMTATTTGTTTA~TATGA~T~TATT~GTT~G~~C~TACA~~TAT~GAGGTA~CTGTT~~CATTTCCTCAGTC~GTT~~GTCTT~~G

TTA Let!

ATA Ile

GTG GTG TTT Val Vel Phe

CAG Ml Gin As"

GGC AGA GLY Arg

GTC MG Val Lys

GGT TGG CM, Gly lrp GL"

gtgngtcaeectae

GTA Val

CGA Gly

CTA Le"

27.36 TAC AG gte Tyr Ar

CCA Pro

TG, Cys

TTC Phe

CGG TTT Arg Phe

TTT Phe

GAA Glu

GGT GCC ATG Gly Ala Met

GGC CTA Gly Leu

Em" AGA ArQ

3489 CCT MT Pro Am 27 CAG CCT Gl" Pro

CTT Leu

CAG Gin

ATG Met

3084 CCC gtgagttt Pro

Em" 25 GGA CTG AGC CTG Gly Leu Ser Leu

3282 GTG QtQagcacactttcBca........... Vel

CTG LW

GCC GTG Ala VaI

Exon 26 TTT GAC TGC phe Asp cys

AGC Ser

AT, 1le

Intron

lntron

GCT Ale

GAG GLu 25

GAG MC Glu AS"

26

gtaagtctctcttcasa...........aaaacctt

CAT His

ATT Ile

TTG L+u

CTT Lw

TTG GAT GM Leu Asp Glu

GCC ACG Ala Thr

3637

ACA Thr

GAC Asp

gta891)8ttf88lttgggtfcet

CTC Leu

TAT Tyr

2683 GGG AAG Cly Lys

GGT GCT Gly Ala

Exon 23 TTT TCC Phe Ser

TAT Tyr

TTC Phe

2298 ttttQfgfffQtQCttfCCaQ

CTG GAT Leu ASP

GW Gly

TGG AGG ATT Trp Arg ILe

TTT Phe

GGG ACA Gly Thr

3490 atttacag

GTC Vet

GGA MT Gly As"

Intro"

3283 98tCtQtgWXCttgftttCag

TAT Tyr

AGG AGT Arg Ser

3085

Qat9tffC**ctgttt...........

ACG Thr

Em" 22 TTC ACT CAG Leu Thr GLn

TCT Ser

TTT Phe

TAC Tyr

GCC Ale

CGT Arg

GCA Ala

gtsegtgtgetgccca..........eeaeafcct

TCC ATG ser llet

AGG CTC Arg Leu

TTT Phe

TCT Sop

2787

22

GGA GCC Cly Ale

ATT Ile

1888 ACA Thr

GTT GTC Val Val

CM Gin

GAA GCC CTG GAC Glu Ala Leu Asp

AAA Lys

GCC AGA Ala Arg

GM Glu

Exon 28 GAG CAT GGC ACG Clu His CIY lhr

CAT His

CAG Cl"

GCA Ala

WC Gl"

GGC ATC Gly Ile

CAG CL"

CTG Leu

CTC Lw

AAA Lys

GGC CCC Gly Arg

TAT Tyr

ACC Thr

TGC Cys

ATT Ile

GTG All vat Ile

GCT Ala

CAC His

TTT Phe

TW Ser

ATG Wet

GTC Val

GTC Val

CAG Cl"

AGT Ser

FIG. 2-continued

definitive proof that the PCR products represent the length of the entire intron segment between two exons, since it is conceivable that some PCR products may result from specific priming at one end and nonspecific priming at the other end. For introns 1, 2, 10-16, 22, 23, 25, and 27, this possibility was ruled out by amplifying genomic DNA segments spanning several introns and exons and demonstrating that the size of these PCR products corresponded to the sum of the sizes of individually amplified introns and exons (Table I). The length of introns 6,11, and 12 is known from their complete sequence determined in genomic clones. For the longest PCR-amplified intron (intron 8), the termini of the PCR product were shown by hybridization to map to those restriction fragments of the

cosmid clone that contained the corresponding exons (data not shown). In the other cases, however, our determination of the intron sizes should presently be viewed as provisional. Intron/Eron Structure of the MDRI Gene-The map of the MDRl gene, spanning >lOO kb of DNA, is shown in Fig. 1. The sizes of all exons and introns, the positions of the intervening sequences, and the sequences of the splice junctions are summarized in Table II. The MDRl gene includes 29 exons, numbered from -1 to 28; introns are numbered as the preceding exons. This numbering system reflects the fact that MDRl mRNA can be transcribed from two different promoters, an upstream and a downstream promoter, with the downstream promoter preferentially expressed in most

Downloaded from http://www.jbc.org/ at CNRS on September 1, 2015

ATCGCT IleAle

GGC CAG Gly Gin

TTT Phe

Em" 18 TTG TTT TCA LW Phe Ser

2686

ataaccQctgeagagt...........

MO LyS

1ntrwl 15 . . . . . . . ..tttettatttattttag

2482

l"fPcm 21 a...........ggtgctQtctgttatcaQ

GTG MG Val Lys

TCA AGA Ser Arg

GAT GM ACT Asp GLu Ser

CCA GCA Pro Ala

AAC Am

GGA GAG AK Cly Clu Ike

TGG TTT Trp Phe

509

Qtati,gtttaacttcagaa..

2398

lntron 20 tgCCtCcttt...........ttfctctaatttgttttgftftgcag

Exon CTG Lw

ACA Thr

Gene

2065

2320

lntron 19 . . ..tcft8ta8ec8Qcttta8gOfeefaaaatcetttfcrgtg~~~~~g

ACA Thr

Wu Glu

1887 CAG Cln

ATG net

t9taataatttgtQttttCfag

2212 ttaaatgttttctcacag

ACA Thr

16

QtetQ8aQggagetgC...........

TGT CCC Cys Ala

GTC Val

MIlRl

Structure

510

amplification

of

MDRl

introns

in

the Human MDRl

of

Gene

cell types (18, 23). The upstream promoter is found at the promoter is beginning of exon -1,’ and the downstream located within exon 1, with the major transcription initiation site at nucleotide -140 (23). The portion of exon 1 located 5’ from the downstream promoter is designated exon la, and the 3’ portion of this exon is called exon lb (Fig. 2). The ATG translation initiation codon is located within exon 2. The MDRl exon sizes range from 49 to 587 bp or from 16 to 69 codons in the protein-coding region. The average length of the internal protein-coding exons of MDRl is 47.5 codons, in agreement with the internal exon sizes found in other genes (44.5 codons average length) (17). All the splice junctions follow the GT/AG rule and agree with consensus sequences for the donor and acceptor sites (25) (Table II). Among the introns located within the open reading frame, 19 introns interrupt this frame between the codons (type 0 introns), one intron interrupts the frame after the first nucleotide of a codon (type 1 intron), and six introns occur after the second nucleotide of a codon (type 2 introns). Introns of different types show highly uneven distribution throughout the gene (Figs. 4 and 5). The part of the gene coding for the N-terminal and membrane-bound regions in the N-terminal (left) half of the protein includes eight introns, four of which belong to type 2, three to type 0, and one to type 1, in no apparent order. In contrast, the equivalent region in the C-terminal (right) half begins with six introns of type 0, followed by two introns of type 2. Both nucleotide binding regions contain only type 0 introns. The protein-coding sequence of MDRl comprises 27 exons,

genomic

Ethidium bromide staining of a 1% agarose gel, containing PCR products, obtained using amplimers and DNA templates enumerated in Table I. Numbers on top indicate introns amplified in each PCR product. Arrows indicate bands that hybridized with a fulllength MDRl cDNA probe after Southern transfer (data not shown). The rightmost lane contains the 1-kb ladder (Bethesda Research Laboratories) used as size standards. DNA.

‘A. V. Gudkov and I. B. Roninson, unpublished TABLE

Estimation Sue of PCR

product -24-l

- -2’3/~50

- 69

-4 - 15/94 - 117 69-90/l%-211 “89-310/394-416 631-651/784-808 Z-747/862-882 828-846/1094-1113 1003-1024/1203-1222 1114-1134/1277-1296 1225-1244/1403-1422 1429-1448/1705-1725 1703-172.5/1727-li47 l’i27-1747/1949-1969 1888-1908/2122-2142 2065-208.5/2269-2288 2212-2233/2377-2397 2.125 - 2346/2459 - 2478 2399 - 2418/2548 - 2570 “Sl-2606/X67-2786 ‘X2-2534/2909-2928 2821-2840/3064-3083 3017 - 3036/3136 - 3154 3206-3225/3341-3360 3283 - 3301/3613 - 3635 3573 - 359”/3823 - 3848 -113 - -92/94 - 117 1003 - 1024/1705 - 1725 1429-14X3/1727-1747 liO3 - 172.5/1949 - 1969 1727-1747/2122-“142 2712 - 27:34/3064 - 3083 :3283 - 3301/3823 - 3848 ” Position in cDNA sequence.

bP 800

4400 1200 3200 4900 5500 3100 450 350 300 600 3200 1100 800 2900 2100 5000

1400 1500 2900 1200 5700 3600 1700 .iOOO 1600 3800 4300 1800 4200 5200

data.

I

of in&-on. sires

bv PCR

lntron(s)

amplified 1 2 3 5 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 1+2 10 + 11 + 13 + 14 + 15 + 22 + 26 +

Genomic DNA template KB-C4, KB-C4, KB-Vl KB-C4, KB-C4.

KB-Vl KB-Vl

KB-C4, KB-C4, KB-C4, KB-C-I, KB-C4, KB-Vl KB-C4, KB-C4, KB-C4, KB-C4, KB-C4, KB-C4, KB-Vl, KB-C4, KB-C4, KB-C4, KB-C4, KB-C4, KB-C4,

KB-Vl KB-Vl KB-Vl KB-Vl KB-Vl

Cosmld DNA template

pSVB1 KB-Vl KB-Vl pSVC6

12 + 13 14 15 16 23 27

KB-C4 KB-Vl KB-Vl KB-C4 KB-Al

KB-Vl KB-Vl KB-Vl KB-Vl KB-Vl KB-Vl KB-Al KB-Vl KB-Vl KB-Vl KB-Vl KB-Vl KB-Vl

pSVA4 pSVA4 pSVA4 pSVA4 pSVA4 pSVA4 pSVA4 pSVA4 pSVA4 pSVA4 pSVSH13 pSVSH13 pSV6A

pSVB1 pSVA4 pSVA4 psv.44 pSVA4 pSVSH13

Downloaded from http://www.jbc.org/ at CNRS on September 1, 2015

3. PCR

and Evolution

Structure

and Evolution

of the Human

TABLE

MDRl

Gene

511

II

Splice junctions in the MDRl gene Exon/intron number

EXOn

EXOIl

size

end”

Exon 3' junction

Intron

Intron

5’ junction

b -1

>95 323 74 49 169 52 192 172 125 172 114 111 126 204 171 162 177 147 108 78 84 204 101 141 157 198 207 147 587

Consensus Freauencv

3’ junction

Exon

1ntron WPe

5’ junction

b -330 -7 68 117 286 338 530 702 827

999 1,113 1,224 1,350 1,554 1,725 1,887 2,064 2,211 2,319 2,397 2,481 2,685 2,786

2,927 3,084 3,282

3,489 3,636 4,223

sequence ( 5%)

n Position in cDNA ’ NA, not applicable.

CCAGATAAAAG AGGAGCGCGAG AAT AAA AG TTT TCA ATG AAT AGA A ATG ACC AG CTT ACA GA TGG GCA AAG CTT GAA AG GTA CTC ACT ATT GAT AAT GAA GTT AAG GAG GGG ATG CTG CCT CAT CTG GAT AAG ACA ATG CAG GAG GCT CTG ATT ATA GGG TTC CTT CAG CTC AGA CAG GTT AAA GGG GCT GGG AAG CCA TAC AG GTT CTG TT CTA ATG CCG GGG AA4 GTG CTG CCT AAT ATG GAA AAG

A

G/g

64

75

100

gtaaggtacaaatac gtaggggcacgcaaa gtaactagcttgttt gtgagttttgaattt gtatgtattgtttgt gtaattagacattct gtaagtatttagttt gtaggtgaagcctgt gttgagtttcttttt gtaagtgtttacatt gtaagtctgagttgg gtacagtgataaatg gtgagatgacccatg gtaagttgtccttgc gtcagtgaggcttag gtatagtttaacttc gtatgaagggagatg gtaagtgtgatgccc gtaaatgtttccatt gtatgtctatcgagg gtacgtgcctccttt gtgagtcaaactaaa gtaataaccgctgaa gtaagtattgggcta gtgagtttgatgttt gtgagcacactttca gtaagtctctcttca gtaagaatttaaatt

>18,000 500 4,300 1,000 >20,000 3,100 541 4,700 7,500 2,800 250 170 200 300 3,200 850 600 2,700 1,900 2,700 4,800 1,100 1,200 2,500 1,100 5,400 3,300 1,400

acttgccctttctag ggcgtttctcttcag attgctgttttgcag tttctctctttttag tttgtggtggtctag ctccttctttttcag tatctgttctttcag atgtattttaaacag tgttctttttctcag cttcacattcctcag aaattgatctgttag tcacggtcctggtag tactttttattccag ggttttctgtggtag tttctctctctttag tcttatttattttag atttgtgttttctag aatgttttctcacag tgttcctgcccacag ttttctgtgccacag tgttttgttttgcag gctgtctgttatcag tgtcttcttttcgag gtttgtgctttccag ttcttctcattgcag aactcttgttttcag aaaccttatttacag gtgattatggaatag

. . . . . . . . ,Y

t

a

a

g

t.....

100

75

68

75

64.............82

AGAGGTGCAAC GTCGGG ATG T GAA AAA TTT CGC TAT GT GAT ATC G TAT GCC T GAT GTC ATA CTA TCT G TAC AAC GTA TTC TTT AAG CCA AGT ATC TTG AAG GTC AGT GTT AA4 TTT GAC GCC AGA AAA ACA GCA CGA GAT GAA ACT GTT TTT ACA GGT TTC ACA GAT GTG ACT GCT ATA GGT ATC GCT ACT A AAC TCT A GTA TTT AAC ACA TTG CTG CTT GAT AAA TAT AGC GTT GTC CAA

Y Y Y Y Y nc 68

86

75

89

75

w

61

a

g/N

100

100

NAb NA 2

0 1 2 2

0 2

0 0 0 0 0 0 0 0 0 0 0 0 0 2 2

0 0 0 0

sequence.

14 of which encode the left and 13 the right half of the protein (Figs. 4 and 5). For many genes, specific correlations have been demonstrated between individual exons and structural or functional domains of the protein (17, 26-28). Based on the hydrophobicity profiles, we have previously subdivided each half of P-glycoprotein into a highly hydrophilic Nterminal region, a hydrophobic membrane-bound region, and a relatively hydrophilic nucleotide-binding region (7). None of the exon borders match precisely with these somewhat arbitrary demarcation lines, but introns 3 and 10 in the left half and introns 16 and 24 in the right half are found reasonably close to these borders. Each of the nucleotide-binding folds in the hydrophilic portion is contained within a separate exon. In the hydrophobic region, 4 out of 12 predicted transmembrane segments of P-glycoprotein are interrupted by introns, but specific transmembrane segments, predicted on the basis of hydrophobicity analysis (29), may not necessarily be precise. The entire lengths or the major parts of eight transmembrane segments are encoded by individual exons, but one pair of adjacent transmembrane segments in each half of the protein is encoded by the same exon. In the absence of additional information about tertiary structure of P-glycoprotein, it does not seem possible at this time to determine whether the exons indeed correspond to structural domains of this protein. Analysis of intron positions in the alignment of the left and the right halves of P-glycoprotein indicates little similarity (Figs. 4 and 5). Within the nucleotide binding region, one pair of introns (introns 13 and 26) is matched precisely, and another pair (introns 12 and 25) is shifted by one codon, with

both introns belonging to type 0. Such a shift can be readily explained by intron sliding (see below). The sizes of the matching introns are quite different (Table II), but variability of intron sizes among homologous introns has been frequently observed. Outside of the nucleotide-binding domain, only one pair of introns, 9 and 23, is found at corresponding codons, but these two introns belong to different types. None of the other introns are found at equivalent positions in this alignment. Preferential conservation of intron positions within the nucleotide-binding regions parallels a much higher degree of amino acid sequence homology for these regions (43.0 alignment score, as calculated using PCOMPARE program, based on the method of Needleman and Wunsch (30) for amino acid residues 351-632 and 994-1280) than for the rest of the protein (alignment score 16.6). Evolutionary Implications-On the basis of sequence similarity between the left and the right halves of P-glycoprotein, we and others have previously suggested that this protein arose by duplication of a primordial gene (7-9). We expected to find significant conservation of intron positions between the two halves of the MDRl gene, since almost all other known genes with an internal duplication show strong conservation of the intron positions between the duplicated domains (16). (The only exception known to us involves the rabbit muscle phosphofructokinase gene, which shows no apparent conservation of intron positions between its two similar halves; no explanation for that discrepancy has been proposed (31).) We have found, however, that only two or three pairs of introns in the MDRl gene are located at corresponding positions in both halves of P-glycoprotein, and

Downloaded from http://www.jbc.org/ at CNRS on September 1, 2015

1 2 3 4 5 6 1 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

Intron

length

512

Structure

.:.:::::.::

.:::::

::

:. .::

: ::::

:: :: :. :

GKEIXRLNVQWLRAHLGIVSQEPILFDCSIAENIAYGDNSRWSQEEIVRARXEAN~HAPIES~PNKYST

l‘