A unique element resembling a processed pseudogene.

0 downloads 0 Views 1MB Size Report
Jan 5, 2018 - faint band in the peafowl track, but there is no detectable hybridization with DNA from other species. The H5 probe on the other hand detects ...
Communication

THEJOURNAL OF BIOLOGICAL CHEMISTRY Val. 261, No. 1, Issue of January 5, pp. 18-20, 1986 0 1986 by The American Society of Biological Chemists, Inc. Printed in U.S.A.

A Unique Element Resembling a Processed Pseudogene” (Received for publication, August 7, 1985) Allan J. Robins$, Shen-Wu Wango, Temple F. Smithll, and Julian R. E. Wells$ From the $Departmentof Biochemistry, University of Adelaide, Adelaide, South Australia, Australia, 5000, the §Institute of Basic Medical Sciences, Chinese Academy of Sciences, Beijing, People’s Republic of China, and the llDana-Farber Cancer Institute, Harvard Medical School, Boston, Massachusetts 02115

We describe a unique DNA element with structural features of a processed pseudogene but with important differences. It is located within an 8.4-kilobase pair region of chicken DNA containing five histone genes, but it is not related to these genes. The presence of terminalrepeats,anopenreadingframe(andstop codon), polyadenylation/processing signal, and a poly(A) rich region about 20 bases 3‘ to this, together with a lack of 5‘ promoter motifs all suggest a processed pseudogene.However,noparentgenecanbe detected in the genome by Southern blotting experimentsand,inaddition,codonboundaryvaluesand mid-base correlations are not consistent with a protein coding region of a eukaryotic gene. The element was detected in DNA from different chickens and in peafowl, but not in quail, pheasant, or turkey.

The availability of gene-specific probes has permitted characterization of a wide variety of gene sequences in thegenomes of many organisms. One unexpected finding, first described for globin genes (I), but subsequently for a number of other eukaryotic genes (2-4) was the existence of processed pseudogenes. The key feature of this class of pseudogene is that it appears to have arisen by reverse transcription of an RNA template prior to integration into the genome, perhaps by mechanisms characteristic of the RNA tumor viruses. Since processed pseudogenes are derived from mRNA, they lack the 5’ transcriptional controlelements and interveningsequences typical of the parentgene. The elements often contain a short stretch of A/T base pairs about 20 bases downstream from the mRNA processing/polyadenylation signal AAUAAA (5). The boundaries of processed pseudogenes are usually flanked by short direct repeats, probably arising as a result of the transposition event. RESULTS AND DISCUSSION

As detailed below, the element described here was not found by gene-specific probing, but by analysis of an 8.4-kilobase pair sequence containing five chicken histone genes. This element is found just5’ to and in thesame orientation as the central H3gene of this cluster (Fig. 1).It is bounded by short * The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “adverti~ement”in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

directrepeatsandcontains an open reading frame of78 codons. The potential coding sequence is unrelated to histones. It is not interrupted by intervening sequences and the 3’ non-coding region is terminated by a poly(A)-containing tract justdownstream from the ubiquitous AAUAAA sequence found near the 3’-end of most polyadenylated transcripts. In common with the processed pseudogene this element does not contain 5’ promoter motifs. In the immediate vicinity of the element there is a 9-base pair direct repeat upstream and an 8-base pair inverted repeatdownstream (Fig. 2). This inverted repeat is intimately associated with one of the five CAAT boxes of the H3gene promoter (6). It is worth comparing this organization and thatof an inverted duplication in the same histone gene cluster in which a 2.1-kilobase pair region of DNA containing an H2A and a H4 gene has been duplicated and inverted around the central H3 gene (see Fig. 1). The boundaries of this duplication are also characterized by direct and inverted repeats which are closely associated with the promoters of the H4 and H2Agenes. Thus, for both the element described here and theinverted duplication containing H2A and H4 genes (Fig. 1) there are repeats which are intimately associated with gene promoters. This suggests a common mechanism for the generation of these two elements in thechicken genome. The sequence data summarized in Fig. 2 suggest that the element described is a processed pseudogene. If so, it should be possible to probe the chicken genome to detect the parental gene from which it was derived. However, we were not able to detect any otherregion homologousto thissequence in Southern blot experiments with EcoRI- or HindIII-digested chicken genomal DNA, even at very low stringency. The single bands detected in each case were those predicted from detailed mapping of the histone gene region containing the putative processed pseudogene (datanot shown). High molecular weightDNA was prepared from four otherbird species, namely Indian peafowl (Pauo cristatus),bush turkey (Alectura lathami), golden pheasant (Chrysolophus pictus),and brown quail (Coturnix australis).These samples, together with DNA from threeseparate chicken samples were digested with EcoRI, electrophoresed, blotted in duplicate, and probed with the putative pseudogene probe or an H5 histone probe as a control. As shown in Fig. 3, the putative pseudogene probe detects a single band for the three chicken samples and a faint band in the peafowl track, but there is no detectable hybridization with DNA from other species. The H5probe on the other hand detects homologous sequences in all samples. The putative pseudogene sequence presumably became integrated into the line of the bird evolutionary tree (order Galliformes) prior to divergence of chicken and peafowl. There areseveral possibilities for the failure to find a parent gene. First, if the putative processed pseudogene was derived from a gene with multiple micro-exons, the probe may fail to cross-hybridize. (This seems unlikely even if exons were only about 50 base pairs in length as in collagen genes (7) or the microexons of Drosophila UbX (cited in Ref. 8).) Second, the parent gene and theprocessed pseudogene may have diverged in sequence to such an extent that they no longer hybridize. Again, this seems unlikely as an open reading frame still exists. Third, thesequence may have derived from mitochondrial DNA which would not be present in preparations from

18

A n Unusual Pseudogene bird erythrocytes. However, the sequence has none of the features of vertebrate mitochondrial genes or their polyadenylated transcripts (9). The simplest interpretation of the Southern blot data is that neither the parent gene nor its relatives are present in the chicken genome. If so, the sequence may have come from an external source such as a virus. In an attempt to identify the putative processed pseudogene, a computer search of the L

Genbank data base (May 1985) was carried out, but therewas no clear match of the open reading frame, amino acid sequence, or the total nucleotide sequence of this element with other recorded data. Since classes of nucleic acid sequences such ascoding, intervening sequences, structural RNA coding, and mitochondrial domains canbe characterized by statistical treatment of sequence data (lo), the reading frame sequences were analyzed for codon boundary values and mid-base correlations, and the values compared with data from several hemoglobin gene coding sequences in the data bank (Table I). The data in Table I show a clear indication of high values for both the codon boundary analysis and the mid-base correlation in known protein coding regions. In contrast, the values for the open reading frame of the chicken element are significantly lower. Low values such as these can be obtained by analysis of randomly generated sequences. The nature of the chicken element presents a dilemma. As already discussed, the segment has all the hallmarks of a processed pseudogene, terminal repeats, an open reading

R

I

I

I

I

2118 bp.

19

2131 b.p.

FIG. 1. Position of processed pseudogene. The element described in this paper is depicted by a solid arrow within this histone gene cluster. It is intimately associated with, and in the same orientation as, the central H3 gene. The boxed areas ( L and R ) show an inverted duplication within the cluster and thesmall arrowsrepresent direct and inverted repeats delineating the borders of the inverted duplication.

CCGGGTCTTTTTTTCCGTTTTTTCTTTCTTTCTTTTTCTTTCTTTTTCCTTGCCCTTCTACTGCA Dl

1-1

GAG ]A-I

Dl

CTGTAGC

02

G

IAGTT/ ATTTGTAAGG

ATG AGC GGC CTG s e rg l yl e u

TGG GGA GAA CGG GAC TTG TGC GCA CGT GTG TCT CTG AGT GCA CGG AAT t r p g l y glu a r g a s p leu cys ala a r g V a l s e r l e u s e r ala a r g asn TTC TGT CAT AAT GAG CTC CCC TCA CTG TGC

CCCCGT AAG GCG TCC TGG

p h e c y s h i s a s n g l u l e u p r o s e r l e u c y sp r o

a r g l y s a l a ser t r p

GAC TTT TCC CCA CAG GGG AAC TTT ACG CGG TTG CCA CAC GGT CAT GTT a s p p h e s e r p r o g l n g l y a s n phe t h r a r g phe p r o h i s g l y h i s

Val

ATA TGG GCA CCC ACT CTT CCA TGT ACC ATG AAG AGA CCC CCA CCC TCC i l e t r p a l a p r o t h c l e u p r o c y s t h r met l y s a r g p r o p r o p r o s e r

TTT CTA AAC AGC ACT TCT CAA AGA AAT CAA CAG TAG TTATGAAAAACCTAA p h el e ua s n

***

s e r t h rs e rg l na r ga s ng l ng l n

TGATCCTCATTTCCTGAGGTAATGTATTCCTCGTAATTCAGTATAGGCACATAATCGGAACCTAT

TTTAGATATCT'T'TCCTTAACAAAACATATTTTTGCTCTTATTTTTCTCATTACTGACTTCAGCAT GGAAGGCTTTGCAATCTACCGCTTATTGTCTCTCTACTGAATCTAGGCTACATGAGTGCTTTCCA poly(A)signal poly (A) tail TTTCTACCTTAAAAGAAACCAAAAGGAT[AAAAAAGAAAACAAAGCA

GAAATGCGAG

FJ

AAAAAAAAAAAAA]

-.

3

I

CGGTGGT'rGCCGGTACGTTCCAAAAGGCGGTTATTTTAAACTTCGAATCGACCAATGAGAT

40 bp

H3 TATA

"

80 bp

H3 ATG

FIG. 2. Sequence of the proposed processed pseudogene.The element is bounded by short direct repeats, has an open reading frame of 78 codons, and is terminated by a poly(A) tail just 3' to theconserved polyadenylation/ processing signal AATAAA. A 9-base pair ( b p ) direct repeat ( D l ) is found just upstream from the element while immediately downstream, an inverted repeat (ZR)is intimately associated with the first CAAT box (CZ) of the adjacent H3 gene. Also shown are the second and third CAAT boxesof the H3 gene (CZI and CIIZ) relative to the start of this gene.

20

An Unusual Pseudogene TABLEI Computer analysisof nucleotide sequences

I

Codon boundary value

YR/RY

Gene sequence square Chi

Chicken element Human cy-globin Mouse cy-globin Rabbit a-globin 3.80 Chicken cy-globin

A B C D E

F G

II

value

ratio

13.9 60.7 40.0 59.2 67.1

0.73 2.32 3.67 2.50

Mid-base correlation square

24.2 264.8 81.1 157.4 149.2

frame, a processing/polyadenylation sequence, and a poly(A) region (in which 4 of 31 bases have mutated) (Fig. 2). However, the absence of an identifiable parent gene in the chicken genome as well as statistical analysis of the open reading frame suggest that theelement is nota processed pseudogene, but by chance, has features common to these elements. It is not possible with the,data available to categorize the chicken element with certainty, but the association of its terminal repeats with H3 gene promoter regions in the histone gene cluster (Fig. 2) suggests that itsposition has been dictated by a precise recombination event. The inverted duplication of an H2A/H4 gene pair in the samecluster similarly involves repeats in close association with promoter elements. This unique motif may represent a new class of elements not previously described. With no parent gene and lacking a transcription promoter region, elements of this type would only be detected by analysis of sequence data. REFERENCES

FIG.3. Southern blot analysis. EcoRI-digested genomal DNA from three individual chickens and four other bird species was subjected to Southern blotanalysis. Part Z was probed with theproposed processed pseudogene sequence while part ZZ was probed with the chicken H5 gene sequence. A , B , and C show the three individual chicken DNA samples, while D is brown quail; E is Indian peafowl; F is golden pheasant; G is bush turkey.

1. Nishioka, Y., Leder, A., and Leder, P. (1980) Proc. Natl. Acad. Sci. U. S. A. 77,2806-2809 2. Wilde, C. D., Crowther, C. E., Cripe, T. P., Lee, M. G., and Cowan, N. J. (1982) Nature 297,83-84 3. Karin, M., and Richards, R. I. (1982) Nature 2 9 9 , 797-802 4. Hollis, G. F., Hieter, P. A., McBride, 0. W., Swan, D., and Leder, P. (1982) Nature 2 9 6 , 321-325 5. Proudfoot, N. J., and Brownlee, G. G. (1976) Nature 263, 211214 6. Wang, S.-W., Robins, A. J., D’Andrea, R., and Wells, J. R. E. (1985) Nucleic Acids Res. 13, 1369-1387 7. Yamada, Y., Awedimento, V. E., Mudryj, M., Ohkubo, H., Vogeli, G., Irani, M., Pastan, I., and de Crombrugghe, B. (1980) Cell 22,887-892 8. North, G. (1983) Nature 303,134-136 9. Anderson, S., Bankier, A. T., Barrell, B. G., de Bruijn, M. H. L., Coulson, A.R., Drovin, J., Eperon, I. C., Nierlich, D. P., Roe, B. A., Sanger, F., Schreier, P. H., Smith, A. J. H., Staden, R., and Young, I. G. (1981) Nature 290,457-465 10. Smith, T. F., Waterman, M. S., and Salder, J. R. (1983) Nucleic Acids Res. 1 1 , 2205-2220