Transposons in ciliated protozoa

0 downloads 0 Views 7MB Size Report
Subsequent to this model I have been extensively involved in data ...... IMI K r QY KK I PVLAQI DLD'fS LES Y I. I LEDS PDKKV IN DHQYD- - KVY PI'N g ..... KJp[lI1. 'I .. T. Darden &: C.H. Langley. [OS" Evo[utlon and exttnc- ..... 100 replicates with random taXon input order J.nd ...... slightly behind the nick"d circle band ot' a .
TRANSPOSONS IN CILIATED PROTOZOA

by Thomas Graeme Doak

A dissertation submitted to the faculty of The University of Utah in partial fulfillment of the requirements for the degree of

Doctor of Philosophy In

Genetics

Department of Oncological Sciences University of Utah May 2001

Copyright © Thomas Graeme Doak 2001 All Rights Reserved

THE UNIVERSITY OF UTAH

GRADUATE SCHOOL

SUPERVISORY COMMITTEE APPROVAL of a dissertation submitted by

Thomas Graeme Doak

This dissertation has been read by each member of the following su pervisory committee and by majority vote has been found to be satisfactory.

Chair:

Glenn Herrick

Tim Formosa

Y-/ "5/0 I /

I

i

_C)?�J1 9b5=-�d J. Stillman

O

. \� V�'>-r;s David M. Virshup

.

THE UNIVERSITY OF UTAH GRADUATE SCHOOL

FINAL READING APPROVAL

To the Graduate Council of the University of Utah:

�;:��

I have read the dissertation of Thomas Graeme Doak in its final form and have

j:

found that (1) its format, citations, and bibliographic style are consistent and acceptable; (2) its illustrative materials including figures, tables, and charts are in place; and (3) the final manuscript is satisfactory to the supervisory committee and is sub�ssion to The Crad

Date

.

W

Glenn Herrick Chair: Supervisory Committee

Approved for the Major Department

Barbara J. Graves Chair

Approved for the Graduate Council

David S. Chapma Dean of The Graduate Sc 001

ABSTRACT

Ciliated protozoa carry two types of nuclei in each cell. One is an undifferentiated germline" micronucleus; the other is a nucleus that has /I

undergone a complete terminal differentiation resulting in a macronucleus specialized for high levels of transcription. Differentiation results in elimination of noncoding DNA and amplification of coding DNA. The evolution of a macronucleus has allowed many ciliates to maintain large cell sizes. My work has encompassed steps in the differentiation of the macronucleus from a normal diploid nucleus. Early work addressed nuclear events that lead to designating and differentiating a macronucleus. Later work dealt with a particular type of DNA elimination in macronuclear differentiation, namely the precise removal or excision of DNA sequences internal to macronucleus-destined sequences. This requires excision of the internal sequence and rejoining of the flanking sequences retained in the macronucleus. Many of these internal eliminated sequences are transposons. Thus, this work has included characterization of the excised transposon families, as well as the mechanism of their elimination. It has resulted in a better understanding of the structure and evolution of internal eliminated transposons and the mechanism of their elimination. My work has helped to invalidate a popular hypothesis explaining transposon excision and may help to bring us closer to a new model of how transposons (and internal eliminated sequences of ciliates in general) are excised from the differentiating macronucleus. Experimental results and my attempts to model excision have made it clear that a number of different

recombination mechanisms could effect transposon excision as "T.Ne now understand it. Identifying on the likely mechanism will require further characterization of transposon excision intermediates and a more thorough exploration of possible excision mechanisms.

v

TABLE OF CONTENTS

ABSTRACT ............................................................................................................................. iv ACKNOWLEDGMENTS ....................................................................................................... x Chapter 1.

INTRODUCTION AND BACKGROUND ............................................................... 1 Introduction .................................................................................................................... 1 Biology of ciliated protozoa ......................................................................................... 1 Structure and evolution of TBE transposons ........................................................... .4 Mechanism of excision ................................................................................................. 7 References ................................................................................ ..................................... 10

2.

A PROPOSED SUPERFAMILY OF TRANSPOSASE GENES: TRANSPOSON-LIKE ELEMENTS IN CILIATED PROTOZOA AND A COMMON "D35E" MOTIF ................................................. 12 Abstract ......................................................................................................................... 13 Ma terials and methods ...... '" ...................................................................................... 13 Results ........................................................................................................................... 13 Discussion ..................................................................................................................... 16

3.

CONSERVED FEATURES OF TBE1 TRANSPOSONS IN CILIATED PROTOZOA ....................................................................................... 18 Abstract ......................................................................................................................... 19 Introduction .................................................................................................................. 19 Materials and methods .............................................................................. ................. 20 Results ...................................................................... ..................................................... 20 Discussion ..................................................................................................................... 27 Acknowledgements .................................................................................................... 29 References ..................................................................................................................... 29

4.

SELECTION ON THE PROTEIN-CODING GENES OF THE TBE1 FAMILY OF TRANSPOSABLE ELEMENTS IN THE CILIATES OXYTRICHA FALLAX AND O. TRIFALLAX .................................................................................................. 31 Introduction .................................................................................................................. 32 Materials and methods ............................................................................................... 33 Results ........................................................................................................................... 34 Discussion ..................................................................................................................... 39 Acknowledgements .................................................................................................... 41

Appendix ...................................................................................................................... 41 Literature cited ............................................................................................................. 41 5.

SELECTION ON THE GENES OF TEC1 AND TEC2 TRANSPOSONS OF EUPLOTES CRASSUS: EVOLUTIONARY APPEARANCE OF A PROGRAMMED FRAME SHIFT IN A TEC2 GENE ENCODING A TYROSINE-TYPE, SITE-SPECIFIC RECOMBINASE .......................................................................................................... 43 Abstract ......................................................................................................................... 43 Introduction .................................................................................................................. 44 Ma terials and methods ............................................................................................... 45 Results ........................................................................................................................... 45 Discussion ..................................................................................................................... 56 References ..................................................................................................................... 63

6.

INTERNAL ELIMINATED SEQUENCES INTERRUPTING THE OXYTRICHA 81 LOCUS: ALLELIC DIVERGENCE, CONSERVATION, CONVERSIONS AND POSSIBLE TRANSPOSON ORIGINS .......................................................................................... 66 Introduction .................................................................................................................. 67 Ma terials and methods ............................................................................................... 68 Results ........................................................................................................................... 71 Discussion ..................................................................................................................... 75 Acknowledgements .................................................................................................... 77 Literature cited ............................................................................................................. 77

7.

DEVELOPMENTAL PRECISE EXCISION OF OXYTRICHA TRIFALLAX TELOMERE-BEARING ELEMENTS AND FORMATION OF CIRCLES CLOSED BY A COPY OF THE FLANKING TARGET DUPLICATION ......................................................... 79 In trod uction .................................................................................................................. 80 Results ........................................................................................................................... 80 Discussion ..................................................................................................................... 85 Materials and methods ............................................................................................... 88 Acknowledgements .................................................................................................... 88 References ..................................................................................................................... 88

8.

TBE1 EXPRESSION: TESTING THE HYPOTHESIS THAT TBE1-ENCODED FUNCTIONS CATALYZE EXCISION ................................... 89 The Problem: Is there TBE1 expression associated with TBE1 excision? .......... 89 TBE1 RNA expression ................................................................................................ 89 TBE1 protein expression ............................................................................................ 92 Conclusion .................................................................................................................. 106 References ................................................................................................................... 109

9.

CONCLUSION .......................................................................................................... 111 Introduction ................................................................................................................ 111

vn

Reconsidering the transrosase=excisase hypothesis .......................................... 111 Tests of the hypothesis ............................................................................................. 116 Intermediates and products of excision ................................................................ 116 A consideration of mechanisms .............................................................................. 121 Transposition via illegitimate site-specific recombination ................................ 126 Conclusions ......................... '" .................................................................................... 133 References ................................................................................................................... 135

Vlll

ACKNOWLEDGMENTS

My graduate career has been a long one and I have accumulated a vast number of people to whom lowe credit for my graduation. They have helped in ways both great and small. I started graduate school in the laboratory of John Roth, and while there had the particular good fortune to work with Anca Segall, Tom Elliott, and Dan Andersson. When I left the Roth laboratory, Glenn Herrick was kind enough to first hire me as a technician and to then encourage me to reenter graduate school. Jerry Kaplan was also very encouraging. I am still not sure if reentering was a good idea, but it did lead to this dissertation. In Glenn's laboratory, I have had the great privilege of working on a daily basis with Kevin Williams and David Witherspoon. Large parts of this dissertation consist of collaborative work with Kevin and David. Other members of the lab have been Laura Storjohann, Susan Nowell, and Katina Lessard: they have contributed to a charming work environment. A small number of other colleagues in particular have also contributed to my work and mental health: Michelle Deegenaars, Marc and Norene Gillespie, Bert Ley, and Karen Yook. Various friends have provided me shelter from the stresses of graduate school, in particular Kerri Buxton and Brad Taylor, Beverly and Enzo Krensky, Laura Katz, Anna Maddock, Trisha Schmid, and Tammy Metz Star. Finally, let me thank my family, Mother Barbara Doak, Brother Dan Doak, and Sister Pat Doak, for fostering my love of biology, and offering me the support that only family can provide. Dan and Pat have both beaten me to this point, but what are siblings for?

CHAPTER 1 INTRODUCTION AND BACKGROUND

Introduction

My work can be divided into two general areas. The first area includes the general biology or natural history of the ciliate genus Oxytricha. Included are studies on Oxytricha culturing, the genetic basis of mating type determination, the cytogenetic events taking place during conjugation and exconjugant development, and the relationship of cytogenetics to observed inheritance patterns.

studies

tend to address questions specific to ciliated protozoa and not immediately applicable to eukaryotes in general. I will not be addressing these studies in my disertation. The bulk of my dissertation work has focused on the evolution of TBE transposons in Oxytricha and the transposons' involvement in the differentiation of the host somatic nucleus. These studies address a specific ciliate transposon and the transposon's involvement in a ciliate-specific developmental process, but have general application to transposon evolution, transposon-host interactions, and the mechanisms of somatic chromosome rearrangements. Biology of ciliated protozoa

My work involves recombination processes and transposon families seemingly unique to ciliated protozoa. Ciliated protozoa have a long history as experimental organisms (Prescott, 1994). Although generally only single celled, these eukaryotic organisms have a complex set of cellular structures characteristic of eukaryotes, are

2

sexual, and display a variety of developmentally and environmentally triggered differentiation processes. They are also large enough to be studied with the light microscope. These advantages made ciliates a popular research organism in the first part of this century and have contributed to their present usefulness. Notable discoveries made in ciliates include ribozymes (Cech, 1993), the molecular nature of telomeres (Prescott, 1994), telomerase (Collins, 1999), and the enzymes responsible for histone acetylation and deacetylation (Brownell et al., 1996). Recent strides in forward and reverse genetic techniques, and genome mapping and sequencing, particularly in Tetrahymena, will contribute to the continued usefulness of ciliates (Orias, 1998). The subjects of my dissertation have been the ciliates Oxytricha fallax and trifallax (recently renamed Sterkiella histriomuscorllm; Berger and Foissner, 1997) sister species of ciliated protozoa. In all ciliated protozoa each cell contains two types of nuclei (nuclear

dimorphism; Herrick, 1994). The first is a small heterochromatic diploid micronucleus (MIC). The MIC acts as a "germ line" nucleus: its genetic material is used in conjugation (sex), and is passed on unaltered from one sexual generation to the next. In contrast, the second type of nucleus, the n1acronucleus (MAC) is an euchromatic, differentiated "somatic" nucleus, supplying all RNA transcripts needed for growth. The MAC is large by virtue of being highly polyploid. In

Oxytricha, the MAC contains thousands of copies of each gene. Genes are carried on small chromosomes of only one or a few genes (called "gene-sized" chromosomes). The MAC nucleus is derived from a copy of the diploid MIC just after conjugation (see below). Mating in ciliates (called conjugation) is initiated when two cells of different mating types partially fuse. Immediately after fusion, the MIC in each cell enters

3

meiosis. The steps of meiosis are somewhat complicated and often include pre- or postmeiotic mitosis. This process in Oxytricha is indistinguishable from published steps for Euplotes (TGD, unpublished; Kuhlmann and Heckmann, 1991). The result is to produce a pair of haploid gametic nuclei. Each cell retains one of its own gametic nuclei (the stationary nucleus) and passes the other (the migratory nucleus) to its conjugation partner. The stationary and migratory gametic nuclei fuse, forming a diploid zygotic nucleus. The zygotic nucleus undergoes a series of mitoses (two in Oxytricha) giving rise to daughter nuclei that will become new MICs, a new MAC, or be destroyed. At this point conjugating cells separate and nuclear development takes place in exconjugant cells. The diploid copy destined to form the new MAC (the anlage) undergoes an extensive set of differentiation steps: chromosome polytenization, removal of internally eliminated sequences (lESs, see below) with coordinate joining of flanking sequences, fragmentation of polytene chromosomes with coordinate de novo addition of telomeres to yield small chromosomes, and amplification of these small chromosomes to the final MAC copy number. MIC DNA can be divided into two classes: MAC-destined sequence, which will be represented in the MAC; and MIClimited sequence, eliminated either as lESs or as inter-MAC regions (regions separating MAC-destined sequence that are lost during fragmentation). Many of the lESs in Oxytricha are transposons (see below), and the rest are "short lESs". Short lESs are eliminated segments ranging in size from a dozen to a few hundred bp. They are almost always flanked by short direct repeats (see Seegmiller, 1996 and references therein; putative target-site duplications, Klobutcher and Herrick, 1997), and the terminal few nucleotides form an inverted terminal repeat (Klobutcher and Herrick, 1997). Sequences internal to the inverted terminal repeats are AT-rich, evolve under no apparent selection, and are thus probably noncoding.

4 It has been proposed that short IESs are very degenerate transposons, retaining only

cis-acting sequences necessary for excision (Klobutcher and Herrick, 1995; Klobutcher and Herrick, 1997). Structure and evolution of TBE transposons

Transposons are nuclear parasites, segments of DNA capable of increasing their number within a host genome (Hickey, 1992). There are two classes of transposons (Berg and Howe, 1989). Retroelements replicate via an RNA intermediate, by the action of reverse transcriptase, and sometimes an integrase (see below). Type II transposons replicate entirely at the DNA levet without using an RNA intermediate. Type II transposons are often referred to as "cut-and-paste" transposons, since many members simply cut themselves out of a locus and insert themselves somewhere else in the genome. Transposon replication is accomplished when host repair mechanisms fill in the vacated gap, using as template a sister chromosome that still has the insertion. However some Type II elements move in a more obviously replicative fashion, copying the original insertion into a new site (replicative transposition; Derbyshire and Grindley, 1986). TBE1s are Type II transposons present in the Oxytricha MIC but not in the MAC (Chapter 2; Hunter et al., 1989). There are three types of TBE, TBEt 2, and 3, present in -2000, -200, and -2000 copies per haploid genome, respectively (Chapter 4; Susan Nowell, Kevin Williams, unpublished). TBE1 sequences are at most 10% different at the nucleotide level, yet TBE1, TBE2, and TBE3 are each 40% different from each other. The three families do not cross-hybridize (Kevin Williams, unpublished). TBE1s are the best characterized, but to the extent that we have characterized TBE2 and excision mechanism.

they seem to be identical to TBE1s in structure and

5 TBEs are 4.1 kb long and include: -70 bp inverted terminal repeats, of which the 17 terminal bp consist of the Oxytricha telomere repeat (G4T4; Herrick et al., 85; Chapter 3); a

~300

bp region in the middle of the element that contains numerous

small direct and inverted repeats (the "repeats region"; Chapter 3); and 3 genes (Chapter 3 and 4). The first gene codes for a 57 kD protein (the "ZF-kinase" protein) with two CCHH zinc fingers (ZFs) and a partial homology to protein kinase (Chapter 3). The second gene codes for a 42 kD transposase, a member of the D,D35E super-family of transposases (Chapter 2). The TBE transposase most closely resembles Tel/Mariner transposases and Euplotes Tee transposase (another ciliate transposon, see below). The third gene codes for a 22kD protein that is highly charged and might playa role in structuring DNA during the transposition reaction (Chapter 3). TBEI insertions are flanked by 3 bp target-site duplications of AnT (where n is a T or A), that represent the central 3 bp of the TBEI 5 bp target-site consensus of CAnTG (Chapter 7). TBEls three genes have been under purifying selection for the function of their protein while TBEls have diverged in the

Oxytricha genome (Chapter 3 and 4). We do not yet know if TBE2 and TBE3 have also diverged under a comparable selection, but are close to doing those experiments. We used "family sequence" as a way to survey the population of TBEls in a genome. This involves amplifying an internal TBEI interval, using total cell DNA as template. This amplifies a large number of individual TBE1 insertions (Chapter 4), which all contribute to subsequent sequencing. The resulting sequence has positions where more than one nucleotide is present; these represent prominent polymorphisms in the TBEI population. If these polymorphisms represent mostly nucleotide changes that are

6

silent, having no effect on protein sequence, a purifying selection has been acting to eliminate TBEI variants with missense and nonsense mutations. TBEs are a component of MIC-limited DNA (Herrick et al., 1985, and we have characterized cases in which TBEI insertions are precisely eliminated as lESs from the developing MAC (Hunter et al., 1989; Chapter 7). During exconjugant development, TBEl, TBE2, and TBE3 circles appear (Chapter 7) and are possible products of excision (see below). TBEs are directly comparable to Tec elements in the ciliate Euplotes crass us (Klobutcher and Herrick, 1997). Tecs are 5.1 kb long, highly repetitive Type II elements (6000 per haploid genome; Jahn et al., 1989). There are two types of Tee: Tecl and Tec2, comparable to TBEl, TBE 2, and TBE 3. Tecs have 690 bp ITRs and create a 2 bp target-site duplication, AT. The AT target-site duplication is common to many Mariner ITc1 family members. Tecs contain a D,D35E transposase that falls into the Mariner ITcl family along with TBE1s (Chapter 2). However the Tec and TBE transposases are very different proteins (Chapter 2), no more related to each other than to other Mariner ITcl members. Although TBEs and Tecs each have three genes (Jahn et al., 1993), only the transposases are homologous. Instead of TBEs' ZF-kinase gene, Tecs have a large protein homologous to tyrosine recombinases (Chapter 5). Each transposon has an ORF coding for a small highly charged protein of unknown function (Chapter 3) with no identified homology between the two. Like TBEls, Tecs are eliminated during 1v'lAC development and many appear as circles (Jaraczewski et al., 1994). The mechanism of Tec, and Euplotes short lESs, excision has been studied (see Chapter 9).

7 Mechanism of excision We have taken four approaches in our attempts to determine the molecular mechanism of excision: characterization of TBE1 conservation, characterization of TBE1 expression, and characterization of TBE1 excision products and intermediates. Initial observations suggested that TBE1 genes were under selection, and characterization of excision products suggested that excision could be catalyzed by the TBE1 transposase (Chapter 7). These observations led to a hypothesis that the TBE1 transposase served to excise TBE1 elements during anlage development (Chapter 7). If the TBE1 transposase functioned in excision, as well as transposition, then its gene function would be selected for at the host level. The ZF-kinase and 22 kD genes are also under selection and would have to serve accessory roles in excision. We have labeled this as the" transposase=excisase" hypothesis. This hypothesis serves to explain both the selection seen on TBE1 functions and the mechanism of TBE1 excision. Implicit in formulating this hypothesis is the belief that eukaryotic Type II transposons are not under selected for transposition functions within a host population (Kaplan et al., 1985). In addition to suggesting that transposase served as excisase, we proposed a specific molecular model for the excision reaction (Chapter 7). In Chapters 3 and 4, we confirmed the observation of selection on TBE1 genes. To serve as excisase TBE1 proteins must be expressed during anlage development, so TBE1 expression would be consistent with the hypothesis, and finding no expression would disprove the transposase=excisase hypothesis. In Chapter 8 we asked if TBE1s are expressed as RNA or proteins. So far we have not found significant TBE1 expression. Characterization of the intermediates and products of excision serves both to test the explicit molecular model of excision proposed by Williams et al. (Chapter 7)

8

by looking for the specific molecules predicted and to determine more broadly what the molecular species are. If we know what the intermediates and products are, we should be able to propose a molecular model consistent with these species. Of necessity, the model would include a particular type of enzymatic activity, presumably compatible with one of the known families of recombination enzymes. Our 1993 model (Chapter 7) used TBE1 transposase activities to perform excision, but other enzymatic activities can be invoked (Chapter 9). The characterization of TBE1 intermediates and products has been carried out entirely by Kevin Williams. The original model of excision was created by Glenn Herrick. Subsequent to this model I have been extensively involved in data interpretation and modeling. The first indication of the nature of the excision product came from blotting exconjugant DNA with TBE1 probes. During anlage development, unit length TBE1 circles are observed (Chapter 7). We assume that TBE1 circles are the immediate products of TBE1 excision, and have tried to characterize them as thoroughly as possible (summarized in Chapter 9). Circles could be indirect products of excision, unrelated to excision, but it seems most parsimonious to assume that they are direct products of excision, and their structure is indicative of the excision mechanism. However circles do not immediately account for all the TBE1s that need to be eliminated from the anlage. Perhaps this is because only TBE1s in MAC destined sequences are removed by precise excision (as lESs) as suggested for Tees (Frels and Jahn 1995; Frels et al., 1996). We see a weak signal for linear TBE1s, but of much lower intensity than circles (Kevin Williams, unpublished). The linear TBE1s could be short-lived intermediates, but we believe they are breakdown products of TBE1 circles. Another indication that TBE1 circles are direct products of excision is their common circle junction structure (Chapter 9).

9

The characterization of TBEI excision intermediates and products has resulted in disproving the transposase=excisase model (Chapter 9). The molecular ends predicted in Chapter 7 were not found, and we did not use the correct circle structure when building the model: the circle junction is more complex than originally understood. Although we now have a better understanding of TBEI excision intimidates and products, this understanding has not yet led us to a new model consistent with the complete data set. Certainly, there is further characterization of excision intermediates to be done, but perhaps more important is a broader consideration of existing recombination systems that might be catalyzing excision.

10

References

Berg, D.E., and Howe, M.M., eds. (1989). Mobile DNA. (Washington D.C.: American Society for Microbiology). Berger, H. and Foissner, W. (1997). Cladistic relationships and generic characterization of oxytrichid hypotrichs (Protozoa, Ciliophora). Arch. Protistenkd. 148, 125-155. Brownell J.E., Zhou J., Ranalli T., Kobayashi R., Edmondson D.G., Roth S.Y., Allis C.D. (1996). Tetrahymena histone acetyltransferase A: a homolog to yeast Gcn5p linking histone acetylation to gene activation. Cell. 84, 843-851. Cech, T.R. (1993) The efficiency and versatility of catalytic RNA: implications for an RNA world. Gene. 135,33-36. Collins, K. (1999). Ciliate telomerase biochemistry. Annu. Rev. Biochem. 68, 187-218. Dawson, D., Buckley, B., Cartinhour, 5., Myers, R. and Herrick, G. (1984). Elimination of germ line tandemly repeated sequences from the somatic genome of the ciliate Oxytricha fallax. Chromosoma 90, 289-294. Derbyshire, K.M., and Grindley, N.D. (1986). Replicative and conservative transposition in bacteria. Cell 47, 325-327. Frels, J.5., and Jahn, C.L. (1995). DNA rearrangements in Euplotes crassus coincide with discrete periods of DNA replication during the polytene chromosome stage of macronuclear development. Mol. Cell. BioI. 15,6488-6495. Frels, J.5., Tebeau, C.M., Doktor, S.Z., and Jahn, C.L. (1996). Differential replication and DNA elimination in the polytene chromosomes of Euplotes crassus. Mol. BioI. Cell. 7, 755-68. Herrick, G. (1994). Germline-soma relationships in ciliated protozoa: the inception and evolution of nuclear dimorphism in one-celled animals. Sem. Dev. BioI. 5,3-12. Herrick, G., Cartinhour, 5., Dawson, D., Ang, D., Sheets, R., Lee, A. and Williams, K. (1985). Mobile elements bounded by C4A4 telomeric repeats in Oxytricha fallax. Cell 43, 759-768. Hickey, D. A., (1982). Selfish DNA: a sexually-transmitted nuclear parasite. Genetics 101,519-531. Hunter, D.J., Williams, K., Cartinhour, K. and Herrick, G. (1989). Precise excision of telomere-bearing transposons during macronuclear development in Oxytricha fallax. Genes & Development 3,2101-2112. Jahn C.L., Krikau M.F., and Shyman S. (1989). Developmentally coordinated en masse excision of a highly repetitive element in Euplotes craSSllS. Cell 59, 1009-1018.

11 Jahn, C.L., Doktor, S.Z., Frels, J.s., Jaraczewski, J.W., and Krikau, M.F .. (1993). Structures of the Euplotes crassus Tecl and Tec2 elements: identification of putative transposase coding regions. Gene 133, 71-78. Jaraczewski, J.W., and Jahn, C.L. (1993). Elimination of Tec elements involves a novel excision process. Genes Dev. 7,95-105. Kaplan, N., Darden, T., and Langley, C.H. (1985). Evolution and extinction of transposable elements in Mendelian populations. Genetics 109,459-80. Klobutcher, L.A., and Herrick, G. (1997). Developmental genome reorganization in ciliated protozoa: the transposon link Prog. Nucleic Acid Res. Mol. BioI. 56, 1-62. Klobutcher, L.A., Turner, L.R., and LaPlante, J. (1993). Circular forms of developmentally excised DNA in Ellplotes craSSllS have heteroduplex junction. Genes Dev. 7, 84-94. Krikau, M.P., and Jahn, C.L. (1991). Tec2, a second transposon-like element demonstrating developmentally programmed excision in Ellplotes craSSllS. Mol. Cell. BioI. 11,4751-4759. Kuhlmann, H-W., and Heckmann, K. (1991). Nuclear processes in Euplotes octocarinatlls during conjugation. Europ. J. Protistol. 26,370-386 Orias, E. (1998). Mapping the germ-line and somatic genomes of a ciliated protozoan, Tetrahymena thermophila. Genome Res. 8,91-9 Prescott, D.M. (1994). The DNA of ciliated protozoa. Microbiol. Rev. 58, 233-267. Seegmiller, A., Williams, K.R., Hammersmith, R.L., Doak, TG, Witherspoon, D., Messick, T., Storjohann, L.L., and Herrick, G. (1996). Internal eliminated sequences interrupting the Oxytricha 81 locus: allelic divergence, conservation, conversions, and possible transposon origins. Mol. BioI. Evol. 13, 1351-6213. Yao, M-C. (1989). Site-specific chromosome breakage and DNA deletion in ciliates. In Mobile DNA. Berg D.E. and Howe M.M., eds. (Washington D.C.: American Society for Microbiology) 715-734.

CHAPTER 2

A PROPOSED SUPERFAMILY OF TRANSPOSASE GENES: TRANSPOSON-LIKE ELEMENTS IN CILIATED PROTOZOA AND A COMMON "D3SE" MOTIF

Doak TG, Doerder FP, Jahn eL, Herrick G. 1994 Proc. NatL Acad. Sci. USA. 91,942-946.

13 Proc. Natl. Acad. Sci. USA Vol. 91. pp. 942-946. February 1994 Genetics

A proposed superfamily of transposase genes: Transposon-like elements in ciliated protozoa and a common "D35E" motif (Tcl/IS630/00 /relrovirus/pror.Je) THOMAS G. DOAK*,

F.

PAUL DoERDER*t, CAROLYN

L. JAHN:t:, AND GLENN HERRICK*§

'Department of Cellular. Viral and Molecular Biology. University of Utah School of Medicine. Salt Lake City. UT 84132; and *Department of Cen, Molecular. and Structural Biology. Northwestern University Medical School. Chicago. II 60611

Communicated by Elizabeth H. Blackburn. September 24. 1993 (received for review June 22. 1993)

ABSTRACT The transposon-like elements TBEI, Ted, and Ted of hypotrichous ciliated protozoa appear to encode a protein that belongs to the IS630- Tel family of transp0sase5. The Anabaena IS895 transposase also is placed in this family. We note that most family members transpose into the dinucleotide target, TA, and that members with eukaryotic hosts have a tendency for somatic excision that is carried to an extreme by the ciliate elements. Alignments including the additional members, and also mariner elements, show that transposases of this family share strongly conserved residues in a large C-terminal portion, including a fully conserved dipeptide, Asp-Glu (DE), and a block consisting of a fully conserved Asp and highly conserved Glu, separated by 34 or 35 residues (D35E). This D35E motif likely is homologous to the previously characterized D35E motif of the family of retroviralretrotransposon Integrases and 1S3-like transposases. Because it is known that the IS3-retroposon D35E region Is a critical portion of a domain capable of various in vitro transpositionrelated reactions, the results suggest that the two families share homologous catalytic transposase domains and that members of both families may share a common transposition mechanism.

indicating they are transposons: they are repetitive and have inverted terminal repeats flanked by short direct repeats, and in some cases, empty alleles of element-interrupted loci are known (for review, see ref. 10). The sequences of the 5.3-kbp Euplotes crassus Tec1 and Tec2 elements were recently determined (11), and we report here the sequencing of a 4.1-kbp Orytrichafallax TBE1 element.lI Each element contains mUltiple ORFs, one of which encodes a moderate-sized protein (380, 383, and 354 codons, respectively) that we show belongs to the lS630-Tc1 transposase family. The TBE1 and Tec elements are not otherwise obviously related. The range of hosts for the aggregate IS630-Tc1-IS3-retron family is extremely broad, including prokaryotes, fungi, plants, invertebrate and vertebrate metazoa (2-9), and now ciliated protozoa.

MATERIALS AND METHODS TBE1-fa1-1 was subcloned from a clone of O. fallax micronuclear DNA (12). Inserts of nested sets of unidirectional deletion clones were constructed and sequenced as described (13). ORFs were identified, translating TAA and TAG as Glu (see ref. 13); the transposase ORF extends from nt 3993 to nt 2932 in the 4073-bp sequence. The Tec1 and Tec2 transposase ORFs extend from nt 1891 to nt 745 and nt 1908 to nt 760 on the respective element sequences (GenBank accession nos. L03359 and L03360, respectively). Other sequences were obtained from public data bases (see figure legends). Short names have been assigned, with an abbreviation of the host species name, where necessary to avoid ambiguity (e.g., Tel and CbTc1). Sequence Analyses. Data base searches were performed with BLAZE (GenBank-Intelligenetics, Swiss-Prot data base release 22) and BLAST (NCB 1 data base May 10, 1993; ref. 14). Alignments and construction and use of profiles (15) were performed with various GCG programs (version 7.2), including EXTRACTNAMES, which allows the intersection of two lists to be identified. All alignments were constructed by PILEUP and none was altered by hand.

Transposons infest a wide variety of organisms and are structurally and functionally diverse (1). Transposon classifications have been based on shared hosts (prokaryotic or eukaryotic), shared structures (inverted terminal repeats or long direct terminal repeats; compound, composite, or complex), shared mechanisms of transposition (conservative or replicative, via RNA or "cut and paste"), and increasingly on shared homologous genes, usually transposases. The trend-as in the field of gene families in general-has been from an initial recognition of many small families to a progressive fusion of families into larger and fewer families. The elements involved in the present study provide a good example of this trend. Families of transposases related to Caenorhabditis Tcl, Drosophila mariner, and Shigella lS630 grew in isolation (2-5) and were joined later (6, 7). The prokaryotic 1S3 transposase and eUkaryotic retrotransposonretrovirus integrase families were joined upon recognition of a common sequence motif referred to as the "D,D35E" motif. This motif includes a conserved Asp joined-by a variable-length less-conserved segment-to a "D35E" block consisting of invariant Asp and Glu residues separated by a moderately conserved segment usually 35 residues long (8, 9). Here we provide evidence that a related D35E block exists in the IS630-Tc1 family. We uncovered this interfamily connection while searching for sequences similar to open reading frames (ORFs) of transposon-like elements that reside in two hypotrichous ciliated protozoa. Although these elements have not been observed to transpose, they show a variety of features

RESULTS Additional Members of the IS630- Tel Family. Database searches with the putative transposase sequences of the ciliate TBE1 and Tee elements indicated their similarities with members of the IS630-Tc1 family. For instance in a BLAZE search with the TBE1 sequence. the Shigella IS630 transposase received the top score, and in a BLAST search the Drosophila bari-} transposase (16) scored flI"st, fifth, and ninth in searches with Teel, TBE1, and Tec2, respectively. Abbreviation: ORF. open reading frame. tpresent address: Department of Biology. Cleveland State University, Cleveland, OH 44115. iTo whom reprint requests should be addressed. 'The sequence reported in this paper has been deposited in the GenBank data base (accession no. L23169).

The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked" advertisement" in accordance with 18 U.S.C. §l734 solely to indicate this fact.

942

14

Genetics: Doak et al. To test the implications of these anecdotal results, we used the PROFILE suite of programs to generate a "profIle" or position-specific scoring table (15). In essence, a profile is a set of sequence characteristics that typifies a family of aligned sequences. A database then can in effect be searched for candidates that "fit the profile": a score is calculated for each data base sequence; a score of + 1 is 1 SD above the mean of all scores. The sequences of the 5 bacterial IS630 family transposases (5) were aligned, and a profile was generated. An augmented protein sequence database (supplemented with the TBE1 and Tec sequences) was searched to learn how the ciliate sequences fared relative to the 22 established members of the IS630-Tc1 family in the database. The 5 "profIled" sequences got inflated scores (> 30) because they are represented in the profIle (Fig. 1A). Although the Tec1 and Tec2 sequences did not receive impressive scores (2.26 and 2.14, respectively), the TBE1 sequence received the top score (8.71), strongly suggesting family membership (Fig. 1A). Similarly, the Anabaena IS895 sequence (17) received an impressive score (8.65), indicating that it too may be a family member. The Tec sequences received much more

FIG. 1. (A) Detection of TBEl and IS895 transposases with an IS630-family profile. A profile of the protein sequences from IS630, StIS630, IS1066, RSa, and RIATL was used to search an augmented GenPept data base (see below). The frequency of entries with high Z scores (2:2.0) is shown as a curve (left axis). The "hit list" was searched for IS630-Tcl family members and candidates (see list below); the scores of those found are shown with labeled bars (right axis); family candidate names are marked with a large dot; unlabeled bars represent Tecl, pogo, Tec2, CeMar, CpMar, CbTcl, and mariner. Scores> 30 represent the profile constituents. (B) Detection of Tecl and Tec2 transposases. A proftle of the five IS630 family members, plus TBEI and IS895, was used in a search. Of the 22 established IS630-Tcl transposases in the database, only the 5 lacking all or parts of the DE and D35E region got scores 20

76 Klobutcher & Herrick. [997). During MAC development most. apparently all. TBE [s are removed (Herrick et al.. 1985; Williams. Doak & Herrick. 1993). Comparison of MIC loci with TBE [ insertions with the resulting MAC sequences show that TBE 1 excision from developing MAC DNA precisely removes the TBEI and one target site repeat (Williams. Doak & Herrick. 1993). This reaction regenerates the original unmutated gene in the MAC, where the gene is expressed. making TBE I insertions essentially phenotypically silent. This benefits both the TBE I transposon and its ciliate host and has led us to propose that TBEls contribute protein factors necessary for their own excision (Williams. Doak & Herrick. 1993; Witherspoon et a!.. 1997). We report here the complete sequences of four TBE Is. three from O. fallax and one from O. lrifailax. Comparison of these four sequences allows the identification of shared sequence features that have been conserved during divergence of the elements. These conserved features - protein-coding genes and apparent cis-acting sites - represent aspects of the TBE 1 transposon that have been maintained by selection during the evolution of TBE 1 transposons in O:rytricha failax. O. lrifalla.x:. and the immediate ancestor of the two sister species (Seegmiller et al.. 1996).

Materials and methods Three TBEls - fall. fal2. and fal-l - were isolated from Oxytricha fallax strain 9D I (Cartinhour & Herrick. 1984). TBE1 fall and TBE1 fal2 were retrieved from a ,\L47 library of O. falla.,r: micronuclear DNA (Herrick et a!.. 1985). The TBEI fall sequence and analyses of its transposase gene were reported previously (Doak et al.. 1994). A clone containing fal4 was also retrieved from that ,\L47 library (A. Lee. unpublished). TBE I lri] was amplified by polymerase chain reactions (PCR) from O. lrifalla.,r: strain JRB 310 DNA (Williams. Doak & Herrick. 1993). Sequencing of these PCR products or cloned DNA was performed either manually (Williams & Herrick. 1991) or with an ABI automated sequencer at the University of Utah Health Sciences Sequencing Facility. TBE 1 sequences have been submitted to GenBank under the following accession numbers: fall and fal2. L39908; fal-l. U85403; lril. L39906. The assembly and editing of sequence alignments. and most subsequent analyses. were performed with the GCG package of programs (Genetics Computer

Table I. Nucleotide sequence identities between TBEls fall

fal2

fal~

90.8% 90.2%

89.6%

Iril

fall fal:!

97.7"'0

faU

91.0% 89.7"'0

rril

Group. 1994: GCG program names are indicated with all uppercase letters). Default settings were used unless otherwise noted. Analyses were performed either on the individual ungapped sequenc~s or on the 'manually' aligned ,equences that were gapped where necessary to align homologous blocks. The most divergent elements are -90% identical and offer little problem in alignment. PROFILEMAKE and PROFILEGAP (Gribskov. Luthy & Eisenberg. 1990) were used to construct profiles that were used to search seque!1c~s for matches. Calculation of synonymous and nonsynonymous divergences Cds and dr!, respectively) was by the method of Nei and Gojobori (1986). as implemented by Ina et ai. (1994) and modified by Witherspoon et:1I. (1997) to use the Oxytricha genetic code (Williams & Herrick. 1991 ). Smith-Waterman searches were conducted with MpSrch. 3.0D-3 (John F. Collins, Biocomputing Research Unit. 1995. University of Edinburgh. UK). The fourTBE 1 sequences were examined forevidence of gene conversion events using the method of Sawyer (! 989), modified to use the OXYlricha genetic code (\Villiams & Herrick. 1991; Witherspoon et aI.. 1997).

Results To identify conserved features of TBE 1 transposons. we determined the nucleotide sequence of four complete TBEls. three from O.falla.,r: (fall Ja12. andfal-t; 4073 bp. 4072 bp. and 4076 bp, respectively) and one from O. lrifalla.,r: Uril. 4076 bp). TBE1 fall andJa12 are adjacent insertions. separated by 935 bp. in the vC allele of the CR-MSC gene of the 81-locus (Herrick et a!.. 1985; Se~gmiller et a!., 1996). TBEI lril is an insertion in that same gene in O. trifalla.,r: strain IRE 310 (Williams. Doak & Herrick. 1993). TBEI fal-t was picked for complete sequencing from a larger set of cloned O.fallax TBEIs. because partial sequence showed it to be particularly divergent from fall ,fal2.

21

77 78 bp In'Mad Tennlnsl Reoeal

57 kD 61 ;,lS, no G'! : G4 T4 Discussion Analysis of four diverged TIlEls has allowed us to identify their conserved features. In addition to the already identified ITRs (Herrick et a!.. 1985, Williams, Doak & Hemck. 1993) and c:onserved portions of the 42 kDa and 57 kDa ORFs (W:therspoon et:11.. (997). we have shown that the entire 22 kDa, 42 kDa, and 57 kDa ORFs have evolved under selection for protein function (Figure 41. Also. a 550 bp region between the divergent 22 kDa and 57 kDa ORFs is -:onserved for a specific nucleotide sequence (Figur~ .3). Analyses of large sets of genomic TBE I s have S~l)Wn that most of the 4000 TBE Is in the geilome share a common size and similarly sized internal segments and that their 42 kDa and 57 kDa ORFs are under selection for function (Williams, Doa!< & Herrick, 1993; Witherspoon et a!.. (997); the four TBE I s analyzed here individually share these fe:1tures. and vve ll:cept them JS representative of most TBE I s. Before discussing the conserved units. it is useful to consider by what sele..:tion pressures they might have been maintained. We hay
is compromised if TBEts are not ~ffi..:ienth exciseLi from its Lieveloping :V!A.C. Cis-acting ,equeilc~) necessary for excision will be maintained. anLi ',ve have suggested that TBE I -en..:oded proteins contribut~ ro the removal ofTBEIs from the developing \.L\C, and thus are selected for function at the level of the host (Williams. Doak & Herri..:k. 199.3: Klobutch
van Gent. Mizuuchi & Gellert. 1996). but joining of the broken chromosome depends on the 'KU' or scid DNA-dependent protein kinase complex (reviewed by Weaver, 1995). In Drosophila. a homolog of the mammalian DNA-dependent protein kinase complex binds to the tip of P element ITRs and plays a role in the healing of breaks created during transposition (Beall, Admon & Rio, 1994; Beall & Rio, 1996). These examples suggest that the 57 kDa protein could activate adjacently-bound host DNA repair enzymes following TEE I transpositional :lOdJor developmental excision. Alternatively, the 57 kDa protein might convert transposase to excisase during :VIAC Je'.eiopment and specitically act on transposase air:cady bound to TBE I sites. However. the analogy llf 57 kDa protein to DNAdependent protein kinases is weak for two reasons. First. unlike the 57 kDa protein. the D:'i.D &: .A.E. Leschzlner. 1"9'. Di' J.25--l37. Herrick. G .. 199.1. Germline-soma rebtlOnshlps In ciliated protolm: the inception and evolution of nucl~ar dimorphism in one-celled animals. S.:m. Dev. BioI. 5: 3-[::. Herrick. G.. S. Cminhour. D. Dawson. D. -"ng. R. Sheets ...1.. LeI! & K. Williams. 1985. ~lobile dements bounded by CA" telomenc repeats in Oxvtrrcha failar. Cdl .)1: 714"79. KJobU[cher, L.A. & G. Hemck, 1997. Developmental genome reorganization In cdiated prawzoa: the transposon link Progress in Nucleic Ac:d ReseJIch md '.101. Bio. 56: 1-62. Kulkosicy, J .• K.S. Jones. R.A. K~tz. 1.P.G. '.Iack & ..\.'.1. Skalb_ 1992. Residues crincal for retroviral integrative recombination In a region that is highly conserved among retrovlraUretrotransposon Integrases md bactenal insertion sequence trmsposases. ~Iol. C"II. BioI. 12: 2JJ 1-2338. Lundblad, V & WE Wnght. 1996. Telomeres md telomerase: a SImple pIcture becomes complex. Cdl 37' 369-375. - MacRae ..\.F & '.IT Clegg, 1992. Evolution l)t .\c and Ds I elements In select grasses (Poaceae) Genenca .~6: 53-95'70 similarity in the transposase gene and> 750'0 similarity in the kinaselZnF gene) and contain no indels in the regions analyzed below and so were aligned by hand. Start :md stop codons, codons containing ambiguities (presumed sequencing errors) in any element. sequences outside the transposase or kinaselZnF ORFs. 3.r.d sequences that were obtained for some but not all dements '.vere excluded from analysis; 184 aligned codons in the transpa sase ORF and 94 codons in the kinase/ZnF ORF (from 10 TBEls) remained for analysis. The TBEl gene sequences and CR-MSC sequences are GC-rich (35'7040% GC; data not shown) in comparison with noncoding sequences of the CR-MSC locus (21 % GC; Seegmiller et al. 1996). as is expected of protein-coding genes in Oxytricha (Hoffman et a!. 1995). The sequences have been submitted to GenBank under the following accession numbers: fall and fall (complete), L39908; fal3, U89031 and U89032; faI4 (complete). U85403; fal5, U89033 and U89034; fal6, U89035 and U89036; fal8, U89039 and U89040; fal9. U89041 and li89042: fallO, U89043 and U89044: tri I (complete). L39906.

Phylogenetic Analyses All characters were weighted equally in all analyses. All most parsimonious (MP) bifurcating trees were recovered (PAUP, exhaustive search; Swofford 1991) and compared with the most likely tree found by maximum-likelihood (ML) analysis (PHYLIP DNA.\fL v. 3.54. 100 replicates with random taXon input order J.nd global rearrangements; Felsenstein 1993). Bootstrapping (PAUP, 1.000 replicates. branch-and-bound search. sampling only from polymorphic sites) was used to identify strongly supported relationships. Results Genome-wide Survey of TBEI Sequence Diversity PCR using a pair of opposing TBEl primers and whole-cell Oxylricha DNA yields a mixed product. since many different TBEls provide template for the reaction. Direct sequencing of this product produces legible sequence that is ambiguous at some positions (i .e .. bands are clearlv visible in more than one lane at some positions in seq~encing gel autoradiograms; see fig. 2). This is the "family sequence" of a given TBEl region and Oxvtricha strain. Others have directly sequenced PCR p;oducts derived from mixed templates (ror ex-

35

arnpk, Capy et ai. 1991: Willia~s, Doak, and Ikrri Were gen.:r.HcJ u::"\!l!:! pnnh:r.., _~(1-l. ,md I 11),5, 'TI1J;o. ;"'XH11011 uf ..tn ,lutun.ldiogram '.If a s..:qut.:ncing :;el ·flo\\,.., Jidt.:o\:-[cnnitl~ltt.:d chains polymcrtlcO from prirnl!r 3M ~."'Cl.· Ii::;. !: 111C

family ',..:qUt:llCCS. dc",pTt(~ r~prc"'l!ming Ihe 'i-lJpt.!'rimp(hCd ..,t.:qUt:flC-

t,!s 1.1f :llany .JitfL"rent TBF.I .... ,m: JS legihle ..IS the ;;\!qucnce uf doned TB!~'; ·f~ill, 1lle: ~lIwJt'IZJlF ORF re;Jd!'i do\\'nw~tnt. with ~Ot.h)lh ~klim· il~J hy lick marks ..\rrowhe-ads ;ndical~ r'
lSS~:

0737-1038

us prior to its expression of the genes these lESs oth
319a

200 bp

r~G, -Alleles of :he ~: :OCUS Jnd [heIr :ESs. The top map ')hows ~hilf'eC1 :"e.'ltures \)f the JiIeies. rES:; are indicate:] Jy :.he ~'our :311 JlaeK bme5. Three :nrrons Me :nd:u,ed by black ;,oxes below the line ..\rrows :0 :he :ett Jrod :1ght '" [ES·R 'epresem the S:OIl codon .)1 :he C~·\1SC sen~ lnd :he ->top .:odon J( :!1e :;ghl JIm gene. respectively . .\'taps or ;harac:e:lzed DNA :icgmenL'i of the mH::"Onuc!eJI .\-He) Jnd m.1cronuc!e:lf 1\-1.AC', represenlauons :)f :::gnr lll~!es Jf:he :OCUS J.re shown below. E.lch '-je~ment:s :narked ~y J '';lrc!ec .1umber. :-er'e~ng :0 rJb.c ~. J,ln!cl Jescr:ot!s how !3.cn segment '),las l")bt2.lOed. Tne .;rst 'hree "lil~l~s ..ire :Tom J "QUill: :he ")onom :-ive Me :-rom O. .'n/aiic..r:. For ~3(h 1lle:e [.he \'lIC ~eque:lce segments Me :nuiCllec ~y biack ~ines. md (he \1AC jequenc~ ;e'Sr..ents JIe :ndic,:lted ley -:ross·ha[ched line,) Jeiow l~"e ';fIC :me. He3v!ly crossw~a[c:-:t:C1 :ines !nCic::He \L-\C ';egments ~.hat have 1i!en 'Sequenced. Llnes ~j;,},[ cxtC:1d fully to the :~fr or :ublisheci \..equence (HerriCk et al. :985, (98711) e:uenJed by the cu~nt work. ?ublished ,;.equence :.CJIUoncur Jnd Hemck. 19R4) extended ,y the I.;'urrent ')Jork. Details df nght .lI'm :-.e published .separateiy (unpublished daLal, published sequence ,Hemck et al. 19~7a. 1987b) e'tenued by the current work. publish!!d sequence (Hemek e( Ji. 1985. 1987a) ex:tended in the .;u~nt ·.l,Iork. rncludes [he THE: ,'31·:

x

sequence; det.:uis of faJ·~ sequencing wdl published separateiy (unpublished Ja(3). 'Previously published 'iequenc: .Hunter :t al. 1989) extended by the current 'Nork. 'Ie ipecificuy was ~mposed J'j C!a I digestion of mi~ed PeR products (vA ..... vB + vC), JS descnbed tJrevlom.Jy (Hunter et .11. :9S'-J)' ~he product ·.1r" J tive....:yc!e PCR :un was cut 'Nith CIa I; Jfter further PCR. Cta !·resistant ;:m:x.luct (vel was ~~i-:1unrleJ JnJ ..lmpiirled further. , Includes Ihe body of TBE! 'ri-I (unpublished Jau). IT'hc JIO Jllele ',101;]5 ~xcluded by Vlnue 01 :t'l ;:arrying TBE·tril i')ee text). ) Une.'p""tedly. IESRv ..IC 'alis to amolify SlOb .\fIC D~A 'n the presence of 510. ~llC DNA. bUl Ihe 'la.m ,of ~"1S discrimination is noC understood. ,mce the pnmer matches [he SlOb sequc~ce nearly ;>erfectly. l The 310 aJJeJe was demonstr:lted in JRB319 with J nested set of PeR ;eactions like :.hat useJ :·or 310 .\11C.

71

Internal Eliminated Sequence AJlel:c Evolutlon

lete, either by Dt"A cloning or by PCR amplification from whole-cell DNA templates, Micronuclear DNA PCR products were generated using micronuclear sequence-specific pnmers, These primers match short IES sequences or match sequence 'lear one end or [he other ot :he TBE I insertion in the 310 allele (fig. I). Because these regions are absent in macronuclear DNA. only micronuclear DNA is amplified. Single-allele products were generated by one of three methods: (I) Allele-specific primers were used :0 amplify single alleles from heterozygous whole-c:ell Dt" A. (2) The 5 lOa and 5 lOb alleles could not be amplified separately from JRB510 (51 OaJ51Ob J. They could be amplified, however. from the Fi progeny of JRB510::md JRB310. Products templated from the 310 mlc~onuclear DNA were then excluded in PCR because the large size (4.1 kb) of the TB E: that :nterrupl

ccn

C.IES-L vA

=:gcq:at.;gat:;t:V--~;7c...;7..;;...c;....:"':'.:;':":'A .V\c..;~:l..A;".AAAAT':.~GA7:-:':.:.!.AAA~...rC:-:;7:"":'~\~rl;:;v..::.;;"r:;'7":.;:'::;aaC3.tc,;:~at:::

ct.gt:;:a:;gat:a7~':'-:';:'';;''C3A~A':'.;;'';''';~ ... AAAA.MT.;AQ..A'TC:7C!'.~..';C:;:'.;';C:-::;7:"':"::";C':AC.;';T.~.;G7':'.;'rt;aaC3.t::::;taac;:: c':;Jcqcatqgac::3.F J;..;..::,r;;:".~C3AT . :"!'';':'.;;'.AA~~ ..;;..;..A ••"4.AT':.;T.!..A .. ,:,C":'.;.;p..A:"':":"::--~-:'":'C::;::":'X:-.;;"'T.\.o\G7":".;::'~:;aat.l.t::::;t:.).ac= 3:0 "::'J"C;;:3c;gac="l"""~-:;c';'::.v., ,;x:-___ .\ ..'-.GC.x::. IT. :":.,\T_;';Gr.AT.>.AT': ... =.;,C:;-;V-.A;:·:xr-:-c.;;...; __ A•. AM. _.;":~;aao.c=;::"ac= 510a ,-:,:gt:;t:~tg;acca7~A: :\ t:;aatJ.t.::-:;cadC:: :::cgt;t.ac.;gac:.l':::;;''';'::'':'X:''.:;';. ' •. ~ ~.A .. :":';:'.; ... __ :7.;:rX::r~:'.;;;.r-: ... 7'C:".\:":'S .. :;;'" .:... ~ •.;;";";"C.'!..A .....;;..;... _";7~;aac3.c=:;,:.!ac-: ._.->,:';7;';"- -1.7- :-:'.;-.-.1- -,;':'-:"':'- - - .;-.;;a-.;]..,",: - A- - - -- 7C':'.;- - -t - - -:: - -:":'c.\- - - -.;- -:';....;, - -:":'.!::-. __ .>

v9 \/C

.

:-.:'.~.;G· _ ...

TAccn

-',......-' ..... ...

: -.;'--.; ... .

< •••••••

• ••••• ->

E. Ann Junction vA ~~

:::ac:-:aaaaa .. :.laac:::c-:g .. c::3.c::,:c~aaaa. :c-:.3.::at::agg:::aac:::3.ac;c;::::':'::'; .. ':':':gaa:gcaagt:3ac:caac;::c:::'C; at ::acaaat.:.at :':'C3.at ,:a~
-2 I) r;

,8

vB

o.

vA

'~;'I

-. I

..

2,]:7

:a

.,

1]57

J

vC 310

510a

510b

FtG. J.-.\.t.1;(lmu:n~iikeiihood :Jtlviogenv "or ;IX ..l!l~jt!S .::r' :ht! CR-'\1SC gene. T.'lC! ~oDOlogy mu ~'rO~I1-1()nS .)f [he .!ene rre~ .1,,'ere inferred JSlng D:'1A.\1L ~'or -:'he CR~\-tSC .:ouon':i )( :!a-~!; JI!ele '-;(Jrt Jnd stop -.:odons ~'(c!uded). The :englh 'Ji ::~ct1 oranch :cn.lnl:!t:\i)I[1'!1 IS 'N-ntten be:ow :he lrJnch. The ::!ngrh ·)f :he lashed 'Jr~1n~t1 -.:onnec!rng 'Ie [0 vA :md \/8 ')".'3.5 lot -:;igniric:lntly different :'rom zero. DNA\1L '),las run ;0 :Imes 'Vim r::mdomized ~lxon .nput '.Jrders. In ~x~cted :"JlIO ·)t :ransa:ons :0 tr:H":$VerSlOns r -::;; : 7. Jnu jiffercnr relative :3teS ;)f ..:hange :'or ~lrs[-, ;econti~, lnd thirCl-DOslllcn ~lles: the (ree 'NHh [he highest likelihuull 1::. :.huwn. The;" := i. -:' 'v.:::iut! '),;;],', ...:::hosc!:1 J.S ~'o!lowdlOn -;!1e~- S Inll ..15 respec~lvcly.l '.Ir·er~ ~stlmiltt!d lSlng \i.lcC:Jdr.:: ,'or e:lc:l ":Jteg'Jr:: 1Jr' slles, lle J,vera!:!e numbt:r ,)1 ~euul;-cd ..:!1anges was dc~ermlncd ,)ver :Ill topoioglt!s '.... hi~h separ::Hed (he '0 ..'-("lila:, J';d O. :nlclihn jl!c~es. Boo[· )(iJp 'luppon values iPAL? :00 !"eplic:;J.les) Jre .;hown ,n :tJ!i..::s .1bove and w {he leit of the :lodes :0 'Nhich the',' ~er"t:r. '-\ITo')":" InLll(J.te h\,po{neslzed 7BEI :nsenlQn \!Vems. Unw~~~rHed :JJr5!mony . P,\t;P,. maximum-lik.:iihood wHh equal rates :'ur JiJ ')He~, J.nu J dis(Jnc~ ,nelr.od rNE!GHBOR) ..111 -ecClvered ~ht! Jbo\t: topology. ~~J.i1lng !.o r:=soi\t! the vA·vB~\'C :-eiauonsnlp (not shown I; i.t! .. ':A J:id ';C '),.'t!!"e ~rrJupe:d only 51 ,.)f ~OO ~lrnes :11' parsimony bOOrSlr:lp JnJ.iysl",

coding regions that should ~ave diverged without the actlOn of purifying selec:ion (and in the ~bsence of unlikely diversifying selection) should have accumulated a maximum extent of differences. whereas lower extents of divergence in some regions should ~ndicatc: re!atlve intensities of pUrifvlOg selection KtIng on those regions. Several of the noncoding regions show high extents of divergence. These regions are the four lESs. intron 2. and the region between the CR-.'vISC and right arm genes (":vm Jnct." fig. "'A). Tnese reglOns :Jot only show high extents of divergence. but the extents of divergence are quite similar (-0.3 to il.-t changes/sile, or "c/site"). suggesting that most nucleotides in these regions have diverged without selection against accumulation of mutations. Consistent with this view. intron 2 shows this high extent of divergence. and ~ntrons can diverge essentially as fast as pseudogenes, ,... hich are assumed to evolve Without selection (excepting the few nucleotides required for splicing; Li and Gram: 991, pp. 71-i3 J. CorrelJted with such hi gh di vergence is a demh of G and C nucleotides in these -:JoncoJing regions; C.g .. the four lESs have ..In J.verage of 32'70 ,J.. ~ T The coding regions of the CR-MSC :md right ann genes show suppressed levels of v In:.JtlOn. The extent of 5ynonymous divergence in CR-.'v1SC codons (D,,'S slightly depressed ~low that of the maximally diverged noncoding regions lfig. -tAl. suggesting that not all -;ynonymous changes are silent. Such D, depreSSion is seen

74

1358

SeegmIller et al.

06f A. OA

"

0.21-

a>

a> c:





at dB. ~

em

01111111

III

II.

02~ ot

dO. •

.11

II.

III

4'1

.....-

Mao

..I

C.

is OAr

OL

Iilli ~~

II• • .....w......

[ll:J

1/

CR·MSC synan.

-c:::J--i J IES·L

IL

IL !

i3

I ,

i2

i

'

i1

:

I

c:J

...'

.u r:::::; ; - Q -

IE5-R ..>.rm Jnct

IES·PA1

FtG. ".-Dlvergences Jetween 3jlelic pairs of '1anous :-eglOns ,)f t.he 31 :ocus. 3ar ::eights represent Ji . . ergencr;; dis[ances between p3Jf'S of sequences. Black bar values were computed by DNADIST .md expressed as the :!stlrna[ed ilumber uf changes ?Cr .lite 510ce the jlvergenc:! of the .5equence pair. \\t'hue bar values Jre computed D\ divergence values between CR-\-1SC open :-eadtng ::Jme sequence ?Jlr5 I start Jnd )w~ codons ~,'(c!uded), where D~ ;5 ~.he e!slJmared number r)f synonymous ';:1anges per synonymous SHe. \Ve '.'ennea :l!':H the DsDn mil DNADfST programs ~enerJte :he ,arne jlvergence YJiue for randomly jiv.:rg~a scquenc~s generated by 'iIfTIu];HJOn IDs:::; On = DNAD[ST ·/alue). The map 11 ~he bottom mdlC:.ltes the regions J.nalyzed, represer.red lS In ~gure :: "Arm 1:1ct" lndicJ.(es the :-e;lOn :.egmmng 95--; of the germline sequences (for reviews see Yao. 1989: KlobU[c~er Ed hnn. 1991; Prescott. 1992; Hen-ick. 1993), Followmg ,Jr Juring a period of chromosome pol) teniZ1(Jon. mtemai eliminated sequences (lESs) Jre Jdeted: each that is bounded by short Jirect repeats. suggesting mat lESs arc derived from transposons (reviewed by Klobutcher and him. 1991) While most known lESs are small and not detectably repetitive. !Jrge repetitive transposon-:ikc elements .lisa ,lie climinated '''' rESs; :hese mc!ude OXYrr7cha Oxiord '....'nl'.twsity

::J~85S

TBEls and E:.0 Ut.H its

sequences J.iign 'yuh those oJf ::..,e rhl0.:5, the Jorrunanr :1uc~:::o[:d~ sequ:.!:1ces freGue:1;:',

:hov.~ .I1 ~anel

.)f' ~he

.-\

In! .'";::pr~:;;o:!nted.

:T,OSt frequent :1uc!::nJOel 5)

~s

is und~r;ir:ed ..1 J::mds ."' clIld 3

is imlistmguIsh:lbie in size :'mm all four characte~ized fal dements, The fuil seque:Jee of ral-I is ~073 op long (F. P. Doerder :lJ1d G. Herri.:k, m preparJtion I. Indicating that J.I kbp is the ~:lnontc:l1 si.ze tor TBE I dements. Southern blots of resrnC:cd o...'allax and O.lrifallax ONAs support the conclusion that most TB E is arc the sarr.e size and share manv rcstnc:i. .:.. TBE 1 pnrr:er clireeted 'off shore' such as 9. 10 .)r [ (Figure lEi. was coupled with :In unre:ated prime!'. \Vc expected such 11 ?rimer-design:.ltcd J . Flanks' primermight fOrtUI:ously match t1anking seque:1ces wlth;n ampliriJ.ble distances of J small subset of the few thousand TEE I ends in DNA .'vlIe rFi>rure 1B). Four such sets of heterogeneous peR DroduclS- were made, primed with different 'Flanks' prid;ers. Nested peR tests ~dic:lted t.hat such 11 reaction produces primarily 11 collection of TEE I flanks (see .'v[aterials.ll1d methodsi. :IS subsequent analv;es bear out. Sequencmg three of these collemons gave geis th,H were easily read:.tbie through the ITR sequence. mdic:.tting their ught con,;er.'auon ('tFJnks·. Figure 3A). A :ew furthe~ nucleotides also were readabie. into the collection of TBE 1 tlanks: further positions were fully ambiguous, as expected (Figure 3A). Flank posltions I .ll1d 3 are unambiguous Iv :\ :lJ1d-T. respectively" Flanks from the tourth peR -collection were cloned :lJ1d 11 were sequenced ,:r;-: through :ri-163. Figure 3). In each case. positions I :lJ1d J again :lfe A lnd T ..:.. conse:Jsus table of rlank sequences: Figure 3B) shows positions I and 3 :lfe Jiways A :lJ1d T. 'Ne refer to the,e invari.ll1t ,hnking positions as 'Ai' :lJ1d ·T3·. InforTI1;.([ISC genotypes of JRB310 ami Its parmer. JRB5iO: PCR men allowed qUick genoryplI1g of progeny (K. Wllliams. T,G.Doak. A.Seegmiller. T.Messick and G.Hemck. in preparanon). JRB310 is homozygous tor the :n-l Jearing aJlde '310'. where:J5 JRB510 IS hererozygous. Ctrrrving tWO further :likles, '510a' md ·SlOb·. neither of ·.vhlC~ 'Jears a TBEl. To JIlalyze exconjugams inheriting tri-l. we mated JRB31 (] and JRBS i 0 cells md cloned exconjug:mts :hat showed ,lvert md persistent Yl.-\C ,ll1bgen. FouI1een F! lines were identiried mat ~ecelved the 310 allele from JRE] 10 and hence processed tri-l dunng me development or :heir \1AC anlagen. Tne \lAC H\!SC DNAs of each of :he P :ines were anaJyzed by peR J.nd direcr ;equencmg [0 !earn me fare of me tri -I t1anks. The resuit for one of those clones. ·SLC:'2·. IS shown in Figure ..c. The sequence oi the 310 y!AC pnxJu~: is iderltlcal in :he excision region to JRB510 y!AC D0A I Figure .j.-\.I. whlC:-t was ,krived :rom y!Ic likies !a6:ing tri-l. Hence. tn-I md one nankIng reeeat (AAT on :he strJIld sequenced, .-\ TT as shown in Figure::) have 'Jeen dimInated. exactly reve~'1ing me gennline H\!SC mut:uicn created by m-l inseruon. Par31lei lnJlyses Di tne remaJr:der of :he 1-1 F I lInes ,hawed that lri-l precise ~xclsion 'ud oc~ur."eJ in each episode of '.-I.;C development. Thus. lri-l excision is regularly precise. The ex.:()nJug~.ln[ line SLC32. lik~ dlree l)thc!:-s. is :)~cc:al. haVing the same homozygous 310 genotype J.S JRB3 iIJ. In

4596

;cQuence )( )pposlte :'r-..L';J. .S ·,hu'.vnl. ~:: ~Dper ;Jar;~~ Jirr...JrpnlC ?Osition ...r.,)v,,~ I :'unna ,"[(;[:1 ;-r.mer . -l.~- -. !n

~nci~~[C''i 1

~nU"on : .Jr' :he :-i~{SC .:;:e:-:..:: .lik:::s ~ iOa .u:d 510b 1JVe J C, '.vh1;c Jlkie J 10 ~J..s .i r. 1...\ J.Ild C; l.'":~ ?CR :empL1IC's 'Nere ;0 ..::e!i ~qUl\'J.ie:1[s .n- ·J,i~Ole...;:lI DSAs ~u:-:!;·,:d ,rom_ ~esxcllvdy. JR3510 1J.i~;!:e'.i 51',~J. - S::Jbi Jl1d .1 ~nTJ;e;.~ lRB310 ;;rc~c:1y '.in-.: ::;L:::J

i:lij

10rr:OlV'2(.'[~).

IRE: IO

,B) Te:1

.inti JRB5l0

em-.!::::.:.~ ~.v;conIU~:ll1ts ~rorn .1

'.... ere lysed ~ire-~~jy

In

manng

'~.h~ PCR :-cacuon.

l[

-

these uniparental cases \lAC development :md tri-l precise cx:ision occurred tn the absence of m 'empry' allele isee Oiscusslon). [Besides th,,! tour 310 - 310 uniparentaJ progeny. [\\0 j lOa .;- 51Gb ;Jrogeny Jlso were tound. :lS wac 10 typical biparental progeny. in the expec:ed y1ende'i:m ~atio. tive 3111 - 510a and :lve 310 ~ SlOb. Cniparental progenv are ooser'ed in a variery of different ciliates and CJIl arise in ar letlSt three different ·.vays: selYing, cytogamy or autogamy (Grell, 1973).; In ll:e PI D\cAs no sequence heterogeneity was detected at or beyond, above) the tar;;et sire, and mus mere was no evidence tor :mprecise eXC:Slon (Figure -lC ·lersus;). even though me nul mixrure ,)f PCR products was directly sequenced. However. imprecise excisions still might be common I,disc:lssed by Hunte, er tIl.. 1989): tirst. multiple chromatIds of the TBE l-bemng :ocus maY' be generated dur.r.g polyte~..ization berore exc:.sions ,'Cc"Jr (see below I and second. f:ti-l. fal-2 md tn-l intelT'~pt me H:V!SC coding reglOn. Thus. !mprecise excision products 'mght have 'Jeen ger.er:ued m :.lle anlage. but ;eleclJon t'or HMSC iUllcrion dur.ng subseuuent clonal growth could have caused biased rete!1uon ,)f pre2i~~ produc:s. cxplJinmg our ICsulrs JJ1alyzing YI.;C D"'.-\ from donally propag:lted ~eils. To assess :he tideliry of :ri-l excision. its rlanks were examined direc:lv in JR3310 :< JRB5 i 0 exconjugants. before Jny exconjug:mt gf'l"lll \)Ccurred. Inuivldu;;] exconjugants 'here :lCll1d picked. as above. and later harvested :cr PCR.ls :hey resumed the ve;er:lnve form '3- .. d:lYs lner pamng), but ':ectore they gre'.v or divided. InuS st:lge occurs - 1 ,by after :he ,lid yl.;( 3lre:.;dy has :,een resorbed Jnd no ,lid '.I.;C D'iA was detected by PCR: :Hlpub1ishcd con-

84

Developmental precise excision and circles of TBE 1

A. 16

B. 24

4()

c.

-

produc15 10 unallered 51 Oa - b producl5 is - I: 1 i intensities of marked 'r and 'C' bands m figure .18). compIlfuble WIth the expected inpUl ratio ot' 310 and :51 Oa - b chromatids. These results also establish that [n- I eXClSlon uccurs durin!! ~lAC development. Free 4.1 kbp TBE1 circles in exconjugants

x-

Fig. 5. TBEl .:irc!es 'n ~.,,(l..':onJugJ.m D:'\,-\ .-\utoraJ:ognIns ,)[ JgJfOSe Jlms Jybndized '.\I1L.1 .li1 ,merr.ai l 3 ~bp ,ectlon ·Jf ~.Jl-2.. I'A) 81m from 0.7'1 19arose :jt:l. ~ach lane ',vlth 0\"'\ ;'rom - 1.5 '-< 10" ,;eHs. han"eSled .liter ma,,(llT!:li ;~l1nn~ JI :.he ~l:r:es ;ndicJ.ted Ih). Bands marked: open C!fC:I!S: Xl'. iJrooaole ~upercOlls not Jetecred in subseque:u ;!xpenmems. ).'hc:-e ';.e ;usoeCl: ::.upercous _~uffered anlfacrual :ucking). ,8 mti Ci 310ts ~-rom 1),5 JIld 1.3% Jgarose :jels. Lmes 3: .lfld C:2 ';J1T': ~-,~c;;nJusam Q:--;A. ~'rom - i '( lrY ,;eUs: TBEl band I : ) 15 indicJted JV ',)pen ..::rc!es. Outsld~ lanes .:arry 30 !lg or J vanously :re3Ied -l.l) :coo plasrrud I prooe ,:ontained J trace ~I!i

=.

or ~e ?ia.srrjd). Plasmid JanJs: ~. -=, -1.0 kbp supercOlis and :"[izuuchi. 1992). tailored for TBE l in .+599

87

K.Wiliiams. TG.Doak dnd G.Herrick

the \-lIe ,Figcre 7.-\ - DL After a replica[Jon tork ~.­ GGGG.:. (iT ,GGGGTTTiGGGGTTTTJ fEEl ime:-:1al 9nmc~s I} (A TCC.-\"-\">'AGTGC.~ TTITGAGTG', "nc 10 I AGCTTGT.~A TTTTT:

GTCTCGCAJ Me Jjrccted outWaros ·.vlm melr ;' I!nds l.l';-: J.Ild

to:

JP

ends, see Figure 5.'11. ?:1ffier 5 ICCT M TT.-\..-\GTACGTAC C.-\ATTT.-\J 'Jinds -1.3 kbp ~·rom J1e ?rima \l :;ue .lild pri..r:1~r i IGAAX;-GGT.ATC.>.CIT.~C.~AAGGI omds -0." kbp from coe pnmcr 10 ·me ' '. ,MCTTKGCAG.-\..>. TAGM TTITCTGAGTCC.~l. VHO' IHumer et aI., 19891. PEB ,Williams and Hemek. 19911. 1-157 - IMTAAAAT_~TCGMTCAT­ TGAITTC). from ihe

TBE1-fa/-3 and ·fa/-4 iso/arion Phage ,\

;\'0

peR, sequencing and cloning PeR reac:::ons !:!.5 an conL1il1ed TJq DNA polymera.s~ and provided buffer (Boelmngen. Producrs were lsopropanol~preclpml{ed :.md sequenced directly IGmCO JsD'iA Cvcle Scquencmg kllJ. or cloned into J pla;nud vecmr. The circle ~unctl()n peR ~ction \35 cycles. 97°C for 15 ::i. 54°C for 30 s. 72° C for 90 'i) comamOO gel-punned circles (- 200 ~ell eqUlvalemsl and pnme:-s 9 and to (0.': .~:Vl ~::Ich). Flank.s PCR re.::Icrions 1.3_.5 c::/~ies. 97°C for [5 ,), 56~C for 3 mIn) conGUned -2 ng ")t get~puntJed [urut~ mobiliev JRB510 DNA. ihe ITR pruner I and 0 ',lanks pnmer. eliher PE.~. PEB 0; LCRI Isee Results ""d Figure I 8l. Each product had a lirruted >et ot hereroeeneous-,)Zed DNAs and wos sequenced from pruner I. A fourth peR reactio-n (35 ~vc!es. 97° C for [5 s. 54°C for 3 mID) contained -::: ng of lirrut-mobiliev jRB310 DNA. pnmer 10 and ihe '~anks' primer LCR 1'. The reaC:l0n ..usa produced :.l linured set of heterogeneous-sIZed products. Thev were used as template ( 1 ,:.d) m each or four nested secondary re31.:tl0n5. Each :1e~too :-eaction ae:a.m used LCRl'. coupled '..\11th ~ither 1. A. B .Jr GT. tnStead of pruner 1-0. These pnmers 'houid bind ihe ITR of bona tide terminal TEEI products progressively closer to ihe ITR end IFigure IB). giving progressIvely . T.G.Doak. A.Seegmiller. T.Messick and G.Hemck. in preparation!. The 5equc:nces have been submitted [0 GenBank raccesSion numbers L02S55 - 56 and U120235 - _'91 and relevanl segments Jre prescmed In fiE;p:-e ~: vA :ll1d vC \tIC sequences h:l'·.1;! access"," .cumbers M 13029 - 3'1. \\ 13035 -':2 md '100056. To penorm peR Jirectly '"n D~·A In individual cdls .. me ~eJls 'xcre nlI1d isolated into '.I,Iater In rhe peR :"(::l.~tlon tube. lysed wah O.Ol ~ :"i"aOH. brought ,0 iJH 8.1 with Trl')-HC1 . .lnd r,1e (u[] PCR ~e.uctlon ·..v3S constl[uted .\1otlle ..:ells and i..:Y-;LS both lyse readuy: \1.-\C m[ervals Jre easily ..unpln;t;d from J ')ing.~ -.:ell: > 50