Patterns of Gene Duplication in Lepidopteran ... - Springer Link

2 downloads 0 Views 479KB Size Report
4 School of the Environment, University of South Carolina, Columbia, SC 29208, USA. 5 Department of Chemistry, State University of New York at Stony Brook, ...
J Mol Evol (1998) 46:272–276

© Springer-Verlag New York Inc. 1998

Patterns of Gene Duplication in Lepidopteran Pheromone Binding Proteins Thomas J.S. Merritt,1 Siana LaForest,7 Glenn D. Prestwich,5,6* Joseph M. Quattro,1,2,3,4 Richard G. Vogt1 1

Department of Biological Sciences, University of South Carolina, Columbia, SC 29208, USA Program in Marine Sciences, University of South Carolina, Columbia, SC 29208, USA 3 Baruch Institute, University of South Carolina, Columbia, SC 29208, USA 4 School of the Environment, University of South Carolina, Columbia, SC 29208, USA 5 Department of Chemistry, State University of New York at Stony Brook, Stony Brook, NY 11794, USA 6 Departments of Biochemistry and Cell Biology, State University of New York at Stony Brook, Stony Brook, NY 11794, USA 7 Departments of Ecology and Evolution, State University of New York at Stony Brook, Stony Brook, NY 11794, USA 2

Received: 19 March 1997 / Accepted: 11 July 1997

Abstract. We have isolated and characterized cDNAs representing two distinct pheromone binding proteins (PBPs) from the gypsy moth, Lymantria dispar. We use the L. dispar protein sequences, along with other published lepidopteran PBPs, to investigate the evolutionary relationships among genes within the PBP multigene family. Our analyses suggest that the presence of two distinct PBPs in genera representing separate moth superfamilies is the result of relatively recent, independent, gene duplication events rather than a single, ancient, duplication. We discuss this result with respect to the biochemical diversification of moth PBPs. Key words: Lepidopteran pheromone binding proteins — Molecular evolution — Gene duplication — Lymantria dispar

Introduction Pheromone binding proteins (PBPs) function in lepidopteran mate location by facilitating the delivery of hydro-

*Present address: Department of Medicinal Chemistry, University of Utah, Salt Lake City, UT 84112, USA Correspondence to: T.J.S. Merritt; e-mail: [email protected]

phobic sex-pheromone molecules to specific neural receptors in male antennae (Vogt 1987; Vogt et al. 1991a; Prestwich et al. 1995). PBPs are a member of a larger gene family, the insect Odorant Binding Proteins (OBPs; Vogt 1991a), small, water-soluble proteins expressed in the olfactory tissue of insects. Initially defined by their ability to bind sex pheromone, PBPs are found in very high concentrations (10 mM) in the aqueous fluid (the sensillar lymph) that surrounds dendrites of pheromonesensitive neurons within the olfactory sensilla along the antennae (Vogt and Riddiford 1981; Vogt et al. 1989). Binding and expression studies have led to a model of PBP function in which pheromone is bound into a hydrophobic pocket of the PBPs, allowing the long chain hydrocarbons to cross the aqueous barrier of the sensillar lymph and interact with specific receptors on the surface of pheromone-sensitive neurons (Vogt et al. 1989; Prestwich 1993; Du and Prestwich 1995). In their role of binding and solubilizing pheromone, PBPs may act as selective filters for the female pheromones by specifically binding, and solubilizing, the appropriate pheromone molecules. Pheromone binding proteins have been described in eight species of moths. Single proteins have been identified in five species (Krieger et al. 1993; Krieger et al. 1996; Gyorgyi et al. 1988; Raming et al. 1989), pairs of PBPs have been described in the remaining three species (Nagan-Le Meillour et al. 1996; Vogt et al. 1989; Krieger et al. 1991). In species expressing two PBPs, the

273

individual proteins have been demonstrated to differentially bind the chemical components of the female pheromone (Vogt et al. 1989; Du and Prestwich 1995; NaganLe Meillour et al. 1996). This differential binding represents a fine tuning of PBP pheromone interaction, making possible a greater specificity of pheromone signal recognition (Vogt 1989; Krieger et al. 1991; Du and Prestwich 1995; Prestwich et al. 1995). Given the potential for greater binding and substrate specificity that duplicate PBPs allow, has selection acted to alter PBP pheromone binding? Genes that code for proteins that are involved in recognition of other molecules are often under diversifying selection (e.g., major histocompatibility complex, Hughes et al. 1990; olfactory receptors, Ngai et al. 1993; Hughes and Hughes 1993; immunoglobins, Tanaka and Nei 1989). In these cases where selection has been documented, the pattern of gene evolution has been firmly established. The first step in investigating the possibility of selection in the evolution of PBPs is to establish the pattern of evolution of this multigene family. The duplication of genes and their subsequent functional divergence is a fundamental but incompletely understood process in the generation of protein diversity and adaptive evolution (Li 1983; Hughes 1994). It has been hypothesized that PBPs have evolved by a process of gene duplication (Vogt 1989; Krieger et al. 1991), but the pattern of duplication has not been reconstructed by phylogenetic analysis. At least two alternatives present themselves: a single, ancient duplication event followed by losses in multiple taxa or multiple, relatively recent, unique duplication events. These two scenarios have different implications for the role of duplication in the evolution of these proteins. We report here the cloning and sequencing of the complete coding regions of the two Lymantria dispar PBP genes (LdisPBP1 and LdisPBP2) and present a phylogeny of the published PBP genes. We show that for those examples in which two PBPs have been identified, these duplicate PBPs are products of recent, independent, gene duplication events.

Materials and Methods LdisPBP1 Cloning and Sequencing. Three hundred fifty-four base pairs (bp) of the LdisPBP1 coding region were amplified from Complementary DNA (cDNA) synthesized according to manufacturer’s instructions (Superscript Preamplification System, Gibco BRL) using the Polymerase Chain Reaction (PCR) (Saiki et al. 1988). Degenerate oligonucleotides were designed from published lepidopteran PBP sequences. The forward primer (LdPBP44F) was designed from the first 30 amino acid residues of LdisPBP1 (Vogt et al. 1989). The reverse primer (LPBP123R) was designed from an alignment of all published PBP sequences: LdPBP44F is, 58—TTYGCNAARCCNATGGARGC—38, and, LPBP123R is, 58—GGNRCCCANTTIARMTYRAG—38, (I 4 28—deoxyinosine; M 4 A|C; N 4 A|C|G|T; R 4 A|G; Y 4 C|T). PCR was carried out for 40 cycles under the following conditions:

denaturation at 95°C for 1 min, annealing at 48°C for 1 min, and extension at 72°C for 1 min. PCR products were cloned into pGEM-T vector (Promega) and sequenced manually. Dideoxy DNA sequencing was performed with US Biochemicals Sequenase 2.0 kit and 35S-dATP as the label. At least three independent clones per PCR fragment were sequenced on both strands. The RACE method (rapid amplification of cDNA ends; Frohman 1990) was used to amplify the extreme 58 and 38 coding and untranslated regions of the LdisPBP1 cDNA. RACE amplifications used genespecific primers designed from the LdisPBP1 cDNA clones: LdPBP38 is, 58—GACGACCCATGCCAGACTATG—38 and, LdPBP58 is, 58—CTTAGCGAGACAGAGGATGAC—38. Conditions for the PCR, cloning, and sequencing were as described above.

LdisPBP2 Cloning and Sequencing. First and second strand cDNA were synthesized using the Stratagene ZAP-cDNA Synthesis kit from 100 mg of total RNA by oligo-dT priming. Double-stranded DNA was methylated to protect internal Eco RI sites, blunt-end ligated to Eco RI linkers, digested with Eco RI, and size selected to obtain cDNAs >500 bp in length. The cDNAs were ligated into lambda ZAP and amplified on XL1-Blue (Stratagene). Approximately 100,000 plaque forming units (PFU) were screened under expression conditions using rabbit antiserum prepared against LdisPBP1 and LdisPBP2 (anti-LdisPBP; Vogt et al. 1989). After overnight growth, duplicate plaque lifts were obtained on 137-mm, 0.45 mm nitrocellulose (Schleicher & Schuell) presoaked with 100 mM IPTG. Positive clones were identified using a 1:250 dilution of primary anti-LdisPBP in conjunction with a Promega Immunoblot kit based on a goat anti-rabbit IgG coupled to alkaline phosphatase. Following plasmid rescue from the lamda ZAP, plasmid DNA was isolated using Qiagen miniprep reagents and protocols. Dideoxy DNA sequencing was performed with US Biochemicals Sequenase 2.0 kit and 35S-dATP as the label. Primers included T7 and T3 oligonucleotides for coding and anticoding strands, respectively, followed by internal primers based on initial sequence data.

Phylogenetic Analyses. In addition to the two L. dispar PBP (LdisPBP1 and LdisPBP2) sequences reported here, insect OBP sequences used in the phylogenetic analysis were obtained from the GenBank database or the primary literature: Agrotis segetum (AsegPBP; Prestwich et al. 1995); Antheraea pernyi PBP1 (AperPBP1; Raming et al. 1990) and PBP2 (AperPBP2; Krieger et al. 1991); Antheraea polyphemus PBP (ApolPBP1; Raming et al. 1989); Bombyx mori PBP (BmorPBP; Krieger et al. 1996); Heliothis virescens PBP (HvirPBP; Krieger et al. 1993); Lymantria dispar PBP2 (Prestwich et al. 1995); Manduca sexta PBP (MsexPBP; Gyorgi et al. 1988); Antheraea pernyi GOBP2 (Breer et al. 1990); Heliothis virescens GOBP1 and GOBP2 (Krieger et al. 1993); Manduca sexta GOBP1 and GOBP2 (Vogt et al. 1991b). Amino acid sequences from each mature protein were aligned using the ClustalV multiple alignment program (Higgins and Sharp 1992). Minor adjustments were made to the alignment manually (Fig. 1). The putative signal sequences (the first 19 and 20 amino acid residues of LdisPBP1 and LdisPBP2, respectively) were excluded from the phylogenetic analyses because of difficulty recognizing homology across the PBPs and because signal sequences have been shown to evolve at different rates than the rest of a protein (Garcia-Maroto et al. 1991). A phylogeny was constructed by the neighbor-joining method (Saitou and Nei 1987) using the algorithm implemented in MEGA (version 1.0; Kumar et al. 1993). Pairwise gamma distances were calculated according to the formula of Nei et al. (1976), with the a parameter set to 2.05 (Uzzel and Corbin 1971). The pairwise deletion option for gaps and missing data was used throughout the distance analyses. Bootstrapping (Felsenstein 1985) was used to evaluate the degree of support for particular groupings in the neighbor-joining analyses.

274

Fig. 1. Alignment of all published lepidopteran PBPs; putative signal sequences are not included (see Methods). Question marks represent missing data. Note that the extra amino acids separate proteins along phylogenetic lines (see text for details).

Results The two Lymantria dispar PBPs previously identified by protein purification (LdisPBP1 and LdisPBP2, Vogt et al. 1989), were cloned and sequenced through a collaborative effort between two laboratories. Screening of a L. dispar antennal expression library with an antibody prepared against both L. dispar PBPs identified two independent clones of LdisPBP2, but failed to identify any clones of LdisPBP1. The LdisPBP1 gene was obtained using PCR with degenerate primers based on published lepidopteran PBP sequences. Sequence analysis of two clones identified by antibody screening revealed identical sequences differing only in the length of their poly(A) tails. Both clones contained the DNA sequence encoding the previously identified LdisPBP2 N-terminus. An ATG start codon preceded the N-terminal encoding region by 57 nucleotides, defining a region encoding a signal peptide sequence typical of insect OBPs and other secreted proteins (Vogt 1991b; von Heijne 1986). An in-frame stop codon (TAG) was present in both clones 495 bp downstream of the start ATG. The sequence encodes a mature protein of 145 amino acids (Fig. 1). Sequence analysis of LdisPBP1 PCR products reveals

a cDNA consisting of 672 nucleotides, including an ATG start codon, TAG stop codon, and 123 nucleotides of 38 untranslated sequence. The cDNA open reading frame codes for an 162 amino acid protein (Fig. 1). As in PBP2, the first 19 amino acids of PBP1 appear to form a signal sequence, the remaining 143 amino acids forming the mature protein. Comparison of the predicted amino acid sequence with the N-terminal L. dispar PBP sequences (Vogt et al. 1989) identified this protein as LdisPBP1. LdisPBP1 and LdisPBP2 were aligned with all other published moth PBPs using ClustalV (Fig. 1). A neighbor-joining analyses of aligned PBPs is shown in Fig. 2. The tree topology reflects currently accepted relationships among higher lepidoptera (Minet 1991), indicating two major groups that correspond to the superfamilies Bombycoidea (Bombyx, Manduca, Antheraea) and Noctuoidea (Lymantria, Heliothis, Agrotis). Bootstrapping provided strong support for the separation of the two superfamilies and an independent origin of the duplicate pairs of PBPs (evidenced by sister taxa relationships of PBPs within genera). Conversely, bootstrapping analyses provided little support for groupings that united the A. pernyi PBPs and L. dispar PBPs. AperPBP1, AperPBP2, and LdisPBP1 grouped together in only 2% of the bootstrap trees. Similarly, AperPBP1, PBP2, and LdisPBP2

275

Fig. 2. Neighbor-joining phylogeny relating lepidopteran PBPs. Bootstrapping values (500 replicates) from neighbor-joining analysis are to the above left of each node. Although drawn as unrooted, an analysis with insect odorant binding proteins as outgroup taxa suggests that the root be drawn between BmorPBP and the HvirPBP/Aseg PBP clade. Arrows indicate location of two putative gene duplication events.

grouped together in only 1% of the bootstrap trees. Interestingly, the ApolPBP1 and AperPBP1 proteins formed a highly supported clade that grouped with the AperPBP2, suggesting that the duplication event that generated the two PBPs preceded the common ancestor between these two Antheraea species.

Discussion We have cloned and sequenced the complete cDNAs from the pair of pheromone binding proteins expressed in male L. dispar. The L. dispar PBPs were initially discovered as two male specific proteins in a non-SDSPAGE analysis of antennal protein extracts. Based on the N-terminal sequences (30 amino acids) it was proposed that these two PBPs were products of separate gene loci (Vogt et al. 1989). It has been shown by Southern blot that the two AperPBP proteins are encoded by separate gene loci (Krieger et al. 1991). The level of sequence divergence between the LdisPBP cDNAs reported herein and direct comparison to the levels of divergence between the two AperPBP proteins further supports our claim that the LdisPBPs are products of two separate gene loci as opposed to allelic variants of a single locus. With the sequencing of the two Lymantria PBPs, complete coding sequences are now available from duplicate PBPs from two separate lepidopteran superfamilies. This allows us to address the question: Are the PBP pairs the product of a single, relatively ancient, duplication event preceding the diversification of the two superfamilies, or the product of two independent duplications within the two superfamilies? The answer to this question affects our interpretation of how multiple PBPs are employed by different taxa. The maintenance of a single ancestral duplication implies relatively ancient selective pressure and a similar function for the two PBPs in the different species. More recent, independent, duplications imply independent selective pressures and more distinct functions for the PBPs. The ancestral duplication hypoth-

esis requires four loss events, in addition to the single duplication, to account for the pattern of single and duplicate genes in Fig. 2: a loss in B. mori, another loss in M. Sexta, an apparent loss in A. polyphemus (protein sequencing identifies only one PBP in this species; Vogt and Riddiford 1981) and one loss in the common ancestor of H. virescens and A. segetum. The recent duplication, or independent origin, hypothesis only requires two duplication events, one in the common ancestor of A. polyphemus and A. pernyi, one in L. dispar and the single apparent loss in A. polyphemus. The independent origin hypothesis is the simpler, more parsimonious of the two hypotheses. This independent origin hypothesis indicates that the duplication events producing the LdisPBPs and the AperPBPs occurred late in the diversification of the two lepidopteran superfamilies. In the earliest work on PBPs, using protein purification and a limited number of species, single proteins for each species were found (Vogt and Riddiford 1981). As more species have been examined, using both protein purification and PCR analysis, duplicate PBPs have often been identified (Vogt et al. 1989; Krieger et al. 1991). Recent protein purification and sequencing has identified two PBPs in another noctuid moth, Mamestra brassicae (Nagan-Le Meillour et al. 1996); the N-terminal sequence data is too short to include in our phylogenetic analysis, but these proteins may represent a separate duplication within the Noctuoidea. Interestingly, Nterminal protein sequencing indicates that Orgyia pseudotsugata, a moth within the same family as Lymantria dispar, has only a single PBP (Vogt 1991a). Although insufficient sequence data is available to allow inclusion in our analysis, this could represent either a loss of one PBP gene, or place the duplication leading to the two L. dispar gene later than the separation of these two genera. Both of these points, and our phylogenetic analysis of the L. dispar and A. pernyi PBP genes, argue that duplications of PBP genes are relatively common and relatively recent events in the evolution and diversification of moth species. Implicit in this argument of multiple, recent,

276

duplications are changes in selective pressure leading to the maintenance of duplicate PBPs, changes that have arisen independently in at least two separate instances. These selective pressures are both intriguing, and to date, elusive. Just as puzzling, given the number of modern taxa with two PBPs is the lack of any evidence for ancient duplicate PBPs. In total, the evolution of PBPs appears to be a story of multiple late, independent, gene duplications; the actual frequency of these duplications and their exact biological implications remain to be determined. Acknowledgments. We thank D. Kluger and G. Du (Stony Brook) for assistance in LdisPBP2 cDNA cloning and T.A. Mousseau, W.J. Jones, K. Oswald, N. Schizas, M. Rogers, and M. Franco for valuable comments on an earlier version of this manuscript. Financial support for this research was provided by the USDA Competitive Research Grants program (85-CRCR-11736 and 94-37302-0398, GDP), the NSF (INT9014102, GDP; OCE-9402855, JMQ), the Herman Frasch Foundation (305HF92, GDP), the National Institutes of Health (NICDC DC-00588, RGV), and the United States Department of Agriculture (CRGP 9437302-0615, RGV).

References Breer H, Krieger J, Raming K (1990) A novel class of binding proteins in the antennae of the silk moth Antheraea pernyi. Insect Biochem 20:735–740 Du G, Prestwich GD (1995) Protein structure encodes the ligand binding specificity in pheromone binding proteins. Biochemistry 34: 8726–8732 Felsenstein J (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783–791 Frohman MA (1990) RACE: rapid amplification of complementary DNA ends. In: Innis MA, Gelfand DH, Sninsky JJ, White TJ (eds) PCR protocols: a guide to methods and applications. Academic Press, New York, pp 28–38 Garcia-Maroto F, Castagnaro A, Sanchez de la Hoz P, Maran˜a C, Carbonera P, Garcı´a-Olmedo F (1991) Extreme variations in the ratios of non-synonymous to synonymous nucleotide substitution rates in signal peptide evolution. FEBS Lett 287(1–2):67–70 Gyorgyi TK, Roby-Shemkovitz AJ, Lerner MR (1988) Characterization and cDNA cloning of the pheromone-binding protein from the tobacco hornworm, Manduca sexta: a tissue-specific developmentally regulated protein. Proc Natl Acad Sci USA 85:9851–9855 Higgins DG, Sharp PM (1992) Fast and sensitive multiple sequence alignments on a microcomputer. CABIOS 5:151–154 Hughes AL (1994) The evolution of functionally novel proteins after gene duplication. Proc R Soc Lond B 256:119–124 Hughes AL, Hughes MK (1993) Adaptive evolution in the rat olfactory receptor gene family. J Mol Evol 36:249–254 Hughes AL, Ota T, Nei M (1990) Positive Darwinian selection promotes charge profile diversity in the antigen-binding cleft of class I major-histocompatibility-complex molecules. Mol Biol Evol 7 (6):515–524 Krieger J, Ganble H, Raming K, Breer H (1993) Odorant binding proteins of Heliothis virescens. Insect Biochem Mol Biol 4:449– 456 Krieger J, Raming K, Breer H (1991) Cloning of genomic and complementary DNA encoding insect pheromone binding proteins: evidence for microdiversity. Biochim Biophys Acta 1088 (2):277–284

Krieger J, von Nickisch-Rosenegk E, Mameli M, Pelosi P, Breer H (1996) Binding proteins from the antennae of Bombyx mori. Insect Biochem Mol Biol 26:297–307 Kumar S, Tamura K, Nei M (1993) MEGA: molecular evolutionary genetics analysis, version 1.0. University Park: Pennsylvania State University Li WH (1983) Evolution of duplicate genes and pseudogenes. In: Nei M, Koehn RK (eds) Evolution of duplicate genes and proteins. Sinauer and Associates, Sutherland, Ma Minet J (1991) Tentative reconstruction of the ditrysian phylogeny (Lepidoptera: Glossata). Ent Scand 22:69–95 Nagnan-Le Meillour P, Huet J, Maibeche M, Pernollet CD (1996) Purification and characterization of multiple forms of odorant/ pheromone binding proteins in the antennae of Mamestra brassicae. Insect Biochem Mol Biol 26:59–67 Nei M, Chakrabarty R, Fuerst, PA (1976) Infinite allele model with varying mutation rate. Proc Natl Acad Sci USA 73:4164–4168 Ngai J, Dowling MM, Buck L, Axel R, Chess A (1993) The family of genes encoding odorant receptors in the channel catfish. Cell 72: 657–666 Prestwich GD (1991) Photoaffinity labeling and biochemical characterization of binding proteins for pheromones, juvenile hormones, and peptides. Insect Biochem 21:27–40 Prestwich GD (1993) Bacterial expression and photoaffinity labeling of a pheromone binding protein. Protein Sci 2:420–428 Prestwich GD, Du G, LaForest S (1995) How is pheromone specificity encoded in proteins? Chem Senses 20:461–469 Raming K, Krieger J, Breer H (1989) Molecular cloning of an insect pheremone-binding protein. FEBS Lett 256:215–218 Raming K, Krieger J, Breer H (1990) Primary structure of a pheromone-binding protein from Antheraea pernyi: homologies with other ligand-carrying proteins. J Comp Physiol B 160:503–509 Saiki RK, Gelfand DH, Stoeffel S, Scharf SJ, Higuchi R, Horn GT, Mullis KB, Ehrlich HA (1988) Primer directed enzymatic amplification of DNA with a thermostable DNA polymerase. Science 239: 487–491 Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406–425 Tanaka T, Nei M (1989) Positive Darwinian selection observed at the variable regions genes of immunoglobins. Mol Biol Evol 6:447– 459 Uzzel T, Corbin KW (1971) Fitting discrete probability distribution to evolutionary events. Science 172:1089–1096 Vogt RG (1982) The molecular basis of pheremone reception: Its influence on behavior. In: Pheremone biochemistry, GD Prestwich and GL Bloomquist (eds). Academic Press, Orlando, FL, pp 385– 431 Vogt RG, Riddiford LM (1981) Pheromone binding and inactivation by moth antennae. Nature 293:161–163 Vogt RG, Kohne AC, Dubnau JT, Prestwich GD (1989) Expression of pheromone binding proteins during antennal development in the gypsy moth Lymantria dispar. J Neurosci 9:3332–3346 Vogt RG, Prestwich GD, Lerner ML (1991a) Odorant-binding-protein subfamilies associate with distinct classes of olfactory receptor neurons in insects. J Neurobiol 22:74–84 Vogt RG, Rybczynski R, Lerner ML (1991b) Molecular cloning and sequencing of general odorant-binding proteins GOBP1 and GOBP2 from the tobacco hawkmoth Maduca sexta: comparisons with other insect OBPs and their signal peptides. J Neurosci 11: 2972–2984 von Heijne G (1986) A new method for predicting signal sequence cleavage sites. Nucleic Acids Res 14:4683–4690