Insect mitochondrial genomics 2: the complete mitochondrial genome sequence of a giant stonefly, Pteronarcys princeps, asymmetric directional mutation bias, and conserved plecopteran A+T-region elements James Bruce Stewart and Andrew T. Beckenbach
Abstract: Mitochondrial (mt) genome sequences of insects are receiving renewed attention in molecular phylogentic studies, studies of mt-genome rearrangement, and other unusual molecular phenomena, such as translational frameshifting. At present, the basal neopteran lineages are poorly represented by mt-genome sequences. Complete mtgenome sequences are available in the databases for only the Orthoptera and Blatteria; 9 orders are unrepresented. Here, we present the complete mt-genome sequence of a giant stonefly, Pteronarcys princeps (Plecoptera; Pteronarcyidae). The 16 004 bp genome is typical in its genome content, gene organisation, and nucleotide composition. The genome shows evidence of strand-specific mutational biases, correlated with the time between the initiation of leading and the initiation of lagging strand replication. Comparisons with other insects reveal that this trend is seen in other insect groups, but is not universally consistent among sampled mt-genomes. The A+T region is compared with that of 2 stoneflies in the family Peltoperlidae. Conserved stem-loop structures and sequence blocks are noted between these distantly related families. Key words: mitochondrial genomics, directional mutation pressure, A+T-rich region, Plecoptera, stonefly. Résumé : Les séquences des génomes mitochondriaux (mt) chez les insectes suscitent un intérêt renouvelé pour des études de phylogénie moléculaire, de réarrangement du génome mt et d’autres phénomènes moléculaires inhabituels tels que le décalage traductionnel. En ce moment, les lignages basaux des néoptères sont peu représentés en matière de séquences de génomes mt. Une séquence complète du génome mt est disponible uniquement chez les orthoptères et les blattes et 9 ordres ne sont pas du tout représentés au sein des bases de données. Les auteurs présentent la séquence complète du génome mt du Pteronarcys princeps (Plecoptera; Pteronarcyidae). Le génome de 16 004 pb est typique quant B son contenu, organisation génique et composition nucléotidique. Le génome montre des évidences d’un biais mutationnel spécifique d’un des brins qui est corrélé au temps qui sépare l’initiation de la réplication des brins primaires et secondaires. Une comparaison avec d’autres insectes révèle que cette tendance est présente chez d’autres groupes d’insectes, mais qu’elle n’est pas universelle parmi les génomes mt examinés. La région A+T est comparée B celle de 2 plécoptères de la famille des peltoperlidés. Des structures en épingle et des blocs de séquences conservés sont notés entre ces familles de parenté distante. Mots clés : génomique mitochondriale, pression de mutation directionnelle, région riche en A+T, plécoptères [Traduit par la Rédaction]
Stewart and Beckenbach
Introduction Complete mitochondrial (mt-) genome sequencing of insects and other hexapods has recently received renewed attention from a number of research groups. At present, 42 insect species are represented by complete mt-genome sequences in
public sequence databases. Of the 30 recognized insect orders (Kristensen 1991; Klass et al. 2002), 11 are represented by at least 1 complete mt-genome sequence; an additional 2 orders are represented by near-complete mt-genome sequences. This sampling still leaves 17 insect orders not represented in the mitochondrial sequence databases.
Received 29 July 2005. Accepted 26 February 2006. Published on the NRC Research Press Web site at http://genome.nrc.ca on 22 August 2006. Corresponding Editor: L. Bonen. J.B. Stewart.1 Department of Molecular Biology and Biochemistry, Simon Fraser University, 8888 University Drive, Burnaby, B.C., V5A 1S6. A.T. Beckenbach. Department of Biological Sciences, Simon Fraser University, 8888 University Drive, Burnaby, B.C., V5A 1S6. 1
Corresponding author: (e-mail: [email protected]
Genome 49: 815–824 (2006)
© 2006 NRC Canada
The subsequent use of the resulting “mitogenomic” data sets for phylogenetic reconstruction is only one motivation for obtaining complete mt-genome sequences. Interesting molecular evolutionary phenomena have also been reported in insect mt-genomes. In contrast to the generally stable gene order and arrangement observed in most insect mtgenomes, some insect lineages appear to have sustained multiple major genome rearrangements. Among the Paraneoptera, many different patterns of gene rearrangement have been observed, including the apparent evolution of up to 4 independent rearrangements within the whitefly family (Hemiptera, Sternorrhyncha, Aleyrodidae) (Shao et al. 2001, 2003; Shao and Barker 2003; Thao et al. 2004). A similar phenomenon has been discussed in detail in relation to the frequent rearrangement of tRNA clusters within the Hymenoptera (Dowton and Austin 1999; Dowton et al. 2003). Obtaining sequences in the whole or majority of the mt-genome has also led to the identification of unusual coding of functional genes that might have been interpreted as the amplification of a nuclear pseudogene copy had only single-gene sequences been amplified and sequenced. One such example is the programmed translational framsehifting in the mitochondrial cytb gene of an ant genus (Beckenbach et al. 2005). An important grouping of orders, referred to as the Neoptera “incertae sedis” by Kristensen (1991) or the basal Neoptera, are not represented by complete mt-DNA sequences. At present, the complete mt-genome sequence of the migratory locust Locusta migratoria (Flook et al. 1995) and the oriental mole cricket Gryllotalpa orientalis (Kim et al. 2005) represent the single order Orthoptera. The near-complete mtDNA sequence of a cockroach, Periplaneta fuliginosa (Order Blattodeai) (Yamauchi et al. 2004), increases the representation to 2 of the 11 Neoptera incertae sedis orders. Given the ancient origin and unknown interorder relationships of most of these groups, it is unusual that these orders have been neglected this long. The stoneflies are an ancient group of insects. Their fossil record extends into the Lower Permian, from 285 to 240 million years ago, where the 5–6 distinct families indicate an earlier origin of the group (reviewed by Wootton 1981). Some of the competing evolutionary hypotheses place the stoneflies in a single basal group, diverging early after the split between the Paleoptera and the Neopteran ancestors, and before other Neopteran diversification (Fig. 1). In this work, we present the complete mt-genome sequence of the giant stonefly Pteronarcys princeps (Plecoptera; Pteronarcyidae). This sequence provides the first complete mt-genome from a representative of the ancient order Plecoptera.
Methods Specimen collection, identification, and DNA extraction A giant stonefly (Pteronarcys princeps) was collected at Stoney Creek, in Burnaby, B.C. The specimen was identified using the key developed by Baumann et al. (1977). Two legs were removed from the specimen, and DNA was extracted 2
Genome Vol. 49, 2006
from each leg separately. Proteinase K digestion and extraction with phenol, followed by chloroform : isoamyl alcohol, was used to extract DNA. A detailed description of the protocol is described elsewhere (Stewart and Beckenbach (2005). The DNA pellets were each dissolved in 100 mL of ddH2O. The specimen is archived in the Canadian National Collection. PCR and long-PCR amplification The genome, except for the A+T-rich region, was amplified in overlapping PCR fragments, using Taq DNA polymerase (Qiagen Inc., Mississauga, Ont.). Heterologous primers were designed using aligned insect and hexapod mtgenome sequences. At later stages in the sequencing effort, primers designed from the obtained stonefly sequence were used in PCR and sequencing reactions. Primer pairs are listed in the supplementary data (Table S1)2. PCR amplifications were carried out in an Eppendorf Mastercycler 5333 thermocycler. Taq DNA polymerase was used in 50 mL reactions (containing 2.0 mmol MgCl2/L, 0.2 mmol of each dNTP/L, 400 nmol of each primer/L, and 1.25 U of Taq polymerase in a 1/10 dilution of the supplied reaction buffer, with 1 mL of a 1/10 dilution of the DNA extract). PCR cycling consisted of 2 min of denaturation at 94 °C, 5 30-s cycles of denaturation at 94 °C, annealing for 30 s, and then 120 s of elongation at 72 °C. Thirty additional cycles were carried out as above, with the annealing temperature increased to 55 °C. Optimized primer annealing temperatures varied from 45 to 55 °C, depending on the primer pair used. The A+T-rich region of the genome was amplified using a GeneAmp XL PCR Kit (Applied Biosystems, Foster City, Calif.). Specific high-annealing primers were designed for use with the XL PCR reaction (Table S1)2. DNA template concentrations were 10 times as concentrated as those used in standard PCRs. Reaction-mixture concentrations were as suggested by the supplier, but scaled to 50 mL (0.9 mmol dNTP mix/L, 1.25 mmol magnesium acetate/L, and 0.25 mmol of each primer/L in 1× supplied reaction buffer). Long-PCR cycling was conducted in an Eppendorf Mastercycler 5333 thermocycler, and consisted of incubation for 1 min at 94 °C of the template and the primer master mixes before they were combined, and then denaturation for 90 s at 94 °C after the 2 master mixes were combined. The first 16 cycles consisted of denaturation at 90 °C for 20 s, followed by annealing/elongation at 62 °C for 6 min. The final 23 cycles were similar, but with an additional 15 s per cycle of annealing/elongation time. Purification of PCR products Amplification products were isolated using the QIAquick PCR Purification Kit (Qiagen), as directed by the supplier, but eluted in 40 mL of ddH2O. If optimization of the PCR protocol did not eliminate secondary bands from the PCR reaction, the band of the expected size was separated with agarose-gel electrophoresis. The target band was then cut from the gel and transferred to a 1.5 mL Eppendorf tube. The gel slice was allowed to freeze at –20 °C overnight, and
Supplementary data for this article are available on the journal Web site (http://genome.nrc.ca) or may be purchased from the Depository of Unpublished Data, Document Delivery, CISTI, National Research Council Canada, Building M-55, 1200 Montreal Road, Ottawa, ON K1A 0R6, Canada. DUD 5068. For more information on obtaining material refer to http://cisti-icist.nrc-cnrc.gc.ca/irm/unpub_e.shtml. © 2006 NRC Canada
Stewart and Beckenbach
Fig. 1. Differing phylogenetic views of the placement of Plecoptera within the insects. (A) Summary of relationships presented in Boudreaux (1979). (B) Summary of relationships presented in Hennig (1981).
Table 1. Nucleotide composition of stonefly (Pteronarcys princeps) mitochondrial genome features. Annotated genome feature
No. of nucleotides
Whole genome (J strand) J-strand protein genes 1st codon position 2nd codon position 3rd codon position N-strand protein genes 1st codon position 2nd codon position 3rd codon position tRNA genes – coding tRNA genes – J tRNA genes – N rRNA genes – N A+T-rich region (J strand) Noncoding nucleotides
16004 6896 2298
0.3710 0.3008 0.2881 0.1915 0.4230 0.2717 0.2713 0.1811 0.3622 0.3369 0.3563 0.3094 0.3310 0.4387 0.5000
0.3435 0.3872 0.3042 0.4339 0.4230 0.4603 0.4282 0.4684 0.4844 0.3631 0.3609 0.3665 0.3885 0.3739 0.3500
0.1108 0.1299 0.2293 0.1375 0.0231 0.1734 0.2124 0.1749 0.1332 0.1670 0.1425 0.2063 0.1862 0.0639 0.0500
0.1746 0.1821 0.1784 0.2372 0.1310 0.0946 0.0881 0.1756 0.0201 0.1316 0.1402 0.1179 0.0943 0.1235 0.1000
0.7146 0.6879 0.5923 0.6254 0.8460 0.7320 0.6995 0.6459 0.8466 0.7146 0.7172 0.6759 0.7195 0.8126 0.8500
1413 870 543 2121 1158 60
Note: Bolded values represent the most frequently used nucleotide for the described feature; italicized numbers represent the least frequently used nucleotide.
then subjected to centrifugation at top speed in a desktop centrifuge for 20 min. The remaining agarose was extracted with the QIAquick Gel Extraction Kit (Qiagen). The liquid recovered from the frozen gel slice was purified using the same column as was used for the agarose extraction. DNA sequencing Sequencing was conducted by the University of Calgary DNA Sequencing Center, using an ABI Prism model 377 sequencing machine with the BigDye Terminators, version 3.0 or 3.1, sequencing kits (Applied Biosystems). Sequence assembly, annotation, and analysis Sequence alignments, assembly of PCR fragments, and nucleotide-sequence analysis was conducted using BioEdit,
version 7.0.1 (17 August 2004) (Hall 1999). Dot-plot analyses were conducted to search for repeats and inverted repeats in the A+T-rich region sequence, using DOTTER for Windows (October 1999) (Sonnhammer and Durbin 1995). Alignment was carried out using Clustal X (Thompson et al. 1994). Searches for tRNA genes were conducted using a stand-alone installation of tRNAScan-SE, version 1.23 (April 2002) (Lowe and Eddy 1997). Searches were confined to organellar tRNAs only, using the invertebrate mitochondrial genetic code with a COVE cut-off score of 5. Codon usage tables for A–T and G–C skew calculation and relative synonymous codon usage (RSCU) were calculated using CodonW, version 1.4.2 (Peden 2005). RSCU calculations counted only 3rd-codon-position synonymous changes, and thus assumed 2 independent 4-fold serines and a 4-fold © 2006 NRC Canada
Genome Vol. 49, 2006
Table 2. Comparison of relative synonymous codon usage for the giant stonefly, a silverfish, and a locust. Silverfish†
AAA AAG AAC AAT CAA CAG CAC CAT GAA GAG GAC GAT TAC TAT TGA TGG TGC TGT ATA ATG ATC ATT TTA TTG TTC TTT CTA CTG CTC CTT GTA GTG GTC GTT ACA ACG ACC ACT CCA CCG CCC CCT GCA GCG GCC GCT TCA TCG TCC TCT AGA AGG AGC ACT
1.135 0.865 0.297 1.703 1.684 0.316 0.420 1.580 1.619 0.381 0.338 1.662 0.408 1.592 1.886 0.114 0.191 1.809 1.656 0.344 0.256 1.744 1.772 0.228 0.348 1.652 1.643 0.119 0.310 1.928 1.636 0.363 0.346 1.655 1.425 0.146 0.384 2.046 1.227 0.133 0.640 2.000 0.873 0.127 0.491 2.509 1.550 0.115 0.402 1.933 2.050 0 0.298 1.653
1.556 0.444 0.384 1.616 1.922 0.078 0.515 1.485 1.960 0.040 0.490 1.510 0.707 1.293 2.000 0 0.500 1.500 1.796 0.204 0.340 1.660 1.974 0.026 0.580 1.420 1.694 0.083 0.333 1.889 1.926 0.207 0.415 1.452 1.646 0.025 0.506 1.823 1.382 0.146 0.836 1.636 0.821 0 0.689 2.490 1.714 0.032 0.635 1.619 2.109 0 0.363 1.527
0.737 1.263 0.122 1.878 1.200 0.800 0 2.000 1.118 0.882 0 2.000 0.080 1.920 1.676 0.324 0.067 1.933 1.462 0.538 0.081 1.919 1.592 0.408 0.045 1.955 1.333 0.333 0.167 2.167 1.177 0.612 0.235 1.977 0.853 0.459 0.066 2.623 0.800 0.100 0.100 3.000 0.986 0.406 0.058 2.551 1.301 0.241 0.048 2.410 2.000 0 0.242 1.758
1.506 0.494 0.464 1.536 1.726 0.274 0.775 1.225 1.526 0.474 0.685 1.315 0.517 1.483 1.714 0.286 0.500 1.500 1.553 0.447 0.412 1.588 1.453 0.547 0.510 1.490 2.170 0.395 0.269 1.166 1.558 0.478 0.407 1.558 1.929 0.089 0.804 1.179 2.306 0.139 0.611 0.944 1.696 0.196 0.826 1.283 1.934 0.083 0.529 1.455 2.029 0.412 0.235 1.324
1.704 0.296 0.653 1.347 1.852 0.148 0.848 1.152 1.773 0.227 0.958 1.042 0.709 1.291 1.853 0.147 1.077 0.923 1.727 0.273 0.532 1.468 1.893 0.107 0.834 1.166 2.792 0.226 0.327 0.654 2.433 0.233 0.500 0.833 2.154 0.044 0.923 0.879 2.815 0.074 0.593 0.519 2.281 0.066 1.091 0.562 2.759 0.110 0.800 0.331 2.571 0.071 0.357 1.000
1.043 0.957 0.080 1.920 1.368 0.632 0.429 1.571 1.188 0.813 0.160 1.840 0.294 1.706 1.459 0.541 0.258 1.742 1.184 0.816 0.155 1.845 1.118 0.882 0.150 1.850 0.625 0.813 0.125 2.438 0.566 0.755 0.302 2.377 0.952 0.286 0.286 2.476 0.778 0.333 0.667 2.222 0.571 0.444 0.317 2.667 0.701 0.041 0.124 3.134 1.650 0.650 0.150 1.550
1.414 0.586 0.484 1.516 1.905 0.095 0.704 1.296 1.714 0.286 0.325 1.675 0.378 1.622 1.920 0.080 0.364 1.636 1.687 0.313 0.225 1.775 1.757 0.243 0.515 1.485 2.149 0.119 0.149 1.582 1.749 0.066 0.109 2.077 2.556 0.059 0.234 1.151 2.294 0.088 0.118 1.500 2.107 0.053 0.160 1.680 1.897 0.063 0.111 1.929 2.945 0.073 0.109 0.873
1.829 0.171 0.632 1.368 2.000 0 0.906 1.094 1.960 0.040 0.520 1.480 0.692 1.308 1.972 0.028 1.000 1.000 1.836 0.164 0.311 1.689 1.966 0.034 0.938 1.062 2.706 0.157 0.157 0.980 2.731 0.038 0.154 1.077 3.066 0.072 0.287 0.575 2.969 0.082 0.165 0.784 2.781 0.038 0.190 0.990 3.194 0.062 0.124 0.620 3.491 0 0.073 0.436
0.414 1.586 0.080 1.920 1.571 0.429 0.111 1.889 1.259 0.741 0 2.000 0.137 1.863 1.793 0.207 0.067 1.933 1.270 0.730 0.049 1.951 1.584 0.419 0.125 1.875 0.375 0 0.125 3.500 0.456 0.101 0.051 3.392 0.316 0 0 3.684 0.615 0.103 0 3.282 0.533 0.089 0.089 3.289 0.548 0.065 0.097 3.290 2.400 0.145 0.145 1.309
N Q H E D Y W C M I L F L
© 2006 NRC Canada
Stewart and Beckenbach
Table 2 (concluded). Silverfish†
Stonefly* Amino acid R
Codons CGA CGG CGC CGT GGA GGG GGC GGT
No. of codons
All 2.323 0.258 0.129 1.290 2.050 0.571 0.235 1.143 3729
J 2.769 0.103 0.205 0.923 2.282 0.254 0.366 1.099 2291
N 1.565 0.522 0 1.913 1.708 1.042 0.042 1.208 1438
All 1.931 0.828 0.207 1.034 1.600 0.980 0.196 1.224 3724
Locust‡ J 2.526 0.632 0.316 0.526 1.971 0.870 0.261 0.899 2289
N 0.800 1.200 0 2.000 1.121 1.121 0.112 1.645 1435
All 2.473 0.145 0.073 1.309 2.019 0.192 0.019 1.769 3725
J 3.657 0.114 0.114 0.114 3.079 0.159 0 0.762 2296
N 0.400 0.200 0 3.400 0.390 0.244 0.049 3.317 1429
Note: Values in bold type represent the most commonly used codon for the given amino acid. * Pteronarcys princeps. † Thermobia domestica. ‡ Locusta migratoria.
and a 2-fold leucine codon group. Asymmetric directional mutation bias of the majority coding (J strand) was investigated by determining the nucleotide usage for the neutral degenerate codon positions (the UCN, CUN, CCN, CGN, ACN, AGN, GUN, GCN, and GGN codons). The frequencies of the short genes atp8, nad3, and nad4L were included with atp6, cox3 and nad4, respectively. The A–T, G–C, and A+C–G+T skews were plotted against the average distance of each gene or gene group from the tRNAIle gene, and linear regression was performed on the plots. The negative value of the calculated skews for N-strand-encoded genes was used to represent J-strand mutational bias. Nine additional genomes were also analyzed in this manner (Drosophila melanogaster NC_001709, Antheraea pernyi NC_004622, Apis melifera NC_001566, Crioceris duodecimpunctata NC_003372, Philaenus spumarius NC_005944, Gryllotalpa orientalis NC_006678, Locusta migratoria NC_001712, Thermobia domestica NC_006080, and Tetrodontophora bielanensis NC_002735). Sequence comparisons Primer design, gene annotation, and comparative analyses relied on the alignment of complete mt-genomes from other hexapods (available in GenBank). Mt-genome sequences available before July 2004 were used. The A+T-rich region of 2 stoneflies from the genus Peltoperla (family Peltoperlidae) were obtained from GenBank (accessions Nos. AY142073 and AY142074). The giant stoneflies (family Pteronarcyidea) are considered a sister group to the families Peltoperlidae + Styloperlidae (Thomas et al. 2000; Zwick 2000). The complete mt-genome sequence is available from GenBank (acc. No. AY687866, with associated RefSeq NC_006133).
Results and discussion Genome annotation The complete genome of the giant stonefly P. princeps is a circularly amplifiable molecule of 16004 bp. The same gene content and relative order observed in the Drosophila yakuba mt-genome is present in the stonefly sequence (Clary and Wolstenholme 1985). This sequence organisation is considered the ancestral genome arrangement because of
the conservation of this exact gene order in noninsect hexapods (Nardi et al. 2003) and crustaceans (Crease 1999). Twenty tRNA genes were identified using tRNAScan software. The tRNASer-AGN and tRNAArg genes were identified by visual inspection, alignment with other insect tRNA genes, and the hand-folding of the putative sequences; tertiary structures of the tRNASer-AGN gene were also considered (Steinberg and Cedergren 1994; Steinberg et al. 1994, 1997) (see Fig. S1 in supplementary data2). The lrRNA and srRNA genes were identified on the basis of conserved relative genome position and sequence alignment with the orthologues in other insect mitochondria. This annotation of the ribosomal genes is based on the assumption that there are no noncoding nucleotides between the rRNA genes and their abutting tRNA genes. The boundary between the 5¢ end of srRNA and the major noncoding region (or the A+T-rich region) could be detected by the alignment of a conserved TTNAAGTTNTAARANCG 5¢ motif for hexapod srRNA sequences. All 13 expected protein-coding genes were identified in their conserved relative position. Start codons used were ATG (for cox2, atp6, cox3, nad4, nad4L, and cytb), ATT (atp8, nad3 and nad6), GTG (nad2 and nad5), and TTG (nad1). The cox1 gene has been tentatively annotated to start with a CGA codon at position 1445–1447. This annotation agrees with the alignment proposed by Beard et al. (1993). A putative ATT start is observed at position 1436–1438, requiring the overlapping of the cox1 and tRNATyr genes by 7 nucleotides, and extending the 5¢ end of cox1 by only 3 amino acids; this might serve as the initiation site of cox1. Only 2 protein-coding genes encode complete stop codons (TAA for nad4 and TAG for nad1). The cox1, cox2, and nad5 genes do not encode their stop codons in their DNA sequence, but evidently make use of single in-frame T nucleotides that directly abut the neighbouring tRNA genes. It is assumed that these stop codons are completed posttranscriptionally by the polyadenylation of mature mRNA (Ojala et al. 1980, 1981). The remaining 6 protein-coding genes encode a putative stop codon that overlaps, by 1 or 2 nucleotides, with the 5¢ end of their downstream abutting gene. These genes could be interpreted to encode only the in-frame T or TA nucleotides that are polyadenylated as mRNAs to construct their functional stop codons. © 2006 NRC Canada
Genome Vol. 49, 2006
Fig. 2. Illustration of the directional mutational pressure observed in the mitochondrial genome of the giant stonefly. (A) Summary of predicted nucleotide biases due to deamination. Genes closest to (clockwise from) ON will remain single stranded for less time, resulting in lower mutational bias toward T and C nucleotides at N-strand silent sites. (B) Nucleotide skew values calculated for Pteronarcys princeps 3rd codon positions of 4-fold degenerate codons plotted against the genomic position of the gene or gene cluster’s midpoint.
© 2006 NRC Canada
Stewart and Beckenbach
Nucleotide composition Analyses of nucleotide composition for various genome features are summarized in Table 1. The RSCU values were calculated for the protein-coding genes of the giant stonefly (Table 2). The results for the 2-codon tRNA families show a distinct bias for the use of the A or T nucleotide in the 3rd codon position, regardless of the identity of the anti-codon encoded by the tRNA. If the majority-coding strand alone (the J strand) is inspected, the 3rd-codon-position sites for 4codon tRNA families show a preponderance of A nucleotides (and T nucleotides for the minority-coding or N-strand proteins). This bias for A on the J strand is common for most insect mt-genomes investigated (see silverfish and locust comparisons in Table 2). Asymmetric directional mutation bias has been carefully explored for vertebrate mt-DNA (Reyes et al. 1998; Saccone et al. 1999; Faith and Pollock 2003). In mammals, it has been hypothesized that the slow initiation of lagging-strand replication leaves the nucleotides exposed, increasing the rate of deamination of C to U, resulting in a bias toward the pyrimidine T on the lagging-replication strand (Francino and Ochman 1997). Also, A is a target for deamination to hypoxanthine, which is complemented on the leading strand by C. The majority of replicating mt-DNA molecules in Drosophila have been observed, using electron microscopy coupled with restriction mapping, to undergo extreme asymmetric replication. Up to 99% of the leading strand (the J strand of the genome) is replicated before replication is initiated for the lagging strand (the N strand), presumably within the AT-rich region prior to the tRNAIle gene (Goddard and Wolstenholme 1978, 1980). It is expected that such a replication mechanism would leave obvious patterns of nucleotide substitution, in which genes near the origin of replication that are exposed the longest accumulate more nucleotide substitutions and exhibit a more extreme nucleotide bias (Reyes et al. 2005). We predict that fewer T and G nucleotides will persist in silent sites, such as the 3rd-codon positions of 4-fold degenerate codons near the site of Jstrand initiation (such as the cytb gene), relative to genes near the site of N strand initiation (such as nad2), because of the loss of A and T nucleotides on the N strand during replication (Fig. 2A). To test whether this asymmetric directional mutation signature is present in the giant stonefly, the nucleotide frequencies of the 3rd-codon positions of 4-fold degenerate codons were compared for each gene, with the frequencies of atp8, nad3, and nad4L being pooled with the atp6, cox3, and nad4 genes, respectively. The skew values for each gene were then plotted against the midpoint of their coding region (Fig. 2B). The A–T skew (the difference between the number of A and T nucleotides divided by the sum) and G–C skew were calculated for the protein-coding genes of the giant stonefly mt-genome (Perna and Kocher 1995). The A–T skew increases as the estimated time that the particular region of DNA is expected to persist in the single-stranded state increases (R2 = 0.6377). This result is consistent with observations in mammals (Reyes et al. 1998). The AC–GT skew (difference in A+C and G+T nucleotides divided by the sum) also reflects this trend (R2 = 0.7133). The G–C skew also shows the expected trend of decreased G usage, but with very poor support by linear regression (R2 =
821 Fig. 3. Predicted secondary-structure folds for potential stemloop regions identified within the A+T-rich region of the giant stonefly. Secondary structure folds and DG values were calculated by DNA mfold (Zuker 2003). Numbers represent the genome positions of the noted nucleotides. Watson–Crick basepairs represented by a dash (–); G–T pairing represented by “+”. (A) Stem-loop from positions 15 124–15 175, shown on the N strand. (B) Repeat from position 15 252 – 15 291.
0.1301). In insect mitochondria, the number of 3rd-codon G and C nucleotides is very low (typically 30%), providing very few G- or C-containing sites and limiting the power of the analysis. The analysis was repeated for 8 additional representative insect mt-genome sequences and 1 collembolan mt-genome sequence. Mt-genomes with protein–gene rearrangements relative to the D. yakuba mt-genome were not included in the analyses. Generally, the trend of the increasing A–T skew from the nad2 gene toward the cytb gene was observed for all but 1 of the sequences examined, but often with low R2 support (Table S2).2 The opposite trend was observed for the Apis mellifera sequence, but this sequence is known to have an unusual bias toward A and T nucleotides that could obscure directional-mutation effects. The G–C skew trendline was found to move in the opposite direction of the expected trend for the Thermobia and Antheraea mt-DNA sequences, but this might be the result of the low number of 3rd-codon G or C nucleotides encoded in these genomes. Stonefly A+T-rich regions The 1158-bp noncoding region was observed in the conserved location between the srRNA and the tRNAIle genes and was composed of 81.26% A and T nucleotides. The region was searched for open reading frames (ORFs) of 50 or more amino acids (the approximate size of the atp8 gene product), with the 7 potential start codons suggested for insect mt-genomes (ATN and NTG codons). Four ORFs that met these criteria were found: a 64 aa ORF (positions 15 928 to 15 737), a 56 aa ORF (positions 14 898 to 15 065), a © 2006 NRC Canada
Genome Vol. 49, 2006
Fig. 4. Conserved sequence blocks (CSBs) from the A+T-rich regions of Pteronarcys princeps and 2 Peltoperla species. Dots represent a nucleotide identical to the P. princeps (Pp) sequence (top); dashes represent introduced gaps for the alignment. Numbers represent the genomic position of the right-hand most nucleotide. Peltoperla tarteri (Pt) and Peltoperla arcuata (Pa) are shown as the reversecomplement of the GenBank flat files (AY142074 and AY142073). CSBs are highlighted by the consensus sequence beneath the aligned regions. (A) Stem-loop-containing region. Shaded sequences represent areas containing putative stem-loop structures. (B) Region containing CSB2. (C) Region containing CSB3 and CSB4.
54 aa ORF (positions 15 157 to 14 996), and a 50 aa ORF (positions 15 362 to 15 511). Blastp searches of these ORFs gave a small number of hits (1, 2, or 3) with high E values (2.3 to 8.6) to fragments of annotated nuclear and bacterial genes. Based on this analysis, they were not considered to be functional ORFs. The small size and lack of similarity to any known proteins indicated that these are nonfunctional ORFs. The region was searched against itself, using DOTTER (Sonnhammer and Durbin 1995) to identify repeat and inverted repeat regions within the A+T-rich region. Two regions were identified; stem-loop 1 (with positions 15 126 – 15 178) and stem-loop 2 (positions 15 252 – 15 290) showed sequence similarity to each other, and the potential ability to fold into stem-loop structures. The identified regions were folded using DNA mfold (Fig. 3) (Zuker 2003). A comparison of 2 stonefly A+T-rich regions from genus Peltoperla has been published previously (Schultheis et al. 2002); it revealed conserved repeat structures and secondary
structures within that genus. Interestingly, the sequence encompassing one of the putative stem-loops for the giant stonefly aligns, with Clustal X, to the regions coding “inverted repeat 1” for both Peltoperla species (Fig. 4). The other putative Pteronarcys princeps stem-loop aligned between the 2 remaining inverted repeats for Peltoperla tarteri. We compared the A+T region of the giant stonefly with that from Peltoperla arcuata and P. tarteri. Dot-plot analysis revealed 4 conserved sequence blocks (CBSs) shared by the A+T-rich regions of the 2 different genera. In addition, the dot plots revealed a shared low-sequence complexity region, spanning from 168 nt (in P. princeps) to 188 nt (in P. tarteri), where the A+T compositions are elevated to between 94.15% and 96.43%. Generally, the A+T regions of these 2 Peltoperla species are not immediately alignable with that of the giant stonefly. Clustal X alignments show very low levels of sequence identity when comparing the giant stonefly to P. arcuata (61.44%) or P. tarteri (60.44%), despite the © 2006 NRC Canada
Stewart and Beckenbach
very high A+T richness for all 3 species (81.10% to 82.56% A and T nucleotides). The 4 CSBs identified in the dot-plot analyses were compared in the sequence alignment. CSB1 is a 32 to 36 nt sequence (positions 15 381 – 15 408) that colocalizes with the putative stem-loop structures predicted to occur at the same aligned position in all 3 species. CSB2 is 28 nt in length and is identical in sequence to that in P. tarteri, and codes only 2 A–G nucleotide transitions relative to the P. arcuata sequence. CSB3 occurs immediately after the low-complexity regions, and encodes a single nucleotide insertion in the 2 Peltoperla species over the region’s 31 nt. CSB4 occurs closest to the tRNAIle gene, and encodes only 8 transitions and 1 insertion over the 55 nt length. Although the conservation of these sequences and structures is intriguing, it is important to remember that the mechanism of replication initiation in insect mt-DNA has still not been characterized. As such, it is not possible to assign functionality to these conserved segments, nor is it possible to determine whether the observed secondary structures have a function homologous to the D-loop structures observed in the replication of other animal mt-genomes (Jacobs et al. 1989).
Conclusions The complete mt-genome of the first plecopteran representative is presented here. The size, nucleotide composition, and genome arrangement are typical of a “standard” insect mt-genome, such as the D. yakuba mt-genome (Clary and Wolstenholme 1985). The coding of the rRNA, tRNA, and protein-coding genes is also canonical. Nucleotide biases are observed to be strand specific and positional in nature. These observations support the idea that the replication of the mt-genome in the giant stonefly follows the same highly asymmetrical and asynchronous pattern as replication in Drosophila species (Goddard and Wolstenholme 1978, 1980). The asymmetric directional mutation bias supports the notion that the N strand is replicated first, from the A+T region toward the ribosomal gene cluster, as it is in Drosophila. Comparisons of the A+T-rich region from 2 stoneflies of another family (Peltoplerlidae) reveal the conservation of some putative secondary structure elements, as well as regions of very high sequence conservation. These results are surprising, given the highly variable nature of the insect A+T-rich region. Sampling of a broader selection of stonefly families might reveal deeper conservation of these elements, and might lead to investigations of their involvement in insect replication and potentially transcription initiation.
Acknowledgements This research was funded by an NSERC Discovery Grant to A.T.B.
References Baumann, R.W., Arden, R.G. and Surdick, R.F. 1977. The Stoneflies (Plecoptera) of the Rocky Mountains. Memoirs of the American Entomological Society, 31: 1–228. Beard, C.D., Hamm, D.M., and Collins, F.H. 1993. The mitochondrial genome of the mosquito Anopheles gambiae: DNA sequence, genome organisation, and comparisons with the mitochondrial squences of other insects. Insect. Molec. Biol. 2: 103–124. Beckenbach, A.T., Robson, S.K., and Crozier, R.H. 2005. Single nucleotide +1 frameshifts in an apparently functional mitochondrial cytochrome b gene in ants of the genus Polyrhachis. J. Mol. Evol. 60: 141–152. Boudreaux, H.B. 1979. Arthropod phylogeny with special reference to insects. Wiley-Interscience, New York. Clary, D.O., and Wolstenholme, D.R. 1985. The mitochondrial DNA molecule of Drosophila yakuba: nucleotide sequence, gene organization, and genetic code. J. Mol. Evol. 22: 252–271. Crease, T.J. 1999. The complete sequence of the mitochondrial genome of Daphnia pulex (Cladocera: Crustacea). Gene, 233: 89–99. Dowton, M., and Austin, A.D. 1999. Evolutionary dynamics of a mitochondrial rearrangement “hot spot” in the Hymenoptera. Mol. Biol. Evol. 16: 298–309. Dowton, M., Castro, L.R., Campbell, S.L., Bargon, S.D., and Austin, A.D. 2003. Frequent mitochondrial gene rearrangements at the hymenopteran nad3-nad5 junction. J. Mol. Evol. 56: 517–526. Faith, J.J., and Pollock, D.D. 2003. Likelihood analysis of asymmetrical mutation bias gradients in vertebrate mitochondrial genomes. Genetics, 165: 735–745. Flook, P.K., Rowell, C.H., and Gellissen, G. 1995. The sequence, organization, and evolution of the Locusta migratoria mitochondrial genome. J. Mol. Evol. 41: 928–941. Francino, M.P., and Ochman, H. 1997. Strand asymmetries in DNA evolution. Trends Genet. 13: 240–245. Goddard, J.M., and Wolstenholme, D.R. 1978. Origin and direction of replication in mitochondrial DNA molecules from Drosophila melanogaster. Proc. Natl. Acad. Sci. U.S.A. 75: 3886–3890. Goddard, J.M., and Wolstenholme, D.R. 1980. Origin and direction of replication in mitochondrial DNA molecules from the genus Drosophila. Nucleic Acids Res. 8: 741–757. Hall, T.A. 1999. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucl. Acids Symp. Ser. 41: 95–98. Hennig, W. 1981. Insect phylogeny. Wiley-Interscience, New York. Jacobs, H.T., Herbert, E.R. and Rankine, J. 1989. Sea urchin egg mitochondrial DNA contains a short displacement loop (D-loop) in the replication origin region. Nucleic Acids Res. 17: 8949– 8965. Kim, I., Young-Cha, S., Hee-Yoon, M., Sam-Hwang, J., Mong-Lee, S., Dae-Sohn, H., and Rae- Jin, B. 2005. The complete nucleotide sequence and gene organization of the mitochondrial genome of the oriental mole cricket, Gryllotalpa orientalis (Orthoptera: Gryllotalpidae). Gene, 353: 155–168. Klass, K.D., Zompro, O., Kristensen, N.P., and Adis, J. 2002. Mantophasmatodea: a new insect order with extant members in the Afrotropics. Science (Washington, D.C.), 296: 1456–1459. Kristensen, N.P. 1991. Phylogeny of Extant Hexapods. In The Insects of Australia, a textbook for Students and Research Workers (Commonwealth Scientific and Industrial Research Organization, ed.). Melbourne University Press, Victoria, Australia. pp. 125–142. © 2006 NRC Canada
824 Lowe, T.D., and Eddy, S.R. 1997. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25: 955–964. Nardi, F., Spinsanti, G., Boore, J.L., Carapelli, A., Dallai, R., and Frati, F. 2003. Hexapod origins: monophyletic or paraphyletic? Science (Washington, D.C.), 299: 1887–1889. Ojala, D., Merkel, C., Gelfand, R., and Attardi, G. 1980. The tRNA genes punctuate the reading of genetic information in human mitochondrial DNA. Cell, 22: 393–403. Ojala, D., Montoya, J., and Attardi, G. 1981. tRNA punctuation model of RNA processing in human mitochondria. Nature (London), 290: 470–474. Peden, J.F. 2005. CodonW codon usage analysis package [online]. Available from http://codonw.sourceforge.net [cited 15 April 2005]. Perna, N.T., and Kocher, T.D. 1995. Patterns of nucleotide composition at fourfold degenerate sites of animal mitochondrial genomes. J. Mol. Evol. 41: 353–358. Reyes, A., Gissi, C., Pesole, G., and Saccone, C. 1998. Asymmetrical directional mutation pressure in the mitochondrial genome of mammals. Mol. Biol. Evol. 15: 957–966. Reyes, A., Yang, M.Y., Bowmaker, M., and Holt, I.H. 2005. Bidirectional replication initiates at sites throughout the mitochondrial genome of birds. J. Biol. Chem. 280: 3242–3250. Saccone, C., De Giorgi, C., Gissi, C., Pesole, G., and Reyes, A. 1999. Evolutionary genomics in Metazoa: the mitochondrial DNA as a model system. Gene, 238: 195–209. Schultheis, A.S., Weigt, L.A., and Hendricks, A.C. 2002. Arrangement and structural conservation of the mitochondrial control region of two species of Plecoptera: utility of tandem repeatcontaining regions in studies of population genetics and evolutionary history. Insect Mol. Biol. 11: 605–610. Shao, R., and Barker, S.C. 2003. The highly rearranged mitochondrial genome of the Plague Thrips, Thrips imaginis (Insecta: Thysanoptera): convergence of two novel gene boundaries and an extraordinary arrangement of rRNA genes. Mol. Biol. Evol. 20: 362–370. Shao, R., Campbell, N.J.H., and Barker, S.C. 2001. Numerous gene rearrangements in the mitochondrial genome of the Wallaby Louse, Heterodoxus macropus (Phthriptera). Mol. Biol. Evol. 18: 858–865. Shao, R., Dowton, M., Murrell, A., and Barker, S.C. 2003. Rates of gene rearrangement and nucleotide substitution are correlated in
Genome Vol. 49, 2006 the mitochondrial genomes of insects. Mol. Biol. Evol. 20: 1612–1619. Sonnhammer, E.L., and Durbin, R. 1995. A dot-matrix program with dynamic threshold control suited for genomic DNA and protein sequence analysis. Gene, 167: GC1–GC10. Steinberg, S., and Cedergren, R. 1994. Structural compensation in atypical mitochondrial tRNAs. Struct. Biol. 1: 507–510. Steinberg, S., Gautheret, D., and Cedergren, R. 1994. Fitting structurally diverse animal mitochondrial tRNASer to common threedimentional constraints. J. Mol. Evol. 236: 982–989. Steinberg, S., Leclerc, F., and Cedergren, R. 1997. Structural rules and conformational compensations in the tRNA L-form. J. Mol. Biol. 266: 269–282. Stewart, J.B., and Beckenbach, A.T. 2005. Insect mitochondrial genomics: the complete mitochondrial genome sequence of the meadow spittlebug Philaenus spumarius (Hemiptera: Auchenorrhyncha: Cercopoidae). Genome, 48: 46–54. Thao, M.L., L. Baumann, and P. Baumann. 2004. Organization of the mitochondrial genomes of whiteflies, aphids, and psyllids (Hemiptera, Sternorrhyncha). BMC Evol. Biol. 4: 25. Thomas, M.A., Walsh, K.A., Wolf, M.R., McPheron, B.A., and Marden, J.H. 2000. Molecular phylogenetic analysis of evolutionary trends in stonefly wing structure and locomotor behavior. Proc. Natl. Acad. Sci. U.S.A. 97: 13178–13183. Thompson, J.D., Higgins, D.G., and Gibson, T.J. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice. Nucleic Acids Res. 11: 4673–4680. Wootton, R.J. 1981. Palaeozoic insects. Ann. Rev. Entomol. 36: 319–344. Yamauchi, M.M., Miya, M.U., and Nishida., M. 2004. Use of a PCR-based approach for sequencing whole mitochondrial genomes of insects: two examples (cockroach and dragonfly) based on the method developed for decapod crustaceans. Insect Mol. Biol. 13: 435–442. Zuker, M. 2003. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 31: 3406–3415. Zwick, P. 2000. Phylogenetic system and zoogeography of the Plecoptera. Annu. Rev. Entomol. 45: 709–746.
© 2006 NRC Canada