What drives parallel evolution? - Wiley Online Library

2 downloads 0 Views 722KB Size Report
cancer tumor evolution (e.g. [25]) or the progress of chronic infectious disease (e.g. ..... evolution between echolocating bats and dolphins. Curr Biol 20: R53–4.
Prospects & Overviews Review essays

What drives parallel evolution? How population size and mutational variation contribute to repeated evolution Susan F. Bailey1), FranSc ois Blanquart2), Thomas Bataillon1) and Rees Kassen3)

Parallel evolution is the repeated evolution of the same phenotype or genotype in evolutionarily independent populations. Here, we use evolve-and-resequence experiments with bacteria and yeast to dissect the drivers of parallel evolution at the gene level. A meta-analysis shows that parallel evolution is often rare, but there is a positive relationship between population size and the probability of parallelism. We present a modeling approach to estimate the contributions of mutational and selective heterogeneity across a genome to parallel evolution. We show that, for two experiments, mutation contributes between 10 and 45%, respectively, of the variation associated with selection. Parallel evolution cannot, therefore, be interpreted as a phenomenon driven by selection alone; it must also incorporate information on heterogeneity in mutation rates along the genome. More broadly, the work discussed here helps lay the groundwork for a more sophisticated, empirically grounded theory of parallel evolution.

.

Keywords: bacteria; evolve and resequence experiment; experimental evolution; mutation; parallel evolution; selection yeast

supporting information may be found in the : Additional online version of this article at the publisher’s web-site. DOI 10.1002/bies.201600176 1) 2)

3)

Bioinformatics Research Centre, University of Aarhus, Aarhus, Denmark Department of Infectious Disease Epidemiology, Imperial College London, London, UK Department of Biology, University of Ottawa, Ottawa, Canada

*Corresponding author: Rees Kassen E-mail: [email protected] Abbreviation: ER experiment, evolve and resequence experiment.

Introduction Parallel evolution, the repeated evolution of the same phenotype or genotype in different populations, seems so improbable that it demands explanation. There ought to be so many different genetic and phenotypic routes to adaptation, and so many distinct ways a lineage might be derailed from taking one route in particular, that parallel evolution should rarely happen. When it does, evolutionary biologists take special notice and often interpret a parallel evolution event as the result of strong selection. This inference rests on the assumption that parallel evolution requires strong selection and cannot plausibly be explained by stochastic processes like mutation and genetic drift (i.e. – chance; see [1] for further considerations). A compelling case is often made that strong selection causes parallel evolution. The repeated reduction in body armor accompanying the transition from marine to fresh water in threespine sticklebacks is one often cited example [2], and there are many others (e.g. [3–5]). Yet, the evidence for a role of strong selection in parallel evolution at the genetic level is often more equivocal, for the simple reason that there are often many different genetic routes that can produce similar phenotypic outcomes. For example, clinical isolates of the opportunistic pathogen Pseudomonas aeruginosa evolve fluoroquinolone resistance via chromosomal mutations in just a few genes like the DNA gyrases gyrA and gyrB, the topoisomerases parC and parE, and efflux pump regulators like nfxB [6]. Yet many more genes – over 100 by one estimate based on a transposon-mutagenesis screen – can confer resistance to this same drug [7]. This small subset of genes may be preferentially favored by selection because they confer higher and less costly resistance than others [8, 9], but they may equally be more prevalent because they have a higher mutation rate. Thus, selection need not be the only mechanism causing parallel evolution. Indeed recent theory, discussed in more detail below, suggests that under some scenarios selection may even be unimportant for parallel evolution. Deciding which mechanism is the main driver of parallel evolution requires additional data that are not always easy to come by. Reliable estimates of the frequency of parallel evolution in

Bioessays 38: 0000–0000, ß 2016 The Authors. BioEssays Published by WILEY Periodicals, Inc. This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.

www.bioessays-journal.com

1

S. F. Bailey et al.

Prospects & Overviews

....

Review essays

Table 1. Species, strains, and the number of different experimental treatments from each study from which data were used in the meta-analysis Study Bacteria [41] [42] [43] [44] [45] [11] [39] [46] [47] Fungi [48] [49] [50] [37] [51] [52] [53]

Species

Strain

No. experimental treatments

Escherichia coli Escherichia coli Escherichia coli Escherichia coli Escherichia coli Pseudomonas fluorescens Pseudomonas aeruginosa Streptococcus pneumoniae Yersinia pestis

REL1206 K12MG1655 K12MG1655 K12 deletions MC100 SBW25 PA14 Serotype 22F Yp1945, Yp2126

1 1 1 22 3 5 4 1 2

Saccharomyces Saccharomyces Saccharomyces Saccharomyces Saccharomyces Saccharomyces Saccharomyces

288c 288c 288c DBY15084 BY4741 BY4741 BY4741 deletions

1 2 1 2 1 1 14

cerevisiae cerevisiae cerevisiae cerevisiae cerevisiae cerevisiae cerevisiae

For the full dataset, see Supplementary Table S1.

natural systems are challenging to get since we cannot usually know how often parallel evolution did not happen. Additionally, strong hypothesis tests about the factors driving variation in degree of parallel evolution require extensive ecological and genetic experimentation, hence limiting the number of convincing examples. The best we can say to date is that parallel evolution is more likely to occur among closely related populations sharing similar environments [10, 11], probably because closely related lineages experience similar selection pressures [12]. Unpacking the mechanisms driving parallel evolution therefore, requires a different approach. Ideally we would track the evolution of variation among independently evolving populations, and calculate the frequency with which parallel evolution evolves directly from data. Socalled “evolve-and-resequence” (ER) experiments, where replicate populations adapt to identical, defined conditions and the mutations fixed are identified through whole genome-sequencing [13, 14], make this calculation possible (see for example [15, 16]). Moreover, with an appropriate experimental design, these experiments can be used to test hypotheses about the drivers of parallel evolution. This review presents a framework for interpreting parallel evolution from ER experiments. We collate data documenting parallel evolution at the gene level to estimate the frequency of parallel evolution across a range of study systems and, wherever possible, to test theory about the factors driving parallel evolution. We then sketch a strategy for disentangling the effects of bias in the generation of genetic variation and selection on parallel evolution in ER experiments. We focus on mutational heterogeneity, both because this is a fundamental parameter for understanding the outcome of adaptive evolution, and because the vast majority of ER experiments to date have studied mutation-driven adaptation. We use, as

2

illustrative examples, two of the data sets included in our collated ER experiment data.

A meta-analysis reveals population size is a key driver of parallel evolution Quantitative statements about the probability of parallel evolution can be made using models that treat adaptation as a sequence of moves or steps via mutation in either phenotype (e.g. [17–20]) or DNA sequence (e.g. [21–24]) that increase fitness relative to the prevailing wild type. A series of these steps forms an adaptive walk – the sequential substitution of beneficial mutations by selection – that stops once the prevailing genotype reaches a fitness optimum and is incapable of generating further beneficial mutants. The most important insight from these models is that any factor that introduces bias in the number of beneficial mutations available to selection increases the probability of parallel evolution [1, 20, 24]. Variation in the mutation rate across the genome is one factor, as is the level of adaptation (i.e. distance, in fitness space, of the current population from a fitness optimum) and total population size. The probability of parallel evolution in subsequent adaptive steps should increase because, as a population approaches a fitness optimum, fewer beneficial mutations are available. The probability of parallel evolution should also increase with population size, because a larger population gives rise to a larger supply of beneficial mutations, making it more likely for a rare, large-effect, beneficial mutation to occur, and be fixed by selection. Clonal interference, the competition that occurs between beneficial mutations occurring simultaneously on different genetic backgrounds within large populations, can further increase the likelihood that the same, large effect mutations will fix in independent populations.

Bioessays 38: 0000–0000, ß 2016 The Authors. BioEssays Published by WILEY Periodicals, Inc.

....

Prospects & Overviews

Figure 1. A: The distribution of levels parallelism, as estimated by mean Jaccard index, across all 62 independent estimates from 16 studies (see Table 1 for a summary). B: Mean Jaccard index estimated for yeast (n ¼ 22) and bacteria (n ¼ 40) replicate populations. Bold lines show the means, boxes indicate the lower and upper quartiles (25 and 75%), the whiskers extend to 1.5 times the interquartile range, and the points show outliers.

Review essays

A meta-analysis of data sets on parallel evolution in ER experiments in bacteria and yeast (summarized in Table 1) can test some predictions of this theory. The data come from published studies (to May 2016) where at least two replicate populations evolved through asexual reproduction from a common ancestor under identical conditions and for which whole genome sequence data were publically available. We quantify parallel evolution by recording the number of times that a given gene was mutated in independent evolving lines derived from a common ancestor, and calculate the proportion of mutations shared between two evolved populations. Specifically, we calculate the Jaccard Index (J) for two evolved populations or strains with sets of mutated genes G1 and G2, j as J G1;G2 ¼ jjG1\G2 G1[G2j or, in words, the number of mutated genes shared by both lineages divided by the total number of genes mutated in both lineages. The Jaccard Index ranges from 0 (no mutated genes in the two populations are shared, G1 \ G2 ¼ Ø) to 1 (the two populations have exactly the same set of mutated genes, G1 ¼ G2). Supplementary Table S1 reports the average J among all pairwise combinations of replicates from each selection environment within an experiment and its associated standard error as well as other relevant data such as population size, number of generations, number of replicates, and genome size. We focus on genes rather than higher order pathways or processes, because they can be unambiguously identified, allowing comparisons across studies. The distribution of J values (Fig. 1A) varies substantially among studies (Fig. S1). While most J values are quite small, being either zero or close to zero, there are also many non-zero values, including a handful greater than 0.5. We are able to

S. F. Bailey et al.

Figure 2. Degree of parallelism, estimated by mean Jaccard index, versus log-transformed mean population size. Each open circle shows an estimate of Jaccard index, where the size of the circle is scaled by the log of the weighting used for that estimate (1/SE). The solid lines show linear regression fits for bacteria (green) and fungi (blue) separately, and the dashed line indicates the linear regression fit for the combined data.

identify a few variables that account for some of the extensive variation in this data set. Some variation is attributable to a significantly lower probability of parallel evolution in yeast compared to bacteria (Fig. 1B; main effect of kingdom: mean Jbacteria ¼ 0.24, mean Jyeast ¼ 0.08; p ¼ 0.031), although the reason for this is unclear. While the yeast genome is at least twice the size of the largest bacterial genome in our collated data set (see supplementary Table S1), the number of genes – on which calculation of J is based – is roughly comparable. In fact, it is two bacteria, Escherichia coli and Pseudomonas fluorescens, that have the smallest and largest number of genes, respectively (E. coli 4,500 genes; P. fluorescens 6,700 genes), suggesting that the difference in parallel evolution between yeast and bacteria is not due to a higher fraction of mutations in non-coding regions in yeast. There is also a positive relationship between the mean population size (N; calculated as the harmonic mean of the initial and final population sizes at and just before transfer, respectively) and the extent of parallel evolution, even after accounting for the difference between yeast and bacteria (Fig. 2; main effect of N: p ¼ 0.0004). Furthermore, the strength of this relationship varies significantly between the two kingdoms (Fig. 2; effect of kingdom x N: p ¼ 0.0015; all p values estimated from 10,000 permutations of a multivariate ANOVA weighted by SE).1 This result is consistent with large population sizes generating high mutation supply and clonal interference, both of which bias the spectrum of mutations fixed during adaptation toward rare, strongly beneficial mutations. Again, why this relationship differs between bacteria and yeast remains unclear. Nonetheless, population 1 Note that the pairwise comparisons involved in calculating J precludes the use of conventional parametric statistics. Details on calculations are provided as Supplementary Information.

Bioessays 38: 0000–0000, ß 2016 The Authors. BioEssays Published by WILEY Periodicals, Inc.

3

Review essays

S. F. Bailey et al.

Prospects & Overviews

size is clearly an important factor governing the probability of parallel evolution in both groups. Interestingly, other factors, such as the amount of adaptation, number of replicates, or generation time, did not explain statistically significant variation in J. However, parallel evolution was more often observed in experiments with more replicates and more generations (number of replicates: p ¼ 0.022; number of generations: p ¼ 0.013; weighted logistic regression, permutation tested with n ¼ 10,000), underlining the importance of large sample sizes when trying to understand outcomes of stochastic processes. Model fits to these data suggest that to have a 99% chance of observing parallel evolution (i.e. at least one gene hit more than once in at least two replicate populations) one should, on average, evolve at least 15 replicate populations for 540 generations.

The relative contribution of mutation and selection to parallel evolution We develop a theoretical framework to make predictions about the relative importance of mutational heterogeneity versus selection in driving patterns of parallel evolution in a set of replicated populations evolving under similar conditions. While our focus is on data derived from ER experiments, the principles apply equally to data from genomic sequencing of population samples available for cancer tumor evolution (e.g. [25]) or the progress of chronic infectious disease (e.g. [26]). The data are genomic sequences derived from independently evolved populations. We count the number of independent mutations affecting each of nL loci across the genome. Here “locus” is used deliberately in a broad sense to mean a set of non-overlapping gene coding regions along the genome that could be of fixed size (bins), discrete genes, or other types of genomic regions. There are two sources of variability in the number of mutations affecting loci: the propensity of any given locus to undergo mutation, and the probability that a mutation arising at a locus will fix in the population. Both events – the occurrence of new mutations and their evolutionary fate – are affected by chance. More formally, we make predictions regarding the probability, pi, that a mutation at locus i among the nL loci will result in an observed fixation event. The key intuition is that if all loci have the same mutation rates, and if all mutations have the same probability of being fixed by selection, then all probabilities pi are equal and sum to 1, such that by symmetry pi ¼ 1/nL. In this case nL, the number of loci, is the sole determinant of patterns of parallel evolution: for any pair of independently evolving populations, the bigger nL, the lower the chance that the same unit will be involved in the next step of evolution in both. This might seem trivial, but this “scaling” effect of the number of loci applies irrespective of other sources of variability among loci ([20], equation 8b). This means, additionally, that comparisons across taxa must account for variation in nL to be valid. Now consider sources of variability in the number of mutations across loci. The probability that locus i contributes

4

....

a mutation that fixes depends on two quantities: a locusspecific mutation rate, mi, and the probability the mutation that just arose at that locus ultimately goes to fixation, Pi. To start, consider two extreme situations: either beneficial mutations are sufficiently rare events that they never compete with each other before being fixed, or they are so common that all possible beneficial mutations are available to selection.

Extreme case 1: Beneficial mutations fix sequentially and never meet and compete The idealized situation where mutations are rare (Nm > 1), sometimes termed the strong selection weak mutation (SSWM) limit, has been studied extensively [23, 27–29]. Under SSWM conditions, populations remain essentially monomorphic while they “wait” for the next beneficial mutation that will escape drift to appear. Once this happens, a beneficial mutation sweeps instantaneously through the population. The next mutation to fix is drawn at random among the pool of beneficial mutations, the probability of fixation for each mutation being weighted by its selective coefficient si [30]. The probability that a specific locus i yields an observed fixation event scales as the product pi / mi Pi / mi si; heterogeneity in m and s therefore, can play equal roles in determining pi. Importantly, variation from locus to locus in either quantity can drive patterns of parallel evolution, a point made independently by [20] and [31].

Extreme case 2: Maximal clonal interference As the mutation supply rate increases, new beneficial mutations will begin to arise in a population before a first beneficial mutation goes to fixation. Independently arising beneficial mutations therefore, compete with each other, a process often termed “clonal interference” in nonrecombining lineages [32–36]. Modeling clonal interference accurately can be complicated, but is easier if we begin by considering the extreme situation where the mutation supply rate is so high that mutations stemming from virtually all loci enter the population before any one mutation goes to fixation, a situation we term “maximal clonal interference.” Now, the mutation that ultimately fixes is the one that has the highest selective advantage and heterogeneity in mutation rates becomes irrelevant. Heterogeneity in selection coefficients among loci is the sole determinant of patterns of parallel evolution (formally, pi ¼ 0 for all loci except the one with the highest selective advantage, where pi ¼ 1).

A middle ground: Intermediate levels of clonal interference Most ER experiments in microbes probably lie somewhere along the continuum between the SSWM and maximum clonal interference. Under such intermediate conditions, a beneficial mutation on its way to fixation is likely to compete with a handful of other beneficial mutations that arise and reach appreciable frequency [37, 38]. While we are still some way

Bioessays 38: 0000–0000, ß 2016 The Authors. BioEssays Published by WILEY Periodicals, Inc.

....

Prospects & Overviews

S. F. Bailey et al.

from gene-to-gene (we assume mutation rates follow a gammadistribution). Parameter values for each model type are estimated using model-fitting techniques (e.g. maximum likelihood) and model selection criteria (e.g. likelihood ratio tests, Akaike information criterion) determine which model best explains the data. We then ask which of three further models of selection heterogeneity, with or without clonal interference, best fit the data, given a particular neutral model: (i) pi determined solely by the fitted mutation parameters, implying constant selection across genes; (ii) pi scales with the product of the fitted mutation parameters and si, meaning selection intensity varies from gene-to-gene under SSWM conditions; and (iii) pi scales with the product of the fitted mutation parameters and sig – selection intensity varies from gene-to-gene and is magnified by some degree of clonal interference. Again, the most appropriate model is identified using model fitting and selection techniques, the details of which are in an upcoming methods focused paper (in preparation).

Empirical evidence that mutational heterogeneity contributes to parallel evolution

Example 1: P. aeruginosa PA14 evolving in cystic fibrosis-like conditions

The theory outlined above predicts that the relative importance of mutational heterogeneity and selection in driving parallel evolution depends on the mutation supply rate, Nm. When a population evolves under SSWM conditions, both mutational heterogeneity and selection play equal roles in contributing to parallel evolution [20, 31]. In a population evolving under clonal interference, where multiple beneficial mutations may compete, mutational heterogeneity should become far less, and selection heterogeneity far more important as a driver of parallel evolution. These predictions are consistent with the observation from our meta-analysis that parallel evolution was more common in large populations. Increasing population size means more mutations, more clonal interference [32], and so a higher likelihood of parallel evolution attributable to selection. We can push the theory further by quantifying the relative importance of mutation versus selection heterogeneity in generating observed patterns of parallel evolution. We fit models incorporating both sources to data consisting of counts of mutations per locus. Our approach first estimates gene-to-gene heterogeneity in mutation rate from mutation accumulation experiments or rates of substitution at synonymous sites; both methods provide insight into the spectrum of mutations fixed under relaxed selection. We then overlay the mutations fixed in the presence of selection and infer directly the relative contribution of mutation and selection heterogeneity to the pattern of parallel evolution. To estimate mutational heterogeneity, we fit three neutral models: (i) pi is constant, representing a case where the mutation rate per gene is constant; (ii) pi scales with gene length, Li, representing a constant per nucleotide mutation rate; and (iii) pi scales with the product of Li and mi, a situation describing variable per nucleotide mutation rates

P. aeruginosa is an opportunistic pathogen that forms chronic infections in 80% of adult cystic fibrosis patients and is a leading cause of morbidity and mortality for this population. Previous work by Wong et al. investigated the extent of parallelism under conditions resembling the initial stages of infection by allowing the widely used PA14 strain to adapt to a synthetic cystic fibrosis lung medium during daily serial transfer [39]. In a separate experiment using the same strain, Dettman et al. [40] performed a mutation accumulation experiment to investigate the spectrum of mutations fixed under relaxed selection. Dettman et al.’s experiment allows us to obtain a distribution of the counts of mutations per locus, mi, in the absence of selection (Fig. 3A). In this case, a model assuming heterogeneity in per-gene mutation rate fits best. Data from Wong et al. allows us to obtain a distribution of counts of mutations per locus fixed in the presence of selection and/or clonal interference (Fig. 3B). For this example, a model assuming heterogeneous selection and some degree of clonal interference fits best. Biologically these results make good sense. Wong et al. report a remarkable amount of parallelism in some genes, especially those like nfxB and gyrA that confer resistance to the antibiotic ciprofloxacin, which was added to some treatments in this experiment. Additionally, clonal interference was important in this experiment: measures of competitive fitness at the population- and individual-level give strikingly different results, suggesting that there is substantial genetic variation segregating in the population.

Example 2: Saccharomyces cerevisiae in rich media Data for our second example come from an evolution experiment involving 40 populations of S. cerevisiae evolved

Bioessays 38: 0000–0000, ß 2016 The Authors. BioEssays Published by WILEY Periodicals, Inc.

5

Review essays

from a quantitative theory of clonal interference, we can operationally define the intensity of interference through its effects on the relative probability that a mutation arising at a locus will go to fixation. Under SSWM conditions, a mutation conferring a fitness advantage si goes to fixation with a probability Pi proportional to si. We posit that the probability of fixation of mutations experiencing some degree of interference can be written as sig, where the exponent g modulates the relative importance of mutation and selection heterogeneity in driving parallel evolution. If g ¼ 1 we recover the SSWM limit; as g goes to infinity, we recover the maximal interference case. This is a heuristic scaling, and is not yet well motivated by theory, but it makes sense as a natural way to tune the effect of clonal interference in determining the fixation probabilities, and bridge two idealized limiting conditions that, on their own, are well understood. In the Supplementary Information we show this scaling is extremely accurate, and that the coefficient g scales linearly with the input of beneficial mutations (Fig. S2), suggesting that under intermediate levels of clonal interference pi / mi sig.

Prospects & Overviews

....

Review essays

S. F. Bailey et al.

Figure 3. A: The distribution of counts of mutation per gene from a mutation accumulation experiment with P. aeruginosa (gray bars) [40]. Lines show model fits for three models assuming the absence of selection. B: The distribution of counts of mutations per gene from a selection experiment with the same strain of P. aeruginosa (gray bars) [39]. Lines show model fits for three alternate models with the possibility of selection.

for 1,000 generations in identical conditions [37]. The mutation rate is estimated using counts of synonymous mutations (Fig. 4A). For these data, all three mutational models appear to fit the synonymous mutation data quite well, though there is a slight preference for a model where the probability of observing a mutation scales with gene length. When non-synonymous mutations are examined (Fig. 4B), a model including both selection heterogeneity and additional effects of clonal interference fit the data best.

The relative importance of mutation versus selection We next compare the relative contribution of mutation and selection heterogeneity to the observed patterns of parallel evolution. To do this, we calculate the coefficient of variation of per-gene mutation rate, CV(Limi), and the coefficient of variation of per-gene probability of fixation of a new mutation, CV(sig), respectively (see also Supplementary Material). There are two results, shown in Table 2, worth noting. The first is that, in both experiments, the CV attributable to selection exceeds that attributable to mutation. This result tells us that selection is the dominant mechanism driving observed patterns of parallel evolution in these experiments, just as it often is in many examples of parallel evolution in nature. The second is that the mutational CV can be a substantial fraction of the selection CV. For the P. aeruginosa example, our models reveal that CV associated with mutation(Limi) is 9% of the CV associated with selection

6

(sig), while for S. cerevisiae, it is 45%. In other words, mutational heterogeneity, while less important than selection, makes a non-trivial contribution as a driver of parallel evolution in these experiments.

Discussion We have shown how ER experiments can be used to dissect the drivers of parallel evolution. Two key insights emerge from our analysis. First, there is substantial variation in the degree of parallel evolution, the majority of experiments having low rates of parallel evolution and a few – especially those in bacteria – with much higher rates. At least some of this variation is associated with population size: larger populations have higher rates of parallel evolution independently of whether they are prokaryotic or eukaryotic. Second, our analysis of mutation counts reveals that, while most parallel evolution can be attributed to selection, mutational heterogeneity across genomes also plays an important role. We discuss these results and what they mean for the theory of parallel evolution in more detail below.

How common is parallel evolution? The frequency and extent of parallel evolution varies widely among ER experiments, and can be substantially higher in bacterial experiments. The reason for this difference is not clear. One possibility is that bacterial experiments are run at higher population sizes than yeast experiments, but inspection of Fig. 2 shows this not to be the case. Another is that mutation rates are higher in bacteria compared to yeast, pushing bacteria more often into a clonal interference regime and so higher rates of parallelism. However, mutation accumulation experiments suggest the opposite: both Pseudomonas and E. coli (two heavily represented groups of bacteria in our data set) have lower mutation

Bioessays 38: 0000–0000, ß 2016 The Authors. BioEssays Published by WILEY Periodicals, Inc.

....

Prospects & Overviews

S. F. Bailey et al.

Review essays

Figure 4. A: The distribution of counts of synonymous mutations per gene from an evolution experiment with S. cerevisiae (gray bars) [37]. Lines show model fits for three models assuming the absence of selection. B: The distribution of counts of nonsynonymous mutations per gene from the same evolution experiment (gray bars). Lines show model fits for three alternate models with the possibility of selection.

rates at both the whole genome and per-gene level than S. cerevisae (summarized in [40]). Finally, it is possible that heterogeneity along the genomes in mutation rates and/or selection coefficients is consistently higher in bacteria compared to yeast, perhaps as a by-product of differences in the selection regimes chosen by experimenters or because this reflects some real biological feature that distinguishes how adaptive evolution proceeds in the two groups. Further experimentation is required to explore these possibilities. The majority of experiments reviewed here have rates of parallel evolution that appear to be quite low, the average Jaccard similarity for the entire data set being just 0.14. The only comparable data come from work in bacteriophage FX174, where Wichman and Brown [15] estimated the probability of parallel evolution at any pair of sites to be 0.5. The much higher rate in phage likely stems from its having a small, highly compact genome with many overlapping reading frames and a high mutation rate. As our data show, such a high rate is likely not representative of other organisms whose genomes are larger and mutation rates smaller than phage by orders of magnitude. Nevertheless, there is no reason to expect phage to be any different from bacteria or yeast in the degree to which parallel evolution is governed by mutational or selective heterogeneity. Our models could therefore, be used to estimate CV associated with mutation and selection from phage experiments, which in turn would allow a calculation of the degree of parallel evolution independent of genome size (and so the number loci involved). Given that traditional measures of parallel evolution at the gene or

nucleotide level depend strongly on genome size, our approach provides a meaningful way to compare ER studies from diverse sources. Such an analysis is beyond the scope of the present paper but remains an intriguing avenue for future investigation. Our meta-analysis also revealed a positive relationship between population size and the probability of parallel evolution in both bacteria and yeast. This is an important result, because it suggests that population size can be a key driver of parallel evolution, likely through its effects on mutation supply. As population size increases, the supply of beneficial mutations also increases and generates clonal interference, a process that ensures the mutations that fix are biased toward those with the largest selective advantages. Moreover, large population sizes likely serve to tip the balance away from mutational heterogeneity and toward heterogeneity among loci in selection coefficients as the driver of parallel evolution. Notably, this mechanism is very general, and should apply with equal force to instances of parallel evolution from standing variation as well. Whether or not population size is a useful predictor of the probability of parallel evolution deserves further theoretical and empirical investigation.

More than just selection can lead to parallel evolution Evolutionary biologists are often quick to interpret instances of parallel evolution as being driven by selection. While no doubt often true, biases in the direction or magnitude of variation almost certainly play a role as well. The loci most likely to be associated with parallel evolution are those that are both under strong selection and have a tendency to generate higher levels of variation relevant to selection than others. This principle has been understood for a long time [12], but the role of mutational heterogeneity in contributing to parallel evolution is often overlooked and, consequently, understudied.

Bioessays 38: 0000–0000, ß 2016 The Authors. BioEssays Published by WILEY Periodicals, Inc.

7

S. F. Bailey et al.

Prospects & Overviews

Review essays

Table 2. Estimates of the contributions of mutational and selection heterogeneity in two ER experiments Species P. aeruginosa S. cerevisiae

CV (Limi) 1.02 0.66

CV (sig) 10.53 1.47

See text for further explanation.

Our models quantify the relative contribution of mutational heterogeneity and selection to parallel evolution. We find, as expected, that selection makes a larger contribution than mutation to patterns of parallel evolution in the two examples we have studied. That said, mutational heterogeneity contributes a substantial amount of the variance associated with parallel evolution in both studies, and so cannot be discounted. This conclusion gains some compelling support from the P. aeruginosa experiment (Example 1) where we have a good mechanistic understanding of the interaction between mutation rate and selection in generating parallel evolution. One of the most commonly selected genes, orfN, in this experiment confers resistance to an antibiotic – a key selective pressure in the experiment – and the mutations responsible involve single base pair deletions in either poly-T or poly-G repeats, leading to a truncated, and presumably loss-offunction, protein [39]. In other words, a locus prone to high rates of mutation and under strong selection, as in the orfN example, is likely to contribute disproportionately to parallel evolution. Of course, mutational heterogeneity is just one factor – the one easiest to study with available data from ER experiments in asexual microbes – that can modulate the probability of parallel evolution. Population size is another, as our meta-analysis shows. Additional factors include pleiotropy, epistasis, and recombination. Pleiotropy should increase the probability of parallel evolution by restricting the number of effective phenotypic dimensions available for adaptation [20]. It is harder to make clear predictions about the roles of epistasis and recombination because these depend crucially on the distribution of fitness effects among mutations, both singly and in combination. Further work in this area is needed.

Conclusions The work reviewed here takes us some way toward a more rigorous empirical approach to uncovering the processes contributing to parallel evolution. We have used a combination of meta-analysis of ER experiments and a model-fitting framework to infer more precisely the key factors driving parallel evolution in microbial evolution experiments. Our most important result is that, consistent with the many examples of parallel evolution in higher organisms in more natural settings, parallel evolution in the defined conditions of laboratory microcosms is mainly driven by selection. That said, variation along the genome in the mutations available can also be important in driving parallel evolution, and likely becomes more important as population sizes decline and

8

....

populations spend more time waiting for beneficial mutations to occur. The contribution of mutation and selection to parallel evolution is thus, mediated in part by the mutation supply. More broadly, the theory of parallel evolution has languished because we had little understanding of how rare or common a phenomenon parallel evolution is. The ER experiments reviewed here are especially important in this regard because they provide direct and unbiased estimates of the frequency of parallel evolution. Of course, these experiments are done in controlled conditions in the laboratory, where selection is usually strong by design and extraneous sources of variation have been eliminated. These conditions would seem to bias laboratory experiments toward high rates of parallel evolution. While this is the case in some instances, more often rates of parallel evolution are quite low in ER experiments. Interestingly, estimates of gene reuse from a meta-analysis of natural populations with similar phenotypic adaptations [10] suggest that rates of gene-level parallel evolution in natural systems may actually be comparable to that in the lab. This result is important because it suggests that there is nothing particularly unusual about in vitro evolution, at least when it comes to parallel evolution. The inferences we make from ER experiments should thus help guide the study of parallel evolution in natura.

Acknowledgments This work was supported in part by an NSERC Discovery Grant to RK, a NSERC post doctoral fellowship to SFB, a Marie Skłodowska-Curie Individual Fellowship to FB (grant 657768), and a European Research Council grant from the European Union’s Seventh Framework Program (ERC Grant 311341) to TB. The authors have declared no conflict of interest.

References 1. Lenormand T, Chevin LM, Bataillon T. 2016. Parallel evolution: what does it (not) tell us and why is it (still) interesting? In Ramsey G, Pence CH, ed; Chance in Evolution. Chicago University Press: Chicago, USA. 2. Colosimo PF, Hosemann KE, Balabhadra S, Villarreal G, et al. 2005. Widespread parallel evolution in sticklebacks by repeated fixation of ectodysplasin alleles. Science 307: 1928–33. 3. Jost MC, Hillis DM, Lu Y, Kyle JW, et al. 2008. Toxin-resistant sodium channels: parallel adaptive evolution across a complete gene family. Mol Biol Evol 25: 1016–24. 4. Liu Y, Cotton JA, Shen B, Han X, et al. 2010. Convergent sequence evolution between echolocating bats and dolphins. Curr Biol 20: R53–4. 5. Dobler S, Dalla S, Wagschal V, Agrawal AA. 2012. Community-wide convergent evolution in insect adaptation to toxic cardenolides by substitutions in the Na,K-ATPase. Proc Natl Acad Sci USA 109: 13040–5. 6. Wong A, Kassen R. 2011. Parallel evolution and local differentiation in quinolone resistance in Pseudomonas aeruginosa. Microbiology 157: 937–44. 7. Breidenstein EBM, Khaira BK, Wiegand I, Overhage J, et al. 2008. Complex ciprofloxacin resistome revealed by screening a Pseudomonas aeruginosa mutant library for altered susceptibility. Antimicrob Agents Chemother 52: 4486–91. 8. Kassen R, Bataillon T. 2006. Distribution of fitness effects among beneficial mutations before selection in experimental populations of bacteria. Nat Genet 38: 484–8.

Bioessays 38: 0000–0000, ß 2016 The Authors. BioEssays Published by WILEY Periodicals, Inc.

....

Prospects & Overviews

35. Good BH, Rouzine IM, Balick DJ, Hallatschek O, et al. 2012. Distribution of fixed beneficial mutations and the rate of adaptation in asexual populations. Proc Natl Acad Sci USA 109: 4950–5. 36. Maddamsetti R, Lenski RE, Barrick JE. 2015. Adaptation, clonal interference, and frequency-dependent interactions in a long-term evolution experiment with Escherichia coli. Genetics 200: 619–31. 37. Lang GI, Rice DP, Hickman MJ, Sodergren E, et al. 2013. Pervasive genetic hitchhiking and clonal interference in forty evolving yeast populations. Nature 500: 571–4. 38. Schick A, Bailey Susan F, Kassen R. 2015. Evolution of fitness tradeoffs in locally adapted populations of Pseudomonas fluorescens. Am Nat 186: S48–59. 39. Wong A, Rodrigue N, Kassen R. 2012. Genomics of adaptation during experimental evolution of the opportunistic pathogen Pseudomonas aeruginosa. PLoS Genet 8: e1002928. 40. Dettman JR, Sztepanacz JL, Kassen R. 2016. The properties of spontaneous mutations in the opportunistic pathogen Pseudomonas aeruginosa. BMC Genomics 17: 27. 41. Tenaillon O, Rodrıguez-Verdugo A, Gaut RL, McDonald P, et al. 2012. The molecular diversity of adaptive convergence. Science 335: 457–61. 42. Conrad TM, Joyce AR, Applebee MK, Barrett CL, et al. 2009. Wholegenome resequencing of Escherichia coli K-12 MG1655 undergoing short-term laboratory evolution in lactate minimal media reveals flexible selection of adaptive mutations. Genome Biol 10: R118. 43. Herring CD, Raghunathan A, Honisch C, Patel T, et al. 2006. Comparative genome sequencing of Escherichia coli allows observation of bacterial evolution on a laboratory timescale. Nat Genet 38: 1406–12. 44. Blank D, Wolf L, Ackermann M, Silander OK. 2014. The predictability of molecular evolution during functional innovation. Proc Natl Acad Sci USA 111: 3044–9.  llez PE, Elsas van JD. 2014. Sympatric metabolic diversifi45. Puentes-Te cation of experimentally evolved Escherichia coli in a complex environment. Antonie Van Leeuwenhoek 106: 565–76. 46. Churton NWV, Misra RV, Howlin RP, Allan RN, et al. 2016. Parallel evolution in Streptococcus pneumoniae biofilms. Genome Biol Evol 8: 1316–26. 47. Leiser OP, Merkley ED, Clowers BH, Kaiser BLD, et al. 2015. Investigation of Yersinia pestis laboratory adaptation through a combined genomics and proteomics approach. PLoS ONE 10: e0142997. 48. Kvitek DJ, Sherlock G. 2013. Whole genome, whole population sequencing reveals that loss of signaling networks is the major adaptive strategy in a constant environment. PLoS Genet 9: e1003972. 49. Kohn LM, Anderson JB. 2014. The underlying structure of adaptation under strong selection in 12 experimental yeast populations. Eukaryot Cell 13: 1200–6. 50. Payen C, Rienzi SCD, Ong GT, Pogachar JL, et al. 2014. The dynamics of diverse segmental amplifications in populations of Saccharomyces cerevisiae adapting to strong selection. G3 4: 399–409. 51. Gerstein AC, Ono J, Lo DS, Campbell ML, et al. 2015. Too much of a good thing: the unique and repeated paths toward copper adaptation. Genetics 199: 555–71. 52. Gerstein AC, Lo DS, Otto SP. 2012. Parallel genetic changes and nonparallel gene-environment interactions characterize the evolution of drug resistance in yeast. Genetics 192: 241–52.  cs K, et al. 2014. The genomic 53. Szamecz B, Boross G, Kalapis D, Kova landscape of compensatory evolution. PLoS Biol 12: e1001935.

Bioessays 38: 0000–0000, ß 2016 The Authors. BioEssays Published by WILEY Periodicals, Inc.

9

Review essays

9. Andersson DI, Hughes D. 2010. Antibiotic resistance and its cost: is it possible to reverse resistance? Nat Rev Microbiol 8: 260–71. 10. Conte GL, Arnegard ME, Peichel CL, Schluter D. 2012. The probability of genetic parallelism and convergence in natural populations. Proc R Soc Lond B Biol Sci 279: 5039–47. 11. Bailey SF, Rodrigue N, Kassen R. 2015. The effect of selection environment on the probability of parallel evolution. Mol Biol Evol 32: 1436–48. 12. Haldane JBS. 1933. The part played by recurrent mutation in evolution. Am Nat 67: 5–19. 13. Bailey SF, Bataillon T. 2016. Can the experimental evolution programme help us elucidate the genetic basis of adaptation in nature? Mol Ecol 25: 203–18. 14. Kassen R. 2014. Experimental evolution and the nature of biodiversity. Roberts: Denver, USA. 15. Wichman HA, Brown CJ. 2010. Experimental evolution of viruses: microviridae as a model system. Philos Trans R Soc B Biol Sci 365: 2495–501. 16. Dettman JR, Rodrigue N, Melnyk AH, Wong A, et al. 2012. Evolutionary insight from whole-genome sequencing of experimentally evolved microbes. Mol Ecol 21: 2058–77. 17. Fisher RA. 1922. Darwinian evolution of mutations. Eugen Rev 14: 31–4. 18. Orr HA. 1998. The population genetics of adaptation: the distribution of factors fixed during adaptive evolution. Evolution 52: 935–49. 19. Welch JJ, Waxman D, Houle D. 2003. Modularity and the cost of complexity. Evolution 57: 1723–34. 20. Chevin L-M, Martin G, Lenormand T. 2010. Fisher’s model and the genomics of adaptation: restricted pleiotropy, heterogenous mutation, and parallel evolution. Evolution 64: 3213–31. 21. Orr HA. 2002. The population genetics of adaptation: the adaptation of DNA sequences. Evol Int J Org Evol 56: 1317–30. 22. Gillespie JH. 1983. A simple stochastic gene substitution model. Theor Popul Biol 23: 202–15. 23. Gillespie JH. 1984. Molecular evolution over the mutational landscape. Evolution 38: 1116–29. 24. Orr HA. 2005. The probability of parallel evolution. Evolution 59: 216–20. 25. Yates LR, Campbell PJ. 2012. Evolution of the cancer genome. Nat Rev Genet 13: 795–806. 26. Didelot X, Walker AS, Peto TE, Crook DW, et al. 2016. Within-host evolution of bacterial pathogens. Nat Rev Microbiol 14: 150–62. 27. Gillespie JH. 1994. The Causes of Molecular Evolution. Oxford University Press: Oxford, UK. 28. Rokyta DR, Beisel CJ, Joyce P. 2006. Properties of adaptive walks on uncorrelated landscapes under strong selection and weak mutation. J Theor Biol 243: 114–20. 29. Orr HA. 2005. The genetic theory of adaptation: a brief history. Nat Rev Genet 6: 119–27. 30. Patwa Z, Wahl LM. 2008. The fixation probability of beneficial mutations. J R Soc Interface 5: 1279–89. 31. Streisfeld MA, Rausher MD. 2011. Population genetics, pleiotropy, and the preferential fixation of mutations during adaptive evolution. Evolution 65: 629–42. 32. Gerrish PJ, Lenski RE. 1998. The fate of competing beneficial mutations in an asexual population. Genetica 102–103: 127–44. 33. Muller HJ. 1932. Some genetic aspects of sex. Am Nat 66: 118–38. 34. Crow JF, Kimura M. 1965. Evolution in sexual and asexual populations. Am Nat 99: 439–50.

S. F. Bailey et al.