Neutral genomic regions refine models of recent ... - Semantic Scholar

1 downloads 5355 Views 4MB Size Report
Sequencing Center, Baylor College of Medicine, Houston, TX 77030; and dDepartment of Human ... sequencing to reliably call rare variants and fit an extensive.
Neutral genomic regions refine models of recent rapid human population growth Elodie Gazavea,1, Li Maa,1,2, Diana Changa,1, Alex Coventryb, Feng Gaoa, Donna Muznyc, Eric Boerwinklec, Richard A. Gibbsc, Charles F. Singd, Andrew G. Clarkb, and Alon Keinana,3 Departments of aBiological Statistics and Computational Biology and bMolecular Biology and Genetics, Cornell University, Ithaca, NY 14853; cHuman Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030; and dDepartment of Human Genetics, University of Michigan, Ann Arbor, MI 48105

Human populations have experienced dramatic growth since the Neolithic revolution. Recent studies that sequenced a very large number of individuals observed an extreme excess of rare variants and provided clear evidence of recent rapid growth in effective population size, although estimates have varied greatly among studies. All these studies were based on protein-coding genes, in which variants are also impacted by natural selection. In this study, we introduce targeted sequencing data for studying recent human history with minimal confounding by natural selection. We sequenced loci far from genes that meet a wide array of additional criteria such that mutations in these loci are putatively neutral. As population structure also skews allele frequencies, we sequenced 500 individuals of relatively homogeneous ancestry by first analyzing the population structure of 9,716 European Americans. We used very high coverage sequencing to reliably call rare variants and fit an extensive array of models of recent European demographic history to the site frequency spectrum. The best-fit model estimates ∼3.4% growth per generation during the last ∼140 generations, resulting in a population size increase of two orders of magnitude. This model fits the data very well, largely due to our observation that assumptions of more ancient demography can impact estimates of recent growth. This observation and results also shed light on the discrepancy in demographic estimates among recent studies.

growth than earlier studies (17–19). At the same time, demographic estimates varied by as much as an order of magnitude between these studies (SI Appendix, Table S1). Not all rare SNVs are as recent as others, and a variant’s selective effect plays an important part in its frequency. Purifying (negative) selection acting on deleterious alleles is expected to give rise to an excess of rare variants, which has been demonstrated for human populations (17, 19, 20). Thus, the genetic signature left by purifying selection on the SFS confounds the signature left by recent growth (21). To minimize this confounding effect, recent studies based on protein-coding genes considered for modeling solely synonymous SNVs, which do not modify the amino acid sequence (17–19). However, synonymous mutations have been shown to be targeted by natural selection, e.g., due to their impact on translation efficiency and accuracy, splicing, and folding energy (17, 19, 22–26). Hence, to study recent human genetic history with minimal effects of selection, it is not only desirable to consider accurate sequencing data from a large number of individuals, but also to focus on genomic regions in which mutations are putatively neutral, i.e., not affected by natural selection. Another potential confounder of demographic inference is population structure. Because both Significance

A

rcheological and historical records reveal that modern human populations have experienced dramatic growth, likely driven by the Neolithic revolution about 10,000 y ago (1, 2). Since then, the worldwide human population size has increased at a fast pace, and faster yet in the last ∼2,000 y, giving rise to today’s population in excess of 7 billion people (3, 4). A central question in population genetics is how such demographic events affect the effective size (Ne) of populations over time, and as a consequence, how they have shaped extant patterns of genetic variation. [Effective population size, which is typically smaller than the census size, determines the genetic properties of a population (5).] Focusing often on human populations of European descent, estimates of Ne from genetic variation have been traditionally on the order of 10,000 individuals (6–11), although higher and lower estimates have also been obtained (12–16). More recent studies based on sequencing data from a relatively small number of individuals have considered recent population growth in fitting models to the observed site frequency spectrum (SFS) and reported as much as a 0.5% increase in Ne per generation, culminating in a Ne of a few tens of thousands today (13, 14). It has been recently hypothesized that these studies could not capture the full scope of population growth because a larger sample size of individuals is needed to observe single nucleotide variants (SNVs) that arose during the recent epoch of growth (4). With extreme recent population growth as experienced by human populations, the vast majority of SNVs are expected to be very young and rare, i.e., of very low allele frequency (4). Indeed, several recent sequencing studies with very large numbers of individuals have observed an unprecedented excess in the proportion of rare SNVs (17–19). Fitting models to the SFS, these studies have captured a clearer, more rapid recent population www.pnas.org/cgi/doi/10.1073/pnas.1310398110

Recent rapid growth of human populations predicts that a large number of genetic variants in populations today are very rare, i.e., appear in a small number of individuals. This effect is similar to that of purifying selection, which drives deleterious alleles to become rarer. Recent studies of the genetic signature left by rapid growth were confounded by purifying selection since they focused on genes. Here, to study recent human history with minimal confounding by selection, we sequenced and examined genetic variants far from genes. These data point to the human population size growing by about 3.4% per generation over the last 3,000–4,000 y, resulting in a greater than 100-fold increase in population size over that epoch. Author contributions: A.K. designed research; E.G., L.M., D.C., A.C., and F.G. performed research; D.M., E.B., R.A.G., C.F.S., and A.G.C. contributed new reagents/analytic tools; E.G., L.M., D.C., A.C., F.G., and A.K. analyzed data; and E.G., D.C., and A.K. wrote the paper. The authors declare no conflict of interest. This article is a PNAS Direct Submission. J.M.A. is a guest editor invited by the Editorial Board. Freely available online through the PNAS open access option. Data deposition: The sequences reported in this paper have been deposited in The Single Nucleotide Polymorphism Database (dbSNP) (www.ncbi.nlm.nih.gov/SNP) (accession nos.: ss836177187–ss836179020). More detailed data is available at http://keinanlab.cb.bscb. cornell.edu. 1

E.G., L.M., and D.C. contributed equally to this work.

2

Present address: Department of Animal and Avian Sciences, University of Maryland, College Park, MD 20742.

3

To whom correspondence should be addressed. E-mail: [email protected].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1310398110/-/DCSupplemental.

PNAS | January 14, 2014 | vol. 111 | no. 2 | 757–762

GENETICS

Edited by Joshua M. Akey, University of Washington, Seattle, WA, and accepted by the Editorial Board November 11, 2013 (received for review June 10, 2013)

large-scale and fine-scale population structures exist in European populations (27–30), pooling individuals of different European ancestries can lead—when not accounted for in modeling—to biases in the observed SFS and, consequently, in estimates of recent history (SI Appendix). In this study, we aim to capture recent demographic history and estimate the magnitude of the recent growth experienced by humans while limiting the confounding by natural selection and population structure. For this purpose, we selected a small set of genomic regions that are putatively neutral based on a wide array of criteria; the SFS of mutations in these regions is likely to reflect historical changes in Ne rather than selection. We sequenced these regions in a large sample of individuals that share a relatively similar genetic ancestry within Europe. Very deep sequencing coverage allowed us to reliably observe even singletons (SNVs with an allele appearing in a single copy in the sample). Based on this dataset, we explored different models of recent European demographic history. Our best-fit model estimates a growth of 3.4% per generation over the last 141 generations, which is more rapid than estimated in recent large-scale studies (17, 19). We show that differences among previous models (17, 19)—and between these models and ours—can be partially explained by a priori assumptions about more ancient demographic events. Results To estimate recent European demographic history with minimal confounding of natural selection, we sequenced genomic regions

that we selected such that mutations therein would be as neutral as possible. Thus, the neutral regions dataset (NR) consists of loci that are at least 100,000 bp and at least 0.1 cM away from coding or potentially coding sequences. The loci are also free of known copy number variants, segmental duplications, and loci shown to have been under recent positive selection, and have minimal amounts of conserved and repetitive elements (Materials and Methods and SI Appendix). We sequenced a final set of 15 loci that met all these criteria, spanning a total of 216,240 bp (SI Appendix, Table S2). In modeling a population’s recent demographic history, if the sample of individuals trace their recent ancestry to different populations, even different European populations, the number of singletons and other rare variants can be biased (SI Appendix, Fig. S1). To sequence the neutral regions in a sample that minimizes such population structure, we applied principal component analysis (PCA) to whole-genome genotyping data of 9,716 individuals of European ancestry from the Atherosclerosis Risk in Communities (ARIC) cohort (31). This analysis revealed extensive population structure, and we consequently sequenced 500 individuals that form a relatively homogenous cluster (Fig. 1A). Validating the choice of samples by comparing to a diverse panel of European populations from POPRES (32), these 500 individuals indeed show much less heterogeneity than the broader ARIC sample (Fig. 1B) and share a northwestern European ancestry that is similar to the POPRES UK sample (Fig. 1C). Sequencing was carried out with Illumina HiSeq: 100-bp pairedend for a median average coverage depth of 295X per individual

Fig. 1. Genetic ancestry of the NR sequenced individuals. (A) PCA of all individuals of European ancestry from the ARIC cohort, with the exception of outlier individuals (SI Appendix). Principal components 1 and 2 are plotted and reveal extensive population structure. Individuals chosen for sequencing in the present study are denoted in blue. (B) PCA of individuals from the POPRES cohort, with the individuals from A subsequently projected onto the resulting principal components (SI Appendix). Principal component 1 (x axis) appears to capture southern vs. northern European ancestry, whereas principal component 2 (y axis) captures western vs. eastern European ancestry, in line with the results of Novembre et al. (27). The full set of individuals from ARIC (ARIC) show greater variability, mostly across the first principal component, than the set of individuals sequenced in this study (ARIC_NR). (C) Same as B, except that the PCA includes 10 randomly chosen individuals from the present study rather than projecting all of them as in B. These 10 individuals cluster with individuals from the POPRES UK and related populations, which is also the case for other random sets of 10 individuals.

758 | www.pnas.org/cgi/doi/10.1073/pnas.1310398110

Gazave et al.

results (SI Appendix, Fig. S5). This set of results combined support our unique choice of regions and the relatively neutral nature of SNVs in these regions, as well as further validate our SNV and genotype calling pipeline. Although the SFS of the NR dataset has a lower proportion of singletons compared with sites under purifying selection, it still exhibits a marked enrichment in this proportion (Fig. 2) compared with the expectation for a population that has remained at constant size throughout history (38.4% vs. 13.6%). This enrichment is consistent with the impact of recent population growth on the distribution of allele frequencies (4, 17–19). However, the SFS predicted by recently published demographic models of European populations that include recent exponential population growth (17, 19) do not closely match the SFS of the NR data (SI Appendix, Fig. S6). These models predict a higher proportion of rare variants and a smaller proportion of common variants than we observed, which is consistent with these models being based on synonymous SNVs with increased effect of purifying selection (17, 19). Hence, we next estimated the magnitude and duration of population growth by fitting the SFS of the NR to several different models of recent history. The first demographic model that we fit to the SFS consists of a recent exponential population growth with two free parameters: the time growth started and the extant Ne, with the growth assumed to continue into the present. More ancient demographic events, before the epoch of growth, including two population bottlenecks, were assumed to follow the model and estimates of ref. 9 (SI Appendix, Table S1). The resulting model (model I) estimates the extant population size and, as a consequence, the growth rate, with very large uncertainty, as evident from the 95% CIs (Table 1), and the model does not fit the data very accurately (Fig. 2). Model I, similar to other recent models of population growth (17–19), assumed more ancient demography as fixed, with the intention of obtaining better resolution for estimating the two parameters of recent history. However, different recent models of population growth have assumed different models of ancient demography (SI Appendix, Table S1): Tennessen et al. (19) assumed what had been estimated previously by Gravel et al. (14), Nelson et al. (17), and Coventry et al. (18), what has been estimated by Schaffner et al. (15), and here what has been estimated by Keinan et al. (9). To test how sensitive model I is to the details of assumed ancient history, we repeated fitting model I while assuming each of the above ancient demographic models. Our results point to large differences among the three resulting models, with some parameters for recent demography being as

Fig. 2. SFS of the NR dataset and demographic models. x axis represents, in log scale, a partition of the number of copies of the minor allele with each number indicating the upper bound of a bin. (Minor allele counts of 1–5 are not binned.) y axis presents, in log scale, the proportion of SNVs that fall into each bin. (Inset) Fraction of variants that are singletons (y axis is presented in linear scale). Data, empirical SFS of NR; model I, a two-parameter model with one epoch of growth, where the duration of growth and final Ne were estimated; model II, an extension of Model I with a third parameter corresponding to Ne before the growth epoch; model III, a model with two separate epochs of growth (Table 1). For clarity, model IV is not presented because its SFS is very similar to that of model III.

Gazave et al.

PNAS | January 14, 2014 | vol. 111 | no. 2 | 759

GENETICS

(after filtering of duplicated reads; SI Appendix, Fig. S2). We used UnifiedGenotyper (GATK) for variant detection and genotype calling (33, 34), and after applying strict filters to its calls, we obtained 1,834 high quality SNVs (SI Appendix). Of these variants, 62.5% have not been reported in dbSNP 135 (novel SNVs). The Ti/Tv (transition/transversion) ratio showed no indication of biases, for both all SNVs (Ti/Tv = 2.22) and for novel SNVs alone (Ti/ Tv = 2.29). None of the called SNVs presented significant departure from Hardy–Weinberg equilibrium. We further validated the quality of variant and genotype calling by comparing to those from whole-genome sequencing of the Cohorts for Heart and Aging Research in Genetic Epidemiology (CHARGES) project, which overlaps with a few individuals from our NR dataset (35). This validation supports the high quality of the NR dataset due to the very high sequencing coverage (SI Appendix). High-quality genotypes for at least 450 individuals were obtained for 95% of SNVs, which are used for presentation of the SFS throughout while probabilistically subsampling 900 random chromosomes for each SNV (SI Appendix). We used the full set of SNVs for demographic modeling, where our approach accounts for missing data (Materials and Methods). We compared the SFS from the NR dataset to that from the Exome Sequencing Project (ESP), which is of equivalent deep sequencing (SI Appendix, Fig. S3). We randomly subsampled the latter to 900 chromosomes and stratified by functional annotation. The more likely to be functional an annotation is, the more the ESP SFS deviates from that of the neutral regions: the highest agreement is observed for intronic and intergenic annotations from the ESP data, although these are still significantly different from the NR (P < 10−3 for a goodness of fit test; SI Appendix, Table S3 and Fig. S4). Agreement is much worse between synonymous SNVs and the NR data (P = 5.4 × 10−19) and worst for missense, splice, and nonsense SNVs (P < 10−30 for each; SI Appendix, Table S3 and Fig. S4). This lack of agreement is due to a higher proportion of very rare variants in the ESP data (SI Appendix, Fig. S4), which is consistent with purifying selection playing a larger role in maintaining alleles at lower frequencies in and around genes. We further examined the NR SNVs in comparison with synonymous SNVs via genomic evolutionary rate profiling (GERP) scores (36), showing our data to be much more narrowly distributed around a score of 0, which corresponds to the absence of functional constraint (SI Appendix, Fig. S5). These low GERP scores might simply reflect the NR regions being chosen to minimize conserved elements, but applying the same conservation criteria to synonymous SNVs leads to similar

Table 1. Four models of recent demographic history and population growth

Model Model Model Model Model

I II* III IV

Number of free parameters

Ne before growth

Duration of earlier growth (generations)

2 3 3 4

10,000 5,633 (4,400, 7,100) 5,633 5,633

NA NA 267.3 200 (200, 600)

Ne after earlier growth

Duration of recent growth (generations)

Ne after recent growth (millions)

NA 112.8 (92.9, 136.8) 5.2 (0.8, 300) NA 140.8 (116.8, 164.7) 0.654 (0.3, 2.87) 5,362 (3,614, 7,955) 132.7 (101.8, 165.5) 0.73 (0.3, 5.7) 5,000 (3,000, 15,000) 140 (80, 160) 0.5 (0.3, 50)

Growth rate during recent growth 5.54% 3.38% 3.70% 3.84%

(3.2, 11.1) (2.4, 5.1) (2.6, 6.4) (0.36, 4.44)

Log likelihood −3,595.141 −3,583.975 −3,584.178 −3,583.501

The table describes the recent history estimated by four models, as well as the goodness of fit of each (log likelihood). Bold values denote the maximum likelihood estimates (and 95% CIs) of the free parameters estimated by each model. Italicized values are assumed as fixed in the model, and regular font denotes parameters that are a direct function of estimated parameters. All four models assume the model of ancient demographic history as estimated in ref. 9, which includes two population bottlenecks (Fig. 3). Results based on other models of ancient demographic history are provided in SI Appendix. *This model is considered as the best-fit model because none of the models with additional parameters provide a significantly improved fit.

much as an order of magnitude different (SI Appendix, Table S4). We stress that both data and methodology underlying these inferences are identical and hence conclude that the assumption of ancient demography has a major effect on estimating the timing and magnitude of recent population growth, which explains some of the differences among the recently published models (17–19). To alleviate some of the sensitivity to ancient demography assumptions, we fit model II that extends model I by adding an additional parameter for the effective population size just before exponential growth. This three-parameter model fits the NR data significantly better than model I (P = 2.3 × 10−6; Fig. 2). It estimates the ancestral Ne before the growth to be 5,633 (CI of 4,400–7,100), markedly lower than the fixed value of 10,000 in model I (Table 1). It estimates growth starting 141 (117–165) generations ago, which is a little earlier than in model I, with a less rapid growth rate of 3.4% (2.4–5.1%) per generation, which culminates in an extant Ne of 0.65 (0.3–2.87) million individuals (Table 1 and Figs. 3 and 4). To test whether this improved model II can explain the differences between different assumed ancient demographic histories, we repeated fitting its three parameters to the NR similarly to above with model I. When ancient demography is assumed from Gravel et al. (14), this model fits the data significantly better (P = 2.9 × 10−7) than the equivalent of model I with the same ancient demography. Parameter estimates become practically identical to those of the above model II based on Keinan et al. (9) (Fig. 4 and SI Appendix, Tables S4 and S5 and Fig. S7). Model II based on Schaffner et al. (15) does not fit the data better than the respective model I (P = 0.52) and provides the poorest fit of all three models. One notable difference in the model of Schaffner et al. (15) is that the timing of the second European population bottleneck is assumed as fixed at over twofold that estimated in the other two models (9, 14). The extant Ne is almost identical for all these three models based on

Fig. 3. Schematic representation of the best fit model. Ne is shown in log scale during the last 5,000 generations, with the last 620 generations as estimated by model II (Table 1), and the preceding period following ref. 9.

760 | www.pnas.org/cgi/doi/10.1073/pnas.1310398110

model II, with best-fit estimates varying between 0.47 and 0.77 million individuals (Fig. 4 and SI Appendix, Table S5). We conclude that by modeling the epoch before growth, the improved model II is much less sensitive to assumptions about more ancient demography, and it goes a long way in closing the gap between different published models of recent growth (17, 19) and between these and model I. Archeological and historical records suggest that the growth of the human census population size has accelerated over time (3, 4). Our models thus far considered a single epoch of exponential growth, estimated to have started ∼3,500 y ago (assuming 25 y per generation). Hence, we considered several additional models in which growth can span two separate epochs with a different growth rate in each: model III with three parameters, model IV with four parameters, and all other possible four-parameter models (Table 1). None of these more detailed models fit the data better than model II (P > 0.86; Table 1 and Fig. 2). They all estimate the second, more recent epoch of growth to be practically identical to the one estimated in model II and the earlier epoch of growth to be equivalent to an epoch of constant population size (Table 1 and SI Appendix, Fig. S8). We investigated whether low statistical power due to a limited amount of data could explain the lack of an acceleration of growth. We simulated a scenario that is identical to model II, except for the addition of an earlier epoch of milder growth, and estimated how often model III provides a significantly better fit than model II (SI Appendix). Our modeling has a nonnegligible statistical power of capturing two separate epochs of growth (SI Appendix, Fig. S9). Power is 86% when the earlier growth rate is about half that of the more recent epoch. It decreases for milder growth during the first epoch—being as low as 25% for a growth rate of only 0.3% per generation—as well as when growth during the earlier epoch becomes similar to the recent rate (SI Appendix, Fig. S9). Overall, power is >60% for detecting two distinct epochs of growth as long as the growth rate during the earlier epoch is in the range of 0.6–1.8% (compared with 3.4% in the recent epoch), which is consistent with archaeological data (SI Appendix). Discussion Recent studies have provided clear evidence that human populations have experienced recent explosive population growth, although detailed estimates of growth varied greatly (17–19). These were in the context of medical genetic studies, hence based on the sequencing of protein-coding genes. Here we generated a dataset with the sole purpose of accurately capturing rare putatively neutral variants for studying recent human history and population growth. As such, our NR data consists of several characteristics. First, because both demography and natural selection shape the distribution of allele frequencies, loci for the NR data were carefully chosen to minimize the influence of natural selection. Second, because population structure also affects the distribution of allele frequencies, the data consist of Gazave et al.

individuals with a homogenous European ancestry, similar to that of individuals from the United Kingdom. Third, because the genetic signature of growth is in rare variants, the data consist of a relatively large sample of 500 individuals. Although some recent studies have considered a larger sample size, this was at the cost of studying a medical cohort with more heterogeneous ancestry. Fourth, to deal with the relatively high error rate of nextgeneration sequencing, we sequenced the neutral loci in all individuals to a very high coverage, which allows strict filtering and yields a set of very high quality SNVs. These characteristics combine to make the NR dataset ideally suited for population genetic studies of rare variants and recent history. We used the NR data to consider an array of models of recent human demographic history while showing that our results are consistent with being less confounded by natural selection. The best-fit model points to Europeans having experienced recent growth from an effective population size of about 4,000–7,000 individuals as recently as 120–160 generations (3,000–4,000 y) ago. Growth over the last 3,000–4,000 y is estimated at an average rate of about 2–5% per generation, resulting in an overall increase in effective population size of two orders of magnitude. This model fits the data very well, but only after the realization that assumptions of ancient demography impact the estimate of recent population growth. We hypothesize that this is the case because previous models of ancient demographic history resulted in parameters that confound more recent and more ancient history (37), with the recent growth indirectly affecting them in a manner dependent on sample size. This realization leads to the model we report here fitting much better than previous models of recent growth, and it sheds light on the discrepancies among the latter. Motivated by archeological evidence of growth starting with the Neolithic revolution ∼10,000 y ago and accelerating in the Common Era, we considered models that allow for acceleration of the rate of growth, but none supported such acceleration. One recent model considered two separate epochs of exponential growth (19). However, the first captures a slow recovery from the Eurasian population bottleneck ∼23,000 y ago, with a weak growth rate of 0.3% that leads to an Ne of only 9,208, which is similar to the instantaneous recovery from the population bottleneck in other models (14). Thus, to date no recent acceleration in the rate of growth that is along the lines suggested by archeological evidence has been observed in genetic data. Power calculations showed that with our data size and modeling framework, the results do not support a milder growth before it accelerated with >60% certainty. Not capturing two separate Gazave et al.

epochs of growth can be due to limited statistical power or overly simplified models. However, another potential explanation is that effective population size increases extremely slowly with the census population size, at least initially. Although several factors contribute to these phenomenon, the particular increase in census population size with the Neolithic revolution has been accompanied by changing social structure that has led to increased variability in reproductive success; the advent of agriculture led to differential accumulation of richness, more notably in males, resulting in differential access to females compared with a hunter-gatherer lifestyle (38). Increased variance in reproductive success results in relatively decreased effective population size. Perhaps jointly with other population processes associated with this social shift, e.g., changing generation time, this can explain either a lack of growth in effective population size initially or a much milder one than in census size. This increased variance, in turn, can lead to our models only capturing the more recent and more rapid growth. In conclusion, we presented refined models of the recent explosive growth of a European population. These models can inform studies of natural selection (21, 39–41), the architecture of complex diseases, and the methods that should best be used for genotype-phenotype mapping. We hope that these models and the public availability of the NR dataset will facilitate additional such studies. (Data available in dbSNP, with more detailed data at http://keinanlab.cb.bscb.cornell.edu.) However, models of recent demographic history are still limited to Europeans (17–19) and African Americans (19), and there is a need to extend them to additional populations. As the vast majority of rare variants are population specific (24, 42, 43), such studies of additional populations will also facilitate better consideration of the replicability of genome-wide association studies results across populations. Materials and Methods Selection of Individuals with Shared European Genetic Ancestry. PCA was run on 9,716 European Americans from the ARIC cohort (31), using EIGENSOFT (44) on whole-genome genotyping data from the Affymetrix 6.0 genotyping array (dbGaP accession phs000090.v1.p1). Outliers of inferred non-European ancestry were removed, in addition to regions of extended linkage disequilibrium such as inversions (SI Appendix). A total of 500 individuals that were densely clustered together based on the first four principal components (PCs) were then chosen for sequencing. We tested for plate effect, which showed no correlation with the localization of the individuals on these PCs. We validated the ancestry of the 500 individuals by merging with data from the Affymetrix 500k genotyping array of 10 individuals from each POPRES population (32) and repeating PCA (SI Appendix).

PNAS | January 14, 2014 | vol. 111 | no. 2 | 761

GENETICS

Fig. 4. Log-likelihood surface of model II using three different models of ancient history. (A–C) Log-likelihood as a function of two parameters— the time growth started and the final Ne—with the third parameter (Ne before the growth, in thousands) fixed on the maximum likelihood estimate. (D–F) Similarly, with Ne before the growth and final Ne presented and the time of growth fixed. The model was estimated by fitting all three parameters concurrently (Materials and Methods). A and D are for the model based on ancient demography from Keinan et al. (9) (Table 1), B and E are from Gravel et al. (14), and C and F are from Schaffner et al. (15) (SI Appendix, Table S5). Colored contours are at intervals of 2 log-likelihood. (Note the different scales between panels.) Red crosses denote the maximumlikelihood estimates.

Choice of Target Putatively Neutral Regions. To minimize the effect of selection, we considered genomic regions located at least 100,000 bp and 0.1 cM from any coding or potentially coding loci. We excluded genomic sequences containing segmental duplications and known copy number variants (SI Appendix), as well as regions under recent positive selection (45). Among contiguous genomic regions of at least 100 kb that satisfy these criteria, we then ranked targets for sequencing by their content of conserved and repetitive elements and removed CpG islands. The resulting NR dataset spans a total of 216,240 bp across 15 regions, each between 5,340 to 20,000 bp long (SI Appendix, Table 2). These and additional criteria for selecting regions are implemented in the Neutral Regions Explorer webserver at http://nre.cb.bscb.cornell.edu (46). Sequencing, Mapping, and Variant Calling. Illumina HiSEq. 2000 with 100-bp paired-end reads was used for sequencing. Reads were mapped to hg18 human reference genome using Burrows-Wheeler Aligner (BWA) (47) (SI Appendix). For each individual, aligned reads were subjected to duplicate removal using Picard v.1.66 (http://picard.sourceforge.net). Subsequent SNV calling, quality control filtering, and genotype calling were performed with the Genome Analysis ToolKit, GATK-1.5–31 (33, 34), as detailed in SI Appendix. Analyses are based on 493 individuals that were successfully sequenced.

growth or an additional, earlier epoch of growth. Models include different combinations of the following parameters: the time recent growth started, final Ne after recent growth, Ne before growth, start time of earlier growth, and Ne after earlier growth (Table 1). We tested whether a model provides a better fit than another using Vuong’s test (48). For each model, we estimated the SFS at different grid points using ms (49), with each grid point being a particular combination of parameter values. We then calculated the composite likelihood of the model following the approach of ref. 9, as the probability of the observed minor allele (the less common of the two alleles) counts of all SNVs while accounting for missing data (SI Appendix). We profiled the likelihood surface using for each parameter 7–16 predefined grid points that span a range of plausible values. We increased the number of grid points for each parameter 10-fold by fitting a smooth spline function for the proportion of SNVs of each allele count as a function of all parameters, which improved accuracy (SI Appendix, Fig. S10), with only a minor increase in computational burden. Two-sided 95% CIs for each parameter was estimated following a χ2(1) distribution that accounts for variation across SNVs (SI Appendix).

Demographic Inference. To obtain estimates of recent demographic history, we calculated the likelihood of the observed SFS conditioned on several demographic models. To reduce parameter space, we fixed ancient history as estimated by previous studies (9, 14, 15) and only estimated parameters of more recent history. These models have either a single epoch of recent

ACKNOWLEDGMENTS. We thank Srikanth Gottipati and Aaron Sams for helpful advice, Matt Rasmussen for code to parse Newick trees, and the editor and reviewers of this manuscript for helpful suggestions. This work was supported in part by National Institutes of Health Grants GM065509, HG003229, and HG005715. E.G. was supported in part by a Cornell Center for Comparative and Population Genomics fellowship. A.K. was also supported by The Ellison Medical Foundation, an Alfred P. Sloan Research Fellowship, and the Edward Mallinckrodt, Jr. Foundation.

1. Cohen JE (1996) How Many People Can the Earth Support? (W. W. Norton & Company, New York), 1st Ed. 2. Roberts L (2011) 9 billion? Science 333(6042):540–543. 3. Haub C (1995) How many people have ever lived on earth? Popul Today 23(2):4–5. 4. Keinan A, Clark AG (2012) Recent explosive human population growth has resulted in an excess of rare genetic variants. Science 336(6082):740–743. 5. Hartl D, Clark A (2007) Principles of Population Genetics (Sinauer, Sunderland, MA). 6. Erlich HA, Bergström TF, Stoneking M, Gyllensten U (1996) HLA sequence polymorphism and the origin of humans. Science 274(5292):1552b–1554b. 7. Garrigan D, Hammer MF (2006) Reconstructing human origins in the genomic era. Nat Rev Genet 7(9):669–680. 8. Harding RM, et al. (1997) Archaic African and Asian lineages in the genetic ancestry of modern humans. Am J Hum Genet 60(4):772–789. 9. Keinan A, Mullikin JC, Patterson N, Reich D (2007) Measurement of the human allele frequency spectrum demonstrates greater genetic drift in East Asians than in Europeans. Nat Genet 39(10):1251–1255. 10. Takahata N (1993) Allelic genealogy and human evolution. Mol Biol Evol 10(1):2–22. 11. Yu N, et al. (2001) Global patterns of human DNA sequence variation in a 10-kb region on chromosome 1. Mol Biol Evol 18(2):214–222. 12. Tenesa A, et al. (2007) Recent human effective population size estimated from linkage disequilibrium. Genome Res 17(4):520–526. 13. Gutenkunst RN, Hernandez RD, Williamson SH, Bustamante CD (2009) Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet 5(10):e1000695. 14. Gravel S, et al.; 1000 Genomes Project (2011) Demographic history and rare allele sharing among human populations. Proc Natl Acad Sci USA 108(29):11983–11988. 15. Schaffner SF, et al. (2005) Calibrating a coalescent simulation of human genome sequence variation. Genome Res 15(11):1576–1583. 16. Melé M, et al.; Genographic Consortium (2012) Recombination gives a new insight in the effective population size and the history of the old world human populations. Mol Biol Evol 29(1):25–30. 17. Nelson MR, et al. (2012) An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science 337(6090):100–104. 18. Coventry A, et al. (2010) Deep resequencing reveals excess rare recent variants consistent with explosive population growth. Nat Commun 1:131. 19. Tennessen JA, et al.; Broad GO; Seattle GO; NHLBI Exome Sequencing Project (2012) Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337(6090):64–69. 20. Kiezun A, et al. (2012) Exome sequencing and the genetic basis of complex traits. Nat Genet 44(6):623–630. 21. Gazave E, Chang D, Clark AG, Keinan A (2013) Population growth inflates the perindividual number of deleterious mutations and reduces their mean effect. Genetics 195(3):969–978. 22. Chamary JV, Hurst LD (2005) Evidence for selection on synonymous mutations affecting stability of mRNA secondary structure in mammals. Genome Biol 6(9):R75. 23. Chamary JV, Parmley JL, Hurst LD (2006) Hearing silence: Non-neutral evolution at synonymous sites in mammals. Nat Rev Genet 7(2):98–108. 24. Fu W, et al.; NHLBI Exome Sequencing Project (2013) Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature 493(7431):216–220. 25. Barreiro LB, Laval G, Quach H, Patin E, Quintana-Murci L (2008) Natural selection has driven population differentiation in modern humans. Nat Genet 40(3):340–345.

26. Waldman YY, Tuller T, Keinan A, Ruppin E (2011) Selection for translation efficiency on synonymous polymorphisms in recent human evolution. Genome Biol Evol 3:749–761. 27. Novembre J, et al. (2008) Genes mirror geography within Europe. Nature 456(7218): 98–101. 28. Humphreys K, et al. (2011) The genetic structure of the Swedish population. PLoS ONE 6(8):e22547. 29. Price AL, et al. (2009) The impact of divergence time on the nature of population structure: An example from Iceland. PLoS Genet 5(6):e1000505. 30. Di Gaetano C, et al. (2012) An overview of the genetic structure within the Italian population from genome-wide data. PLoS ONE 7(9):e43759. 31. The ARIC Investigators (1989) The Atherosclerosis Risk in Communities (ARIC) Study: Design and objectives. The ARIC investigators. Am J Epidemiol 129(4):687–702. 32. Nelson MR, et al. (2008) The Population Reference Sample, POPRES: A resource for population, disease, and pharmacological genetics research. Am J Hum Genet 83(3):347–358. 33. McKenna A, et al. (2010) The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20(9):1297–1303. 34. DePristo MA, et al. (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43(5):491–498. 35. Morrison AC, et al.; Cohorts for Heart and Aging Research in Genetic Epidemiology (CHARGE) Consortium (2013) Whole-genome sequence-based analysis of high-density lipoprotein cholesterol. Nat Genet 45(8):899–901. 36. Cooper GM, et al.; NISC Comparative Sequencing Program (2005) Distribution and intensity of constraint in mammalian genomic sequence. Genome Res 15(7):901–913. 37. Myers S, Fefferman C, Patterson N (2008) Can one learn history from the allelic spectrum? Theor Popul Biol 73(3):342–348. 38. Betzig L (2012) Means, variances, and ranges in reproductive success: Comparative evidence. Evol Hum Behav 33(4):309–317. 39. Boyko AR, et al. (2008) Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet 4(5):e1000083. 40. Yu F, et al. (2009) Detecting natural selection by empirical comparison to random regions of the genome. Hum Mol Genet 18(24):4853–4867. 41. Ayodo G, et al. (2007) Combining evidence of natural selection with association analysis increases power to detect malaria-resistance variants. Am J Hum Genet 81(2):234–242. 42. Altshuler DM, et al.; International HapMap 3 Consortium (2010) Integrating common and rare genetic variation in diverse human populations. Nature 467(7311):52–58. 43. Abecasis GR, et al.; 1000 Genomes Project Consortium (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491(7422):56–65. 44. Price AL, et al. (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38(8):904–909. 45. Akey JM (2009) Constructing genomic maps of positive selection in humans: Where do we go from here? Genome Res 19(5):711–722. 46. Arbiza L, Zhong E, Keinan A (2012) NRE: A tool for exploring neutral loci in the human genome. BMC Bioinformatics 13:301. 47. Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25(14):1754–1760. 48. Vuong QH (1989) Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica 57(2):307–333. 49. Hudson RR (2002) Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18(2):337–338.

762 | www.pnas.org/cgi/doi/10.1073/pnas.1310398110

Gazave et al.

Supplementary Information

Contents A. Sequencing target and acquisition ............................................................................................................ 2 i. Choice of Neutral Regions to target .................................................................................................. 2 ii. Illumina Library Construction ........................................................................................................... 4 iii. Sequence Capture Methods ............................................................................................................... 4 iv. Illumina Sequencing ............................................................................................................................. 5 v. Illumina Primary Analysis ................................................................................................................... 6 B. Variant and genotype calling ....................................................................................................................... 7 i. Variant Analysis ........................................................................................................................................ 7 ii. Variant filters............................................................................................................................................ 7 C. Calling Validation .......................................................................................................................................... 11 i. Calling validation by comparison to the site frequency spectrum (SFS) of the NHLBI Exome data.................................................................................................................................................. 11 ii. Calling validation by comparison with data from the CHARGE-S project ..................... 12 D. Data homogeneity and coverage ............................................................................................................. 20 i. Coverage effect on variant calling................................................................................................... 20 ii. Assessing population homogeneity with POPRES sample .................................................. 21 iii. The effect of homogeneity of the sampled population on the SFS .................................. 21 E. Demographic modeling and the SFS ...................................................................................................... 23 i. Downsampling data for figures of the SFS .................................................................................. 23 ii. SFS of simulated populations according to previously published demographic models ........................................................................................................................................................... 23 iii. Demographic inference .................................................................................................................... 24 iv. Evaluation of the power of the maximum likelihood approach ....................................... 26 v. Validation of improved accuracy of estimated SFS using smoothing spline................. 27 F. Difference between census population size and effective population size ............................. 29 G. ms command lines ........................................................................................................................................ 32 H. Conservation analysis of the NR data ................................................................................................... 33 I. Figures ................................................................................................................................................................ 38 J. Tables .................................................................................................................................................................. 48

1

A. Sequencing target and acquisition i. Choice of Neutral Regions to target Regions for sequencing were targeted to be as neutral as possible with two constraints: 1) the target regions should consist of 5 to 20 kilobases (kb) of consecutive bases, and 2) the target regions should be arranged by groups of two or three (duplets or triplets) in partial linkage disequilibrium (LD) which is motivated by the design in (1). To achieve this goal, we designed the procedure outlined in the following.

To minimize the confounding by natural selection, we considered only genomic regions located at least 100,000 bp and 0.1 cM away from coding or potentially coding sequences (defined as the union of Known Genes and Genes Bounds UCSC tracks), and free from segmental duplications (http://humanparalogy.gs.washington.edu/build36/build36.htm) and known copy number variants (http://projects.tcag.ca/variation/downloads/variation.hg18.v9.mar.2010.txt). Genomic sequences that may be under recent positive selection were also excluded, based on the consensus set assembled by Akey (2). From the contiguous stretches that fulfilled the above criteria, we only kept those larger than 100kb since these can satisfy the duplet or triplet design criteria. To further minimize the effects of selection, we considered regions with the lowest content of conserved elements (Mammal El, phastConsElement44wayPlacental UCSC track) and repetitive elements (rmskRM327 UCSC track), and with no CpG islands. All these features are now implemented as part of our Neutral Regions Explorer (NRE) (3).

2

Despite the best of our efforts, some of these loci may have unannotated function. Though our filtering procedures aims to minimize the confounder of natural selection on loci in these regions based upon several criteria, loci in these regions may still be under functional constraint. Hence loci in our regions are putatively neutral based only on the above criteria.

A total of 15 target regions, between 5,340 bp to 20,000 bp long, spanning a total of 216,240 bp were used for sequencing (Table S2). They are arranged in 3 duplets and 3 triplets. Measured from their most distant 3' and 5' points, the duplets or triplets cover a physical distance of 195 kb on average (range: 93 kb and 430 kb) and are separated by an average genetic distance of 0.275 cM (range: 0.20cM-0.36 cM).

Target regions were also chosen to cover a range of diverse LD structures within regions and between the 2 regions in duplets or 3 regions in triplets. Within each duplet or triplet, regions are on average 0.154cM apart. Recombination varies across the regions from very low (0.0005 cM/Mb) to moderately high (5.07 cM/Mb), with an average of 1.05cM/Mb, close to the genomewide average.

Mean physical and genetic distance between the neutral regions and the closest gene were 211 kb (range: 100 kb-577 kb) and 0.37 cM (range: 0.12 cM-1.37 cM), respectively. They contain on average 0.81% (0.27%-1.2%) of conserved elements (less than 1/5 of the genome average). Their GC content (39.34%) is close to the overall genomic level.

3

ii. Illumina Library Construction

Genomic DNA samples were constructed into Illumina paired-end pre-capture libraries according to the manufacturer’s protocol (Illumina Multiplexing_SamplePrep_Guide_1005361_D) with modifications as described in the BCMHGSC Illumina Barcoded Paired-End Capture Library Preparation protocol. Libraries were prepared using Beckman robotic workstations (Biomek NXp and FXp models). The complete protocol and oligonucleotide sequences are accessible from the HGSC website (https://hgsc.bcm.edu/sites/default/files/documents/Illumina_Barcoded_PairedEnd_Capture_Library_Preparation.pdf ).

Briefly, 1 µg of genomic DNA in 100µl volume was sheared into fragments of approximately 300-400 base pairs in a Covaris plate with E210 system (Covaris, Inc. Woburn, MA) followed by end-repair, A-tailing and ligation of the Illumina multiplexing PE adaptors. Pre-capture Ligation Mediated-PCR (LM-PCR) was performed for 7 cycles of amplification using the 2X SOLiD Library High Fidelity Amplification Mix (a custom product manufactured by Invitrogen). Universal primer IMUX-P1.0 and a pre-capture barcoded primer IBC were used in the PCR amplification. In total a set of 12 such barcoded primers were used on these samples. Purification was performed with Agencourt AMPure XP beads after enzymatic reactions. Following the final XP beads purification, quantification and size distribution of the pre-capture LM-PCR product was determined using the LabChip GX electrophoresis system (PerkinElmer).

iii. Sequence Capture Methods Twelve pre-capture libraries were pooled together (approximately 84 ng/sample, 1µg total library DNA per pool) and hybridized in solution to the custom probe design (1.1Mb capture 4

region, manufactured by NimbleGen) according to the NimbleGen SeqCap EZ Exome Library SR User’s Guide (Version 2.2) with minor revisions. Human COT1 DNA (100 µg) and 3’dideoxycytosin (ddC)-modified blocking oligonucleotides (5'AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATC T/3ddC, 5’-CAAGCAGAAGACGGCATACGAGAT/3ddC and 5’ACTGGAGTTCAGACGTGTGCTCTTCCGATCT/3ddC; 650 pmoles each) were added into the hybridization to block repetitive genomic sequences and the Illumina adaptor sequences. Postcapture LM-PCR amplification was performed using the 2X SOLiD Library High Fidelity Amplification Mix with 14 cycles of amplification. After the final AMPure XP bead purification, quantity and size of the capture library was analyzed using the Agilent Bioanalyzer 2100 DNA Chip 7500. The efficiency of the capture was evaluated by performing a qPCR-based quality check on the four standard NimbleGen internal controls. Successful enrichment of the capture libraries was estimated to range from a 6 to 9 of ΔCt value over the non-enriched samples. Aliquots of 10 nM concentration of enriched libraries were submitted for sequencing.

iv. Illumina Sequencing Library templates were prepared for sequencing using Illumina’s cBot cluster generation system with TruSeq PE Cluster Generation Kits (Part no. PE-401-3001). Briefly, these libraries were denatured with sodium hydroxide and diluted to 3-6 pM in hybridization buffer in order to achieve a load density of ~800K clusters/mm2. Each library pool was loaded in a single lane of a HiSeq flow cell, and each lane was spiked with 2% phiX control library for run quality control. The sample libraries then underwent bridge amplification to form clonal clusters, followed by 5

hybridization with the sequencing primer. Sequencing runs were performed in paired-end mode using the Illumina HiSeq 2000 platform. Using the TruSeq SBS Kits (Part no. FC-401-3001), sequencing-by-synthesis reactions were extended for 101 cycles from each end, with an additional 7 cycles for the index read. Real Time Analysis (RTA) software was used to process the image analysis and base calling. Sequencing runs generated approximately 250-400 million successful reads on each lane of a flow cell, averaging yield of 2.2 Gb per sample. With these sequencing yields, samples achieved an average of 95% of the targeted bases covered to a depth of 20X or greater.

v. Illumina Primary Analysis Illumina sequence analysis was done using the HGSC Illumina analysis pipeline, moving data step by step through various analysis tools from the initial sequence generation on the instrument to finished duplicate-marked BAMs. Firstly, the primary analysis software on the instrument produces .bcl files that are transferred off-instrument into the HGSC analysis infrastructure by the HiSeq Real-time Analysis module. Once the run is complete and all .bcl files are transferred, the pipeline runs the vendor’s primary analysis software (CASAVA v1.7), which demultiplexes pooled samples and generates sequence reads and base-call confidence values (qualities). The next step is mapping of reads to the hg18 Human reference genome using the Burrows-Wheeler aligner (BWA (4), http://bio-bwa.sourceforge.net/) to produce a BAM (binary alignment/map) file (5). We then mark duplicates (using Picard and SAMtools), and where necessary merge separate sequence-event BAMs into a single sample-level BAM. BAM sorting, duplicate read marking, and BAM format validation all occur at this step.

6

B. Variant and genotype calling

i. Variant Analysis In each individual, aligned reads were subjected to “duplicate removal” using PICARD-1.66 (http://picard.sourceforge.net). Downstream analyses were performed with the Genome Analysis ToolKit, GATK-1.5-31 (6, 7). Reads were locally realigned (GATK IndelRealigner). Variant detection and genotyping were performed using the UnifiedGenotyper (UG) tool from the GATK exclusively on the targeted regions. The UG tool generated an initial variant dataset for each sample (in variant call format, VCF) used as “raw” calls. The extreme depth of coverage of our regions allowed us to repeat this procedure three times, generating 3 VCF files based on a randomly chosen subset of reads for each sample, with a maximum of 250x depth per sample. An identical SNV filtering procedure was applied to the three replicates. Only the SNVs that passed the filters in all three replicates were kept for the genotype calling. These replicates can be seen as a partial validation.

ii. Variant filters A threshold on the minimum base quality and the minimum mapping quality of the reads (40) was set to generate the raw calls. The maximum number of alternate alleles to genotype was set to 2. The threshold for 6 quality control variables (HS, QD, FS, MQRS, RPRS, Hrun) was empirically determined by plotting the distribution of the aforementioned variables. Raw calls were filtered out if they showed a high Haplotype Score (HS > 50), low average quality (QD < 2), if they presented a strong strand bias (FS > 900), a strong correlation between mapping 7

quality and alternate allele (|MQRS| > 40), and a strong bias position along the read (|RPRS| > 40) or fell in a homopolymer run (HRun > 5). In addition, all variants falling within 10 bp of an indel were filtered out. Triallelic positions were also removed from the dataset.

In parallel, we ran the UG with the -A RMSMappingQuality and --output_mode EMIT_ALL_SITES options. This outputs the distribution of QUAL (Phred scaled probability that REF/ALT polymorphism exists at this site) and mrsMQ (mean root square of Mapping Quality of the reads across all samples) of the entire targeted region (independently on whether the sites are polymorphic or not). Threshold values for the variants were determined by plotting the empirical distributions of mrsMQ and QUAL values (Fig. B1). QUAL provides an overall quality score for the existence of an alternate allele in each genomic position. Variants with QUAL < 250 and mrsMQ < 30 were filtered out.

8

Figure B1. Distribution of the QUAL scores (on a log10 scale) for each variant before filtering. Black points represent sites where no call was made by the UnifiedGenotyper tool, while cyan points are genomic positions marked as polymorphic by the software. Polymorphic positions are expected to have higher QUAL values, while low QUAL values indicate a low probability of the existence of at least one copy of an alternate allele, as expected in non-polymorphic positions. The red line shows the cut-off value we used to filter out calls (cyan points) with low confidence.

Individual genotype filters Individual genotypes were marked as missing if they presented strong allelic imbalance (>20%), a depth of coverage < 20x or a low individual genotype quality (QC < 30), resulting in only a small amount of missing data. We established that 95% of SNVs have successful calls for at least 900 individual chromosomes out of the possible 986. 9

Variant calling quality control Several quality control steps were applied to the final set of variant calls. When compared to dbSNP135, 62.5% of the variants called in our regions were new. Ti/Tv ratio showed no indication of bias calling or sequencing error, neither for all SNVs (Ti/Tv=2.22) nor for novel SNVs alone (Ti/Tv = 2.29). No SNV presented significant departure from Hardy-Weinberg equilibrium.

10

C. Calling Validation

i. Calling validation by comparison to the site frequency spectrum (SFS) of the NHLBI Exome data The NHLBI exome sequencing project (ESP) data was obtained from the Exome Variant Server of the NHLBI Exome Sequencing Project (http://evs.gs.washington.edu/EVS/). Data was downloaded from the server in December, 2012. We first filtered out indels, variants with an average read depth of less than 20 and variants that were monomorphic in Europeans. Due to increased noise at lower number of variants, functional categories with less than 1000 SNVs were not analyzed. In order to compare the ESP data to the neutral regions (NR) data, we probabilistically downsampled both datasets to a sample size of 900 chromosomes.

NR data plotted against the ESP data from different functional categories reveals that the NR SFS is more similar to the SFS of less conserved functional categories (near genes, intergenic and intronic regions, Fig. S4a). In addition, the NR data has the lowest fraction of singletons when compared to categories that are more conserved (missense, UTR, splice sites, synonymous, synonymous near splice sites, stop-gain, Fig. S4b), reflecting that fewer variants in the NR dataset are under selection.

To test whether the SFS of the NR data and the SFS of a particular functional category of the ESP data were significantly different, we additionally calculated the chi-square test statistic as ∑

. Let the expected SFS be the SFS obtained from combining the ESP data and the

11

NR data under the null hypothesis that the two datasets have the same SFS. Then ei is a vector of size 2, equal to the product of the number of variants in either the NR data or the ESP data and the proportion of variants in minor allele count bin i of the estimated SFS. oi is also a vector of size 2, equal to the observed number of variants in each minor allele count category in either the NR data or the ESP data. To increase the consistency of the result, we binned categories with more than 10 minor alleles into one bin such that the chi-square test statistic has 10 degrees of freedom. We performed a chi-square test and calculated a p-value for each of the pairwise comparisons. Similar to what we showed on Figure S4, we find that categories with greater functional constraint deviate further from the NR data (Table S3).

ii. Calling validation by comparison with data from the CHARGE-S project In order to test for overall calling quality, false positive and false negative rates in the NR data, we compared our data to other sequencing data. An appropriate dataset for this comparison is the 962 individuals with whole-genome sequencing data in the CHARGE-S project (8).

In order to compare the SFS of both the CHARGE-S and the NR data, we subset the CHARGE-S data to an equal sample size as sampled in the NR data (n = 493 individuals). These individuals were chosen either to be samples that were also in the NR data, or individuals from a homogeneous cluster on a principal component analysis (PCA) of the CHARGE-S data. Out of the 493 individuals chosen, 395 are part of the ARIC cohort, of which 20 were also sequenced in the NR analysis. Thus, we had sequences that were obtained both by the CHARGE-S project and by the present study, for the same genomic regions and individuals. Using the list of genotype calls for the 20 overlapping individuals in each study, we could use the genotype consistency as 12

a measure of calling quality. In calculating the proportion of genotypes ascertained in the other dataset, we considered only positions where the derived allele is present in at least one of the 20 individuals and only heterozygous genotypes.

One important property of this validation procedure is the asymmetry of the two datasets. The NR dataset has an average median coverage of 295X, while the average median coverage in CHARGE-S for the corresponding regions in the subset of 493 individuals is 4.6X. This fundamental difference in coverage represents a limitation for the estimation of false positives in the NR dataset, as a call missed in CHARGE-S does not necessarily indicate a calling error in the high coverage NR data. It also represents to a lesser extent a limitation in estimating the false negative rate since the low-coverage data are more prone to sequencing errors.

An estimate of the proportion of calls missed due to low coverage was computed as part of the CHARGE-S project. The adjustment is based on 886 individuals that have both high coverage exome-sequence data and low coverage whole genome sequence data. An exponential decay function was used to fit the discovery rate of the first 20 bins, similar the one used in Gravel et al. (9). This adjustment represents a deviation, for each minor allele count (MAC), from a perfect calling with 100% power. We apply this adjustment to the CHARGE-S data to estimate false positive rate of the NR data.

Overall calling quality First, for each dataset of 493 individuals, we counted the number of copies of the minor allele at all the positions that were polymorphic. Eliminating 8 cases where our data have more than 80%

13

missing genotypes, the consistency in minor allele frequency (MAF) between both datasets is very high (Spearman rho=98.56) for the 1115 polymorphic positions called both by CHARGE-S and the NR (Fig. C1). This shows that the overall NR calling does not present a strong bias related to variant frequency.

Figure C1. Concordance of the minor allele frequency (MAF) of the NR and the CHARGE-S subset. Outliers, marked with a star, are SNVs for which more than 80% of the individuals have missing data in the NR data. Individual genotypes of the CHARGE-S were imputed and have therefore no missing data.

Second, the average median coverage of the positions that show perfect consistency between CHARGE-S and NR is 4.96X and 336.4X, for CHARGE-S and NR respectively, which is above each dataset’s average. Similarly, the average median coverage of positions with one or more genotype found in NR but not called in CHARGE-S is 4.76X and 317.5X, for CHARGE-S and NR respectively. In contrast, the average median coverage of positions with one or more genotype found in CHARGE-S but not called in NR is below average: 4.27X and 223X for CHARGE-S and NR respectively. This suggests that these positions are more likely to be 14

enriched for calling errors. However, the increase in calling error is probably very low for data of 223X coverage, suggesting that the genotypes with discrepancy are more likely to be CHARGES calling errors.

False positive rate in NR As a first step, we considered the proportion of concordant genotypes in the subset of 20 individuals sequenced independently by the CHARGE-S project and the present study (black in Fig. C2). We observe that a low proportion of genotypes in the NR validation subset are consistent with the CHARGE-S genotypes. For example, only 37% of singletons in the NR validation subset are also called by CHARGE-S. The genotype consistency between both datasets increases for increasing MAC. For MAC=8, this proportion reaches 85% and for MAC=41 the proportion is above 95%. Given the high coverage of the NR dataset and the low coverage in CHARGE-S, we expect a difference in the power to call variants, especially for low MAC categories. In particular, we expect a fraction of the NR variants to be absent in CHARGES without necessarily being NR false positives. To explore this idea, we estimated for each MAC the proportion of SNVs that the CHARGE-S data had the power (yellow in Fig. C2) to detect by fitting an exponential decay function to the first 20 MAC categories, similar to Gravel et al. (9). The discovery power can be interpreted as an expected false negative rate in CHARGE-S. We observed that the proportion of NR genotypes observed in CHARGE-S follows very closely the theoretical maximum power line in CHARGE-S. In other words, the proportion of NR calls not observed by CHARGE-S follows closely the expected proportion of false negatives in CHARGE-S. As a consequence, this suggests that the false positive rate in NR is not very high,

15

although the difference in coverage prevents us from providing a reasonable quantitative estimate.

Figure C2. Proportion of concordant genotypes in the subset of 20 individuals sequenced independently by the CHARGE-S project and the present study. The y-axis presents the proportion of NR genotypes that are concordant with CHARGE-S genotypes (black points). The x-axis shows the minor allele count (MAC) of the CHARGE-S SNVs. Data are shown for the first 50 categories of MAC. Note that MAC values go beyond 20 because although only 20 individuals were involved in the genotype comparison, we considered the minor allele count of the SNVs in the whole sample of 493 individuals. We removed part of the noise due to the low sample size involved in the comparison (20 individuals) by fitting a loess function to the data (line). Yellow points show the estimated discovery power in the low coverage CHARGE-S data. The discovery power provides an approximation to the false negative rate in CHARGE-S. The NR genotypes discordant with the CHARGE-S genotypes are more often found in low MAC categories of the CHARGE-S data, for which the discovery power is also lower. The overall agreement between the black and yellow points shows that the SNVs in NR not found in CHARGE-S can be almost entirely explained by the limited discovery power in the lower coverage CHARGE-S data.

False negative rate in NR

16

In another approach to identify potential false negatives in the NR dataset, we counted the genotypes called by CHARGE-S and not discovered by the NR in the 20 individuals sequenced by both studies. The proportion obtained varied between 0% and 50% depending on the MAC. The principal difference with the previous comparison is that the proportion does not increase with the MAF, but is equally distributed between rare and common variants of the NR data, ruling out a systematic bias related to the allele frequency (Fig. C3a). In contrast, the NR genotypes discordant with CHARGE-S are mainly rare variants in CHARGE-S (Fig. C3b).

Figure C3. Genotype consistency among the 20 individuals sequenced both by CHARGE-S and the present study. a) The y-axis presents the proportion of CHARGE-S genotypes that are concordant with NR genotypes. b) The y-axis presents the proportion of NR genotypes that are concordant with CHARGE-S genotypes. On both panels a) and b), the x-axis shows the minor allele count of the CHARGE-S SNVs in the whole sample. Panel b) data are the same as in Figure C2 and is shown for comparison purposes. We observe that the proportion of CHARGE-S genotypes discordant with NR genotypes are equally distributed among rare and common variants of the NR data (panel a), indicating no systematic bias in the NR genotypes, whereas the NR genotypes discordant with CHARGE-S are mainly rare variants in CHARGE-S (panel b).

17

In order to understand the source of the genotype discrepancies, we checked the NR calling before quality control and filtering. We observe that 77.3% (92/119) of CHARGE-S genotypes not observed in NR are either positions not called due to low base quality, low mapping quality or extremely low coverage (12 cases), or positions called in the raw NR that were subsequently filtered in the QC step (19 cases). This includes triallelic positions (9 cases), proximity to indels (14 cases), coverage < 20 X (38 cases), and positions with low quality control variables (Supplementary Note B). The remaining 22.7% (27/119) of inconsistencies are CHARGE-S genotypes absent in NR, though located in regions with good coverage and QC values. The BAM files for these 27 individuals were visually examined at the ambiguous positions with the Integrative Genome Viewer application (IGV version 2.0.23), enabling the filter to remove duplicated reads. The NR genotypes in these 27 cases do not present any sign of a polymorphism.

From this analysis we conclude that our strict filtering may have generated some false negatives, notably due to strict filtering of QC variables and filtering around indels. Filtering around indels is often justified and has become a common procedure (10-12). In addition, it only affects 1.9% of all the raw calls of polymorphic positions in the NR data and does not systematically affect variants of a particular frequency (the average MAC of SNVs around indels is 138, varying from MAC 1 to 350).

Altogether, comparing the genotype calls in the 20 individuals present in both datasets, only 119 heterozygous genotypes in CHARGE-S were not found in the 3929 heterozygous genotypes in the NR data, representing 3.03% of the false negatives. However, this value is an upper bound,

18

with the assumption that all CHARGE-S calls not found in NR were true calls. Due to the difference in coverage, and some visual validation, we can safely assume that the real false negative rate is probably much lower. Importantly, we showed that the CHARGE-S calls not found in NR are spread over all frequency categories, showing that our filtering process does not generate a strong bias in the SFS and therefore in the demographic estimates.

19

D. Data homogeneity and coverage i. Coverage effect on variant calling We empirically investigated the effect of coverage by running the main GATK pipeline 3 times, modifying only the maximum number of reads considered per position. This was achieved by changing the option -dcov to 111, 27 and 15. The two first values (111X and 27X) were chosen from the average median depth reported in Tennessen et al.(13) and Nelson et al. (12), respectively. The 15X maximum depth was chosen arbitrarily to represent lower coverage. The rest of the pipeline is identical to that applied to the NR dataset, with the exception of filtering of individuals with less than 20X, which was only applied to the 111X set.

The SFS derived from the pipeline run on a maximum of 111X does not show any perceptible difference from the NR SFS. In fact they only differ by one SNV (Fig. S3). The SFS from the 27X coverage subset presents a lower proportion of singletons (Fig. S3), though we expect that a relaxed set of filters would allow partial recovery of true singletons. For this reason, we interpret that the comparison between the NR dataset and the 27X subset does not show that 27X coverage represents insufficient coverage, but rather shows that high coverage allows application of very strict quality filters (leading to a reduced false positive rate), which have to be relaxed to call SNVs on data with lower coverage. Finally, we observe that the 15X coverage presents a substantial deviation from the high coverage dataset (Fig. S3). Interestingly, the change in the shape of the SFS from 27X to 15X is more accentuated than the change from 111X to 27X, suggesting that coverage depths below 20X are insufficient for high quality calling.

20

ii. Assessing population homogeneity with POPRES sample We additionally compared the sequenced individuals to individuals from POPRES (14). POPRES genotyping data was obtained from dbGaP. To ensure similar sample sizes across European populations, 10 individuals were chosen from each POPRES population (with the exception of Croatia and Greece, where only 7 and 8 individuals could be chosen). Following Novembre et al. (15), ancestry for each individual was assigned as the reported ancestry of the maternal and paternal grandparents (after removing individuals with mixed grandparental ancestry). Data for POPRES and ARIC individuals were merged, with 369,145 overlapping sites after removing sites with a joint missingness rate above 10%. Linkage disequilibrium based pruning was applied with a maximum r2 threshold of 0.5 in PLINK (16) in addition to removing sites with potential allele flips and regions of extended LD (15). We computed principal components on the remaining 150,497 autosomal variants using EIGENSOFT (17). The PCA of POPRES individuals are qualitatively similar to those previously published (15). We next computed PCA using samples from the POPRES cohort and ten randomly chosen samples from the NR project. We found that individuals from the NR had North-Western European ancestry, similar to that of individuals from the POPRES UK sample. Similar results are observed other random NR samples.

iii. The effect of homogeneity of the sampled population on the SFS In order to test for the importance of using a single homogeneous population to model recent demography, we simulated two populations and compared their SFS (Fig. S1). The first population followed the history of Model II described in Table 1. The other population consisted of 8 subpopulations that mimic European populations and split 400 generations ago: all 8 21

subpopulations share a common history and have the same initial population size (5633) at the time of split. After the split, all 8 subpopulations follow the same demographic history of Model II in Table 1. We sampled 900 individuals from both populations. For the population with 8 subpopulations, we performed two sampling strategies. In the first case, we sampled a balanced number of individuals in each subpopulation. In the second case, the number of samples taken from each subpopulation was 308, 124, 32, 104, 116, 32, 36 and 48, in proportion to the sample sizes from different European populations in Table S10 of (12). When sampling a balanced number of individuals in each subpopulation, we used 112 samples from 6 populations and 114 samples from 2 populations because we could not equally divide 900 samples into 8. All simulations were done using the software ms (18). The results show that although each subpopulation had the same demographic history as the homogeneous population, the SFS derived from a non-homogeneous sample shows a higher proportion of rare variants than the SFS from the homogeneous sample. This is true both in the case of balanced or unbalanced sampling of 900 individuals from 8 subpopulations. These results demonstrate that using nonhomogeneous samples can bias demographic estimates.

22

E. Demographic modeling and the SFS i. Downsampling data for figures of the SFS In order to compare SNVs with differing numbers of successful genotype calls, it was necessary to estimate the allele counts given a lower sample size. Employing the strategy implemented in (19) for a folded site frequency spectrum (SFS), we estimated allele counts for a sample size of 900 chromosomes (as this constituted 95% of our variants). All variants with less than 900 successful genotype counts were removed. Thus, for each SNV, the probability it is of a certain allele count is given by the following formula.

i n  i     d n  m  d  P[ min (i  d, n  i  (n  m  d))] = n    n  m 

where i is the original allele count, m is the number of chromosomes to downsample to, n is the 

number of successful genotype calls for this SNV and d = 0,…,min(i, n-m). When presenting the SFS, variants are binned into the following minor allele count categories: (1, 2, 3, 4, 5, 6-10, 1120, 21-50, 51-100, 101-200, 201-450). The proportion of variants is then weighted by the number of minor allele count categories in each bin.

ii. SFS of simulated populations according to previously published demographic models We compared the SFS of the NR data to other SFS of previously published models (11, 12). For each of these published scenarios, we simulated 10,000 independent regions for 900 23

chromosomes (to match the sample size of the NR data) with the coalescent simulator ms (18). We observe that the fraction of singletons in the NR data is below the estimates given by the two models (Fig. S6). The models over-predict the number of singletons (55.65% and 54.23%, respectively, compared to 38.4% in the SFS of the NR data) and extremely rare variants, a pattern that could be expected if less purifying selection acted on the loci sequenced in the NR data. Though these previous studies modeled recent demography based on the SFS derived solely from synonymous variants in order to minimize the confounding effect of selection, the data of these are nonetheless derived from sequencing of genic regions (11, 12), which may partially explain the differences in estimates with the NR data.

iii. Demographic inference In all demographic inference models, we fixed ancient demographic history following previously described scenarios (9, 20, 21), and focused on the parameters associated with recent human population growth. When using the ancient demography model of (9), we assumed an instantaneous recovery from the second bottleneck instead of a gradual recovery. We estimated parameters of interest using the maximum likelihood approach of Keinan et al. (20). For each of the models considered, we profiled the likelihood surface as a function of the multiple parameters using predefined grid points, which corresponds to a particular combination of multiple parameters. Grid points were carefully selected to cover the range of most likely parameter values with better resolution through an initial set of broader grid points and later zooming in (Table S6). For each grid point, we calculated the expected SFS of the model using simulated genealogies from ms (18). For each grid point, 200,000 ms simulations were conducted and the length of branches that lead to a different allele count was calculated for each. 24

Branch lengths were then summed up across the 200,000 genealogies and standardized to obtain the expected SFS.

Given the expected SFS, we calculated the likelihood of the observed SFS as follows. Let s denote the number of SNVs and n denote the number of chromosome samples. Also let xi and mi denote the number of minor alleles and the total number of successful genotypes at SNV i (xi ≤ mi/2 ≤ n/2). Then, a composite likelihood for the data is given by L 



s i 1

 P ( x i of

m i )  P ( m i  x i of m i ) 

where P(a of b) is the probability of observing a alleles

out of b, conditioned on demography. In order to account for the sampling of the mi successful genotypes out of the possible n, we derived L by the law of total probability to be

s

L 

 i 1

   j    n  j    n  m i  x i        x i   m i  x i    P ( j of n )   n   j  xi     m i   

n xi

 j  m i xi

  n  j    j          x i   m i  x i    P ( j of n ) n       m i   



To explore models of recent history, we started with a simple recent growth model with two parameters capturing the time the growth started and final Ne (‘Model I’). Since we assume an exponential growth with constant rate, the growth rate is no longer a free parameter and can be derived for any given time of growth and final Ne. We also studied two three-parameter models: similar to Model I, with the ancestral Ne before growth as an additional free parameter (‘Model II’), and a model with two epochs of growth with the start time of the first growth fixed at 400 generations ago (‘Model III’). We additionally explored a four-parameter model by allowing the time of the first growth in Model III to be another free parameter (‘Model IV’).

25

To improve resolution, we applied a smooth spline approach to estimate the frequency of SNVs belonging to a MAC category as a function of each parameter (see below). In all models where the smooth spline approach was used, the 95% CI for a given parameter is defined as all values of the parameter for which the log-likelihood is comprised between the maximum value minus 1.92 (half of the 95% percentile of the

distribution) and the maximum, while fixing all other

parameters at the maximum likelihood estimates. CI for the 4-parameter models, where the smooth spline approach was not applied for computational efficiency, were obtained from the original grid points.

iv. Evaluation of the power of the maximum likelihood approach One epoch of growth To evaluate the power of our maximum-likelihood approach, we used the coalescent simulator ms (18) to simulate a population with an ancient history as described in Keinan et al. (20), and with a single epoch of growth starting 400 generations ago. During this time, the population grows from Ne = 10,000 to Ne = 1,100,000, representing a growth rate of 1.18% per generation. We chose a value of 1,100,000 for the final Ne as described in (22) because it is between the other two recent estimates of European Ne: 512,000 7(11) and 4,000,000 (12). For this scenario, we simulated 986 chromosomes with 500,000 SNVs. We randomly subsampled 1800 SNVs (comparable to the number of SNVs in the NR dataset), and applied the likelihood method to estimate three parameters: initial Ne before the growth, time of the onset of growth, and growth rate. We repeat the above procedure and define power as the proportion of 10,000 replicates of resampling for which the three parameter estimates are a minimum of one grid point away from 26

the true value. The result shows a reasonable power of ~80% to capture the three parameters for the scenario we tested.

Two epochs of growth As above, we simulated a population with ancient demography as described Keinan et al. (20) with a 2-epoch growth model. We varied the rate of the first epoch of growth while fixing all other parameters, including the length of the two epochs of growth (280 and 120 generations respectively), the rate of the second growth (3%), Ne before growth (5,633), and ancient demographic parameters (20). For each parameter combination, we simulated a sample of 986 chromosomes with 500,000 SNVs, from which we randomly subsampled 1800 SNVs and conducted a likelihood ratio test between a reduced Model II (2 parameters with fixed ancestral population size of 5633) and Model III (three parameters). We repeated this procedure and defined power as the proportion of 1000 replicates for which the likelihood ratio test p-value is less than 0.05. Power was estimated for each parameter combination.

As shown in Fig. S9, we have reasonably good power (over 60%) for detecting the first epoch of growth when the rate of growth falls between 0.6~1.8%.

v. Validation of improved accuracy of estimated SFS using smoothing spline

A smoothing spline (smooth.spline in R (23)) was used to estimate the frequency of SNVs belonging to a specific MAC category as a function of one of the parameters each time. To validate the performance of the smoothing spline, we conducted a simulation study using one 27

specific demographic history with one epoch of growth at 100 generations ago and 24 grid parameter values for final Ne varying from 2 to 200 million. For each of the two-parameter combinations, we performed 10,000 ms simulations to get the estimated SFS and an additional 1.2 million ms simulations to get another estimate of the SFS as the "true underlying" SFS. Although the estimate from 1.2 million simulations may not be equal to the true underlying SFS, this was only used as a standard to validate the performance of the smoothing spline by comparing the estimate from 10,000 simulations, the estimate after applying the smoothing spline, and the estimate from 1.2 million simulations.

The squared distance from the 1.2 million simulations SFS was consistently smaller for smoothing spline estimates than the original ones (Fig. S10), indicating the superior performance of the smoothing spline approach. Moreover, by using smoothing spline, we can obtain the estimated SFS for any grid point from a fitted line, which will increase the accuracy of our approach, in addition to saving computing time. In order to further reduce noise and increase accuracy, we binned SNVs with MAC greater than 50 into one category. Smoothing spline and likelihood calculations were applied to the data after binning (22).

28

F. Difference between census population size and effective population size We show that a model with two epochs of growth (Model III) does not provide a better fit to our data than a model with a single epoch of growth (Model II). Besides power considerations, this result suggests that although the census population size (Nc) started to increase with the invention of agriculture 400 generations ago (24, 25), the effective population size (Ne) only began to grow recently (140 generations ago, according to Model II). A high variance in reproductive success is one known factor that explains Nc > Ne (26). Here, we explore how large the variance in reproductive success should be in order to explain alone a constant Ne during the Neolithic Europe.

According to Crow and Kimura (27), for diploid species with two sexes under the assumption of random mating, the effective population size of the tth generation is a function of the census population size of tth generation Nc(t), the average number of offspring per individual of (t - 1)th generation ̅( - 1) and the variance of the number of offspring per individual Vk(t - 1). ̅=2 corresponds to the population maintaining a constant census population size and Vk = 2 for a population of constant population size with a Poisson distribution of offspring number. Thus, we have: e(

)

c(

̅(

1) 1

) V ( 1) ̅( 1)

which further leads to the following equation, e(

1) = e( )

c(

̅ 1) ( 1) 1 ̅( ) 1 c( )

29

V ( 1) ̅( 1) V () ̅( )

If the growth rate of the Nc is (α - 1), i.e., Nc(t + 1) = αNc(t), then the average number of offspring per individual ̅ = α is a constant, and the growth rate of the effective population size is (β - 1), i.e., Ne(t + 1) = βNe(t). Solving for the variance of the number of offspring per individual of tth generation at time growth begins, Vk(t) gives: V

α

= ( β) V 0

α

α( α - 1) [(β) - 1] (Eq.1)

Archeological data can provide estimates for α during the time frame we attempted to detect the first growth epoch (260 generations, from 400 to 140 generations ago). Around 1000 BC, Northern Europe population size estimates vary between 100,000 for the British Islands to roughly 200,000 for a territory overlapping part of the current Poland and Germany (28). Growth in census population size may have been as high as ~1% per generation at that time according to archeological data (24, 29, 30), although other evidence suggests it could be around 0.4% (9). This leads to estimates of α ranging from 1.004 to 1.01 and Nc, 140 (the census population size 140 generations ago) ranging from 2.82x to 13.29x of Nc, 400 (the census population size 400 generations ago). As an illustration, if Nc, 400 is 5633 (as in Model II), population growth results in an Nc, 140 between 15,904 and 74,868 for an α of 1.004 and 1.01, respectively. If we assume that the reproductive success 400 generations ago was Poisson distributed, i.e., Vk, 400 = 2 (the variance in reproductive success 400 generations ago), and there was actually no effective population growth (β = 1), the above scenarios result in a Vk, 140 (the variance in reproductive success 140 generations ago) between 9.34 to 51.91 for α of 1.004 and 1.01, respectively. These values are in the reasonable range for historical civilizations (31). For example, historical data on intensive agriculturalist civilizations report as many as hundreds of children for a very few elite males. Estimates based on contemporary pre-industrialized societies show that the variance in male reproductive success in pastoralists and horticulturalist cultures varies between 8 and 86 30

(31). Thus, the increase in Vk may explain why the increase in Nc was not accompanied with an increase in Ne.

In addition to the case that the increase in Vk leads to a constant Ne, it is also possible that the increase in Vk causes the growth rate of Nc to be accompanied with a milder growth rate of effective population size, which might be in the range for which we have limited statistical power. The following table shows Vk under different combinations of growth rates of Nc (in the range of 0.6% to 1%) and growth rates of Ne (in the range of 0.3% to 0.6%, corresponding to statistical power of 25% and 60% respectively). β

α 1.003 1.004 1.005 1.006

1.006 6.74 4.74 3.19 .00

1.007 9.34 6.74 4.74 3.19

1.008 1 .70 9.33 6.74 4.74

1.009 17.06 1 .70 9.33 6.74

1.01 .71 17.05 1 .69 9.33

Most of the variances above are within reasonable ranges of estimates for human populations (31). Even if the census population size has grown by 1% per generation over this period of 260 generations, Vk of 9.3 would lead Ne to only grow by 0.6% per generation, which we only have a statistical power of 60% to detect.

31

G. ms command lines Commands for running ms for the four models (I through IV, respectively) from Table 1 in the main text, using best-fit parameters, are reported below: Model I: ./ms 986 25000 -T -G 0.11531883E+07 -eG 0.54230771E-05 0 -eN 0.2980769E-04 0.1056635E03 -eN 0.3461538E-04 0.1923077E-02 -eN 0.2221154E-03 0.3642163E-04 -eN 0.2269231E-03 0.1923077E-02 Model II: ./ms 986 25000 -T -G 0.88335800E+05 -eG 0.53822631E-04 0 -eN 0.2370031E-03 0.8401376E-03 -eN 0.2752294E-03 0.1529052E-01 -eN 0.1766055E-02 0.2895910E-03 -eN 0.1804281E-02 0.1529052E-01 Model III: ./ms 986 25000 -T -G 0.10703884E+06 -eG 0.45445210E-04 -0.53861189E+03 -eG 0.13698630E-03 0 -eN 0.2123288E-03 0.7526712E-03 -eN 0.2465753E-03 0.1369863E-01 -eN 0.1582192E-02 0.2594418E-03 -eN 0.1616438E-02 0.1369863E-01 Model IV: ./ms 986 25000 -T -G 0.65788146E+05 -eG 0.70000000E-04 -0.11920425E+04 -eG 0.17000000E-03 0 -eN 0.3100000E-03 0.1098900E-02 -eN 0.3600000E-03 0.2000000E-01 -eN 0.2310000E-02 0.3787850E-03 -eN 0.2360000E-02 0.2000000E-01

32

H. Conservation analysis of the NR data We chose the loci with the aim of minimizing the confounding of selection in our data. To investigate the level of selection, beyond comparisons to other types of loci (Figure S4; Table S3), we utilized GERP (32) to assess the amount of selection or constraint loci in our regions were under. GERP measures the difference between the expected substitution rate under neutrality and the observed rate, where positive scores are given to constrained elements or variants suspected of having undergone purifying selection (32). We lifted-over our data from hg18 to hg19 (33) and used the GERP conservation track on the UCSC genome browser (33) to obtain the score for each of our 1834 SNVs that passed quality control filtering. For comparison, we obtained GERP estimates for synonymous SNVs from the Exome Sequencing Project (ESP) data (34). Comparing the distribution of variants from the NR data to those from the synonymous category within the ESP data, we found that variants from the NR data are more closely distributed around zero (Fig. S5). We additionally filtered synonymous SNVs in the ESP data that lay in conserved elements (Mammal El, phastConsElement44wayPlacental UCSC track) and still observed a wider distribution as compared to variants from the NR data. These results support that the putatively neutral loci in our regions are less confounded by selection than the synonymous SNPs used in some of the previous studies that modeled recent growth in population size. This is reasonable given the criteria we filtered upon when choosing regions to sequence (Supplementary Note A).

We next partitioned SNVs to those with a GERP score between -4 and 0, and those with a positive score. The SFS of both sets are very similar to the SFS of the original data, though the 33

set with positive GERP scores deviates slightly in the direction that the ESP synonymous data deviate from our data (Fig. H1).This is in line with this set being more affected by selection, but still to a much smaller extent than synonymous SNPs are. We additionally re-estimated demographic parameters from the best-fit model II (Table 1 in main text) using each of the two subsets of the NR data. We found that the 95% confidence interval for both subsets include the original estimates on the full data, though estimates from the GERP score > 0 subset were somewhat closer to demographic estimates from previous studies utilizing synonymous variants (13) (Table H1).

Figure H1. SFS of the variants from the NR data separated into variants with a GERP score between -4 and 0 (not including 0), and those with a positive GERP scores. For comparison, the SFS of synonymous variants from the Exome Sequencing Project after filtering SNVs in conserved elements is also presented. The SFS of variants with a positive GERP score deviates slightly from the SFS of the full NR data in the direction of the synonymous variants, consistent with more purifying selection on these sites, but not as extensive as purifying selection on synonymous sites.

34

Ne after earlier growth Duration of growth (generations) Ne after growth (millions)

GERP_Score>0 6519.0 (3949.4, 9000.0) 156.7 (108.8, 216.6) 0.65 (0.2, 112.0)

GERP_Score