Text S1. 1. Text S1: Deriving the Multiallelic Hot-or-Not Model. The greatest difficulty in applying the Wright-Dirichlet distribution to data analysis is computing the ...
Text S1: Deriving the Multiallelic Hot-or-Not Model The greatest difficulty in applying the Wright-Dirichlet distribution to data analysis is computing the normalization constant. Various methods are available, including importance sampling and Taylor series expansion [1]. In a multiallelic PIMS model of directional selection with no dominance, the Wright-Dirichlet distribution simplifies to K
p ( f ) ∝ ∏ eγ i fi fiθi −1 ,
(A1)
i =1
where γi is the population-scaled selection coefficient for allele i. The normalizing constant for this distribution is related to the characteristic function for a Dirichlet distribution with parameter θ. In addition to the methods mentioned above, it can be computed numerically using contour integration in the complex plane [2, 3] K
∫ ∏e
Φ K i =1
γ i fi
fi
θi −1
∏ df =
K i =1
Γ (θ i )
2π I
K
t ∫ e ∏ (t − γ i ) L
−θi
dt ,
(A2)
i =1
where I represents the imaginary unit and ΦK is a (K – 1)-simplex. The contour L is a loop beginning and ending at −∞ , and encircling in the positive direction (anticlockwise when the x and y axes correspond to the real and imaginary lines respectively) all the finite singularities of the integrand (i.e. when t = γi). We have Mathematica code that performs this integration (available on request) but found it to be numerically unstable. Under the hot-or-not model, however, we can use Equation A2 to show that
eγ F ∏ i =1 fiθi −1 K
p(f ) =
Text S1
Β( θ ) 1 F1 ( Θ H ,Θ, γ )
,
(A3)
1
where γ is the population-scaled selective difference between fitness classes, F is the total frequency of alleles in the favored (hot) fitness class, ΘH is the total mutation rate to alleles of that class and Θ is the total mutation rate for all alleles. Β( θ ) is the beta function with vector argument, so that Β( θ ) = ∏ i =1 Γ (θ i ) Γ K
(∑ θ ) , K
(A4)
i =1 i
and 1 F1 ( a,b, c ) is the confluent hypergeometric function. Let H be the set of favored alleles, and N the set of disfavored alleles. The numbers of alleles in the two classes are KH and KN. Define h = { fi / F : i ∈H } , g = { fi / (1 − F ) : i ∈N } , θ( H ) = {θ i : i ∈H } and θ( N ) = {θ i : i ∈N } . By making the
change of variables ( f ) → ( F, g, h ) , Equation A3 can be factorized so that
⎡ eγ F F Θ H −1 (1 − F )Θ N −1 ⎤ p ( F, g, h ) = ⎢ ⎥ ⎢⎣ Β( Θ H ,Θ N ) 1 F1 ( Θ H ,Θ, γ ) ⎥⎦ ⎡ 1 ×⎢ ⎢ Β θ( N ) ⎣
KN
( )∏ i =1
(N )
θi i
g
−1
⎤⎡ 1 ⎥⎢ ⎥ ⎢ Β θ( H ) ⎦⎣
KH
( )∏ i =1
(H )
θi i
h
−1
⎤ ⎥, ⎥ ⎦
(A5)
where Θ N = Θ − ΘH . This demonstrates that F follows a biallelic Wright-Dirichlet distribution with parameters (ΘH, ΘN, γ), and g and h follow independent Dirichlet distributions with parameters θ( N ) and θ( H ) respectively. The Dirichlet distribution arises from the neutral case, and this factorization suggests that in a hot-or-not model, evolution within class H or N can be characterized as neutral. This in turn supports our assumption (see main text) that, in the low-mutation limit, the probability of fixation of
Text S1
2
allele A equals the fixation probability for the whole class multiplied by the neutral fixation probability for allele A within its class. To condition the stationary distribution on identity of the ancestral allele, suppose allele A is ancestral. In a recurrent selection model, the ancestral allele is disfavored. Thus from Equations A5 and the form of Equation 11 where A is disfavored,
1− e ( ))F (1 − F ) g ∏ ( p ( F, g, h | A ) = ( ) )ϕ (1 − ϕ ) dϕ ∫ ζ ∏ ∫ (1 − e − γ 1− F
Θ N −1
Θ H −1
A
1
− ϕ 1− F
Θ H −1
Θ N −1
KN
A
(1 − e =
)F
i =1
ΦK N
0
KN i =1
(N )
(N )
giθi
(N )
ζ iθi
−1
∏
−1
dζ
Θ H −1
(1 − F )Θ
N
−1
KN
KH
(H )
( ) ( )
(H )
hiθi
∫∏
ΦK H
gA ∏ i =1 giθi −1 ∏ i =1 hiθi θ Β( Θ H ,Θ N ) ⎡⎣1 − 1 F1 ( Θ N ,Θ, −γ ) ⎤⎦ A Β θ( N ) Β θ( H ) ΘN − γ (1− F )
KH i =1
−1
KH i =1
(H )
ηiθi
−1
dη
−1
.
(A6) Reversing the change-of-variables,
(
)
− γ (1− F ) f θi −1 ∏ fA Θ N 1 − e i =1 i p ( f | A) = . 1 − F θ A Β( θ ) ⎡⎣1 − 1 F1 ( Θ N ,Θ, −γ ) ⎤⎦ K
(A7)
Assuming that alleles are sampled at random with replacement from the population, we can utilize the multinomial distribution to obtain the conditional likelihood for a sample of size n comprising xi copies of allele i, given that the ancestral allele is A. p (x | A) =
⎛ n ⎞ K xi fi p ( f | A ) d f ∫ ⎜⎝ x⎟⎠ ∏ i =1 ΦK
⎛ n ⎞ ( x + θ A ) Θ N Β( x + θ ) ⎡⎣1 − 1 F1 ( X N + Θ N , n + Θ, −γ ) ⎤⎦ =⎜ ⎟ A , ⎝ x ⎠ θ A ( X N + Θ N ) Β( θ ) ⎡⎣1 − 1 F1 ( Θ N ,Θ, −γ ) ⎤⎦
Text S1
(A8)
3
K ⎛ n⎞ 1 where XN is the number of alleles sampled in class N and ⎜ ⎟ = n!∏ . Note that ⎝ x⎠ i =1 xi !
Equations A7 and A8 are equivalent to Equations 12 and 13 in the main text, using the identities FA = 1 − F , X A = X N , and Θ A = Θ N .
References 1. Donnelly P, Nordborg M, Joyce P (2001) Likelihoods and simulation methods for a class of nonneutral population genetics models. Genetics 159: 853-867. 2. Erdéyli A (1939) Integration of certain systems of linear partial differential equations of hypergeometric type. Proc Roy Soc Edin A 59: 224-241. 3. Phillips PCB (1988) The characteristic function of the Dirichlet and multivariate F distributions. Cowles Foundation Discussion Paper No. 865.
Text S1
4