Microbial Ecology

6 downloads 184 Views 510KB Size Report
Dec 3, 2007 - Received: 22 June 2006 / Accepted: 10 July 2006 / Online publication: 13 December ... model in which random immigrations, births, and deaths.
Microbial Ecology Modeling Taxa-Abundance Distributions in Microbial Communities using Environmental Sequence Data William T. Sloan1, Stephen Woodcock1, Mary Lunn2, Ian M. Head3 and Thomas P. Curtis3 (1) Department of Civil Engineering, University of Glasgow, Oakfield Avenue, Glasgow G12 8LT, UK (2) Department of Statistics, University of Oxford, 1 South Parks Road, Oxford OX1 3TG, UK (3) School of Civil Engineering and Geosciences, University of Newcastle upon Tyne, Newcastle NE1 7RU, UK Received: 22 June 2006 / Accepted: 10 July 2006 / Online publication: 13 December 2006

Abstract

We show that inferring the taxa-abundance distribution of a microbial community from small environmental samples alone is difficult. The difficulty stems from the disparity in scale between the number of genetic sequences that can be characterized and the number of individuals in communities that microbial ecologists aspire to describe. One solution is to calibrate and validate a mathematical model of microbial community assembly using the small samples and use the model to extrapolate to the taxaabundance distribution for the population that is deemed to constitute a community. We demonstrate this approach by using a simple neutral community assembly model in which random immigrations, births, and deaths determine the relative abundance of taxa in a community. In doing so, we further develop a neutral theory to produce a taxa-abundance distribution for large communities that are typical of microbial communities. In addition, we highlight that the sampling uncertainties conspire to make the immigration rate calibrated on the basis of small samples very much higher than the true immigration rate. This scale dependence of model parameters is not unique to neutral theories; it is a generic problem in ecology that is particularly acute in microbial ecology. We argue that to overcome this, so that microbial ecologists can characterize large microbial communities from small samples, mathematical models that encapsulate sampling effects are required.

Introduction

Characterizing large microbial communities from sparse genomic data requires some mathematical model of a Correspondence to: William T. Sloan; E-mail: [email protected] DOI: 10.1007/s00248-006-9141-x

pattern in community structure. Identifying such models remains one of the greatest challenges in microbial ecology. One approach, perhaps the ideal one, is to target particular communities and conduct very intensive surveys that will yield sufficient data for patterns in community structure to become readily apparent or for empirical models to be teased out of the data in statistical analyses. However, this approach is speculative and, because we currently do not know what constitutes Bsufficient data[, such surveys are difficult to rationally plan or cost [6]. The alternative is to postulate a mathematical model in advance of a survey and tailor the sampling accordingly. This approach is also speculative in that the onus is on the microbial ecologist to lay down a priori theoretical conjectures, often backed by little more than intuition, which can be translated into mathematical models. The latter approach has yielded success. The observation of power–law taxa–area relationships in bacteria [2, 12] and microbial eukaryotes [10] is a major breakthrough in microbial ecology, which did not arise by chance. Researchers reasoned that one of the well-known ecological relationships observed with macroorganisms may also apply to microorganisms and tailored an experimental program to test this hypothesis. These studies, like all others in microbial ecology, are made difficult because even using the most up-to-date genomic approaches, we are limited to analyzing a small fraction of the genes in very small environmental samples [28]. The disparity between sample and community size is enormous and far exceeds that for macroorganisms. Take, for example, a 10-g sample of soil; this can comprise as many as 1010 individual microorganisms (approximately the human population of the world), clone libraries generated from soil samples typically represent a random sample of tens to a couple of hundred individuals. Intuitively, such a small sample has the potential to distort our view of the large

& Volume 53, 443–455 (2007) & * Springer Science + Business Media, Inc. 2006

443

444

population. Although microbial ecologists are well aware of this disparity of scale, it does not routinely affect the way genomics data are interpreted. Taxa-abundance distributions, for example, are used to characterize microbial community structure, but what they actually characterize is the distribution of taxa abundances in a very small sample. The disparity of scale between samples and the communities they aim to represent, means that the sample and community distributions can be very different indeed. To demonstrate this, we have derived the sample distribution for 200 individuals (equivalent to clones in a 16S rRNA gene library) selected at random from large populations (1012 individuals) in which taxa abundances are distributed in four different ways (Fig. 1) two of which have been previously proposed as plausible theoretical distributions [5, 9] (Fig. 1a,b) and two of which have no biological basis and could be considered ridiculous (Fig. 1c,d). All the sample distributions have a very similar shape that is redolent of the distribution of taxa abundances in real16S rRNA gene clone libraries. Thus, for example, the fact that clone abundance distributions look like the tail end of a lognormal distribution does not mean that the taxa in the larger community are distributed lognormally (although they might be). Descriptors such as diversity indices, taxaabundance distributions, and similarity indices have their roots in ecology of macroorganisms, which are easier to observe, and rely on a fairly complete census of the organisms at a particular site. Figure 1 demonstrates that for microbial communities these descriptors may differ significantly between sample and community. Molecular methods are rapidly evolving and will offer partial solutions. Thus, when very high throughput sequencing becomes routinely available to microbial ecologist, a complete census of a sample may become possible. However, improved molecular methods will only take us so far; the number of individuals in samples, even when we can identify all of them, will still be very small in comparison to those in the microbial communities as a whole. It will always be necessary to infer larger-scale descriptors of community structure from very small samples, which requires consideration of sampling effects. Nonetheless, the fact that patterns exist in the common taxa suggests generic patterns that might extend deeper into the community. Deriving Patterns from a Model of Community Assembly

How then does one sensibly postulate patterns and mathematical models to describe microbial communities that apply to the whole community, not just the abundant taxa? The rationale that is proposed here recognizes that all patterns in community structure derive from the processes of community assembly. Thus,

W.T. SLOAN

ET AL.:

NEUTRAL ASSEMBLY

OF

PROKARYOTE COMMUNITIES

it is the balance between the evolutionary and ecological processes of speciation, environmental selection, dispersal (or immigration), and local competition that shape the community structure. Therefore, if one can quantify these processes by using small samples and common taxa, and then by assuming that the processes act upon the whole community, it may be possible to extrapolate patterns such as taxa–area relationships, and simple models of them (e.g., power–law relationships), in whole-community structure. By embarking on such a strategy, microbial ecologists will enter the debate and controversies that are being played out in theoretical ecology. Verifying that a particular model holds true may require extensive surveys and adaptations of current genomic and metagenomic methods. However, with a quantitative hypothesis to test, there is little doubt that the ingenuity of molecular microbial ecologists will prevail and we can begin to test candidate models and iterate toward a predictive theoretical microbial ecology. The rationale, therefore, is straightforward. Implementing it, however, requires the derivation of a processbased mathematical model of microbial community assembly, which is less straightforward. Our observations on microbial communities are so sparse that we cannot yet aspire to test some of the more subtle hypotheses on the balance of ecological forces that shape communities of macroorganisms. A model that encapsulates every process that is known to affect the structure of complex, diverse, and densely inhabited microbial communities would be rendered useless, in all but the most abstract of analyses, by our inability to parameterize it. Therefore, pragmatism dictates that a model should employ prudent simplification with a view to explaining some of the community structure. Here we demonstrate our rationale by using a very simple conceptual model based on a few fundamental truths that occur in any open biological community: organisms multiply, die, immigrate, and emigrate, and some taxa become extinct whereas other, new ones, invade. MacArthur and Wilson [17] prudently pared down their model of community assembly in the theory of Island Biogeography to encapsulate these processes alone. Their intuition that such a simple model could explain the variance in diversity in insular communities of macroorganisms has been borne out by its central and enduring position in macroecological theory. Importantly, because, in the balance between immigration and extinction, their model proffers a mechanism for maintaining, enhancing, or depleting diversity, it has been used in conservation ecology as a predictive tool for designing nature reserves [11]. In so many applications of microbial ecology, the ultimate goal is to manipulate microbial communities to our advantage. This requires knowledge of how patterns in communities form, not just an observation of their existence. Thus, a reasonable conjecture is that a model based on the same processes as

W.T. SLOAN

ET AL.:

NEUTRAL ASSEMBLY

OF

445

PROKARYOTE COMMUNITIES

a) 40

Lognormal Distribution

400

N = 1012 T N /N =5 T

Number of Taxa

Density of Taxa

500

max

Diversity = 4290

300 200 100 0 0

10

20 30 Log2(Abundance)

30 20 10 0 0

40

5

10 Abundance

15

20

b) θ = 50 Diversity = 105

60

Number of Taxa

Density of Taxa

50

12

NT = 10

Logseries

40 20 0 0

10

20 30 Log (Abundance)

40 30 20 10 0 0

40

5

2

10 Abundance

15

20

c) 200 equally abundant taxa

80

N = 1012 T

Number of Taxa

Number of Taxa

200 150 100 50 0 20

25 30 35 Log (Abundance) Discrete Scale

60 40 20 0 0

40

5

10 Abundance

15

20

5

10 Abundance

15

20

2

d) Diversity = 1000

Number of Taxa

Density of Taxa

80 12

NT = 10

Bimodal Distribution

1000

500

0 0

10

20 30 Log (Abundance)

40

60 40 20 0 0

2

the theory of Island Biogeography might explain patterns currently observed in microbial communities, and predict patterns that might be observed in the future. However, the theory of Island Biogeography makes predictions on the total diversity of a community and therefore cannot be parameterized by using data from small microbial samples typically used to characterize microbial communities and patterns in the abundance of common taxa. A route to applying the principles of the

Figure 1. Distribution of taxa abun12 dances in communities of 10 individuals and in small samples of 200 individuals from them for: (a) a lognormally distributed community (NT/Nmax is the ratio of the total number of individuals to the number of individuals belonging to the most abundant taxon, which can be used to index richness [5]); (b) a logseries distributed community (q is one of the parameters of the lognormal that can be used as an index to species richness [14]; (c) a community where 200 taxa are equally abundant; (d) a bimodal distribution.

theory of Island Biogeography to microbial communities was offered by the recent neutral community models (NCMs) of Hubbell [14] and Bell [1], which extend the theory to make predictions on the relative abundance of taxa in the community, not just the diversity. For these to be applied to very large communities and parameterized using data collected by molecular microbial ecologists required two adaptations that we have presented in [23]. First, the mathematics of the original NCMs is

446

W.T. SLOAN

discrete, which means every birth, death, and immigration event in the assembly of a community is represented. This becomes impractical in very large populations and therefore a continuous mathematical model was developed. Second, the published methods for calibrating the

a)

1

Frequency

0.8 0.6 0.4 0.2 0 0

0.1

0.2

0.3

0.4

0.5

Mean Relative Abundance (pi)

b)

1

Frequency

0.8 0.6 0.4 0.2 0 0

0.1

0.2

0.3

0.4

0.5

0.6

Mean Relative Abundance (pi)

c)

1

Frequency

0.8 0.6 0.4

ET AL.:

NEUTRAL ASSEMBLY

OF

PROKARYOTE COMMUNITIES

model rely on an almost complete description of the taxaabundance distribution for a community, which do not exist for microbial communities in any natural environment. Therefore, a method was developed for calibrating the model using the small-sample taxa-abundance distributions that are typically collated by using molecular approaches to microbial community analysis. This method, which relies on multiple samples, was applied to successfully calibrate the model using published data for functional genes, conserved 16S rRNA sequences, and groups of organisms in wastewater treatment plants, estuaries, lakes, and the human lung (Fig. 2). The evidence presented by Sloan et al. [23] is sufficient to suggest that the continuous NCM is more that just conjecture and that it could potentially inform us of larger-scale patterns in microbial community structure. We do not suggest that NCMs are the only candidate models for community assembly. We merely propose that the evidence warrants their promotion to a theory that deserves further investigation. However, to do this requires a fuller description of the local community structure than has previously been published. In the work of Sloan et al. [23], the model is defined for a single taxon embedded in a neutral community. To infer the whole community structure requires this to be extended to a tractable mathematical description of the taxa-abundance distribution for all taxa within a functional group. The main result of this article is a generic description for the expected taxaabundance distribution for a microbial community undergoing neutral dynamics. An analysis of the sensitivity of taxa-abundance distributions to changes in immigration and local population size serves to reinforce the importance of these variables and chance in shaping community structure. In particular, high immigration rates promote diversity, whereas low immigration rates deplete diversity and promote the dominance of com-

0.2 0 0

0.1

0.2

0.3

0.4

Mean proportion of signal detected

d)

1

Frequency

0.8 0.6 0.4 0.2 0 0

0.1

0.2

0.3

Mean Relative abundance (pi)

0.4

Figure 2. Comparing the theoretical and observed relationship between the mean relative abundance of a taxa, pi, and the frequency with which it appears in a fixed population size. Each of the points represents a different taxa. (a) Clone libraries of different the ammonia monooxygenase (AMO) genes at 13 different domestic sewage works [26], m = 0.1, (b) Clone libraries of different ammonia oxidizing bacteria 16S RNA genes at six sites from the Humber Estuary [16], m = 0.7, (c) 16S RNA sequences for 16 different bacterial taxa that are considered to be particular to freshwater environments sampled from 96 different lakes [29]. Before the analysis, we removed data that represented three cyanobacterial lineages, leaving only data from putative heterotrophs, and expressed proportional abundance as a fraction of the overall noncyanobacterial abundance. The lowest relative abundance detected in a single analysis, 1/480, was used to define the detection limit of the technique, NTm = 1.36. (d) Clones from the lungs of 24 patients with and without asthma (Wardlaw and Barer, personal communication) NTm = 14.6.

ET AL.:

NEUTRAL ASSEMBLY

OF

447

PROKARYOTE COMMUNITIES

mon taxa. The analysis also highlights the scale dependence of model parameters, and that further advances are required in the description of how local communities aggregate to form the source community before large-scale patterns in community structure can be inferred.

a) 1 0.8 Frequency

W.T. SLOAN

PrðNi þ 1=Ni Þ     Ni Ni : ð1Þ mpi þ ð1  mÞ ¼ 1 NT NT  1 Similar expressions can be derived for a decease and no change in Ni and these form the basis of Hubbell’s discrete NCM (Appendix) and our continuous version of it [23] for large microbial populations. The adjective

0.4 0.2

Taxa-Abundance Distribution for Neutrally Assembled Microbial Communities

0 0

0.1

0.2

0.3

0.4

0.5

Mean Relative Abundance (p i )

b) 1 0.8 Frequency

The conceptual model that underpins the NCM is very similar to that of the theory of Island Biogeography: there is some larger community (mainland) that acts as a source of immigrants to a local community (island) and it is the balance between immigrations and local extinctions that determines community structure. With the NCM, the local community is assumed to be saturated with NT individuals. For the assemblage of organisms in the local community to change, an individual must die or leave the system, which occurs at random with a rate d. It is then immediately replaced by an individual either by reproduction from within the community or an immigrant selected at random from a source community comprising n different taxa with abundances fpi gni¼1 . The probability, m, that a vacancy in the local community is filled by an immigrant is a taxon-independent constant. There is an assumption that all vacancies are filled by reproduction or immigration and, therefore, the model does not currently accommodate a population of previously dormant pores suddenly becoming active. Using only fpi gni¼1 , m and NT simple expressions can be derived to track the likely change in abundance of any taxon in the local community through time. For example, consider the probability that the abundances of the ith taxon, Ni, in the community increases by one individual in a time period 1/d. This first requires a death/removal of an individual belonging to some other taxon, which occurs with probability 1  NNTi . Then, either an individual from the ith taxon in the source community has to migrate into the vacant space, which has probability mpi, or there is no immigration but instead a local reproduction of the ith taxon in the local community, i which occurs with probability ð1  m Þ NTN1 . Thus the probability that the abundance of the ith taxon increases by one individual is simply given by,

0.6

0.6 0.4 0.2 0 0

0.1

0.2

0.3

0.4

0.5

0.6

Mean Relative Abundance (p i )

Figure 3. Relationship between the mean relative abundance (pi) and the detection frequency: observed in clone libraries; — given by the neutral model; – – assuming that the abundance of the ith taxon in  a community of size NT is distributed binomially

Í

Pr ðK ¼ k Þ ¼ NT pik ð1  pi ÞNT k . (a) For AMO genes at 13 different k domestic sewage works [26] with immigration probability in the neutral model, m = 0.1 (b) AOB 16S RNA genes at six sites from the Humber Estuary [16], with immigration probability in the neutral model, m = 0.7.

Bneutral[ derives from the fact that the probability of a i , is solely dependent on the relative reproduction, N NT 1 abundance of taxa. Thus density-dependent growth is represented in the model but the specific growth rates of all taxa are equivalent. This assumption will be violated over short-time periods and is controversial [7]. However, it has been demonstrated, by allowing taxa to have differentiated specific growth rates, that provided there is a constant stream of immigrants the model is robust to modest departure from the purely neutral assumption [23]. To describe the whole-community taxa-abundance distribution for a local community requires a substantial extension of our previous mathematical (not conceptual) description of the NCM. In Sloan et al.’s study [23] the model describes how the marginal probability distributions of the abundance of taxa change through time. To infer the whole community structure requires a description of how the joint probability distribution, ðx1 ; x2 ;:::; xn1 Þ, for the abundance of taxa, changes

448

W.T. SLOAN

through time, which we show in Mathematical Appendix is governed by,

 n  @ X @ ðMxi Þ 1 @ 2 ðVxi Þ ¼  þ @t @xi 2 @xi2 i¼1   n X 2 @ Cxi xj  1X þ 2 i¼1 j6¼i @xi @xj

ð2Þ

where xi is the relative abundance of the ith taxon and Mxi ; Vxi and Cxi xj are simple functions of our model parameters fpi gni¼1 , m and NT. If the community is in Blong-term[ dynamic equilibrium, then the joint probability distribution will not change through time and this stationary distribution is described by the solution @ of Eq. (1) with @ @t ¼ 0 and boundary conditions @xi ¼ 0, where xi = 0 or xi = 1. It is shown in Mathematical Appendix that this is a Dirichlet distribution,

x1 ;:::; xn  DirðNT mp1 ;:::; NT mpn Þ

ð3Þ

The taxa-abundance distribution can easily be derived from this by using a simple algorithm for simulating realizations of the abundance for each taxon (Mathematical Appendix). Other authors have made significant advances on Hubbell’s original discrete NCM [13, 20, 24, 25]. Our formulation is simple and we have used well-established techniques to convert Hubbell’s discrete model to a continuous diffusion equation. This has the advantage of making available literature from population genetics and stochastic modeling to those who wish to predict the dynamics of microbial communities. Previously published descriptions of the taxa-abundance distribution for NCMs (e.g., [20, 25]) have been ingenious, but idiosyncratic, which makes them difficult to work with and their general properties obscure. We have shown that the joint probability distribution is, in fact, described by the wellknown Dirichlet distribution for which there is a wealth of statistical literature [4]. One of the greatest advantages of using our formulation in the context of microbial communities is that it does not presuppose any particular distribution for the abundance of taxa in the source community; it is defined for an arbitrary distribution fpi gni¼1 . Previous analytic descriptions of the taxa-abundance distribution predicted by an NCM have assumed that the distribution of taxa abundances in the source community is logseries—characterized by a single parameter, q, which Hubbell calls the Bfundamental biodiversity number.[ q is an index of diversity; the larger the q value for a functional group, the more diverse it is. However, this logseries assumption is based on Hubbell’s model for the source community, which assumes neutral dynamics where biodiversity is main-

ET AL.:

NEUTRAL ASSEMBLY

OF

PROKARYOTE COMMUNITIES

tained at equilibrium through speciation rather than immigration. New species appear in the population like rare point mutations; they may spread and become more abundant, or more often, die out quickly. The alternative drivers of speciation that are known to exist for microorganisms, such as lateral gene transfer, must call into question this conceptual picture of the source community. Deriving a mathematical description of the source community by aggregating local communities in some way that can be tested using genomic data remains as a significant and exciting challenge. Therefore, the flexibility of being able to condition the predicted local community taxa-abundance distribution on any assumed source distribution is a distinct advantage of our formulation. What Can We Infer about Prokaryote Community Structure from Small Samples?

How then does our derivation of the taxa-abundance distribution help to infer patterns in microbial communities? This is perhaps best explained by considering two of the example data sets displayed in Fig. 2 in more detail: the clone libraries of ammonia monooxygenase AMO genes [21, 26] from 13 different sewage works in Germany; and the ammonia-oxidizing bacteria 16S rRNA gene data from six samples at three different sites in the Humber estuary in England [16]. On average, 13 clones were sampled from each of the sewage work samples and exactly 20 were sampled for the estuary samples. As argued previously, this is a small sample from which to draw conclusions on the community structure at any one site. However, using the technique reported by Sloan et al. [23], it is possible to calibrate the NCM based on the distribution of taxa abundance across the 13 sewage works or six estuary samples for the common taxa. If the average relative abundance of a taxon is pi then, if neutral, its relative abundance in any one  of the samples, xi, is beta distributed; xi  Beta xi : NT mp i ; NT m ð1  pi Þ . Here, we are assuming that the small random sample of clones constitutes the community; therefore, NT is the average number of clones and m is the immigration probability as previously defined. Knowing the theoretical probability density distribution for the ith taxon allows us to calculate the probability that the taxon exists at an abundance greater than the detection limit of whatever molecular method is being used. It is simply Z1 Prðxi > dÞ ¼ Betaðxi : NT mpi ; NT mð1  pi ÞÞdxi ð4Þ d

where d is the detection limit. For clone libraries, the detection limit in a sample is one clone and so d ¼ N1T .

W.T. SLOAN

ET AL.:

NEUTRAL ASSEMBLY

OF

449

PROKARYOTE COMMUNITIES

NT is known and pi is the average relative abundance across all the sewage works. Therefore, Eq. (4) gives the theoretical relationship between pi and the probability (or frequency) of detecting the ith taxon in any sample as a function of one unknown taxa-independent parameter; m is the immigration probability. m can be simply calibrated by adjusting it to minimize the difference between this theoretical probability of detection and the observed relative frequency with which the common taxa are observed (Fig. 2). For the sewage works samples, the calibrated value of m is 0.1, and for the estuary samples it is 0.77; this is the probability that when an ammonia-oxidizing bacterium is lost from the system it is replaced from outside.

a)

It would appear from the above analysis that immigration of AOB into German sewage works and into samples from the Humber estuary is high. Thus, perhaps dispersal limitation is not a major driver in shaping community structure in these communities. So, let us ignore immigration and assume that environments are all the same and exactly the same structuring forces act on the communities, and thus the distribution of taxa abundances is the same in each community. In this case, as noted above, the stochastic effects of random sampling will mean that taxa are absent from some clone libraries and present in others purely by chance. For example, if the relative abundance of an organism is 0.5, then we are very likely to see it in all clone libraries drawn from the

c)

0

0

10

Log (Relative Abundance)

−5

10

−10

10

−5

10

−10

10

10

Log10(Relative Abundance)

10

−15

10

−15

10

−20

10

−20

10

0

10

1

2

10

3

10

0

10

10

1

3

10

4

10

5

10

10

d)

0

0

10

Log (Relative Abundance)

10

−2

10

−4

10

m=1

−6

10

−7

m =10 −8

0

10

1

10

Log10 (Taxon’s Rank)

−4

10

m=1 −6

10

m = 10−7 −9

m = 1.5x10

−8

−9

m =1.5x10

10

−2

10

10

Log10(Relative Abundance)

10

Log (Taxon’s Rank)

10

b)

2

10

Log (Taxon’s Rank)

10 2

10

0

10

1

10

2

10

3

10

4

10

5

10

Log10(Taxon’s Rank)

Figure 4. (a) Ranked relative abundance distribution for a logseries distribution with Hubbell’s [14] biodiversity number q = 2.0, 20 which gives approximately 100 taxa in a source population of 10 individuals. q was calibrated by using the AMO gene clone libraries from 9 the sewage works. (b) The ranked abundance distribution in a local neutrally assembled community of 10 individuals that has been fed with immigrants from the source community in (a). (c) The ranked relative abundance distribution for a lognormal distribution in a 20 5 ð2Þffi ¼ source population of 10 individuals. The parameters of the lognormal were calibrated to be  ¼ plnffiffiffiffiffi 0:085 and diversity = 5  10 . 9 22 (d) Ranked abundance distribution in a local neutrally assembled community of 10 individuals that has been fed with immigrants from the source community in (c).

450

W.T. SLOAN

communities, whereas an organism with relative abundance 0.001 will rarely appear in small clone libraries. This then begs the question, how much of an effect does immigration have on the community structure over and above that which is attributable to random sampling from identical communities? If the clone libraries were unbiased random samples from identical communities then, pi, the mean relative abundance, is the probability that an organism picked at random from the sample belongs to the ith taxon. Thus if K is the number of clones that belong to the ith taxon in a random sample of size NT clones, it will be distributed binomially,   NT k pi ð1  pi ÞNT k ð5Þ PrðK ¼ kÞ ¼ k and, therefore, the probability of observing the taxon in the sample is

Prðk  1Þ ¼ 1  ð1  pi ÞNT

ð6Þ

This has been plotted for the sewage works and estuary data in Fig. 3. Clearly, immigration has an effect over and above random-sampling from identical communities for the sewage work and estuary samples. So far, it has been assumed that each random sample of only a few clones can be treated as if it were an independent little local community that has been assembled neutrally. However, in the processes of constructing the clone library, any spatial structure that existed in the distribution of taxa will have been obliterated. So, perhaps, it is more realistic to consider all the individual organisms in the original environmental sample as constituting the local neutrally assembled community and that the clone library represents a random sample from that. The word Bperhaps[ is used advisedly here because we do not currently know on what scale it is best to represent and characterize microbial communities. In many other scientific disciplines, characteristic length scales have been identified at which smaller scale variations in the properties of a system begin to average out and effective variables can be employed. For example, in modeling the geomechanics or hydrogeology of geological formation, the rock will be broken into representative elementary volumes within each of which a single effective variable can describe strength or porosity, despite there being myriad different crystals contained within the volume. For microbial communities, the small-scale spatial and temporal variability in community structure may be large and therefore determining whether such characteristic length scales exist is important, because it is at such scales that simple models, such as the neutral model presented here, will be most effective in teasing out the ecological mechanisms that drive community assembly. Nonethe-

ET AL.:

NEUTRAL ASSEMBLY

OF

PROKARYOTE COMMUNITIES

less, we assume here that all the microoganisms in the original environmental sample comprise the local neutrally assembled community and that a clone library drawn from that is a random, unbiased sample. To distinguish between these two, let NS be the number of clones in the library and retain NT to represent the number of individual in the community. We have been able to derive expressions for the first and second moments of the probability density function (pdf ) for the abundance of taxa in the clone library (Appendix), but could not derive a neat precise analytic expression for this distribution. However, by repeatedly randomly sampling from synthetic large neutrally assembled communities, we found, unsurprisingly, that the distribution was also approximately beta distributed. Under an assumption that the distribution is exactly beta xi  Betaðxi : NS m b pi ; NS m b ð1  pi ÞÞ then by matching first and second moments (Appendix), m b is related to the true immigration probability of immigration into the community, m, into the community by, b¼ m

NT m : NT m þ NS þ 1

ð7Þ

m b could be considered as an effective immigration rate into the small sample that encapsulates both the dispersal limitation imposed on the community as a whole and random sampling effects. Equation (7) allows us to extrapolate from our small random samples to the immigration in the larger neutral community. In the case of the sewage works, where the effective immigration probability is 0.1, the immigration probability for a neutral community of 109 organisms would be 1.55  _9 10 . For the estuary, where the effective immigration is 0.77, the immigration probability for a neutral community of 109 organisms is an_ order of magnitude higher, but still low at 7.0  10 8. This would indicate that immigration for both environments is low if a representative element of the microbial landscape comprises 109 organisms. On inspecting [Eq. (7)], it is apparent that for the effective immigration into a sample to vary significantly from 1, then the product NTm must be of the order NS. This means that when the sample size, NS, is small and NT is large, the immigration probability has to be very small indeed for the effects of dispersal limitation to be apparent in the sample. Conversely, for large microbial communities, it will be impossible to distinguish between high immigration rates and the immigration probability being one. This does not mean that immigration will have no affect on the taxaabundance distribution of the community. It just means that the affects are difficult to see in small samples unless they are pronounced. Now we are in a position to use the main innovation presented in this article, the mathematical description of

W.T. SLOAN

ET AL.:

NEUTRAL ASSEMBLY

OF

PROKARYOTE COMMUNITIES

the taxa-abundance distribution for large communities, to investigate the overall community structure. Again, retaining the community of 109 organisms as a representative element in the community, we can demonstrate the sensitivity of the community structure to changes in immigration. However, to do this requires an assumption on the nature of the source community based on very little evidence. We have used two distributions to describe the source community: first, a logseries distribution, which derives from Hubbell’s NCM for macroorganisms; second, a lognormal distribution, for which some theoretical and empirical justification has been outlined by Curtis et al. [5]. These were fitted, using least squares, to the average clone abundance data {pi} for the sewage works. The fact that two such different distributions can be fitted equally well to the data highlights our uncertainty in the underlying source community model. Figure 4 a and c show the source community abundance distributions for a large source community of 1020 individuals. Figure 4b and d show what the local neutral community taxa-abundance distributions would be if the immigration probability was 1. That is, every local replacement was an immigrant. As expected in this case, the distribution closely reflects the source community distribution. When the immigration probability is the _ calibrated value of 1.55  10 9, the expected abundance of the common taxa increases and the diversity in samples (indicated by the taxon’s rank in Fig. 4) drops from 50 to 10 taxa in the local community fed with immigrants from a logseries distributed source community and from 104 to 10 in the local community fed with immigrants from the lognormally distributed source. There are more rare taxa associated with a lognormal distributed source community and therefore the percentage decrease in biodiversity is much greater. The subjective result that low immigration will result in a local depletion of biodiversity will hold provided there are rare taxa in the source community no matter how the taxa are distributed. Now, suppose that that the designer of the sewage works analyzed in this study wanted to engineer functional redundancy in the ammonia-oxidizing bacterial community into their system. This might be possible by artificially increasing the immigration rate of AOB into the local community in the treatment reactor. Figure 4 also shows that increasing the immigration probability by a factor of 100 can effect a large change in biodiversity.

Discussion

This work demonstrates how mathematical modeling is an indispensable guide to the rational exploration of the microbial world. The huge discrepancy between sample size and the size of microbial communities leaves us no option. This is amply demonstrated by the simple,

451

sampling exercise outlined at the start of the article, which clearly demonstrates the dangers of naively extrapolating from small samples. This is important, because a proper understanding of the nature of taxon abundance curves is central to the longstanding conundrum of the extent of prokaryote diversity [6] and the curves may be (rightly or wrongly) interpreted as reflecting underlying ecological processes [18]. The model we have deployed is simple and can be calibrated. We emphasize the importance of these two attributes. A model that cannot be calibrated cannot be used to predict, and prediction is highly desirable in theoretical microbial ecology. This is because we do not know many of the basic patterns in the communities we are dealing with. Thus we need to extrapolate from the data and patterns we can observe, to make predictions about community structure. These can then be tested by using appropriately targeted experimental programs. The low number of parameters deployed in our model arises from its conceptual simplicity; it only considers the size of the community, births, deaths, and immigration. It might be argued that the model is too simple to offer any guidance. However, the model does appear to be consistent with patterns observed in microbial communities [23], and the theory has been successfully applied to higher organisms [14]. This does not preclude the possibility of further refinements, or the necessity of rigorous testing. However, it does suggest that it constitutes a sound foundation for the rational exploration of the microbial world. Our steady-state solution to the multitaxon neutral model affords the opportunity of predicting the whole community taxon abundance distribution on the basis of observations made on the common taxa. In addition, it allows us to predict how changes in immigration will affect local community structure and diversity. The ability to extrapolate from patterns in common taxa, which have been identified using sparse molecular data, to patterns in the rarer taxa is essential in addressing the role of diversity and community structure on microbial ecosystem services provided by microorganisms. Given the foregoing, it is unsurprising that sampling should be considered when calibrating this, or any other model of community assembly. The relationship between the immigration probability, m b , calibrated in samples and the actual immigration, m, into a local neutrally assembled community [Eq. (7)] is important. First, it demonstrates that what appears to be high immigration probabilities in samples can translate to very low probabilities of immigration into communities. Second, by using the very small samples that are typical of many microbial ecology surveys, the calibration method [22] can only quantify immigration for systems that are highly dispersal limited, where the immigration probability is very low. Conversely, much larger samples will

452

be required to distinguish moderately dispersal limited communities from those that are randomly assembled. Finally, it highlights our uncertainty about the scale at which we should be characterizing microbial communities. Our ability to calibrate immigration in samples suggests that an NCM at least partly explains community structure. However, to extrapolate to an immigration probability for the community by using Eq. (7) requires a knowledge of the size, NT, of the neutral community. In the sewage works example, is NT the population of the whole sewage works, in which case immigration would be very low indeed, or are some smaller units, such as flocs, assembling neutrally? Although we do not know the NT values for the AOB in sewage works, we can be confident that there are of the order of 106 to 108 [3] in a milliliter. It follows, therefore, that the true m values will be very small even in small samples. This may have important implications for the debate on the biogeography of bacteria [8]. It is, however, undoubtedly true that this controversial field would benefit from the rigor that appropriately parameterized mathematical models can bring to a debate. By necessity, we have focused on characterization of microbial communities based on a single genetic locus (amoA or 16S rRNA genes); however, the arguments relating to the extent and structuring forces of microbial diversity are equally relevant in the context of environmental genomic studies. It is now well accepted that rRNA genes are a conservative marker of microbial diversity, and diversity at the level of the whole genome is likely to be somewhat greater than suggested by 16S rRNA sequence-based analyses. This makes it all the more pressing that we develop rigorous mathematical approaches to provide a foundation upon which the growing resource of environmental genomic data can be interpreted.

Appendix: Mathematical Appendix Kolmogorov Backward Equation for the neutral community The basis of the model is Hubbell’s NCM in model.

which the community is saturated with a total of NT individuals; and for an assemblage to change, an individual must die or leave the system. This occurs at a taxa independent rate d. The dead individual is immediately replaced by an immigrant from a source community, with probability m, or by reproduction of a member of the local community with probability 1_ m. Thus, the community forms and develops through a continuous cycle of immigration, reproduction, and death. Assuming that deaths are uniformly distributed in time, then during a period of time 1/d one death is expected and the ith species, with initial absolute

W.T. SLOAN

ET AL.:

NEUTRAL ASSEMBLY

OF

PROKARYOTE COMMUNITIES

abundance Ni, will either increase by 1, stay the same, or decrease by 1, with probability given by the following three expressions, respectively:

PrðNi þ 1=Ni Þ     NT  Ni Ni mpi þ ð1  mÞ ¼ NT NT  1

PrðNi =Ni Þ ¼

ð8Þ

   Ni Ni  1 mpi þ ð1  mÞ NT  1 NT     NT  Ni NT  Ni  1 mð1  pi Þ þ ð1  mÞ þ NT  1 NT

ð9Þ PrðNi  1=Ni Þ    Ni NT  Ni mð1  pi Þ þ ð1  mÞ ¼ NT NT  1

ð10Þ

where pi is the relative abundance of the ith species in the source community. Hubbell used these transition probabilities for relatively small populations to form a finite Markov–Chain model with which the community dynamics can be investigated and the stationary probability distribution for Ni can be calculated. The computational expense [19] of this discrete MarkovChain formulation makes it impossible to apply to the very large diverse populations that typify the microbial world [27]. Here, we employ Kimura and Ohta’s [15] methods to recast the model for large populations. Let, xi ¼ NNTi be the relative abundance of the ith species, and assume that NT, the local community size, is large enough that xi can be considered continuous. Also, let ðxi ; x2 ;:::; xn ; t Þ be the joint pdf that the relative abundances of species 1,..., n at time t are x1,..., xn, respectively. The continuous model comes from considering the expected change in  that will occur in a small time interval dt. To do this, we define g ðxi ; x1 ;:::; xn ; xn ; t; tÞ to be the pdf for the relative abundance of species1 changing from x1 to x1 +  x1, and the relative abundance of species 2 changes from x2 to x2 +  x2,..., and the abundance of species n changes from xn to xn +  xn during the time period between t and t +  t. Then,

ðxi ;:::; xn ; t þ t Þ ¼

R

ðx1  x1 ;:::; xn  xn ; t Þ g ðx1  x1 ; x1 ;:::; xn  xn ; xn ; t; t Þ dðx1 Þ:::dðxn Þ

W.T. SLOAN

ET AL.:

NEUTRAL ASSEMBLY

OF

453

PROKARYOTE COMMUNITIES

Expanding this as an n-dimensional Taylor series about the point x1,..., xn and neglecting terms of order 3 and above gives 3

n P @ g  x ð g Þ i @xi 7 6 i¼1 7

Z 6 n 7 6 P ðxi Þ2 @ 2 7 6 ð g Þ þ 2 ðxi ;:::; xn ; t þ t Þ ¼ 6 2 7dðx1 Þ::: dðxn Þ @xi i¼1 7 6

n P 7 6 P 2 4 xi xj @x@i @xj ðgÞ 5 þ 12 2

i¼1 j6¼i

ð11Þ where g denotes  (x1, x2,...,xn, t)g(x1,  x1,...,xn,  xn; t, R  t). Because g d ðxi Þ ¼ 1;

ðx1 ; x2 ;:::; xn ; t þ t Þ ðx1 ; x2 ;:::; xn ; t Þ n   R P ¼  @x@ i ðpi ; xi ; t Þ ðxi Þg dðxi Þ i¼1

þ 12 þ

1 2

n P

i¼1

@2 @xi2

Stationary probability density function. The solution to the diffusion equation [Eq. (13)] with @ @t ¼ 0 and reflecting boundaries, where xi = 0 or xi = 1, gives the stationary (long-term equilibrium) joint probability density function (pdf) for the relative abundance of the n taxa in the local community, fxi gni¼1 . Here, we show that the joint pdf for a Dirichlet distribution,



 GðNT mÞ N mp 1 N mp 1 x1 T 1 x2 T 2 xnNT mpn 1 ¼ GðNT mp1 ÞGðNT mpn Þ ð17Þ

  R ðpi ; xi ; t Þ ðxi Þ2 g dðxi Þ

n P P

both Cxi xj and Vxi . Equations (13)–(16) then define the NCM for large populations by describing the change in the joint probability of the relative abundances of the n different taxa in the local community.

    R ðpi ; xi ; t Þ ð@xi Þ xj g dðxi Þd xj

where xn ¼ 1  x1      xn1 and pn ¼ 1  p1      pn1 is a solution. Note that if

i¼1 j6¼i



ð12Þ therefore,     n  n X 2 @ Cxi xj  @ X @ ðMxi Þ 1 @ 2 Vxi  1X ¼  þ þ @t @xi 2 2 i¼1 j6¼i @xi2 @xi @xj i¼1

 1 @ ðVxi Þ ðMxi Þ þ 2 @xi   1 X @ Cxi xj  þ ¼ 0 for i 2 i6¼j @xj

ð13Þ where Mxi and Vxi are the first and second moments of the change in xi per unit of time and Cxi xj is the expected product of changes in xi and xj. This is the n-dimensional version of the Kolmogorov equation. By considering the expected changes in relative abundance in the discrete time interval 1/d given by Eqs. (8)–(10), then Mxi ; Vxi and Cxi xj can be approximated by

Mxi ¼

mðpi  xi Þ NT

¼ 1;; n

ð18Þ

then @ @t ¼ 0. Therefore, substituting in Eqs. (14)–(16), we require

  mðpi  xi Þ 1 @ 2xi ð1  xi Þ   NT 2 @xi N 2T   1 X @ 2xi xj ¼  2 i6¼j @xj N 2T

ð19Þ

ð14Þ Substituting  into the left-hand side of Eq. (19) gives

Vxi

Cxj xj

2xi ð1  xi Þ þ mðpi  xi Þð1  2xi Þ ¼ NT2

ð15Þ

    2xi xj þ m xi pj  xj þ xj ðxi  pi Þ ¼ : ð16Þ NT2

Reasoning that typically either m is small or pi rapidly converges on xi, we can neglect all but the first term of

mðpi  xi Þ @  @xi NT



 xi ð1  xi Þ  NT2

h ih i  NT mpi  xi ðNT mpi þ 1Þ  xi ð1xi ÞðxNnT mpn 1Þ NT2 h ih

i ¼ N2 NT mðpi  xi Þ  NT mpi  xi ðNT mpi þ 1Þ  xi ð1xi ÞðxNnT mpn 1Þ T h ih i iÞ i NT mð1  pi Þ  1  ð1x ¼  x xn ðNT mpn  1Þ N2 ¼

mðpi xi Þ NT



T

ð20Þ

454

W.T. SLOAN

Similarly, substituting  into the right-hand side of (19) gives P i6¼j

@ @xj



xi xj NT2



ð21Þ

  xj xi  X ¼ 2 NT mpj  ðNT mpn  1Þ NT i6¼j xn   xi  ð1  xi  xn Þ ¼  2 NT mð1  pi  pr Þ  ðNT mpn  1Þ NT xn   xi  ð1  xi Þ ðNT mpn  1Þ ¼  2 NT mð1  pi  pr Þ þ ðNT mpr  1Þ  NT xn   xi  ð1  xi Þ ðNT mpn  1Þ ¼  2 NT mð1  pi Þ  1  NT xn

taxa in the source community fpi gni¼1 , a realization of the Dirichlet distributed local abundances can be generated by sampling from a set of gamma distributions. Let fYi gni¼1 be random variables such that Yi ~ gamma(NTmpi) and let fYi gni¼1 be realizations of these variables sampled at random, then yi xi ¼ P n

i ¼ 1;:::; n

ð22Þ

yj

NEUTRAL ASSEMBLY

We have already shown that for the continuous variant of the NCM, the steady-state joint pdf for all species is Dirichlet Dir(NTmpi,...,NTmpn), where p1,...,pn are the relative abundances of the species in the metacommunity. We can repeat the exact same argument to derive the joint distribution of the relative abundances within a sample of size NS from such a community. Strictly speaking, selecting a subsample of size NS from a local community is achieved by simply sampling NS individuals without replacement from the community of size NT. However, since for almost all microbial samples NS  NT , the problem can be approximated to one of sampling with replacement. Regard the sampling exercise as a continuous process through time. Individuals are selected from the source community one by one until a sample of size NS has been collected. Once this sample size has been reached, the process of selecting individuals continues at regular

PROKARYOTE COMMUNITIES

n Y yiNS xi f ðY j X Þ ¼ GðNS Þ GðNS xi Þ i¼1

ð23Þ

where X = (x1,...,xn) and X = (y1,...,yn) for notational convenience. This allows us to calculate the first and second moments of the sample distribution because we know that the marginal densities of a Dirichlet distribution are beta distributed. Therefore,

ð24Þ

Eðyi jxi Þ ¼ xi and

  xi ðNS xi þ 1Þ ð25Þ E yi2 jxi ¼ NS þ 1   Now, since xi  Beta NT mp i ; NT m ð1  pi Þ , we have that

j¼1

will represent a random sample from the Dirichlet joint probability distribution for a local neutral community [Eq. (17)].

OF

intervals in time (generations) but now the selected individual replaces one randomly chosen individual currently in the sample population. This is analogous to the argument used for deriving the joint distribution for the local abundances, except that we have a pure immigration–death process, with immigrants into the sample from the local community. Setting m = 1 and regarding our local abundances as the metacommunity from which immigrants are drawn, it is clear that conditional on knowledge of local abundances x1,...,xn the joint distribution of relative abundances y1,...,yn within a sample is Dirichlet Dir(NSxi,...,NSxn). That is,

Now, because (20) and (21) are equal,  is a solution to the diffusion equation [Eq. (13)] with @ @t ¼ 0 and the reflecting boundary conditions are met. Algorithm for generating the stationary probability Given the relative abundances of n density function.

ET AL.:

Eðyi Þ ¼ pi and

  E yi2 ¼



1 NS þ 1



pi ðNT mpi þ 1Þ NS þ pi NT m þ 1

NS NT mp2i þ ðNS þ NT m þ 1Þpi NS NT m þ NT m þ NS þ 1

NS NT m 2 NT mþNS þ1 pi þ pi

¼ NS NT m NT mþNS þ1 þ 1

Sampling a neutral community.

ð26Þ 

¼

ð27Þ

letting

~¼ m

NT m NT m þ NS þ 1

ð28Þ

then 2

^   NS mp i þ pi E yi2 ¼ ^ NS m þ 1

ð29Þ

We were unable to derive a neat analytical solution for the marginal pdfs of abundance in the sample. However, repeated sampling from neutrally assembled

W.T. SLOAN

ET AL.:

NEUTRAL ASSEMBLY

OF

PROKARYOTE COMMUNITIES

synthetic communities confirmed that the marginals were very closely approximated by beta distributions. If we assume that the sample marginal distributions are exactly beta, then—as their first and second moments are given by Eqs. (26) and (29), respectively—the sample distribution is given by,   ^ ^ yi ~Beta NS mpi ; NS mð1  pi Þ ð30Þ

References 1. Bell, G (2000) The distribution of abundance in neutral communities. Am Nat 155: 606–617 2. Bell, T, Agar, D, Song, J, Newman, JA, Thompson, IP, Lilley, AK, van der Gast, CJ (2005) Larger islands house more bacterial taxa. Science 308: 1884 3. Coskuner, G, Ballinger, SJ, Davenport, RJ, Pickering, RL, Solera, R, Head, IM, Curtis, TP (2005) Agreement between theory and measurement in quantification of ammonia-oxidizing bacteria. Appl Environ Microbiol 71: 6325–6334 4. Cox, DR, Miller, HD (1965) The Theory of Stochastic Processes. Methuen, London 5. Curtis, T, Sloan, WT, Scannell, J (2002) Modelling prokaryotic diversity and its limits. Proc Natl Acad Sci 99: 10494–10499 6. Curtis, TP, Sloan, WT (2005) Exploring microbial diversity—a vast below. Science 309: 1331–1333 7. Enquist, BJ, Sanderson, J, Weiser, MD (2002) Modeling macroscopic patterns in ecology. Science 295: 1835–1836 8. Fenchel, T, Finlay, BJ (2005) Bacteria and Island Biogeography. Science 309: 1997–1999 9. Finlay, BJ, Clarke, KJ (1999) Ubiquitous dispersal of microbial species. Nature 400: 828–828 10. Green, JL, Holmes, AJ, Westoby, M, Oliver, I, Briscoe, D, Dangerfield, M, et al. (2004) Spatial scaling of microbial eukaryote diversity. Nature 432: 747–750 11. Harris, LD (1984) The Fragmented Forest. University of Chicago Press 12. Horner-Devine, MC, Lage, M, Hughes, JB, Bohannan, BJM (2004) A taxa-area relationship for bacteria. Nature 432: 750–753 13. Houchmandzadeh, B, Vallade, M (2003) Clustering in neutral ecology. Phys Rev E 68: art. no. 061912

455

14. Hubbell, SP (2001) The Unified Neutral Theory of Biodiversity and Biogeography. Princeton University Press, Princeton 15. Kimura, M, Ohta, T (1971) Theoretical Aspects of Population Genetics. Princeton University Press, Princeton 16. Linacre, CH (2004) Diversity and the quantification of ammonia oxidising bacteria and denitrification from turbidity maximum of estuaries. PhD thesis, Civil Engineering and Geosciences, University of Newcastle upon Tyne. 17. MacArthur, RH, Wilson, EO (Eds.) (1967) The Theory of Island Biogeography. Princeton University Press, Princeton 18. May, RM (1975) Patterns of species abundance and diversity. In: Cody, ML, Diamond, JM (Eds.), Ecology and Evolution of Communities. Harvard University Press, Harvard, MA, pp 81–120 19. McGill, BJ (2003) A test of the unified neutral theory of biodiversity. Nature 422: 881–885 20. McKane, AJ, Alonso, D, Sole, RV (2004) Analytic solution of Hubbell’s model of local community dynamics. Theor Popul Biol 65: 67–73 21. Purkhold, U, Pommerening-Roser, A, Juretschko, S, Schmid, MC, Koops, HP, Wagner, M (2000) Phylogeny of all recognized species of ammonia oxidizers based on comparative 16S rRNA and amoA sequence analysis: implications for molecular diversity surveys. Appl Environ Microbiol 66: 5368–5382 22. Sloan, WT, Woodcock, S, Lunn, M, Head, IM, Nee, S, Curtis, TP (2005) The roles of immigration and chance in shaping prokaryote community structure. Environ Microbiol, Early Online 28 Nov 23. Sloan, WT, Lunn, M, Woodcock, S, Head, IM, Nee, S, Curtis, TP (2006) Quantifying the roles of immigration and chance in shaping prokaryote community structure. Environ Microbiol 8: 732–740 24. Vallade, M, Houchmandzadeh, B (2003) Analytical solution of a neutral model of biodiversity. Phys Rev E 68: art. no. 061902 25. Volkov, I, Banavar, JR, Hubbell, SP, Maritan, A (2003) Neutral theory and relative species abundance in ecology. Nature 424: 1035–1037 26. Wagner, M, Loy, A (2002) Bacterial community composition and function in sewage treatment systems. Curr Opin Biotechnol 13: 218–227 27. Whitman, WB, Coleman, DC, Wiebe, WJ (1998) Prokaryotes: the unseen majority. Proc Natl Acad Sci USA 95: 6578–6583 28. Woodcock, S, Lunn, M, Curtis, TP, Head, IM, Sloan, WT (2006) Taxa area relationships for microbes: the unsampled and the unseen. Ecol Lett 9: 805–812 29. Zwart, G, van Hannen, EJ, van Kamst, Agterveld, MP, van der Gucht, K, Lindstrom, ES, van Wichelen, J, et al. (2003) Rapid screening for freshwater bacterial groups by using reverse line blot hybridization. Appl Environ Microbiol 69: 5875–5883