Marine microbial community structure assessed

1 downloads 0 Views 2MB Size Report
Savvas Genitsarisa, Sebastien Monchya, Jeremie Denonfouxb, Stephanie Ferreirab, Konstantinos Ar. Kormasc,. Telesphore .... long-read chemistry of Roche Applied Sciences sup- ... annotated using the MG-RAST server (Meyer et al. 2008).
MARINE BIOLOGY RESEARCH, 2015 http://dx.doi.org/10.1080/17451000.2015.1084425

ORIGINAL ARTICLE

Marine microbial community structure assessed from combined metagenomic analysis and ribosomal amplicon deep-sequencing Savvas Genitsarisa, Sebastien Monchya, Jeremie Denonfouxb, Stephanie Ferreirab, Konstantinos Ar. Kormasc, Telesphore Sime-Ngandod, Eric Viscogliosie and Urania Christakia a

Laboratoire d’Océanologie et Géosciences (LOG), Université du Littoral Côte d’Opale (ULCO), Wimereux, France; bGenoscreen, Genomic Platform and R&D, Lille, France; cDepartment of Ichthyology and Aquatic Environment, University of Thessaly, Nea Ionia, Greece; d Laboratoire Microorganismes: Génome et Environnement (LMGE), Université Blaise Pascal, Aubière Cedex, France; eInstitut Pasteur of Lille, Inserm U1019, CNRS UMR 8204, Lille, France ABSTRACT

ARTICLE HISTORY

The microbial taxonomic composition of the three domains of life in two coastal plankton samples was assessed by random total community metagenomic sequencing and PCR-based rDNA amplicon deep-sequencing in order to compare the resulting diversity and investigate possible limitations and complementarities of each method. The various universal primer sets, used to amplify different hypervariable rDNA regions, revealed the same major high-level taxonomic groups in Bacteria and unicellular Eukaryota, and showed a scarce Archaea apparent richness. However, significant differences were found between the different primer sets (p-value < 0.05, with the Kolmogorov–Smirnov test), regarding both operational taxonomic unit (OTU) richness and relative abundance of the major high-level taxonomic groups detected. Based on the metagenomic approach, the phylum Bacteroidetes dominated the prokaryotic community, followed by Proteobacteria, while the detected eukaryotic unicellular taxa belonged to the groups of Alveolata, Fungi, Chlorophyta, Stramenopiles and Phaeophyceae. These groups were found to carry genes typically found in microbial communities, which are linked to DNA, RNA and protein metabolism and the synthesis of nucleotides, amino acids, carbohydrates and vitamins. Although our findings suggest that the total community metagenomic approach can provide a more comprehensive picture of the planktonic microbial community structure, a number of issues associated with this approach emerged. These issues include the still relatively high cost compared to amplicon sequencing, the possible low coverage of the full marine diversity, the insufficiency of databases for other gene markers than the small subunit gene, and the bias towards bacterial sequences because of their higher abundance relative to eukaryotes in marine environments.

Received 17 March 2015 Accepted 7 August 2015

Introduction The development of next-generation sequencing (NGS) methodologies have revealed a surprising diversity of planktonic organisms, including a variety of new and/ or previously undetected groups belonging to Bacteria, Archaea and Eukaryota (for a review, see Shokralla et al. 2012). In marine plankton studies, amplicon deepsequencing is a very popular method of targeting intermediate DNA fragments (400–500 bp) of any marker genes, after polymerase chain reaction (PCR) amplification (Bik et al. 2012). It has been used to assess protist diversity (e.g. Cheung et al. 2010; Edgcomb et al. 2011) and succession (e.g. Christaki et al. 2014; Genitsaris et al. 2015), bacterial spatial richness (e.g. Pommier et al. 2010) and seasonal dynamics (e.g. Gilbert et al.

RESPONSIBLE EDITOR

Torsten Struck KEYWORDS

Gene pool; marine microbial community; metagenome; pyrosequencing; SSU rRNA

2012), and the structure of the rare archaeal biosphere (Hugoni et al. 2013). PCR-associated biases in this approach include limited primer universality (Hadziavdic et al. 2014), PCR amplification errors, nucleotide misincorporation and PCR chimera formation (Stoeck et al. 2010; Pawlowski et al. 2011). However, these biases were well known in the fingerprinting era, but were somehow thrown aside when deep-sequencing appeared. Further biases of deep-sequencing, which involve pyrosequencing errors (Kunin et al. 2010), copy-number variations among taxa (Medinger et al. 2010) and limitation when performing meta-analysis of different samples sequenced with different primers (Engelbrektson et al. 2010), further support the need for alternative methods of analysis.

CONTACT Urania Christaki [email protected] Laboratoire d’Océanologie et Géosciences (LOG), UMR CNRS 8187, Université du Littoral Côte d’Opale (ULCO), 32 av. Foch, 62930 Wimereux, France The supplementary material for this article (Figures S1–S5 and Tables SI–SVI) is available via the Supplemental tab of the article’s online page at 10.1080/ 17451000.2015.1084425 © 2015 Taylor & Francis

2

S. GENITSARIS ET AL.

In the continuing growth of sequencing technology, methods aiming to circumvent these issues have been further developed. One of these, the total community metagenomic sequencing, is an approach that produces high-throughput, random genomic sequences, without a PCR step. This approach has largely benefited from the emergence of NGS technologies, as the sequencing of a continuously larger number of sequences is feasible. This method could provide a relatively unbiased way to assess microbial composition, and allow the identification of the gene pool content across various functional categories (Venter et al. 2004; DeLong et al. 2006). It has mainly been used to either assemble bacterial genomes (e.g. Margulies et al. 2005; Farrer et al. 2009), or to explore microbial communities in patients for medical purposes (Lupski et al. 2010), and in some cases has led to several major discoveries of new higher-level taxa (e.g. Di Rienzi et al. 2013). A recent investigation into the microbial community of two marine systems showed that the difference in anthropogenic pressures on these systems could be assessed using this method through combining information on the detected taxonomic groups and metabolic pathways (Kisand et al. 2012). The main objectives of the present study were to assess the potential, the limitations, and the complementarities of the total community metagenomic approach in comparison to the more frequently used rDNA amplicon deep-sequencing. To achieve these three objectives, both approaches were used independently of each other to investigate the microbial taxonomic diversity and composition of the three domains of life (Bacteria, Archaea and Eukaryota) in two replicates from a meso-eutrophic coastal system (the eastern English Channel). In addition, the gene pool content was explored using the metagenomic approach. In this system, the planktonic microbial community has been well documented over the last decade in terms of microscopical (Lamy et al. 2009; Grattepanche et al. 2011a, 2011b) and molecular (using rDNA amplicon deep-sequencing) observations (Monchy et al. 2012; Christaki et al. 2014). The choice to analyse replicates from a single date during the early spring bloom period was considered the best compromise between the high cost of the implemented methodologies, the necessity for replication (e.g. Prosser 2010; Knight et al. 2012) and the intention to target two different hypervariable small subunit (SSU) rRNA gene regions for each of the domains of life. The methodology selected to perform the analysis of the samples was based on commonly used tools in marine environmental studies (see Bik et al. 2012). The present study was to serve as a test case for the

combined methods of rDNA amplicon deep-sequencing and total community metagenomic analyses.

Materials and methods Sample collection The sampling site, representative of the coastal water masses of the eastern English Channel, was located at the SOMLIT (French Network of Coastal Observatories) station (50°40′ 75′′ N, 1°31′ 17′′ E; 20–25 m water depth). The physical, chemical, hydrological and biological properties encountered at this site are sampled and measured monthly. Sampling for this study was carried out on 26 March 2013, during the early spring bloom period (Table SI, supplementary material), at high tide, at a depth of 3 m. This transition period was chosen to investigate the taxonomic composition and the gene pool content of the microbial community as it contains a rich prokaryotic (Lamy et al. 2009) and eukaryotic diversity (Christaki et al. 2014). Two separate 15 l seawater samples were collected in sterile polyethylene bottles and kept at an in situ temperature in the dark until filtration less than 2 h later. Each water sample replicate was screened with a 150 μm mesh to retain larger metazoa, and the flow-through was filtered on 0.2 μm nucleopore filters (142 mm diameter) using a very low filtration pressure peristaltic pump (15 rpm) in order to avoid filter clogging and to minimize cell disruption. Larger membrane filters can improve DNA yield and reduce variation in DNA recovery. The filters were sliced into small pieces using a sterile scalpel and stored in NalgeneTM General Long-Term Storage Cryogenic Tubes (2 ml) at −80°C until analysis. The planktonic organisms’ DNA was extracted and purified with the PowerWater® DNA isolation kit (Mobio Laboratories Inc., CA, USA) into 50 μl final volume of extract, following the manufacturer’s protocol. The two replicates (R1 and R2) contained 7.6 and 7.4 ng μl−1 of DNA, respectively, as measured by the Qubit® 2.0 Fluorometer (Thermo Fischer Scientific Inc., MA, USA). This DNA yield, although higher than DNA concentrations in previous samples from the same study site (see Genitsaris et al. 2015), is relatively low compared to other meso-eutrophic coastal areas. This could be explained by the removal of metazoans during the pre-filtration step, prior to DNA extraction.

Total community metagenomic sequencing The taxonomic composition and the gene pool content of the entire microbial community was examined using

MARINE BIOLOGY RESEARCH

a metagenomic approach based on random total community sequencing of the replicated water samples from the SOMLIT site. For this, we used the 454 GS FLX+ sequencer (Roche Applied Sciences) with the long-read chemistry of Roche Applied Sciences supplied by the GS-FLX+ system, which provides read lengths up to 750 bp. Sequencing was carried out by the sequencing company GenoScreen (Genomic Platform and R&D, Lille, France), according to the manufacturer’s specifications. Library preparation was carried out according to the GS FLX+ Rapid Library Preparation Kit (Roche Applied Sciences) with 300 ng of DNA. The two replicates (R1 and R2) were run in parallel on a single Pico Titer Plate.

Total community metagenomic sequencing read processing and annotation First, the raw reads were quality pre-treated using the PRINSEQ-lite PERL script (Schmieder & Edwards 2011) with the following screen: low-quality ends (having a Phred quality score below 20) were removed; reads with lengths below 200 bp and over 1300 bp were excluded; a maximum of 1% of ambiguous bases (N) was accepted; the ‘shotgun’ key of 4 bp in the 5′ end of the sequences was also removed; and filters on the mean quality of the sequences (threshold set to 20) were applied. Subsequently, the pre- and nontreated sequences were assembled with GS De Novo Assembler version 2.8 (Roche Applied Sciences) using 60 bases overlap and 95% overlap identity, implementing the ‘-rip’ option. This option appends one sequence into one unique contig. The raw reads, as well as both the pre- and non-treated contigs, were processed and annotated using the MG-RAST server (Meyer et al. 2008). The metagenome data are available through the MG-RAST server with ID numbers 4549260.3 (R1, raw), 4549261.3 (R2, raw), 4544952.3 (R1, contigs), 4544953.3 (R2, contigs), 4545118.3 (R1, pre-treated contigs) and 4545139.3 (R2, pre-treated contigs). The raw sequence data were also submitted to GenBankSRA under the accession number SRP039908. In the MG-RAST analysis, duplicate and low-quality reads were removed using the optional initial quality control (QC) filter and the sequence quality scores (data uploaded in FASTQ format). Sequences were classified using the M5RN protein and ribosomal databases applying an e-value threshold of 1.0e−05 (Figure S1, supplementary material). There were 15,945,780 sequences in the M5NR protein database, and 309,342 sequences in the M5RNA ribosomal database, at the time of the analysis (January 2014; for more details on the context of the M5NR database see

3

Table SII, supplementary material). The predicted taxonomic annotation and classification was based on all organism and rRNA databases of M5NR. Predicted functional annotation was based on the Subsystems set of the SEED database, as it provided the highest number of hits for all data sets (Figure S1).

PCR and rDNA amplicon deep-sequencing Total genomic DNA from both replicates was amplified by PCR using two sets of universal primers for each domain of life. The primers were selected from the literature in addition to primers designed by GenoScreen for this study, in order to target different variable regions of the SSU rRNA genes for Prokaryota and Eukaryota (Table I). PCR reactions (50 μl) were carried out according to standard conditions for the Gene Amp High Fidelity PCR system (Life Technologies), with 5 ng of environmental DNA as a template. PCR conditions for each set of primer are described in Table I, where 12 samples for PCR and deep-sequencing were produced (three domains of life × two replicates × two sets of primers). All amplicon products were purified using Agencourt AmPure XP beads (Agencourt Bioscience Corporation, Beverly, MA, USA) and checked on an Agilent Bioanalyzer High Sensitivity DNA chip (Agilent Technologies, Santa Clara, CA, USA). Ribosomal DNA amplicon deep-sequencing was performed by GenoScreen (Lille, France) using the 454 GS-FLX titanium sequencing system (Roche Applied Sciences). Three samples, amplified with primers from each domain of life, were sequenced at a quarter of a plate, according to expected read length, so that all samples were sequenced in a whole plate. Amplicons were submitted to GenBank-SRA under the accession number SRP039908.

rDNA amplicon deep-sequencing quality filtering Downstream sequence processing was performed using the mothur 1.28.0 software following the standard operating procedure (Schloss et al. 2009). First, the flowgrams from each of the 12 samples were extracted and were de-noised using the mothur 1.28.0 implementation of PyroNoise (Quince et al. 2009). Primer sequences and key fragments were removed, and only reads above 200 bp long with homopolymers shorter than 8 bp were kept. Each data set was de-replicated to the unique reads and aligned against the SILVA 108 database. Chimera detection of the unique reads was performed with the UCHIME software (Edgar 2010) and potential chimeras were subsequently removed. The remaining

Hugoni et al., 2013 72°C, 7 min

GenoScreen design 72°C, 7 min

GenoScreen design 72°C, 7 min

72°C, 7 min

94°C, 2 min

94°C, 2 min

94°C, 2 min

94°C, 2 min

94°C, 2 min 481 bp

465 bp

326 bp

337 bp

421 bp

V3–V4

V5–V6

V2–V3

V3–V4

TAReukREV3: ACTTTCGTTCTTGATYRA Euk-516r: ACCAGACTTGCCCTCC S-D-Bact-0785-a-A-21: GACTACHVGGGTATCTAATCC R2r: CGTTRCGGGACTTAACCCAACA 464r: TRTTACCGCGGCGGCTGBCA 806r: GGACTACVSGGGTATCTAAT TAReuk454FWD1: CCAGCASCYGCGGTAATTCC 18S-82F: GAAACTGCGAATGGCTC S-D-Bact-0341-b-S-17: CCTACGGGNGGCWGCAG R2f: CAAACAGGATTAGATACCCTG 123f: CGGRAAACTGGGGATAAT 349f: CCTAYGGGGYGCASCAG

V2–V3

References

Lopez-Garcia et al. 2003 Amannn et al. 1990 Klindworth et al. 2013 72°C, 7 min

72°C, 7 min

94°C, 15 s/50°C, 30 s/ 72°C, 1 min 95°C, 15 s/50°C, 30 s/ 72°C, 1 min 94°C, 30 s/50°C, 30 s/ 72°C, 1 min 95°C, 15 s/50°C, 30 s/ 72°C, 1 min 95°C, 15 s/50°C, 30 s/ 72°C, 1 min 95°C, 15 s/50°C, 30 s/ 72°C, 1 min 94°C, 2 min

Reverse primer (5′ –3′ ) Forward primer (5′ –3′ )

417 bp

Data analysis

V4

Final extension Initialization

reads were clustered into operational taxonomic units (OTUs) at a 97% similarity threshold. Data sets from the two replicates were normalized per primer set using mothur 1.28.0. Over 50% of the sequences were single singletons (unique amplicons occurring only once). Although it has been suggested that these are probably sequencing artefacts and should be removed from the data set (Huse et al. 2010; Kunin et al. 2010), a large amount of the valid sequences can be lost. Thus, the single singletons that had ‘close reference’ (>99% similarity on >80% of their length) with sequences submitted in the SILVA 108 database were retained in the ‘optimized’ data set. About 10% of the single singletons for each sample were considered as valid sequences. The rest of the single singletons were discarded. Concerning the sequences produced by the eukaryotic primers, sequences assigned to metazoa were removed.

Rarefaction curves and the richness estimator SChao1 for all 12 rDNA amplicon sequenced samples, and the metagenomes, were calculated with the PAST 2.17c software (Hammer et al. 2001). For the metagenomes, rarefaction curves and diversity indexes were calculated based on abundance measurements at the genus level. Taxonomic affiliation was performed on the most abundant OTUs produced by rDNA amplicon deep-sequencing (Table SIII, supplementary information), comprising >0.1% of all eukaryotic sequences and >0.5% of all bacterial sequences in the corresponding samples. The threshold was determined in order to include >90% of the total number of sequences in each of the corresponding samples (see Fuhrman 2009). These OTUs were compared with BLASTN queries against the nr/nt nucleotide database of the NCBI for the detection of the closest cultured or uncultured relatives (Table SIII). Concerning eukaryotes, their taxonomic affiliation was also verified by BLASTN against the PR2 database (Guillou et al. 2013). Finally, the two replicates were compared at the OTU level for the rDNA amplicon data sets and at the genus level for the metagenomic data sets with the Kolmogorov–Smirnov test in the R software. The Kolmogorov–Smirnov test was also used to compare the results produced by the different primer sets per domain of life.

Archaea

Bacteria

Eukaryota

Data accessibility Life domain

PCR conditions

Denaturation (30 cycles) Amplicon size Hyper-variable region

Stoeck et al. 2010

S. GENITSARIS ET AL.

Table I. Universal primers used to amplify different hypervariable regions of the SSU rRNA genes for the three domains of life.

4

Pyrotags: GenBank-SRA accession number SRP039908. Metagenomic raw sequence data: GenBank-SRA accession number SRP039908.

MARINE BIOLOGY RESEARCH

MG-RAST metagenomic 4549260.3, 4549261.3, 4545118.3 and 4545139.3.

profiles: ID numbers 4544952.3, 4544953.3,

Results Total community metagenomic sequencing Random metagenomic sequencing was performed in the total community DNA, extracted from two biological replicates on a single sampling date. Overall, about 1.1 × 106 single reads (mean read length between 450 and 500 bps) were generated from the two replicates. These reads produced around 2 × 104 contigs in total, with a mean sequence length between 1700 and 1900 bps. After quality control, 20% of the reads, or contigs, were removed from each replicate (Table SIII). Within this data set, more than 50% of the sequences were annotated into proteins (Table SIV, supplementary material). A good estimation for taxonomic affiliation and gene annotation was observed among all M5NR databases with >95% of hits occurring below the evalue threshold of e−05 (Figure S1). No significant differences on contig abundance and taxonomic distribution were observed between the two replicates based on the Kolmogorov–Smirnov test (P = 0.72). All metagenomic profiles (raw data and contigs, non- and pre-treated data sets) showed similar taxonomic distribution (Figure S2, supplementary information). Therefore, in the following sections, the pre-treated contigs data were used to present the taxonomic composition and gene pool content of the microbial community. This data set displayed a larger sequence length (Table SIV) providing more confident annotations.

Taxonomical composition based on total community metagenomic sequencing In both replicates, more than 94% of the sequences taxonomically annotated were affiliated to Bacteria, while 4% and 3% were affiliated to eukaryotic taxa for R1 and R2, respectively. Moreover, about 50% of the eukaryotic taxa belonged to unicellular organisms (Figure 1). Concerning the bacterial taxonomic groups, Flavobacteria, belonging to Bacteroidetes, dominated the class distribution, comprising 42.3% and 40.2% of the total number of sequences for R1 and R2, respectively. They were followed by α-Proteobacteria (27.6% and 30.8%) and γ-Proteobacteria (10.1% and 11%). Eukaryotic unicellular taxa were comprised of groups of Bacillariophyceae, Ascomycota, Chlorophyta, Basidiomycota, Apicomplexa (found only in R2, Figure 1), and Phaeophyceae (found only in R1, Figure 1). Each of these groups contributed between 10% and 0.8% to

5

the total number of eukaryotic sequences. A small fraction of the sequences were annotated to viruses, and in particular bacteriophages (representing about 75% of the total virus sequences in each replicate), while only few archaeal sequences were detected (0.1% of the total number of sequences in each replicate; Figure 1). Considering a lower taxonomic level, the most abundant sequences affiliated at the genus level (>0.5% of the total number of sequences) belonged mainly to Bacteria, and were dominated by the Bacteroidetes and Proteobacteria groups (Figure S3, supplementary material). The most abundant taxon was the Bacteroidetes Polaribacter, with 1057 and 1243 annotated sequences for R1 and R2, respectively, followed by the α-Proteobacteria Roseobacter (616 and 952) and Ruegeria (295 and 229). Among the most abundant taxa, the genus Nematostella, belonging to Cnidaria, was the only representative of eukaryotic sequences, comprising 40 and 41 sequences for R1 and R2, respectively, while T-like viruses and the Bacteria Cellulophaga, Burkholderia, Loktanella, Shewanella, Robiginitalea, Saccharophagus and Jannoschia appeared among the most abundant taxa in R2 (Figure S3).

Gene pool content based on total community metagenomic sequencing Predicted functional annotation of the assembled sequences using the Subsystems database signified the occurrence of common functional groups in both replicates. In particular, genes associated with clusteringbased subsystems (CBS), linked among others with fatty acid metabolism, cell division and cytochrome biogenesis, were found. In addition, some genes were linked with DNA, RNA and protein metabolism; and genes related to nucleotides, amino acids, carbohydrates and vitamin synthesis were predominantly present in both replicates, comprising >80% of the total annotated sequences. Other functions related to cell wall and capsule biosynthesis, membrane transportation, respiration, virulence, stress response, cell signalling and division, compound metabolism, dormancy and photosynthesis were also detected (Figure S4, supplementary material). The majority of the characteristic metabolic pathways and protein connections in a microbial community, as interpreted by the KEGG mapper, were represented in both replicates (data not shown). According to the best-hit classification of the functional groups, 26 functional groups were distributed among eight assigned high-level taxonomic groups of unicellular organisms (Figure 2). In particular, the most abundant groups of bacteria were associated with compound synthesis (nucleotides, amino acids,

6

S. GENITSARIS ET AL.

Figure 1. Distribution of the major high-level taxonomic groups in the two replicates (R1 and R2). Each slice indicates the percentage of sequences predicted as proteins and rRNA genes within the indicated taxonomic level. The colour of each slice is based on sequence abundance and not phylogenetic relatedness. This information is based on all the annotation source databases used by MG-RAST (see Table SII, supplementary material). Slices with no taxonomic affiliations include several taxonomic groups with a low number of sequences.

carbohydrates, vitamins and prosthetic groups), DNA, RNA, protein and various compounds, metabolism and membrane transportation. The eukaryotic groups were mainly linked with genes associated with respiration processes, which is not surprising due to the high copy nature of the mitochondrial genome, and RNA and protein metabolism. The viruses detected had the putative ability to express DNA metabolism genes and genes associated with phages, plasmids and transposable elements (Figure 2).

Microbial composition based on rDNA amplicon deep-sequencing relative to total community metagenomic sequencing Ribosomal DNA amplicon deep-sequencing was used to explore the diversity and composition of all three domains of life in the two replicates by targeting

different hyper-variable regions of the SSU rRNA gene (Table I). Ribosomal DNA amplicon sequencing produced only five Archaea-related OTUs with the two different primer sets used (Table SIII). This is consistent with the very low presence of archaeal sequences detected in the metagenomic profiles (0.1% of the total number of sequences; Figure 1). Due to this low archaeal diversity in the study area, Archaea were not considered further comparing metagenomic and amplicon deep-sequencing approaches. An overview of the number of taxa of the microbial community with both approaches is shown in Table SV (supplementary material). Concerning the rDNA amplicon data sets, the ratio of observed to expected (SChao1) OTUs was in all cases >80%, while the rarefaction curves started to reach a plateau in all cases, with a ≥97% threshold of similarity (Figure S5, supplementary

MARINE BIOLOGY RESEARCH

7

Figure 2. Heat-maps for the two replicates (R1 and R2) representing gene pool content of high-level taxonomic groups, based on the number of sequences encoding each predicted protein function according to the Subsystems set of the SEED annotation database.

material). In contrast, the ratio of observed to expected taxa calculated for the metagenomes was 65% and 70%, respectively, for R1 and R2. No significant

differences in the abundance and taxonomic distribution of sequences were observed between the two replicates based on the Kolmogorov–Smirnov test for

Figure 3. Composition of the high-level taxonomic groups (percent of total number of OTUs for the rDNA amplicon data sets and taxa at the genus level for the metagenomic (MG) data sets) in the two replicates (R1 and R2). Taxonomic information for the rDNA amplicon deep-sequencing derived from representative relatives of the most abundant OTUs (comprising >0.1% of all eukaryotic sequences and >0.5% of all bacterial sequences in the corresponding samples and including >90% of the total number of sequences in each of the corresponding samples) obtained from BLAST searches against GenBank. The primer corresponding to each code name in the figure is shown in Table I. Taxa of the total community MG data sets were determined based on predicted proteins and rRNA genes using all the annotation source databases of MG-RAST. Other Bacteria include Actinobacteria, Cyanobacteria, Planctomycetia, Firmicutes, Verrucomicrobia, Fusobacteria, Molicutes and unclassified Bacteria. Other Eukaryota include Cercozoa, Haptophyceae, Choanoflagelida, Ichthyosporea, Cryptophyta, Katablepharidophyta and unclassified Eukaryota.

8

S. GENITSARIS ET AL.

bacterial and eukaryotic primers (p-values between 0.69 and 0.92 for all primer sets). The overall Bacteria and Eukaryota distribution of OTUs within the major high-level taxonomic groups was statistically different among primer sets (p-value < 0.05). In particular, concerning bacterial sequences, both primer sets gave a similar taxonomic representation of the major high-level taxonomic groups, except for Bacteroidetes, which had almost double the number of OTUs with the S-D-Bact primers (Figure 3). Concerning eukaryotic sequences, Alveolata, including Dinophyceae and Ciliophora, was the predominant group for both primer sets, while Stramenopiles had almost twice the number of OTUs with the TAReuk primers (Figure 3). However, the 18S-82F primers amplified additional high-level groups, including Apicomplexa, Cryptophyta and Haptophyta (not shown) that were not detected with the TAReuk primers. In the metagenomes, the bacterial community was dominated by Bacteroidetes-related taxa (almost 50% of the total number of bacterial taxa), while fewer sequences were attributed to Proteobacteriarelated taxa in comparison with the rDNA amplicon sequencing. Concerning the eukaryotic metagenome, the Bacillariophyceae-related taxa were almost four times more abundant in comparison with the PCRamplified data sets, while the Alveolata-related taxa were considerably fewer (Figure 3). To highlight this, microscopical observation limited to the free-living plankton (organisms with size >10 μm) revealed that the dominant group of eukaryotic unicellular organisms was Bacillariophyta, comprising >55% of the total number of individuals identified, followed by other Eukaryotes, including mainly nanoflagellates, and Dinophyceae (Table SVI, supplementary material). Regarding lower taxonomic levels, very few of the dominant OTUs were affiliated to the same genus with both primer sets for Bacteria (e.g. Sulfitobacter, Rhodobacteraceae, Sphingopyxis) and Eukaryota (e.g. Gyrodinium, Protaspis, Acineta), although the majority of OTUs belonged to the same higher-level taxonomic group (Table SIII). Similarly, only few of the dominant taxa at the genus level in the metagenomes were also found among the dominant OTUs (e.g. Polaribacter, Roseobacter, Sulfitobacter, Roseovarius; Figure S3, Table SIII).

Discussion Taxonomic composition based on rDNA amplicon deep-sequencing Based on the rDNA amplicon data sets, the same major high-level taxonomic groups were detected in Bacteria

(Bacteroidetes and Proteobacteria) and unicellular Eukaryota (Alveolata and Stramenopiles) with the different primer sets, yet they produced different outcomes when focused on the lower-level taxa (e.g. at the genus level). The accuracy of BLAST-derived taxonomy, especially for low-level taxa, depends on the sequence length, the database coverage for the specific taxonomic group, and the correct identification of the reference sequence (Bik et al. 2012). In addition, statistically significant (p-value < 0.05) differences of the relative taxonomic composition and species richness were detected among the primer sets. Furthermore, among the eukaryotic sequences, the 18S-82F primers amplified sequences from high-level groups that were not detected with the TAReuk primers, such as Apicomplexa, Cryptophyta and Haptophyta. The choice of primers affects the biodiversity assessment of an ecosystem (Hong et al. 2009), as full primer universality is difficult or impossible to achieve (Hadziavdic et al. 2014). Limitations of the primers’ universality produce constraints, as it excludes taxonomic groups in the analysis, and introduces biases in favour of some taxa (Polz & Cavanaugh 1998; Lanzen et al. 2011). In order to investigate this comprehensively, the microbial diversity and composition of two different hyper-variable SSU rRNA gene regions for each domain of life were targeted. The regions V3–V4 and V5–V6 for Bacteria and V2–V3 and V4 for Eukaryota were chosen. Several studies have applied the V4 region in assessments of the microbial composition for both Bacteria and Eukaryota in various ecosystems (e.g. Youssef et al. 2009; Stoeck et al. 2010; Klindworth et al. 2013) because it has been suggested that it is a prominent candidate to reveal a realistic picture of microbial diversity for both domains (Pernice et al. 2013). However, different research groups have used different primer sets to assess microbial diversity, perplexing direct comparisons between samples (Hadziavdic et al. 2014). Regarding single singletons, many researchers tend to discard them as possible errors, choosing a conservative but more reliable estimation of diversity (e.g. Huse et al. 2010; Kunin et al. 2010). However, there is a risk of removing valid sequences. In this study, the low number of samples sequenced produced a large number of singleton reads, some of which could be valid sequences, that could be determined if numerous samples were processed together. In order to retain as much diversity as possible and be confident that no sequencing errors are contained in the data set, ‘close-reference’ reads were included. Based on the concept that ‘all observed sequences must be similar to previously known sequences’ (Bik et al. 2012), the ‘close-reference’ single singletons comprised about

MARINE BIOLOGY RESEARCH

10% of the total number of single singletons. Among uncultured taxa and taxa belonging to dominant high-level taxonomic groups, a few singleton phylotypes belonging to under-represented high-level taxonomic groups, such as Phaeocystaceae for Eukaryota and Cyanobacteria for Bacteria, were included.

Taxonomic composition and gene pool content based on total community metagenomic sequencing Based on the total community metagenomic sequencing data, the bacterial groups of Bacteroidetes, αand γ-Proteobacteria dominated the microbial community. A previous study in the area highlighted the dominance of the same high-level prokaryotic taxonomic groups during the early spring bloom (Lamy et al. 2009). It was suggested that they were major participants in leucine incorporation of the active bacterial community. In our samples these groups included genes encoding several functions usually found in microbial communities, which included protein, RNA and DNA metabolism, biosynthesis and degradation of carbohydrates and lipids, compound metabolism, cell division, cycle and signalling, stress response, and virulence (Figure S4). In particular, Bacteroidetes have genes linked with polymer degradation, including a large number of peptidases, hydrolases and glycosyl transferases (Fernández-Gómez et al. 2013), also found with total community metagenomic analysis in our samples. In addition, α-Proteobacteria are associated among others with the metabolism of a number of organic compounds, sulphur oxidation and the production of secondary metabolites (Brinkhoff et al. 2008), while γ-Proteobacteria comprise several pathogens that include virulence and disease genes (Persson et al. 2009). In our data set, around 3% of the genes were associated with virulence and disease. Analysis of the Global Ocean Sampling metagenomic data (Rusch et al. 2007) has indicated that virulence-related genes are present in up to 8% of the planktonic bacteria, with the highest values detected in eutrophic waters (Persson et al. 2009). The samples’ dominant microorganism belonged to the genus Polaribacter, a marine Bacteroidetes with a dual life strategy. As with many known marine Bacteroidetes (Riemann et al. 2000), Polaribacter benefits from colonization on particle surfaces and polymer degradation. In addition, the presence of the proteorhodopsin gene provides a survival advantage in nutrientdepleted sunlit surface waters (González et al. 2008; Kirchman 2008), and could be an explanation for Polaribacter’s dominance in the microbial community.

9

According to the total community metagenomic approach, eukaryotic sequences comprised only 4% of the total number of sequences, as expected. This relatively low abundance is in accordance with the metagenomic analysis of marine microbial communities by Kisand et al. (2012) and it can be attributed to the relatively high different abundances of Prokaryota and Eukaryota in marine samples of the same volume. Furthermore, this low number of sequences could cause the discrepancies among the low-level eukaryotic taxa identified in the two replicates, as the coverage of the eukaryotic community was limited. These discrepancies could also be attributed to factors related to sequence sampling, such as uneven sampling of low abundance taxa, but they could also be due to differences occurring at the stage of DNA extraction (e.g. cells are not always equally well disrupted), library preparation and sequencing. All these steps can randomly or systematically introduce deviations between replicates. The eukaryotic community was dominated by Alveolata-related sequences, followed by Stramenopiles and Fungi. This composition resembles marine protist community compositions reported in previous studies, using microscopy and molecular techniques (e.g. Countway et al. 2010; Grattepanche et al. 2011a, 2011b; Christaki et al. 2014). A low number of archaeal sequences were detected, in accordance with the rDNA amplicon analysis and a previous study in the area using micro-FISH analysis (Lamy et al. 2009). In addition, a low number of sequences affiliated with viruses and mainly phages (75% of the viral sequences) were detected. This was not surprising, as most environmental free virus-like particles (VLPs) in the water column are phages (Breitbart & Rohwer 2005). A noticeable functional category that was associated with phages was phosphorus metabolism (Figure 2). This is consistent with many marine phages, which encode enzymes involved in phosphate metabolism (e.g. Chen & Lu 2002).

Potential and limitations of total community metagenomic sequencing in relation to rDNA amplicon deep-sequencing Both methods detected the same major high-level taxonomic groups in Bacteria and unicellular Eukaryota and corroborated the low presence of archaeal phylotypes; however, the results did not concur when focused on the lower-level taxa (e.g. at the genus level). The two approaches also showed different relative taxonomic composition among the high-level taxa. With amplicon sequencing, Alveolata were over-represented and at the same time Bacillariophyceae-

10

S. GENITSARIS ET AL.

related taxa were under-represented, as shown by microscopy identification. On the contrary, total community metagenomic sequencing does not include an initial PCR step (except during the sequencing library clonal amplification) and the related possible biases such as the primer universality, and presented a similar picture with microscopy regarding the taxonomic diversity and abundance of these groups. The under-representation of Bacillariophyceae with rDNA amplicon sequencing may suggest that they are ‘masked’ by the high number of SSU rRNA gene copies of other groups, such as the Alveolata (Prokopowich et al. 2003; Zhu et al. 2005), and can also be associated with the low Bacillariophyceae affiliation of the universal primers used. On the other hand, these differences in estimation of taxonomic diversity could also be attributed to problems associated with the metagenomic approach. With the metagenomic sequencing, more than 50% of the sequences did not get annotated. A part of those could be non-coding regions, but also the annotation heavily depends on the coverage of sequence data for a taxon or a gene function in the query database. Thus, biases or low sequence coverage in the reference databases might reflect biases in taxonomic representation observed between the two approaches at lower levels. Total community metagenomic sequencing could circumvent the non-universality of the existing primers, making microbial community meta-analysis feasible. Furthermore, by including other gene markers in the analysis additional to the SSU rRNA gene, it can be used more assuredly to assess the diversity and composition of an environmental sample than conventional PCR-based rRNA studies (Venter et al. 2004). Still, databases for other genes are not extensive enough. One exception when investigating protistan diversity might be to use the mitochondrial cytochrome oxidase I gene (COI), which is routinely used in different studies (e.g. Nassanova et al. 2010; Lara et al. 2011). Finally, total community metagenomic sequencing also offers identification of the gene pool content, and as a consequence, a more comprehensive picture of the structure of microbial communities (Venter et al. 2004). However, a number of associated limitations need to be addressed. (i) First, the cost of total community metagenomic sequencing can be 10 times higher than rDNA amplicon sequencing, reducing the number of samples that can be analysed. (ii) An environmental sample can typically contain thousands of genomes, consisting of thousands of genes. Obtaining around 1.2 million raw sequences per plate run with the existing 454 technology, or even millions

with Illumina, may still not be sufficient to unveil all members of a community. (iii) The coverage of marine communities’ eukaryotic diversity can be further challenging, as in marine samples the majority of sequences are associated with Bacteria (see results; also Kisand et al. 2012). A possible solution for this could be the differential filtration of the water sample according to size. Differential filtration of the sample according to size may increase the recovery of protists, over bacteria, and serve the purpose of covering a higher eukaryote marine diversity, when using a metagenomic approach. However, differential filtration may also lead to other biases, such as inconsistency between replicates. Fenchel & Finlay (2004) hypothesized that smaller protists are more likely to have a wider and more homogeneous dispersal. Consistent with this, it was shown that smaller fraction (1.6–30 µm) replicates were more similar to each other in terms of composition, compared to replicates from the larger size fraction (> 30 µm; Duret et al. 2015). Also, size fractionation might result in finding larger taxa in smaller fractions due to the breaking of bodily parts, so such practices should be cautiously assessed. (iv) In highly complex metagenomes, such as those produced by total community metagenomic sequencing, the probability of observing matching sequences originating from the exact same locus is very low (Gomez-Alvarez et al. 2009), so in this study’s case, the term OTU is not applicable. Therefore, the assignment of reads or contigs to species, and the direct comparison of loci, remains a challenge. New methodologies, such as CONCOCT (Alneberg et al. 2014), have been developed to improve reads assembly, thus allowing better reads assignment, quantification of their abundance and revealing strain-level variation. Overall, as the cost of sequencing decreases and data processing pipelines optimize, total community metagenomic sequencing has been shown to be a promising method in integrating sequencing data into ecological processes and offering insights into the complex interaction between genes, organisms, communities, and environmental features. Thus, this method could be ideal to explore unfamiliar and extreme systems and, along with in-depth targeted sequencing, to address questions about microbial structure and functioning.

Acknowledgements This study was supported by the ‘Nord-Pas de Calais’ FRBDEMO (FRB_2013) and the ANR-ROME (ANR 12 BSV7 0019 02) projects, and the SOMLIT network. We are grateful to

MARINE BIOLOGY RESEARCH

Eric Lecuyer for running the logistics and the SOMLIT sampling and Elsa Breton for validating the physical and chemical values. The authors would also like to thank those at www.englisheditor.webs.com for their help in the English proof-reading of this paper. We are thankful to the three referees who helped improve the original manuscript with their constructive comments.

Disclosure statement No potential conflict of interest was reported by the authors.

References Alneberg J, Bjarnason BS, de Bruijn I, Schirmer M, Quick J, Ijaz UZ, et al. 2014. Binning metagenomic contigs by coverage and composition. Nature Methods 11:1144–46. Amann RI, Binder BJ, Olson RJ, Chisholm SW, Devereux R, Stahl DA. 1990. Combination of 16S rRNA-targeted oligonucleotide probes with flow cytometry for analyzing mixed microbial populations. Applied and Environmental Microbiology 56:1919–25. Bik HM, Porazinska DL, Creer S, Caporaso JG, Knight R, Thomas WK. 2012. Sequencing our way towards understanding global eukaryotic biodiversity. Trends in Ecology and Evolution 27:233–43. Breitbart M, Rohwer F. 2005. Here a virus, there a virus, everywhere the same virus? Trends in Microbiology 13:278–84. Brinkhoff T, Giebel HA, Simon M. 2008. Diversity, ecology, and genomics of the Roseobacter clade: a short review. Archives of Microbiology 189:531–39. Chen F, Lu J. 2002. Genomic sequence and evolution of marine cyanophage P60: a new insight on lytic and lysogenic phages. Applied and Environmental Microbiology 68:2589–94. Cheung MK, Au CH, Chu KH, Kwan HS, Wong CK. 2010. Composition and genetic diversity of picoeukaryotes in sub-tropical coastal waters as revealed by 454 pyrosequencing. The ISME Journal 4:1053–59. Christaki U, Kormas KA, Genitsaris S, Georges C, Sime-Ngando T, Viscogliosi E, Monchy S. 2014. Winter-summer succession of unicellular eukaryotes in a meso-eutrophic coastal system. Microbiology Ecology 67:13–23. Countway PD, Vigil PD, Schnetzer A, Moorthi SD, Caron DA. 2010. Seasonal analysis of protistan community structure and diversity at the USC Microbial Observatory (San Pedro Channel, North Pacific Ocean). Limnology and Oceanography 55:2381–96. DeLong EF, Preston CM, Mincer T, Rich V, Hallam SJ, Frigaard NU, et al. 2006. Community genomics among stratified microbial assemblages in the ocean’s interior. Science 311:496–503. Di Rienzi SC, Sharon I, Wrighton KC, Koren O, Hug LA, Thomas BC, et al. 2013. The human gut and groundwater harbor non-photosynthetic bacteria belonging to a new candidate phylum sibling to Cyanobacteria. eLife 2:e01102. 25 pages. Duret MT, Pachiadaki MG, Stewart FJ, Sarode N, Christaki U, Monchy S, et al. 2015. Size-fractionated diversity of eukaryotic microbial communities in the Eastern Tropical Pacific oxygen minimum zone. FEMS Microbiology Ecology 91(5). 13 pages. doi:10.1093/femsec/fiv037.

11

Edgar RC. 2010. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26:2460–61. Edgcomb V, Orsi W, Bunge J, Jeon S, Christen R, Leslin C, et al. 2011. Protistan microbial observatory in the Cariaco Basin, Caribbean. I. Pyrosequencing vs Sanger insights into species richness. The ISME Journal 5:1344–56. Engelbrektson A, Kunin V, Wrighton KC, Zvenigorodsky N, Chen F, Ochman H, Hugenholtz P. 2010. Experimental factors affecting PCR-based estimates of microbial species richness and evenness. The ISME Journal 4:642–47. Farrer RA, Kemen E, Jones JDG, Studholme DJ. 2009. De novo assembly of the Pseudomonas syringae pv. Syringae B728a genome using Illumina/Solexa short sequence reads. FEMS Microbiology Letters 291:103–11. Fenchel T, Finlay BJ. 2004. The ubiquity of small species: patterns of local and global diversity. BioScience 54:777–84. Fernández-Gómez B, Richter M, Schüler M, Pinhassi J, Acinas SG, González JM, Pedrós-Alió C. 2013. Ecology of marine bacteroidetes: a comparative genomics approach. The ISME Journal 7:1026–37. Fuhrman JA. 2009. Microbial community structure and its functional implications. Nature 459:193–99. Genitsaris S, Monchy S, Viscogliosi E, Sime-Ngando T, Ferreira S, Christaki U. 2015. Seasonal variations of marine protist community structure based on taxon-specific traits using the eastern English Channel as a model coastal system. FEMS Microbiology Ecology 91(5). 15 pages. doi:10.1093/ femsec/fiv034. Gilbert JA, Steele JA, Caporaso JG, Steinbrück L, Reeder J, Temperton B, et al. 2012. Defining seasonal marine microbial community dynamics. ISME Journal 6:298–308. Gomez-Alvarez V, Teal TK, Schmidt TM. 2009. Systematic artifacts in metagenomes from complex microbial communities. The ISME Journal 3:1314–17. González JM, Fernández-Gómez B, Fernàndez-Guerra A, Gómez-Consarnau L, Sánchez O, Coll-Lladó M, et al. 2008. Genome analysis of the proteorhodopsin-containing marine bacterium Polaribacter sp. MED152 (Flavobacteria). Proceedings of the National Academy of Sciences 105:8724–29. Grattepanche JD, Breton E, Brylinski JM, Lecuyer E, Christaki U. 2011a. Succession of primary producers and micrograzers in a coastal ecosystem dominated by Phaeocystis globosa blooms. Journal of Plankton Research 33:37–50. Grattepanche JD, Vincent D, Breton E, Christaki U. 2011b. Phytoplankton growth and microzooplankton grazing during a spring bloom in the eastern English Channel. Journal of Experimental Marine Biology and Ecology 404:87–97. Guillou L, Bachar D, Audic S, Bass D, Berney C, Bittner L, et al. 2013. The Protist Ribosomal Reference database (PR2): a catalog of unicellular eukaryote small sub-unit rRNA sequences with curated taxonomy. Nucleic Acids Research 41:597–604. Hadziavdic K, Lekang K, Lanzen A, Jonassen I, Thompson EM, Troedsson C. 2014. Characterization of the 18S rRNA gene for designing universal eukaryote specific primers. PLoS One 9(2):e87624. 10 pages. Hammer Ø, Harper DAT, Ryan PD. 2001. PAST: Paleontological statistics software package for education and data analysis. Palaeontologia Electronica 4:9. Computer program.

12

S. GENITSARIS ET AL.

Hong SH, Bunge J, Leslin C, Jeon S, Epstein SS. 2009. Polymerase chain reaction primers miss half of rRNA microbial diversity. The ISME Journal 3:1365–73. Hugoni M, Taib N, Debroas D, Domaizon I, Dufournel IJ, Bronner G, et al. 2013. Structure of the rare archaeal biosphere and seasonal dynamics of active ecotypes in surface coastal waters. Proceedings of the National Academy of Sciences 110:6004–09. Huse SM, Welch DM, Morrison HG, Sogin ML. 2010. Ironing out the wrinkles in the rare biosphere through improved OTU clustering. Environmental Microbiology 12: 1889–98. Kirchman DL. 2008. New light on an important microbe in the ocean. Proceedings of the National Academy of Sciences 105:8487–88. Kisand V, Valente A, Lahm A, Tanet G, Lettieri T. 2012. Phylogenetic and functional metagenomic profiling for assessing microbial biodiversity in environmental monitoring. PLoS One 7:e43630. 14 pages. Klindworth A, Pruesse E, Schweer T, Peplies J, Quast C, Horn M, Glöckner O. 2013. Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencing-based diversity studies. Nucleic Acids Research 41:e1. 11 pages. Knight R, Jansson J, Field D, Fierer N, Desai N, Fuhrman JA, et al. 2012. Unlocking the potential of metagenomics through replicated experimental design. Nature Biotechnology 30:513–20. Kunin V, Engelbrektson A, Ochman H, Hugenholtz P. 2010. Wrinkles in the rare biosphere: Pyrosequencing errors can lead to artificial inflation of diversity estimates. Environmental Microbiology 12:118–23. Lamy D, Obernosterer I, Laghdass M, Artigas LF, Breton E, Grattepanche JD, et al. 2009. Temporal changes of major bacterial groups and bacterial heterotrophic activity during a Phaeocystis globosa bloom in the eastern English Channel. Aquatic Microbial Ecology 58:95–107. Lanzen A, Jørgensen SL, Bengtsson MM, Øvreås L, Urich T. 2011. Exploring the composition and diversity of microbial communities at the Jan Mayen hydrothermal vent field using RNA and DNA. FEMS Microbiology Ecology 77: 577–89. Lara E, Heger TJ, Sceihing R, Mitchell EAD. 2011. COI gene and ecological data suggest size-dependent high dispersal and low intra-specific diversity in free-living terrestrial protists (Euglyphida: Assulina). Journal of Biogeography 38:640–50. Lopez-Garcia P, Philippe H, Gail F, Moreira D. 2003. Autochthonous eukaryotic diversity in hydrothermal sediment and experimental microcolonizers at the MidAtlantic Ridge. Proceedings of the National Academy of Sciences 100:697–702. Lupski JR, Reid JG, Gonzaga-Jauregui C, Deiros DR, Chen DCY, Nazareth L, et al. 2010. Whole-genome sequencing in a patient with Charcot-Marie-Tooth neuropathy. New England Journal of Medicine 362:1181–91. Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J. 2005. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437:376–80. Medinger R, Nolte V, Pandey RV, Jost S, Ottenwalder B, Schlotterer C, Boenigk J. 2010. Diversity in a hidden world: potential and limitation of next-generation

sequencing for surveys of molecular diversity of eukaryotic microorganisms. Molecular Ecology 19(suppl 1):32–40. Meyer F, Paarmann D, D’Souza M, Olson R, Glass EM, Kubal T, et al. 2008. The metagenomics RAST server – a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics 9:386. Monchy S, Grattepanche JD, Breton E, Meloni D, Sanciu G, Chabé M, et al. 2012. Microplankton community structure in a coastal system relative to a Phaeocystis bloom inferred from morphological and tag pyrosequencing methods. PLoS One 7:e39924. 12 pages. Nassanova E, Smirnov A, Fahrni J, Pawlowski J. 2010. Barcoding amoebae: Comparison of SSU, ITS and COI genes as tools for molecular identification of naked lobose amoebae. Protist 161:102–15. Pawlowski J, Christen R, Lecroq B, Bachar D, Shahbazkia HR, Amaral-Zettler L, Guillou L. 2011. Euakryotic richness in the abyss: insights from pyrotag sequencing. PLoS One 6: e18169. 10 pages. Pernice MC, Logares R, Guillou L, Massana R. 2013. General patterns of diversity in major marine microeukaryote lineages. PLoS One 8:e57170. 11 pages. Persson OP, Pinhassi J, Riemann L, Marklund BI, Rhen M, Normak S, et al. 2009. High abundance of virulence gene homologues in marine bacteria. Environmental Microbiology 11:1348–57. Polz MF, Cavanaugh CM. 1998. Bias in template-to-product in multitemplate PCR. Applied and Environmental Microbiology 64:3724–30. Pommier T, Neal PR, Gasol JM, Coll M, Acinas SG, Pedrós-Alió C. 2010. Spatial patterns of bacterial richness and evenness in the NW Mediterranean Sea explored by pyrosequencing of the 16 rRNA. Aquatic Microbial Ecology 61:221–33. Prokopowich CD, Gregory TR, Crease TJ. 2003. The correlation between rDNA copy number and genome size in eukaryotes. Genome 46:48–50. Prosser JI. 2010. Replicate or lie. Environmental Microbiology 12:1806–10. Quince C, Lanzen A, Curtis TP, Davenport RJ, Hall N, Head IM, et al. 2009. Accurate determination of microbial diversity from 454 pyrosequencing data. Nature Methods 6:631–41. Riemann L, Steward GF, Azam F. 2000. Dynamics of bacterial community composition and activity during a mesocosm diatom bloom. Applied and Environmental Microbiology 66:578–87. Rusch DB, Halpern AL, Sutton G, Heidelberg KB, Williamson S, Yooseph S, et al. 2007. The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific. PLoS Biology 5:e77. 33 pages. Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, et al. 2009. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Applied and Environmental Microbiology 75:7537–41. Schmieder R, Edwards R. 2011. Quality control and preprocessing of metagenomic datasets. Bioinformatics 27:863–64. Shokralla S, Spall JL, Gibson JF, Hajibabaei M. 2012. Nextgeneration sequencing technologies for environmental DNA research. Molecular Ecology 21:1794–805. Stoeck T, Bass D, Nebel M, Christen R, Jones MD, Breiner HW, Richards TA. 2010. Multiple marker parallel tag

MARINE BIOLOGY RESEARCH

environmental DNA sequencing reveals a highly complex eukaryotic community in marine anoxic water. Molecular Ecology 19(suppl 1):21–31. Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, Eisen JA, et al. 2004. Environmental genome shotgun sequencing of the Sargasso Sea. Science 304:66–74. Youssef N, Sheik CS, Krumholz LR, Najar FZ, Roe BA, Elshahed MS. 2009. Comparison of species richness

13

estimates obtained using nearly complete fragments and simulated pyrosequencing-generated fragments in 16S rRNA gene-based environmental surveys. Applied and Environmental Microbiology 75:5227–36. Zhu F, Massana R, Not F, Marie D, Vaulot D. 2005. Mapping of picoeukaryotes in marine ecosystems with quantitative PCR of the 18S rRNA gene. FEMS Microbiology Ecology 52:79–92.