and ⤠5% contamination) with varying degree of fragmentation ranging from 55 to 202 fragments/Mbp. The horizon- tal line represents ... Staphylococcus capitis.
High throughput in-situ metagenomic measurement of bacterial replication at ultra-low sequencing coverage Emiola and Oh
C
B 1250
Non-growing bacteria
750
Genome Location 0
Cov@terminus
0
25 50 Average coverage
p = 2.49 x 10
500 1000 1500 2000 Genome Location (Kbp)
−1 −3 −4 500
1000
1500
2000
Genome Location (Kbp)
F
● PTR 6 ● OD600
p = 2.26 x 10
2000 OD600
●
4
2
Count
1000
●
●
Coverage
OD600
0.4x
0.2x Refined Unrefined
●
●●
1
2
3
Distance from ori
4
(Mb)
1.8
C. simulans
● 0.5
0.0
1.50
● ●
●
0 0
1.75
300
1.0
500
2.25 2.00
●● ● ● ● 100 200 Time (min)
1.5 ● PTR ● OD600
●
● ●
0
1
● 2
●● 3 Time (hr)
●
●
● ● ●
0 ● 0
●
●
●●
−2.5
2.50
S. epidermidis ●
1500
Log delta
Refined GRID = 1.81 Species heterogeneity (1- Refined/Unrefined) 0.515
−2
-16
0.0
Unrefined GRID = 3.73
0
75
E
D -10
−2.0
1
250
Growth rate=
Trough
−1.5
500
Cov@origin
Peak
0
Log % coverage
Number of genomes
1000
−1.0
PTR
Increasing reads coverage
Growing bacteria
1.7 1.6
● ●
1.5 ● 1.4
● 4
PTR
Replication origin
Log % coverage
A
5
1.3
Supplementary Figure 1: GRiD benchmark (A) Growing bacteria have higher read coverage in regions close to the origin of replication (ori) compared to the terminus (ter) region. Growth rate can be measured as the ratio of coverage at the ori and ter regions. (B) Average coverage of genomes calculated from a metagenomic skin dataset (n = 698) with median read count of 17.9 million reads per sample10. The red vertical line represents a coverage cutoff of 5x which is required by iRep. (C) To minimize the level of noise during GRiD estimation, GRiD utilizes the lowest point of expected variance of the mean for the peak value, while the upper point of the variance of the trough mean is selected. The lower figure shows GRiD calculations of S. epidermidis in a skin sample with or without refinement. Overestimation of growth rate could occur without refinement. (D) Reproducibility of S. epidermidis GRiD estimates from a skin dataset after subsampling in the presence and absence of refinement. GRiD estimates are significantly (p < 0.001, Wilcoxon rank-sum test) more reproducible when refinement is included. (E) Barplot showing the distance of dnaA from ori in 2561 bacterial genomes obtained from the Database of Replication Origins (doriC) (http://tubic.tju.edu.cn/doric/index.php). (F) In vitro growth curve of S. epidermidis and C. simulans obtained from pure cultures and the corresponding PTR. Both microbes had an exponential doubling time of 30 min. Source data are provided as a Source Data file.
A
B
C
20 0.2
0.4 15
10
Delta
Delta 0.2
0.1
0.0
0.0
2
3 Delta
4
5
Fragments/Mbp
75
1
50
0
92 10 4 11 0 12 3 13 0 14 0 14 8 16 8 17 5 20 2
0
89
5
55
(dnaA/ori) x (ter/dif) / species heterogeneity
Coverage 0.2 0.4 1
Coverage 0.2 0.4 1
25
Completeness (%)
Supplementary Figure 2: Assessment of GRiD parameters (A) The combined role of dnaA coverage, dif coverage, and species heterogeneity on GRiD accuracy. PTR was initially calculated for S. epidermidis using a closed circular reference genome from a skin dataset, and then, GRiD was calculated using the same reference genome, but fragmented into 100 Kb fragments and reshuffled. The differences in growth estimates (delta) between PTR and GRiD (x-axis) are displayed as a factor of dnaA coverage, dif coverage, and species heterogeneity (y-axis). The red vertical line represents a delta cutoff of 0.15 which we considered as the threshold for high accuracy while the horizontal line is the y-axis cutoff for high accuracy. (B) The effect of genome fragmentation on GRiD reproducibility using 12 high quality bacterial bins (≥ 95% completeness and ≤ 5% contamination) with varying degree of fragmentation ranging from 55 to 202 fragments/Mbp. The horizontal line represents a delta cutoff of 0.15. (C) The effect of genome completeness on GRiD reproducibility. A genome bin with 89 fragments/Mbp, which is at the boundary for accuracy cutoff at ultra-low coverage as shown in ‘B’ above, was randomly subsampled prior to GRiD analysis. This subsampling step was conducted 10 times. The horizontal line represents a delta cutoff of 0.15. Source data are provided as a Source Data file.
Significant
6
GRiD
Non-significant
4
2
1
0.8
0.6
0.4
0.2
0
ER R ER 59 R 4 ER 59 294 R 4 ER 59 299 R 4 ER 59 308 R5 43 ER 9 11 R 4 ER 59 318 R 4 ER 59 326 R 4 ER 59 331 R5 43 ER 9 35 R 4 ER 59 348 R 4 ER 59 349 R 9 ER 59 000 R 9 ER 59 038 R 9 ER 59 044 9 SR R59 136 R 9 SR 15 142 R 0 SR 15 698 R 0 3 SR 15 698 R 06 6 SR 20 98 R 43 8 SR 63 72 R 6 8 SR 94 581 R9 81 48 55 28 4
Supplementary Figure 3: GRiD analysis from metagenomic dataset (A) Inter-individual bacterial growth differences and association with psoriasis patient characteristics. Statistical differences between population groups were determined using the Wilcoxon rank-sum test. (B) GRiD values of Bdellovibrio species in each environmental sample. (C, D, E) Growth rate correlation between bacterial species in dry (C), moist (D) and oily (E) sites from a skin metagenomic dataset. Blue and red circles indicate positive and negative Spearman correlation respectively. Larger circles and darker colors indicate a higher correlation. Source data are provided as a Source Data file.
Concatenate all samples
Sample pool
Contigs Extract contigs > 1 Kb Contigs pool
Bowtie mapping
De novo assembly (MEGAHIT)
Unmapped reads
Concatenate unmapped reads De novo assembly (SPAdes) Extract contigs/scaffolds > 1 Kb, concatenate with previous contig pool Binning (MetaBAT) Genome bins Quality check, cutoff at 75% completeness and 5% contamination (CheckM) High quality bins
GRiD
Supplementary Figure 4: Flowchart for the identification of uncultivated bacteria prior to GRiD analysis.