Downloaded - Journal of Clinical Microbiology - American Society for ...

4 downloads 40980 Views 3MB Size Report
Sep 9, 2015 - Fax: +44 131 242 6008 Email: [email protected]. 13. 14 ... silico analyses (VirulenceFinder, Resfinder and local BLAST ... detection by providing identification and typing services for E. coli O157 and other STEC.
JCM Accepted Manuscript Posted Online 9 September 2015 J. Clin. Microbiol. doi:10.1128/JCM.01066-15 Copyright © 2015, American Society for Microbiology. All Rights Reserved.

1

The utility of Whole Genome Sequencing of Escherichia coli O157 for outbreak detection and epidemiological surveillance

2 3

Running Title: WGS of E. coli O157

4 5

Anne Holmes1, Lesley Allison1, Melissa Ward3, Timothy J. Dallman4, Richard Clark2, Angie Fawkes2, Lee Murphy2, Mary Hanson1

6 7

1

Scottish E. coli O157/VTEC Reference Laboratory (SERL), Royal Infirmary of Edinburgh, Edinburgh EH16 4SA.

8

2

Wellcome Trust Clinical Research Facility (WTCRF), University of Edinburgh, Western General Hospital, Edinburgh, EH4 2XU.

9

3

Centre for Immunity, Infection and Evolution, School of Biological Sciences, University of Edinburgh, King’s Buildings, Edinburgh EH9 3FL.

11

4

Gastrointestinal Bacterial Reference Unit, Microbial Services Division, Public Health England, 61 Colindale Avenue, London, NW9 5EQ.

12

Corresponding Author: Dr Anne Holmes, Scottish E. coli O157/VTEC Reference Laboratory (SERL), Royal Infirmary of Edinburgh,

13

Edinburgh EH16 4SA. Tel: +44 1312426013. Fax: +44 131 242 6008 Email: [email protected]

10

14

1

15

ABSTRACT

16

Detailed laboratory characterisation of Escherichia coli O157 is essential to inform epidemiological investigations. This study assessed the

17

utility of whole genome sequencing (WGS) for outbreak detection and epidemiological surveillance of E. coli O157, and the data was used to

18

identify discernable associations between genotype and clinical outcome. One hundred and five E. coli O157 strains isolated over a five year

19

period from human faecal samples in Lothian, Scotland were sequenced using the Ion Torrent PGM. A total of 8721 variable sites in the core

20

genome were identified among the 105 isolates; 47% of the single nucleotide polymorphisms (SNPs) were attributable to 6 ‘atypical’ E. coli

21

O157 and included recombinant regions. Phylogenetic analyses showed WGS correlated well with the epidemiological data. Epidemiological

22

links existed between cases whose isolates differed by ≤ 3 SNPs. WGS also correlated well with MLVA typing data, with only three discordant

23

results observed, all among isolates from cases not known to be epidemiologically related. WGS produced a better supported, higher resolution

24

phylogeny compared with MLVA confirming the method is more suitable for epidemiological surveillance of E. coli O157. A combination of in

25

silico analyses (VirulenceFinder, Resfinder and local BLAST searches) were used to determine stx subtype, MLST type (15 loci) and the

26

presence of virulence and acquired antimicrobial resistance genes. There was a high level of correlation between the WGS data and our routine

27

typing methods, although some discordant results were observed, mostly related to the limitation of short sequence read assembly. The data was

28

used to identify sub-lineages and clades of E. coli O157, and when correlated with the clinical outcome data showed one clade, Ic3, was

29

significantly associated with severe disease. Together, the results show WGS data can provide higher resolution of the relationships between E.

2

30

coli O157 isolates compared with MLVA. The method has the potential to streamline the laboratory workflow and provide detailed information

31

for the clinical management of patients and public health interventions.

32 33

INTRODUCTION

34

Shiga toxin-producing Escherichia coli (STEC) are important gastrointestinal pathogens and a common cause of acute renal failure in

35

children worldwide (1,2,3). In Scotland, the epidemiology and clinical outcome of E. coli O157 infection in humans has been closely monitored

36

by Health Protection Scotland (HPS) since the introduction of enhanced surveillance in 1999, following a rapid increase in the number of

37

microbiologically confirmed cases of STEC infection during the mid-1990s. Enhanced surveillance was extended in 2003 to include non-O157

38

STEC and enhanced surveillance of Haemolytic Uraemic Syndrome (HUS) was established in 2003. The most common STEC serogroup

39

isolated from patients in Scotland is E. coli O157. In the past five years, the number of confirmed and cultured cases of E. coli O157 in Scotland

40

ranged from 195-263 per year compared with 22-75 per year of non-O157 STEC (4, 5). On average, 43% of STEC cases in Scotland require

41

hospitalisation and 9% progress to (HUS), mostly children (5). The reported rate of E. coli O157 infection in Scotland (4.9 cases per 100,000

42

population in 2014, HPS personal communication) is higher than most countries, including other countries within the United Kingdom (5, 6).

43

The reasons for this are unclear and likely to be multifactorial, but may be related to the relative density of humans to cattle, which are the main

44

reservoir of E. coli O157. Although large outbreaks of infection associated with food products have been reported in Scotland (7, 8, 9), the

3

45

majority of infections are apparently sporadic and contact with animal faeces, exposure to farm animals or farm environments and drinking

46

water from private water supplies have all been identified as a strong risk factor for infection (10, 11).

47

The continued use of appropriate control measures, which is dependent on the prompt diagnosis of cases and detection of outbreaks, is essential

48

to limit the spread of this pathogen. The Scottish Escherichia coli O157/Verotoxigenic E. coli Reference Laboratory (SERL) works closely with

49

Scottish National Health Service (NHS) Boards and with Health Protection Scotland, and plays a pivotal role in case confirmation and outbreak

50

detection by providing identification and typing services for E. coli O157 and other STEC. Putative isolates of E. coli O157 are referred to SERL

51

from all fourteen Scottish health boards for confirmation of identity and typing. In addition, faecal samples from cases with typical symptoms of

52

STEC, but culture negative in local diagnostic laboratories, are sent to the SERL for further analysis. Currently the SERL uses a range of

53

phenotypic and genotypic typing techniques, including real-time-PCR, phage typing, disc diffusion susceptibility testing, multi-locus variable

54

number tandem repeat analysis (MLVA) and pulsed-field gel electrophoresis (PFGE). However, there is an ever increasing body of evidence

55

demonstrating the potential benefits of WGS as a tool which may replace these methods (12, 13, 14, 15, 16). Bacterial WGS is becoming

56

increasingly affordable, and has demonstrated improved resolution over more conventional typing methods that only sample a fraction of the

57

genome (17,18,19). As a reference laboratory, it is essential that we continually aim to improve our services to provide precise and timely data to

58

support public health interventions. In this study we have compared our current methods with WGS for the typing and characterisation of E. coli

4

59

O157 for epidemiological surveillance and detection of outbreaks. In addition, we have correlated genome content with pathogencity and

60

ancestry to try to achieve a better understanding of the genetic factors associated with virulence in STEC.

61

MATERIALS AND METHODS

62

Bacterial isolates

63

Isolates of E. coli O157 (n=105) from human faecal samples received in Lothian laboratories between 01/04/2007 and 31/03/2012 were analysed

64

in this study (Table S1). These included: isolates from 10 cases received over an 11 month period in 2011 associated with a UK-wide outbreak,

65

which was linked to unwashed vegetables (20); isolates that were epidemiologically linked by place and time (eight single household clusters;

66

one cluster involving two households that was farm related; and one travel associated cluster); and sporadic isolates from 27 patients thought to

67

have acquired their infection outside the UK. The patients had all been interviewed using standardised questionnaires to collect information,

68

including severity of infection and travel history, to identify any epidemiological links and try to pinpoint the source of infection. The age of the

69

patients sampled ranged between 1 – 85 years, with a gender distribution of 42% male and 58% female.

70

DNA isolation. E. coli O157 were incubated on Sorbitol MacConkey agar (Oxoid Ltd, Basingstoke, UK) overnight at 37°C. DNA was extracted

71

using the Wizard® Genomic DNA Purification Kit (Promega Ltd UK, Southampton, UK), as described by the manufacturer. The quality of the

72

genomic DNA was checked by gel electrophoresis, and the quantity and purity was measured using the NanoDropTM 1000 apparatus

73

(NanoDropTM Products, Thermo Scientific).

5

74

Phenotypic testing

75

Phage typing was performed using 16 phages as previously described (21). β-glucuronidase production was assessed by using TBX agar

76

(tryptone bile X-glucuronide agar; E&O Laboratories); ATCC25922 was used as a control strain. Antibiotic sensitivity patterns were determined

77

using the disc diffusion method with 15 antibiotics routinely tested in SERL for surveillance purposes: chloramphenicol, ciprofloxacin,

78

ampicillin, gentamycin, streptomycin, meropenem, nalidixic acid, kanamycin, tetracycline, trimethoprim, piperacillin/tazobactam, cefotaxime,

79

ceftazidime, co-amoxyclav and co-trimoxazole. The European Committee on Antibiotic Susceptibility Testing (EUCAST) criteria were used to

80

determine resistance.

81

MLVA

82

MLVA was performed as previously described (22). Raw data (.fsa files) from the ABI3130 (Applied Biosystems) were imported and analysed

83

in Bionumerics v6.6 (Applied Maths, Belgium) using the MLVA plugin. A minimum spanning tree was produced using the MLVA allele

84

numbers.

85

Shigatoxin Gene Subtyping

86

PCR and gel electrophoresis was performed as described by Scheutz et al. (23) with a few modifications. A sample volume of 15µl with

87

HotstarTaq Master Mix Kit (Qiagen UK Ltd., Crawley, UK), 0.2µM primer(s), and 1.5µl of DNA. stx1a, -c and -d primers were multiplexed in a

88

single reaction. stx2 primers for -b, -e and -g were multiplexed while -a, -c, -d and -f were used in singleplex reactions. The thermocycler

6

89

conditions were 95°C for 15 min then 35 cycles of 94°C for 50 sec, 66°C for 40sec and 72°C for 3min. Amplicons were run on a 2% agarose

90

gel.

91

Whole Genome Sequencing

92

Sequencing was performed using the Ion Torrent Personal Genome Machine (PGM, Life Technologies, Carlsbad, CA, USA) at the Wellcome

93

Trust Clinical Research Facility, Edinburgh. Samples were sequenced retrospectively over a period of ~1 year on the same machine, by the same

94

personnel, and analyzed using the same version of the software to limit the effects of batching. Libraries were generated using 1 µg of the

95

gDNA and enzyme fragmented using the Ion XpressPlus Fragment Library Kit. Specifically, gDNA was fragmented to 200-300bp blunt-ended

96

DNA fragments. The fragmented DNA was ligated to Ion-compatible barcoded adapters, followed by nick-repair to complete the linkage

97

between adapters and DNA inserts. The adapter-ligated library was then size-selected for optimum length (330bp) and the final library was

98

amplified. An aliquot of the library was analyzed on the Bioanalyzer instrument with an Agilent High Sensitivity DNA Kit, to assess the size

99

distribution and determine the molar library concentration. Two barcoded libraries were pooled at a concentration of 100pM, 10pM of the pool

100

was added into an emulsion PCR based template reaction; in this reaction the fragments generated during the library prep were attached to Ion

101

sphere particles (ISPs) and clonally amplified. This process was carried out using the Ion One Touch 2 system and the Ion PGM Template OT2

102

200 Kit. Quality control was performed on the Qubit 2.0 Fluorometer using the Ion sphere Quality Control assay. The optimal amount of library

103

corresponds to the library dilution that gives percent template ISPs between 10–30%. The template positive ISPs were then enriched and

7

104

sequenced on the PGM using the Ion PGM Sequencing 200 Kit v2 and an Ion 316 chip. Average coverage was determined using an estimated

105

genome size of 5.528Mb and found to be 42x (23- 106x; Table S1).

106

Bioinformatics

107

Sequence reads were mapped to the reference strain (Sakai, GenBank accession number NC_002695) using BWA-MEM (24) and isolates with

108

an average coverage of < 20x were excluded from the analysis. Single Nucleotide Polymorphisms (SNPs) were then identified using GATK2

109

(25) in unified genotyper mode. Core genome positions (defined as sites for which a base was called for all isolates) that had a high quality SNP

110

(>90% consensus, minimum depth 10x, GQ >= 30) or were the same as the reference base were extracted to produce a whole core genome

111

alignment for phylogenetic analysis. Sequence reads were also assembled de novo using the Torrent Suite™ Software and the assemblies

112

(contigs) were used for in silico analysis as described below.

113

In Silico Analysis

114

Local BLAST databases were developed in either Bioedit (http://www.mbio.ncsu.edu/bioedit/bioedit.html) or Bionumerics v6.6 (Applied Maths)

115

for MLST, stx subtyping, virulence gene detection and LSPA-6 typing. The databases were then queried using the de novo assemblies. For

116

MLST, the database consisted of the alleles of 15 housekeeping genes (arcA, aroE, aspC, clpX, cyaA, dnaG, fadD, grpE, icdA, lysP, mdh, mtlD,

117

mutS, rpoS, uidA), which were downloaded from the EcMLST website (www.shigatox.net/ecmlst/cgi-bin/index); for stx subtyping, the database

118

consisted of 600bp sequences (342 bp of the C-terminal part of subunit A and 258bp of the N-terminal of subunit B) of 106 stx variants used by

8

119

Scheutz et al. (23). For LSPA-6 typing, partial gene sequences of the six genes (folD-sfmA, Z5935, yhcG, rbsB, rtcB and arp-iclR) were

120

extracted from Sakai and used to form the database (26, 27). Isolates with genotype 111111 were classified as LSPA-6 lineage I, those with

121

genotype 211111 were classified as LSPA-6 lineage I/II, while those with other derivations were classified as LSPA-6 lineage II.

122

Strains belonging to Clade 8 (28) were identified based on the discriminatory SNP (C/A at position 539 in gene ECs2357) described by Riordan

123

et al. (29).

124

VirulenceFinder and Resfinder (http://cge.cbs.dtu.dk/services/) was used to determine the presence of E. coli virulence genes and acquired

125

antibiotic resistance genes using identity thresholds of 85% and 98% respectively (30,31).

126

Data Analysis

127

Ridom EpiCompare (http://www3.ridom.de/epicompare/) was used to calculate the discriminatory power of the typing methods. Odds ratios

128

were calculated using MedCalc (http://www.medcalc.org/calc/odds_ratio.php).

129

Recombination detection

130

The number of variable sites across the core genome alignment was calculated in sliding windows of 10,000 base pairs and plotted using custom

131

Python and R scripts to identify areas of the genome with an unusually high density of SNPs. The entire E. coli O157 core genome alignment

132

was screened for recombination using BratNextGen (32), and was also screened for recombination with BratNextGen when the two highly

133

divergent and putative recombinant sequences (XH18570E and XH22083W) were excluded from the alignment.

9

134

Maximum Likelihood Phylogenetic Analysis

135

Maximum likelihood core genome phylogenies were constructed using the RaxML software (33; Linux version 8) and PhyML (34). For the

136

published RaxML phylogeny, the general time-reversible model of nucleotide substitution was used, with gamma-distributed rate heterogenetity

137

across sites and 1,000 bootstrap replicates. Neighbor-joining phylogenies were also produced using MEGA version 5 (35). We confirmed that

138

the tree topology was robust to the use of different software frameworks as well as different choices of evolutionary model. In the absence of an

139

obvious outgroup to O157, phylogenies were rooted at the midpoint between the two most divergent taxa in the trees.

140

RESULTS

141

Analysis of core-genome

142

Our core genome alignment for the 105 E. coli O157 strains was 4,122,236 base pairs in length and contained 8721 variable sites. However, a

143

large number (4110; 47%) of the SNPs were attributable to 6 ‘atypical’ E. coli O157 strains, including a sorbitol fermenting (SF) E. coli O157

144

(XH25052C) and three β-glucoronidase positive (β-GUD +) E. coli O157 (XH20443W, XH16200L and XH24967A). When looking across the

145

whole alignment, a ~150 kilobase region with a high density of SNPs (an average of 148.33 variable sites per 10,000bp window compared to a

146

genome-wide average of 21.16 variable sites per 10,00bp window) was observed near the end of the genome. Two of the atypical strains

147

(XH18570E and XH22083W) were identified by the BratNextGen software as recombinant in this region, suggesting that genetic material has

148

entered from a donor strain. Figure S1 shows plots of the distribution of variable sites across the genome in the presence and absence of

10

149

recombinant strains XH18570E and XH22083W, as well as with and without recombinant regions excluded. Sanger sequencing of 792bp of the

150

mutL gene, which lies within the putative recombinant region confirmed the SNPs detected by WGS were genuine and a BLAST analysis of this

151

gene region showed it was identical to the mutL gene of E. coli K12.

152

Removal of recombinant strains XH18570E and XH22083W from the alignment followed by removal of additional, small, putative recombinant

153

regions identified by BratNextGen for other strains resulted in a 4,114,451bp alignment containing 6626 variable sites.

154

Epidemiological Concordance

155

Eleven groups of two or more isolates known to be epidemiologically-linked were present in our analysis, and the epidemiologically linked

156

isolates formed clusters with 100% bootstrap support in the core genome phylogeny (Figure 1). Within each epidemiologically linked cluster,

157

there were fewer than four core SNPs separating the isolates. Three SNPs were identified among 10 isolates from a UK-wide PT8 outbreak

158

spanning 11 months in 2011. A maximum of two SNPs were identified among isolates from the other epidemiologically-linked cases received

159

by SERL within a shorter time frame (0 to 14 days). The SNPs detected among epidemiologically related cases were confirmed by Sanger

160

sequencing. Table S2 provides the SNP distance between all pairs of isolates. In three instances, isolates from cases with no apparent

161

epidemiological link clustered together in the core SNP phylogeny and were separated by a maximum of 1 SNP. In two cases (XH11963F &

162

XH12059K; XH15449Q & XH15521V) the isolates were temporally related (isolated within 8 days), had the same MLVA profile and did not

163

differ at any core SNP positions (i.e. their core genome sequences were identical), suggesting that there may have been an unidentified common

11

164

source of infection. Another isolate (XH16734G) clustered with XH15449Q & XH15521V (differing to these isolates by one SNP) but was a

165

MLVA double locus variant (DLV) and was isolated ~ 4 months later. In the third case (XH11856D & XH12193B), the two isolates differed by

166

one SNP and clustered in the core genome phylogeny with 100% bootstrap support, were MLVA single locus variants (SLVs) and isolated 39

167

days apart. Two additional MLVA SLVs (XH14013B & XH17884G; XH14653V & XH18908N) were observed in the dataset; however they

168

were not related in time and/or space and differed by 36 and 126 SNPs, indicating they were not closely related. The genetic variation among all

169

pairs of isolates with no epidemiological links was 9 - 1632 SNPs (Table S2).

170

Concordance with MLVA and Phage typing

171

A comparison of the minimum spanning tree produced by MLVA and the phylogeny based on the core genome showed that, although

172

epidemiologically-linked isolates clustered together in both trees, the overall topologies and the clustering of epidemiologically unrelated

173

isolates differed between the trees. Within the core genome phylogeny, all of the sub-lineages we defined, and the majority of clades within

174

these, were supported by bootstrap values of 100% (Figure S2 and S3). By contrast, the minimum spanning tree based upon the MLVA data

175

(Figure S4) had relatively low bootstrap values, suggesting MLVA only provides meaningful phylogenetic relationships for closely related

176

isolates. Similarly, although Phage Typing was broadly congruent with the core genome phylogeny, identical phage types were distributed

177

across the core genome phylogeny in different clades (Figure 1). For example PT 4 was distributed in four of our defined sub-lineages across

178

the tree, suggesting a lack of specificity of Phage Typing.

12

179

Discriminatory power of typing methods

180

The SNP data separated the 105 isolates into 81 different types (Simpson’s Index of Diversity [SID] 0.989; 95% Confidence Interval [CI] 0.979-

181

0.998), where isolates that differed by > 3 SNPs were considered a different type. MLVA produced 80 different allelic profiles (SLVs were

182

considered the same type) and had the same discriminatory index as the SNP method (SID 0.989; 95% CI 0.979-0.998). The types were

183

concordant except for the two sets of MLVA SLV’s that fell in different parts of the core genome phylogeny and isolate XH16734G that differed

184

at two MLVA alleles (VNTR3 and VNTR9 by one repeat) to XH15449Q and XH15221V (Table S1 and Figure 1). Phage typing produced 14

185

different lysis patterns and had a much lower SID than the other two methods (SID 0.853; 95% CI 0.817-0.89).

186

stx Subtyping

187

The stx subtypes were determined by PCR and in-silico analysis. A relatively low diversity of subtypes was detected among the 105 isolates

188

(Figure 1, Table S1). The most common subtype was stx2a/stx2c (n=42), followed by stx2c only (n=26) and stx1a/stx2c (n=24). Eleven isolates

189

carried stx2a only, while 2 had stx1a only.

190

The PCR results were concordant with the local BLAST and VirulenceFinder for 102 (97%) and 97 (92%) isolates respectively. The local

191

BLAST did not detect the stx2c gene in three isolates found to carry stx2a and stx2c by PCR; in one case only a partial sequence was identified,

192

while in another, two different stx2a subtypes were detected. VirulenceFinder produced similar results for these isolates, however in one case it

193

detected the presence of a stx2c subunit A but the adjacent subunit B was reported as stx2a. Similarly, in 4 other cases, VirulenceFinder detected

13

194

different stx2 variants in subunits that lay beside each other in the genome. In one case, a stx2d subunit A was reported in one isolate, which was

195

not detected by PCR or the local BLAST.

196

Virulence Gene Detection

197

VirulenceFinder combined with local BLAST searches were used to determine the presence of virulence genes, and the complete list of genes

198

detected in each isolate is shown in the supplementary material (Table S1). A total of 23 virulence genes were detected among the 105 isolates.

199

Twenty of the genes (astA, eae, ehxA, espA, espB, espF, espJ, espP, etpD, gad, iha, iss, katP, nleA, nleB, nleC, prfB, tccP, tir, and toxB) were

200

present, or partially present (