AEM Accepts, published online ahead of print on 30 August 2013 Appl. Environ. Microbiol. doi:10.1128/AEM.01455-13 Copyright © 2013, American Society for Microbiology. All Rights Reserved.
1
From green to red: horizontal gene transfer of phycoerythrin gene cluster between
2
Planktothrix strains
3
Running title: Transfer of phycoerythrin gene cluster
4
Ave Tooming-Klunderud1#, Hanne Sogge1,2, Trine Ballestad Rounge1,3, Alexander J.
5
Nederbragt1, Karin Lagesen1, Gernot Glöckner4,5, Paul Hayes6, Thomas Rohrlack7,8, Kjetill S.
6
Jakobsen1,2#
7
(1) University of Oslo, Centre for Ecological and Evolutionary Synthesis (CEES),
8
Department of Biosciences, P.O.Box 1066, 0316 Oslo, Norway
9
(2) University of Oslo, Microbial Evolution Research Group (MERG), Department of
10
Biosciences, P.O.Box, 1066, 0316 Oslo, Norway
11
(3) Cancer Registry of Norway, P.O. box 5313 Majorstuen, N-0304 Oslo, Norway
12
(4) Institute for Biochemistry I, Medical Faculty, University of Cologne, Joseph-Stelzmann-
13
Straße 52; D-50931 Köln, Germany
14
(5) Leibniz-Institute of Freshwater Ecology and Inland Fisheries, IGB, Müggelseedamm 301;
15
D-12587 Berlin, Germany
16
(6) Faculty of Science, University of Portsmouth, St Michael’s Building, White Swan Road,
17
Portsmouth, PO1 2DT, United Kingdom
18
(7) NIVA, Norwegian Institute for Water Research, 0411 Oslo, Norway
19
(8) Norwegian University of Life Sciences, Department of Plant and Environmental Sciences,
20
P.O.Box 5003, 1432 Ås, Norway
21
#
Corresponding authors:
1
22
Kjetill S. Jakobsen:
[email protected]
23
Ave Tooming-Klunderud:
[email protected]
24
Keywords: Cyanobacteria, Planktothrix, horizontal gene transfer, phycoerythrin gene cluster,
25
chemotype, recombination, phycobilisome
26 27
Abstract:
28
Horizontal gene transfer is common in cyanobacteria and transfer of large gene clusters may
29
lead to acquisition of new functions and conceivably niche adaption. In the present study, we
30
demonstrate that horizontal gene transfer between closely related Planktothrix strains can
31
explain the production of the same oligopeptide isoforms by strains of different color.
32
Comparison of the genomes of eight Planktothrix strains revealed that strains producing the
33
same oligopeptide isoforms are closely related, regardless of color. We have investigated
34
genes involved in the synthesis of the photosynthetic pigments phycocyanin and
35
phycoerythrin, which are responsible for green and red appearance, respectively. Sequence
36
comparisons suggest the transfer of a functional phycoerythrin gene cluster generating a red
37
phenotype in a strain that is otherwise more closely related to green strains. Our data show
38
that the insertion of a DNA fragment containing the 19.7 kb phycoerythrin gene cluster has
39
been facilitated by homologous recombination, also replacing a region of the phycocyanin
40
operon. These findings demonstrate that large DNA fragments spanning entire functional
41
gene clusters can be effectively transferred between closely related cyanobacterial strains and
42
result in a changed phenotype. Further, the results shed new light on the discussion of the role
43
of horizontal gene transfer in the sporadic distribution of large gene clusters in cyanobacteria,
44
as well as the appearance of red and green pigmented strains.
2
45 46
Introduction
47
Horizontal gene transfer (HGT), the exchange of genetic information between two
48
organisms that do not share a recent ancestor–descendant relationship, is now recognized as a
49
major force shaping the evolutionary history of prokaryotes (e.g. [1-4]). HGT is considered to
50
be common in cyanobacteria [5]. Through the availability of bacterial genome sequences it
51
has become clear that HGT can occur throughout the genome, and that a substantial fraction
52
of genes have been horizontally transferred [5,6]. The quantity of genetic material that can be
53
horizontally transferred may range from small gene fragments (e.g. [7-9]) to fragments
54
spanning complete genes (e.g. [10-12]) and whole operons encoding complex biochemical
55
pathways (e.g. [13-15]). As even the transfer of a single or a few genes can give recipient
56
organisms the opportunity to implement a new function and exploit new ecological niches,
57
HGT contributes to the rapid creation of biological novelty that otherwise, through mutations
58
and gene duplications, might have taken millions of years to appear.
59
According to Andam and co-workers [1], HGT is the norm and not the exception,
60
while others call the transfer of genes between bacteria ‘both rare and promiscuous’[4].
61
Successful HGT depends on transfer of genetic material to the cell (via transformation,
62
conjugation, transduction or gene transfer agents), survival of the DNA in the cell, integration
63
of foreign DNA via recombination and finally fixation of the integrated DNA in the
64
population (involving for example selection). Since the rate of recombination decreases with
65
increased sequence dissimilarity [16,17], HGT events are more common among close
66
relatives, as shown by a recent analysis of 657 sequenced prokaryotic genomes [18].
67
For fixation of a newly transferred gene in the population, it should provide a relevant
68
function and this function must operate within the native machinery of the host cell. Since
3
69
bacterial genomes are subject to deletional bias [19], genes that do not contribute to fitness of
70
the organism will eventually be removed from the genome. Integration of new genes into
71
existing cellular networks can be facilitated by acquisition of an operon containing all genes
72
and regulatory regions required for function [20]. For single-gene acquisitions, the fate of new
73
genes depends largely upon the existing genes in its new host. Experimental studies have
74
shown that most HGT events are deleterious [21,22]. However, rare HGT events and
75
mutations can be selected for under particular conditions and are thus contribute to bacterial
76
adaptation and evolution [23-25].
77
Horizontal gene transfer events have also been demonstrated for the filamentous
78
cyanobacterium Planktothrix (e.g., [26-29]), which occurs in deep and stratified lakes in
79
temperate regions of the northern hemisphere. Traditionally, Planktothrix isolated from
80
different lakes have been classified into species according to morphological characteristics,
81
such as cell dimension and pigmentation. Following the first description of the genus
82
Planktothrix including 14 distinct species by Anagnostidis and Komárek [30], the number of
83
different species has been heavily disputed. Studies based on molecular data like sequences of
84
gas vesicle genes and 16S rRNA have suggested that whole Planktothrix genus is
85
monospecific [31,32] while Suda and co-workers [33] described four Planktothrix species
86
based on several genetic and phenotypic properties.
87
Planktothrix strains isolated from Norwegian lakes and classified as distinct species at
88
Algal Culture Collection of the Norwegian Institute for Water Research cannot be separated
89
by 16S rRNA. Recently, Rohrlack and co-workers [34,35] reported that strains of
90
Planktothrix showing >99% 16S rRNA gene sequence similarity may produce distinct
91
cellular patterns of oligopeptides, bioactive secondary metabolites synthesized mostly by non-
92
ribosomal peptide synthetases. Using the oligopeptide profiles produced by each strain as
93
markers, they grouped strains into distinct chemotypes (Cht). Based on field studies of the 4
94
Norwegian Lake Steinsfjorden, four coexisting Planktothrix chemotypes differing
95
considerably in seasonal dynamics, depth distribution and participation in loss processes,
96
were identified [34]. Since the production of oligopeptides is facilitated by several large and
97
independently evolving operons [36,37], strains associated with a distinct chemotype are
98
assumed to be more closely related. This hypothesis is also supported by data showing that
99
Planktothrix strains associated with same chemotype generally have same color, either red or
100
green [35]. However, in Lake Steinsfjorden, one chemotype was shown to comprise both red
101
and green strains [34,35]. The red and green appearance of Planktothrix strains is associated
102
with the content of accessory light harvesting pigments, the phycobiliproteins, involved in the
103
photoautotrophic machinery. Phycobilisomes, the macromolecular complexes formed from
104
phycobiliproteins, have an allophycocyanin core that links to the photosystems, and
105
peripheral light harvesting rods that comprise either phycocyanin or phycocyanin and
106
phycoerythrin (for review, see e.g. [38]). Phycocyanin, common to all cyanobacteria, imparts
107
a green appearance to the cell and absorbs red light (620-630 nm). Phycoerythrin absorbs
108
green light (560-570 nm) and imparts a dominant red color when present. The co-existence of
109
red and green strains within the same chemotype can be explained by acquisition or loss of
110
genes coding for phycoerythrin as suggested earlier for Synechococcus and other
111
picocyanobacteria [39-41].
112
The aim of this study was to investigate the genome arrangements leading to the co-
113
occurrence of red and green strains within the same oligopeptide chemotype. For that
114
purpose, the genomes of eight different Planktothrix strains classified as four different species
115
were sequenced, four red and four green strains, including a red and two green strains from
116
the same chemotype. We address the following questions: (1) how similar are the genomes of
117
closely related Planktothrix strains and is there any evidence for genetic substructuring
118
according to color or chemotype; (2) is the structure and chromosomal location of genes 5
119
encoding phycocyanin and phycoerythrin pigments the same in all strains; (3) in the light of
120
results from the two first questions, can the co-occurrence of red and green strains within the
121
same chemotype be explained by altered phycoerythrin genes and is this because of a) an
122
acquisition of phycoerythrin gene cluster by the red strain, or b) mutations leading to non-
123
functional phycoerythrin genes in two green strains.
124
Our results show that all eight Planktothrix genomes are highly similar, and that
125
strains associated with the same chemotype are the most closely related, regardless of color.
126
Furthermore, we reveal that a red strain from a chemotype dominated by green strains has
127
acquired the 19.7 kb phycoerythrin gene cluster. Our data indicate that the DNA fragment
128
containing phycoerythrin operon originated from a strain associated with a “red” chemotype.
129 130
Materials and Methods
131
Planktothrix strains and DNA isolation
132
Eight Planktothrix strains isolated from lakes Steinsfjorden and Kolbotnvatnet
133
(Norway) have been investigated. All strains have been kept in continuous, non-axenic
134
cultures at the Algal Culture Collection of the Norwegian Institute for Water Research
135
(NIVA), in Z8 medium and light at a photon flux density of 10 μmol m-2s-1, and a light-dark
136
cycle of 12:12 hours. Prior to genomic DNA isolation, cells were centrifuged and re-
137
suspended in TE buffer. DNA was extracted by the following procedure: cells were treated
138
with lysozyme (final concentration 15 mg/ml), followed by RNaseA and Proteinase K, (5
139
mg/ml, 1 % LDS) treatment. Samples were then incubated at 60°C (shaking at 300 rpm) for
140
60 min. In cases where the solution did not clear after 60 minutes, the incubation time was
141
prolonged with additional Proteinase K. Subsequently, 1 volume phenol: chloroform:
142
isoamylalcohol [25:24:1 (v/v)]) was added. The solution was mixed by inversion at 37°C for 6
143
30 min to remove pigments and proteins. After centrifugation, the upper layer was treated
144
twice with 1 volume chloroform–isoamyl alcohol [24:1 (v/v)]. DNA was precipitated using
145
0.1 volume 3 M sodium acetate and 2.5 volume ice-cold 96 % ethanol on ice for 1 hour. The
146
DNA pellet was washed twice with ice-cold 70 % ethanol, dried at room temperature and
147
dissolved in Tris-HCl buffer (pH 8.0). All DNA samples were purified using Amicon Ultra-
148
0.5mL 50K Centrifugal columns to ensure high-quality DNA for paired-end library
149
preparation using the 454 Life Sciences protocols. DNA was concentrated using
150
manufacture’s instructions and washed twice using Tris-HCl buffer, pH 8.0.
151
Sequencing
152
Seven out of eight Planktothrix genomes were sequenced using 454 pyrosequencing at
153
the Norwegian Sequencing Centre (http://www.sequencing.uio.no/). Strain NIVA-CYA 34
154
was initially sequenced using Sanger sequencing at Max Planck Institute for Chemical
155
Ecology. DNA sample of NIVA-CYA 34 was amplified using the REPLI-g® kit (Qiagen).
156
The resulting DNA was randomly sheared and the fragment size range from 2500 to 3000 kb
157
was selected for cloning into pUC18 standard vectors. The resulting clones were sequenced
158
from both ends with the standard sequencing primers on an ABI 3700 machine generating 20
159
066 sequencing reads comprising 16 Mb. An additional 454 shotgun library was prepared and
160
sequenced at the Norwegian Sequencing Centre to ensure satisfactory quality of the NIVA-
161
CYA 34 genome assembly. For the remaining seven strains, both shotgun and paired-end
162
libraries were prepared and sequenced to 23 – 41x coverage (see Supplementary Table 1 for
163
information about libraries and number of reads/bases sequenced for each strain).
164
Assembly and annotation
165
The newbler program (gsAssembler, Roche-454, Branford, USA) was used to
166
assemble the 454 reads into scaffolds and contigs (newbler version used for assembly of each
7
167
strain together with assembly statistics are shown in Supplementary Table 1). Since all strains
168
had been cultivated non-axenically, reads from co-cultured bacteria were present in the
169
dataset. To decrease the chance of co-assembling these contaminating reads with the genome
170
of the strain of interest, stringent overlap settings with a minimum of 98 % overlap identity
171
and a minimum of 60 bp overlap length were used. However, the assemblies of most strains
172
still contained some short scaffolds with low average read depth (below 5-10 x) and/or GC
173
percentages that diverged from high read depth scaffolds. These scaffolds were considered to
174
be derived from co-cultured bacteria present in the sample used for DNA extraction [42]. The
175
low read depth scaffolds were compared to the non-redundant NCBI protein database using
176
BLASTX, and all non-cyanobacterial matching scaffolds were removed before annotation.
177
Assembly of the NIVA-CYA 34 genome was done using both the Sanger and 454-
178
reads. Newbler was given a trimming file (-vt option) to remove pUC18 plasmid vector
179
sequences from the Sanger reads. The ‘-stopjoin’ and ‘-repfill’ options were used for
180
assembly. One low-coverage non-cyanobacterial scaffold was removed from the newbler
181
assembly. Four of the unscaffolded contigs were identified as containing cyanobacterial
182
sequences; these were added to the scaffolds before annotation.
183
According to classification of Genomes Sequence Standards [43], we consider the
184
assemblies as improved high-quality drafts. Annotation of all genome sequences was
185
performed using the U.S. Department of Energy (DOE) Joint Genome Institute (JGI)
186
integrated microbial genomes database and comparative analysis system (IMG) [44].
187
Comparison of genomes
188
The IMG system was used for pair wise comparison of genomes and calculation of
189
Pearson correlation coefficients of COG (Cluster of Orthologous Groups) profiles.
8
190
Homologous genes were defined as genes having a minimum of 80% sequence identity and
191
identified using BLASTP.
192
Hierarchical clustering by COG profiles was performed by IMG system using the tool
193
Cluster (http://www.falw.vu/~huik/cluster.htm). For construction of the pairwise hierarchical
194
tree, a function profile vector (gene counts per COG) was generated for each genome. The
195
distance metrics between these profile vectors was calculated by means of un-centered
196
correlation. Pairwise single-linkage clustering was performed, grouping the two closest
197
profile vectors first to form a group, then grouping the next pair of closest groups or vectors
198
until all genomes were incorporated into the hierarchical tree.
199
Phylogenetic analyses of genomes were performed using a set of core genes. Core
200
genes were defined as genes present in all genomes and having a minimum of 90% sequence
201
identity, resulting in data set of 3 690 genes. The final data set (after removing 100% identical
202
genes in all genomes together with transposases, oligopeptide synthetases, retron type reverse
203
transcriptases, phage-associated proteins and proteins shorter than 50 aa) contained 2 914
204
genes. Ten sub-sets of genes for generating phylogenies were created by random sampling of
205
20 genes from the core gene set (2914 genes) repeated 10 times using R (R development core
206
team 2012. R: A language and environment for statistical computing, reference index version
207
2.10.1. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL
208
http://www.R-project.org). Sequences were aligned using Mega 5 software [45] and
209
concatenated for each of the ten data sets. List of genes for each data set is shown in
210
Supplementary Table 2. Maximum likelihood (ML) analyses were carried out using RAxML
211
[46] with 1000 bootstrapped resamplings with GTRCAT model. The resulting individual
212
phylogenetic trees are shown in Fig. S1. Finally, the aligned sequences of all 200 genes were
213
concatenated
resulting
in
217
546
bp
long
alignment
(available
through
9
214
http://dx.doi.org/10.6084/m9.figshare.719100) and Maximum likelihood (ML) analyses were
215
carried out using RAxML [46] with 1000 bootstrapped resamplings with GTRCAT model.
216
Analyses of phycobilisome genes
217
All gene sequences analysed (phycoerythrin, phycocyanin and flanking genes) were
218
downloaded from IMG. Sequences were aligned using Mega 5 software [45]. Nucleotide
219
diversity was analyzed using the computer program DNA Sequence Polymorphism (DnaSP)
220
[47]. Maximum likelihood (ML) analyses of cpc genes were carried out using RAxML [46]
221
with 1000 bootstrapped resamplings. GTRCAT was determined to be the best evolutionary
222
model using ModelTest [48].
223
Detection of recombination events
224
Recombination events were detected by visual analyses of informative sites (variable
225
sites where each variant occurs in at least two sequences) as described by Rudi and co-
226
workers [7]. In order to detect the recombination breakpoints, concatenated sequences of
227
cpcBA (reverse-complement) and CHAP domain genes were analyzed using the RDP 4
228
software [49]. Sequences from strains NIVA-CYA 406 and NIVA-CYA 15 were discarded as
229
these are identical with sequences from strains NIVA-CYA 98 and NIVA-CYA 34,
230
respectively. Recombination signals were accepted if at least three different methods detected
231
statistically significant (P