Horizontal gene transfer depends on gene content of the host

3 downloads 0 Views 35KB Size Report
2.3 Statistical analysis. Analyses were performed in R (R development core team, 2004, http://www.R-project.org). We constructed 2 × 2 contingency tables for.
Vol. 21 Suppl. 2 2005, pages ii222–ii223 doi:10.1093/bioinformatics/bti1136

BIOINFORMATICS Systems Biology

Horizontal gene transfer depends on gene content of the host Csaba Pál1,2 , Balázs Papp3 and Martin J. Lercher1,4,∗ 1 European

Molecular Biology Laboratory, 69012 Heidelberg, Germany, 2 MTA, Eötvös Loránd University, Budapest H-1117, Hungary, 3 School of Biological Sciences, University of Manchester, Manchester M13 9PT, UK and 4 Department of Biology and Biochemistry, University of Bath, Bath BA2 7AY, UK

ABSTRACT Horizontal gene transfer is a major contributor to the evolution of bacterial genomes. We examine this process through a combination of comparative genomics and in silico analysis of the Escherichia coli metabolic network. We validate our horizontal transfer estimates by confirming the predicted gradual amelioration of GC content over time. We find that the chance of acquiring a gene by horizontal transfer is up to six times higher if an enzyme that catalyses a coupled metabolite flux is already encoded in the host genome. Contact: [email protected]

We aligned each gene family present in exactly one copy in all genomes using MUSCLE (Edgar, 2004). From the concatenated alignments, we reconstructed a maximum-likelihood phylogeny using PHYML (Guindon and Gascuel, 2003) with an empirical substitution matrix (Jones et al., 1992) and a -model for rate variation among sites. The resulting tree was supported by high bootstrap values except for the E.coli–Shigella branch, which we resolved by separately investigating the phylogeny of five close relatives of E.coli K12 under the same protocols. Using PAUP* 4.0b10, we then estimated the most parsimonious scenarios (Boussau et al., 2004) for gene loss and horizontal transfers from gene presence/absence data (von Mering et al., 2005) on this tree, setting the penalty ratio for gene gains/gene losses to 2:1 (Snel et al., 2002).

1

2.2

INTRODUCTION

The gene content of different bacterial species is highly variable. A major route of genomic evolution in bacteria is the acquisition of new genes from often very distantly related individuals through horizontal transfer (Lawrence and Hendrickson, 2003; Ochman et al., 2000). An initial transfer event can be viewed as a macro-mutation, and the fate of the newly acquired gene will depend both on external factors (e.g. changes in the environment of the host species) and internal factors, in particular the gene content of the host species. Bacteria appear to be under evolutionary pressure to reduce their genome sizes, possibly to allow fast growth rates. Thus, to survive and become fixed in a bacterial population, a transferred gene must confer a selective advantage to its host. Consequently, if a protein’s action depends on the presence of other proteins, then the gene encoding that protein will be fixed only if those other proteins are also present in the host. From this, we predict that the likelihood of a successful horizontal gene transfer should depend on the genomic presence of its partners. Here, we consider genes that encode enzymes in the metabolic network of Escherichia coli K12. If any metabolite flux catalysed by one enzyme in the network depends on a non-zero flux catalysed by a second enzyme, we call these enzymes physiologically coupled. From the above arguments, we predict that the horizontal acquisition of an enzyme-encoding gene should depend strongly on the genomic presence of those proteins that it is physiologically coupled to.

2

METHODS

2.1

Identification of horizontal gene transfers

We used STRING (von Mering et al., 2005), which extends the COG database (Tatusov et al., 2003), to identify orthologs of the E.coli K12 metabolic genes among its closest relatives, including 54 fully sequenced bacterial species. ∗ To

whom correspondence should be addressed.

ii222

Flux coupling analysis of the E.coli network

To find physiologically coupled gene pairs, we analysed the metabolic network of E.coli K12 (Reed et al., 2003) after eliminating duplicate reactions. If multiple proteins in the network mapped to the same COG, these were excluded from further analysis. We implemented a flux coupling analysis (Burgard et al., 2004) with linear programming under CPLEX 9.5. For each pair of proteins, we maximized the flux catalysed by one protein after removal of the other. We consider a protein pair to be directionally coupled (A→B) if the fluxes catalysed by protein A are shut down after removal of protein B. If this is true in both directions (A→B and B→A), we consider the pair to be tightly coupled.

2.3

Statistical analysis

Analyses were performed in R (R development core team, 2004, http://www.R-project.org). We constructed 2 × 2 contingency tables for all pairs of physiologically coupled proteins, with rows corresponding to a horizontal transfer of one of the genes along a given branch and columns corresponding to the presence of the other gene in the descendant node of the branch. Only branches where the gene whose transfer was investigated was absent in the ancestral node were considered. We used Fisher’s two-sided exact test to estimate statistical significance and 95% confidence intervals.

3

RESULTS AND DISCUSSION

Based on differences in GC content between horizontally transferred genes and the host genome, it was previously hypothesized that the GC content of transferred genes gradually approaches that of the host over evolutionary time (Lawrence and Ochman, 1998). To validate the horizontal gene transfer events that we inferred from generalized parsimony, we analysed the GC content at third codon positions (GC3) for all E.coli genes. As predicted, we found a gradual decay of the absolute difference between the GC3 of individual genes and the host average (data not shown). How does the acquisition of genes by horizontal transfer depend on the presence of physiologically coupled genes in the genome? To answer this question, we examined all coupled protein pairs in the

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected]

Horizontal gene transfer depends on gene content of the host

Table 1. Contingency table for horizontal transfers of directionally coupled enzyme pairs (A→B XOR B→A)

A not transferred A transferred

B not present in descendant

B present in descendant

6865 61

10 088 251

Table 2. Contingency table for horizontal transfers of fully coupled enzyme pairs (A→B AND B→A)

A not transferred A transferred

B not present in descendant

B present in descendant

4388 29

3446 139

metabolic network of E.coli K12 (Reed et al., 2003), as identified by flux coupling analysis (Burgard et al., 2004). The contingency tables for the dependence of the acquisition of one gene of the pair on the presence of the other gene are given as Table 1 for directionally coupled pairs (i.e. flux catalysed by protein A depends on flux catalysed by protein B or vice versa, but not both) and as Table 2 for tightly coupled pairs (i.e. flux catalysed by each protein depends on the flux catalysed by the other). The odds ratio for directional coupling (Table 1) is 2.80 (95% confidence interval 2.11–3.77, P = 5 × 10−15 ), i.e. horizontal acquisition of one of the genes is almost three times more likely if the physiologically coupled partner is present in the host genome. As expected, this dependence becomes even stronger for tightly coupled protein pairs (Table 2): here, the odds ratio reaches 6.10 (95% confidence interval 4.06–9.47, P = 2 × 10−24 ). Thus, the successful acquisition of a gene into the metabolic network depends on the genomic presence of physiologically

coupled enzymes. Detailed study of physiological interactions may eventually allow us to predict the probability of the adaptation of a given species to a new environment, e.g. the transition of a gut bacterium to a parasitic intracellular lifestyle.

ACKNOWLEDGEMENTS We thank Peer Bork, Christian von Mering and Shannon McWeeney for valuable discussions. M.J.L. is a Heisenberg Fellow of the Deutsche Forschungsgemeinschaft. Conflict of Interest: none declared.

REFERENCES Boussau,B. et al. (2004) Computational inference of scenarios for alpha-proteobacterial genome evolution. Proc. Natl Acad. Sci. USA 101, 9722–9727. Burgard,A.P. et al. (2004) Flux coupling analysis of genome-scale metabolic network reconstructions. Genome Res., 14, 301–312. Edgar,R.C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res., 32, 1792–1797. Guindon,S. and Gascuel,O. (2003) A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol., 52, 696–704. Jones,D.T. et al. (1992) The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci., 8, 275–282. Lawrence,J.G. and Hendrickson,H. (2003) Lateral gene transfer: when will adolescence end? Mol. Microbiol., 50, 739–749. Lawrence,J.G. and Ochman,H. (1998) Molecular archaeology of the Escherichia coli genome. Proc. Natl Acad. Sci. USA, 95, 9413–9417. Ochman,H. et al. (2000) Lateral gene transfer and the nature of bacterial innovation. Nature, 405, 299–304. R development core team. (2004) R: A language and environment for statistical computing, Vienna, Austria. Reed,J.L. et al. (2003) An expanded genome-scale model of Escherichia coli K-12 (iJR904 GSM/GPR). Genome Biol., 4, R54. Snel,B. et al. (2002) Genomes in flux: the evolution of archaeal and proteobacterial gene content. Genome Res., 12, 17–25. Tatusov,R.L. et al. (2003) The COG database: an updated version includes eukaryotes. BMC Bioinformatics, 4, 41. von Mering,C. et al. (2005) STRING: known and predicted protein-protein associations, integrated and transferred across organisms. Nucleic Acids Res., 33 (Database issue), D433–437.

ii223