news and views - Nature

8 downloads 0 Views 830KB Size Report
The usual suspects. Using the traditional dragnet of genome- wide linkage analysis and candidate gene association studies, only a tiny fraction of the genetic ...
Partners in crime Mark J Daly & David Altshuler The genetic culprits that contribute to common diseases remain at large, despite dedicated sleuthing by many laboratories. A new study evaluates the power of genome-wide searches for variants acting in combination, with results that are both unexpected and encouraging.

The usual suspects Using the traditional dragnet of genomewide linkage analysis and candidate gene association studies, only a tiny fraction of the genetic culprits in common disease have been identified. In the face of uncertainty, two distinguishable but complementary models drive much of today’s research. The first, in analogy to mendelian disease, is based on the assumption that disease can be caused by a heterogeneous collection of individually rare, highly penetrant mutations. To find such alleles, deep resequencing in cases must be done1,2 to identify mutations. Until resequencing becomes much cheaper and faster, and until functional noncoding changes can be identified from primary sequence data, this approach will probably be restricted to culprits lurking in the coding regions of candidate genes who carry smoking guns3. Second is the idea of comprehensively testing common variation for association to disease. This approach is motivated by the observations that most human heterozygosity is due to common ancestral polymorphisms and that common, late-onset, environmentally triggered diseases might not have been disadvantageous during human history. Two Mark J. Daly is at the Massachusetts General Hospital, Harvard Medical School, the Broad Institute of Harvard and MIT, and the Whitehead Institute for Biomedical Research, Cambridge, Massachusetts, USA. David Altshuler is at the Massachusetts General Hospital, Harvard Medical School, and the Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA. e-mail: [email protected] or [email protected]

Cartoon by Sean Taverna

© 2005 Nature Publishing Group http://www.nature.com/naturegenetics

NEWS AND VIEWS

trends make it practical to start carrying out hypothesis-free, genome-wide association studies4–6: the public catalog of common sequence variants now contains more than nine million SNPs, with more than one million typed in population samples to determine frequencies and correlations7,8 and SNP genotyping technology is becoming sufficiently high-throughput, accurate and affordable so as to allow large collections of markers to be tested without restriction to hypotheses about the identities of candidate genes or whether the changes are coding or regulatory4–6. On the basis of the very limited association studies done to date, a modest number of common variants have reproducible influences on common diseases9–11. In all con-

NATURE GENETICS | VOLUME 37 | NUMBER 4 | APRIL 2005

firmed cases, the individual effect of each variant is weak, on its own explaining only a tiny fraction of the overall heritability of disease. Conspiracy theories Common variants with small individual effects might contribute more substantially to disease risk through nonadditive interactions among loci (epistasis). Such a model raises the concern that by examining only a single locus at a time, these effects might be missed. Because only very common variants will be found in combination at a measurable frequency, the study of gene-gene interactions in common disease is implicitly most relevant to the second approach of studying common variants.

337

© 2005 Nature Publishing Group http://www.nature.com/naturegenetics

NEWS AND VIEWS Large-scale (ultimately, genome-wide) data collection makes it possible to carry out unbiased screens for combinations of unlinked genetic variants that may together predispose to or protect from disease12–14. An overriding concern in the development of these approaches has been whether the penalty due to testing a vast number of hypotheses might, even in the presence of true interactions, decrease rather than increase the overall power of the study to discover true associations. To evaluate this set of questions, on page 413 of this issue, Jonathan Marchini and colleagues15 present simulated genome-wide association studies under a range of scenarios and analytical approaches. The authors focused on biologically motivated scenarios of gene-gene interactions: (i) models in which disease risk increases multiplicatively with each additional risk allele, whether at the same or a different locus, and (ii) models in which each allele acts only in the presence of the other factor. In both cases, statistical associations would be observed in single-locus tests, driven partially or entirely (respectively) by a subset of individuals carrying both factors. The authors compare three distinct analytical strategies, in each case using a correction for multiple testing: (i) locus-by-locus scanning, requiring a stringent genome-wide level of single-locus significance; (ii) complete evaluation of all pairs of loci, declaring success if any given pair surpasses the genome-wide threshold; and (iii) an intermediate, two-stage strategy, first carrying out single-locus tests and then examining those loci that surpass a much more modest level of significance in all pairwise combinations. In addition to showing that genome-wide interaction analyses can be computationally tractable, these simulations offer several important results. First, across a range of reasonable scenarios, screening all pairwise combinations has comparable or greater power than locus-by-locus testing, despite the substantial penalty for the O(n2) pairwise tests. Second, for loci with independent main effects as well as interactions, singlelocus scanning remains the most powerful approach. In contrast, where locus effects are entirely dependent on alleles at unlinked sites, the pairwise testing procedure was preferable and resulted in good power. These results offer considerable reassurance that, under

338

reasonable models of gene-gene interactions, epistasis is not an overwhelming barrier to genome-wide association analysis. Implications for law enforcement Though perhaps unexpected, these conclusions are in retrospect quite intuitive. Given a sufficiently well-powered sample, the correctly specified model results in the most powerful testing procedure, regardless of statistical penalties. This is because the increase in significance for hitting the right model scales more favorably with sample size than does the penalty for multiple comparisons. Thus, multilocus scanning performs best when the effect is entirely due to an allelic combination, whereas single-locus scanning wins when each individual locus has an effect on its own, even if multiplied by other loci. The authors suggest that the two-stage strategy may be the most powerful of those considered: the locus-by-locus test is done, and all pairs of loci reaching a very modest nominal level of association are examined in pairwise combination. Another natural strategy not explicitly considered is to carry out the locus-by-locus screen with a more stringent threshold and then repeat the genome-wide analysis conditional on any individually significant locus. Such an approach most closely resembles standard practice today, where once individual loci are discovered and confirmed, subsequent scans are routinely examined for interactions with those loci by the addition of n two-locus tests (i.e., each new marker in combination with the confirmed locus). Because both the single-locus tests and tests conditional on confirmed positives will probably remain the first two steps of any wholegenome association ana-lysis, exploration of methods that optimally complement and supplement the results of these tests may provide further insight. Finally, the authors suggest that the widespread irreproducibility of genetic association results might be explained (at least in part) by gene-gene interactions. Specifically, they point out that if the frequencies of each genetic risk factor vary across populations, and if these variants act in combinations, then the frequencies of specific combinations may vary to an even greater degree, contributing to failure of replication. Although this principle is sound and must certainly have a role, we

offer a word of caution as a counterbalance. Given the very modest statistical significance of most irreproducible association results and the low prior probability for each locus in the genome, it seems to us more likely that statistical fluctuation is the predominant explanation for failure of replication as compared to true heterogeneity. We would argue that each indicted (but not yet convicted) genetic culprit should be considered innocent until proven guilty beyond reasonable statistical doubt, lest overzealous prosecutors (i.e., investigators)—who have unfortunately strong incentives to claim success in nabbing the crook—claim a gene is guilty even in the face of exculpatory evidence. At root, a sound approach to apprehending genetic culprits depends on knowing how many there are, how frequent they are in the population and how often they band together to commit crimes. The challenge for the field is to balance the virtue of considering a broad range of scenarios (thereby avoiding overlooking guilty parties) with the risk of inappropriately incarcerating the innocent. Marchini and colleagues reassure us that genegene interactions are not show-stoppers in the fight against genetic crime and point the way towards design of analytical approaches that are robust to multiple models of gene effects. Answers about genetic architecture can only come from empirical data, which we are thankfully soon to have in abundance. 1. Botstein, D. & Risch, N. Nat. Genet. 33 Suppl, 228– 237 (2003). 2. Cohen, J.C. et al. Science 305, 869–872 (2004). 3. Hirschhorn, J.N. & Altshuler, D. J. Clin. Endocrinol. Metab. 87, 4438–4441 (2002). 4. Wang, W.Y., Barratt, B.J., Clayton, D.G. & Todd, J.A. Nat. Rev. Genet. 6, 109–118 (2005). 5. Hirschhorn, J.N. & Daly, M.J. Nat. Rev. Genet. 6, 95– 108 (2005). 6. Altshuler, D. & Clark, A.G. Science 307, 1052–1053 (2005). 7. Hinds, D.A. et al. Science 307, 1072–1079 (2005). 8. The International HapMap Consortium. Nature 426, 789–796 (2003). 9. Lohmueller, K., Pearce, C.L., Pike, M., Lander, E. & Hirschhorn, J.N. Nat. Genet. 33, 177–182 (2003). 10. Florez, J.C., Hirschhorn, J. & Altshuler, D. Annu. Rev. Genomics Hum. Genet. 4, 257–291 (2003). 11. Daly, M.J. & Rioux, J.D. Inflamm. Bowel Dis. 10, 312– 317 (2004). 12. Nelson, M.R., Kardia, S.L., Ferrell, R.E. & Sing, C.F. Genome Res. 11, 458–470 (2001). 13. Hoh, J. & Ott, J. Nat. Rev. Genet. 4, 701–709 (2003). 14. Moore, J.H. & Ritchie, M.D. JAMA 291, 1642–1643 (2004). 15. Marchini, J. Donnelly, P. & Cardon, L.R. Nat. Genet. 37, 413–417 (2005).

VOLUME 37 | NUMBER 4 | APRIL 2005 | NATURE GENETICS

NEWS AND VIEWS

Sirtuins for healthy neurons David Sinclair

© 2005 Nature Publishing Group http://www.nature.com/naturegenetics

Sir2 deacetylases are believed to promote the survival and longevity of organisms during times of adversity. A new study shows that activation of Sir2 by small molecules called sirtuin-activating compounds increases neuronal survival in two different models of Huntington disease, possibly opening new avenues for treatment.

Scientists should avoid using words like ‘miraculous’. But if ever there was a reason to make an exception, resveratrol is it. This small nontoxic molecule from Asian medicinal herbs and red wine is currently in human clinical trials to treat colon cancer and oral herpes; in rodents, it protects against inflammatory disorders, stroke, myocardial infarction, spinal cord injury and heart disease and is one of the most effective cancer chemopreventive agents known1. No one really knows how resveratrol achieves such feats, but there is little doubt that this knowledge could open new avenues to develop truly revolutionary drugs. Now, on page 349 of this issue, Alex Parker and colleagues2 show that resveratrol might be effective in treating hereditary polyglutamine (polyQ) disorders such as Huntington disease; moreover, they provide evidence for a mechanism. STACs come of age The foundation for understanding how resveratrol works came from a small group of researchers examining why yeast cells grow old and which genes, if any, control the process. They found that genomic instability of DNA repeats is a key cause of yeast aging3 and that overexpression of a gene called SIR2 suppresses genomic instability, leading to an increase in lifespan of ∼30% (ref. 4). Now we know that SIR2 genes are found in all eukaryotes (humans have seven, SIRT1–SIRT7) and that many of them encode NAD+-dependent deacetylases that direct the behavior of target proteins by removing acetyl groups from specific lysines. Overexpression of SIR2 homologs also extends the lifespans of organisms that age by mechanisms ostensibly unrelated to those in yeast, namely Caenorhabditis elegans5 and Drosophila melanogaster6. A set of 18 small polyphenolic molecules including resveratrol was recently found to increase the affinity of the SIRT1 enzyme for certain protein targets, probably through an David Sinclair is in the Department of Pathology, Harvard Medical School, Boston, Massachusetts, 02115 USA. e-mail: [email protected]

allosteric mechanism (Fig. 1). These molecules, known as sirtuin-activating compounds (STACs), extend the lifespan of yeast, worms and flies in a Sir2-dependent manner. Sir2 and STACs seem to work by increasing cell defenses against stress, through the same or similar pathways as caloric restriction, the strict diet that increases the life expectancy of mammals by delaying diseases of old age such as cancer, heart disease, diabetes and even neurodegeneration7. Resveratrol versus huntingtin This brings us to the work of Parker and colleagues2. There are four main polyQ disorders, one of which is Huntington disease, a progressive neurodegenerative disorder that typically presents around age 40 with uncontrolled movements and cognitive deteriora-

tion. PolyQ disorders are so named because they stem from mutations in DNA repeats of CAG, which encode the amino acid glutamine, resulting in a given protein (in this case, huntingtin) having many more sequential glutamines than normal. For reasons that are not known, mutant huntingtin is toxic to cells, leading to neuronal dysfunction and cell death. Parker and colleagues2 investigated two models of Huntington disease. The first involved overexpressing a pathogenic version of huntingtin in C. elegans touch receptor neurons. In these worms, a subset of mechanosensory neurons accumulates huntingtin aggregates, and the worms often fail to twitch when prodded. The second model involved culturing neurons derived from transgenic mice engineered to overexpress mutant

Plant-derived STACs

1

Resveratrol

HO

OH

NAD+

4

NAD precursors

NMNAT OH

Sirtuins

NAD salvage pathway

NAM

NH2

? O

N+

HO

2 O

O

3

PNC1/ PBEF

HO

OH

NAM riboside

?

NH2 N+

Isonicotinamide

NAM derivative STACs

PBEF (visfatin)

Secreted factors

Figure 1 Routes to increasing the activity of sirtuin deacetylases. Increasing the abundance or activity of sirtuin (class III) deacetylases protects neurons from the toxic effects of mutant huntingtin. There are four known ways to increase the activity of sirtuins. STACs, which work by contacting the enzymes directly, fall into two classes: polyphenols produced by stressed plants, such as resveratrol and fisetin, lower the Km of sirtuins, possibly through an allosteric mechanism (1), and nicotinamide (NAM) analogs increase the Vmax by blocking inhibition by nicotinamide, a physiological regulator of sirtuin activity (2). Other routes include increasing the amount or activity of the NAD+ salvage pathway gene PNC1 or its functional equivalent in mammals, known as PBEF (visfatin) (3), or increasing the levels of NMNAT, as exemplified by the WldS mouse mutant that is resistant to axonal degeneration due to an NMNAT gene duplication (4). It is also feasible that sirtuin activity is stimulated by NAD+ precursors such as nicotinamide riboside or by secreted PBEF (visfatin).

NATURE GENETICS | VOLUME 37 | NUMBER 4 | APRIL 2005

339

© 2005 Nature Publishing Group http://www.nature.com/naturegenetics

NEWS AND VIEWS (109Q) huntingtin. In both model systems, resveratrol suppressed the deleterious effects of mutant huntingtin, as assessed by loss of the twitch response in the worms and cell death of the mouse neurons. Resveratrol is commonly referred to as a ‘dirty’ molecule in the pharmaceutical industry, meaning that it seems to interact with many different proteins, including COX1/2, ribonucleotide reductase and SIRT1. What makes this new study important is that resveratrol is almost certainly working through Sir2 enzymes. Overexpression of Sir2 in the worm suppresses cell toxicity, whereas worms lacking Sir2 are no longer protected by resveratrol. Similarly, in mouse neurons, the action of resveratrol is blocked by the Sir2 inhibitors sirtinol and nicotinamide. Although the authors did not test whether protection is provided by SIRT1 specifically, this seems a reasonable bet, given that SIRT1 has previously been linked to cell survival in a variety of cell types, including neurons8. Fisetin, a chemically distinct class of STAC, worked even better than resveratrol at protecting worm neurons, consistent with the finding that fisetin is a considerably more potent activator of C. elegans Sir2 than is resveratrol9. The present study comes on the heels of a string of papers linking the effects of resvera-

trol in mammals to SIRT1 activation, including increased survival of neurons whose axons are cut8, mobilization of fatty acids in adipocytes10 and modulation of NF-κB transactivation and TNFα-induced apoptosis11. To your health Probably the most pressing question for many readers is whether a glass of red wine can provide enough resveratrol to activate SIRT1. Older literature indicates that it does not. Red wine results in serum levels of resveratrol that barely reach the micromolar concentrations thought to be required for SIRT1 activation; what’s more, resveratrol is metabolized into sulfated and glucuronidated forms within ∼15 min of entering the bloodstream12. But Parker et al.2 show that we may need a concentration of only ∼500 nM of resveratrol to protect our neurons. We should also be aware that the metabolites of resveratrol, which circulate in serum for some 9 h (ref. 12), might also activate SIRT1. One overarching question remains: why does a group of structurally related polyphenols produced by stressed plants protect cells, improve health and, in some cases, promote longer life in various organisms? STACs may resemble an as-yet-unidentified endogenous activator. Another explanation, known as the

xenohormesis hypothesis13, holds that plants synthesize STACs during stressful times to activate their own Sir2-mediated defenses and that animals benefit from picking up on these chemical stress cues from the plant world because it allows them to mobilize defenses in anticipation of a deteriorating environment. Which of these theories is correct may not be known for many years, but whether STACs can effectively treat the major diseases of our time, including neurodegenerative disorders, may be known far sooner than that. Wouldn’t that be a miracle? 1. Aggarwal, B.B. et al. Anticancer Res. 24, 2783–2840 (2004). 2. Parker, J. et al. Nat. Genet. 37, 349–350 (2005). 3. Sinclair, D.A. & Guarente, L. Cell 91, 1033–1042 (1997). 4. Kaeberlein, M., McVey, M. & Guarente, L. Genes Dev. 13, 2570–2580 (1999). 5. Tissenbaum, H.A. & Guarente, L. Nature 410, 227–230 (2001). 6. Rogina, B. & Helfand, S.L. Proc. Natl. Acad. Sci. USA 101, 15998–16003 (2004). 7. Wang, J. et al. FASEB J. (in the press). 8. Araki, T., Sasaki, Y. & Milbrandt, J. Science 305, 1010– 1013 (2004). 9. Wood, J.G. et al. Nature 430, 686–689 (2004). 10. Picard, F. et al. Nature 429, 771–776 (2004). 11. Yeung, F. et al. EMBO J. 23, 2369–2380 (2004). 12. Walle, T., Hsieh, F., DeLegge, M.H., Oatis, J.E. Jr. & Walle, U.K. Drug Metab. Dispos. 32, 1377–1382 (2004). 13. Lamming, D.W., Wood, J.G. & Sinclair, D.A. Mol. Microbiol. 53, 1003–1009 (2004).

Defining stroke risks in sickle cell anemia James F Meschia & V Shane Pankratz Children and young adults with sickle cell anemia at risk for stroke are identified principally by screening for cerebral vasculopathy using transcranial Doppler ultrasonography. Investigators now show how Bayesian networks can generate useful predictive models and highlight relationships between genes and the occurrence of stroke in those with sickle cell anemia. Sickle cell anemia (SCA) is an autosomal recessive disorder caused by a missense mutation in the beta polypeptide chain of hemoglobin. Symptomatic stroke can occur in as many as 11% of affected individuals by the age of 20 years (ref. 1), and many more will show evidence of silent infarction by magnetic resonance imaging (Fig. 1). The number of families with SCA with at least two children with stroke is greater than James F. Meschia is in the Department of Neurology, Mayo Clinic, Cannaday Building, 2 East, Jacksonville, Florida 32224, USA. V. Shane Pankratz is in the Division of Biostatistics, Mayo Clinic, Rochester, Minnesota, 55905 USA. e-mail: [email protected]

340

would be expected by chance alone2. A study of 29 families with more than one child with SCA found that having a child with elevated cerebral blood flow velocities on transcranial Doppler ultrasonography (TCD) increased the odds of siblings having elevated cerebral blood flow velocities by a factor of 50, a finding consistent with familial predisposition to cerebral vasculopathy3. These studies signal that genetic factors may influence the risk of stroke in individuals with SCA; the search for risk-modifying genes is ongoing4,5. There are increasing amounts of clinical and genetic information about risk factors; the challenge remains how best to analyze and interpret this information. On page 435 of this issue, Paola Sebastiani and colleagues6 use the new

approach of Bayesian networks to analyze variations in 108 SNPs in 39 candidate genes in 1,398 individuals with SCA. Their findings show the potential for Bayesian networks to generate predictive models and highlight relationships between genes and a disease phenotype. Stratifying stroke risk The current approach to stratifying stroke risk among SCA cases is to test noninvasively for presymptomatic cerebral vasculopathy with TCD ultrasonography. The technique measures flow velocity, which increases in narrowed arteries, in large proximal intracranial arteries. A cohort study of 190 children and adolescents with SCA monitored for an

VOLUME 37 | NUMBER 4 | APRIL 2005 | NATURE GENETICS

© 2005 Nature Publishing Group http://www.nature.com/naturegenetics

NEWS AND VIEWS average of 29 months showed that abnormal blood flow velocities of 170 cm s–1 or greater were associated with a 44-times-higher relative risk of stroke7. Longer follow-up with an additional 125 affected individuals confirmed that elevated TCD-detected velocities, particularly average mean maximum blood flow velocities of 200 cm s–1 or more, are strongly associated with stroke risk8. The STOP trial in individuals with SCA with time-averaged mean blood flow velocity in the internal carotid or middle cerebral artery of at least 200 cm s–1 showed that blood transfusions reduced the risk of stroke by 92% (ref. 9). TCD monitoring and transfusion therapy might explain the recent marked decline in stroke rates in California10. In the future, multiplex genetic testing may prove superior to TCD for stratifying risk of stroke in SCA. Bayesian networks, and other modern data analytical methods, may leverage information generated by multiplex gene testing. For example, the model developed by Sebastiani and colleagues showed great predictive value, correctly classifying all 7 subjects who experienced a stroke and 105 of the 107 who did not6. Testing for risk-modifying genes might reduce the need for on-site technical expertise in ultrasonography and thereby decentralize the process of screening for at-risk individuals. This could also reduce the need for long-term follow-up of at-risk individuals, which has been necessary to optimize the usefulness of TCD screening11. Testing for risk-modifying genes could identify individuals at risk for stroke earlier in the course of their disease, before they develop ultrasonographic evidence of cerebral vasculopathy, a condition that may not be benign in the stage preceding symptomatic stroke. In a study of children with SCA but no history of stroke, children with abnormal TCD readings performed less well on certain cognitive tests than those with conditional or normal TCD readings12. Genetic testing might help not only individuals at risk for stroke but also those at risk for cerebral vasculopathy before they manifest symptomatic stroke. Sebastiani and colleagues started with 235 SNPs in 80 candidate genes and used a Bayesian network approach to ultimately identify 31 SNPs in 12 genes that were associated with the occurrence of stroke in individuals with SCA. Among these were genes that are good candidates for underlying stroke, including genes in the TGF-β path-

HbSS

Cerebral infarcts

Stroke Cognitive deficits School problems

Modifier genes: TGFBR2, TGFBR3, BMP6 and others Figure 1 Multiple genes probably modify the risk of stroke in individuals with SCA. HbSS, hemoglobin in an individual who is homozygous with respect to the mutation that causes SCA; TGFBR2, transforming growth factor, beta receptor II; TGFBR3, transforming growth factor, beta receptor III; BMP6, bone morphogenetic protein 6.

way. Agonist and antagonist studies on stroke models show that TGF-β1 can be neuroprotective, reducing neuronal cell death and infarct volumes13. TGF-β plasma concentrations are elevated in SCA in steady state, and the elevation is contingent on whether individuals have high or low fetal hemoglobin, an inhibitor of polymerization of sickle hemoglobin in red cells14. Bayesian and other approaches Bayesian networks, and other modern methods of evaluating multilocus data, are promising alternatives to commonly used model selection procedures such as stepwise logistic regression (e.g., ref. 15). One strength of these methods is their flexibility in accounting for genetic interactions among the selected genes. This is in contrast to logistic regression, where one must explicitly model any single-gene effects and multigene interactions of interest. They therefore show promise in identifying potential structure that may be useful in interpreting large amounts of genetic information. Despite their promise, it is important to temper enthusiasm for Bayesian networks and similar analytical techniques. Results from these analytical techniques are dependent on the structure of the model, and their statistical properties have not yet been fully studied. The complex data analytical techniques that can be applied to large data sets make it possible to obtain persuasive results

NATURE GENETICS | VOLUME 37 | NUMBER 4 | APRIL 2005

for any given data set. If nuances in sample set identification, data collection and application of the method are overlooked, then the results that are obtained may be difficult to replicate, or even misleading. We endorse the continued study of these new analytical techniques. To validate more fully the methods and the results obtained with them, however, we recommend that extensive detail be provided about every component of the study design and analysis, including a detailed description of the intermediate steps in model creation. This would facilitate a full evaluation of the analytical methods and permit an objective weighing of findings. 1. Ohene-Frempong, K. et al. Blood 91, 288–294 (1998). 2. Driscoll, M.C. et al. Blood 101, 2401–2404 (2003). 3. Kwiatkowski, J.L. et al. Br. J. Haematol. 121, 932–937 (2003). 4. Hoppe, C. et al. Blood 103, 2391–2396 (2004). 5. Adams, G.T. et al. BMC Med. Genet. 4, 6 (2003). 6. Sebastiani, P. et al. Nat. Genet. 37, 435–440 (2005). 7. Adams, R. et al. N. Engl. J. Med. 326, 605–610 (1992). 8. Adams, R.J. et al. Ann. Neurol. 42, 699–704 (1997). 9. Adams, R.J. et al. N. Engl. J. Med. 339, 5–11 (1998). 10. Fullerton, H.J. et al. Blood 104, 336–339 (2004). 11. Adams, R.J. et al. Blood 103, 3689–3694 (2004). 12. Kral, M.C. et al. Pediatrics 112, 324–331 (2003). 13. Dhandapani, K.M. & Brann, D.W. Cell Biochem. Biophys. 39, 13–22 (2003). 14. Croizat, H. & Nagel, R.L. Am. J. Hematol. 60, 105– 115 (1999). 15. Hoh, J. & Ott, J. Nat. Rev. Genet. 4, 701–709 (2003).

341

NEWS AND VIEWS

The first steps in adaptive evolution James J Bull & Sarah P Otto

© 2005 Nature Publishing Group http://www.nature.com/naturegenetics

The first empirical test of an evolutionary theory provides support for a mutational landscape model underlying the process of adaptation. The study shows that it is possible to predict at least the first step in an adaptive walk and also shows the importance of incorporating mutation bias in the fitness effects of mutations. Adaptation is everywhere. Sometimes it gets in our way, as with drug-resistant microbes, pesticide-resistant insects and cancer. Sometimes it does us good, as in the domestication of plants and animals and industry’s use of directed evolution to create useful molecules. Nevertheless, although the products of adaptation are well known, the mechanism and quantitative nature of the adaptive process remain poorly understood. Early attempts to describe the adaptive process on geometrical grounds1 did not lead much beyond a rudimentary understanding of whether smalleffect or large-effect mutations contribute most to adaptation2. Real progress towards understanding the adaptive process came from explicit models of genetic sequences3–6. These models predicted the fitness effects of mutations that arise and fix within a population as it adapts to its environment. On page 441 of this issue, Rokyta et al.7 now provide an empirical test of this theory, finding that the first steps that adaptation takes are consistent with the theoretical predictions of the adaptive process.

that lead to adaptation should represent the most-fit tail of the distribution of all possible mutations. Gillespie’s model made use of a common statistical theory called extreme-value theory, which indicates that samples drawn from the tail of a distribution have properties that do not depend on the exact nature of the distribution. Applied to evolution, extreme-value theory predicts an ordered progression of fitness effects among the handful of beneficial alleles: the best allele should be substantially fitter than the next-best allele, and fitness differences between pairs of next-most ben-

eficial alleles should decline so that most of the beneficial alleles have small effects. Using this insight, Orr6 derived predictions about the distribution of fitness effects of the mutations that arise and fix during an adaptive walk. Orr’s model allowed for testable predictions about the course of adaptive evolution. In particular, he derived the probability that a mutated allele with a given fitness rank would be fixed at the next adaptive step. The first adaptive step Rokyta et al.7 now supply the needed empirical test of this theory. The authors used a

Testing evolution Theories that make real, a priori predictions about adaptation have been gaining momentum, and the report from Rokyta et al. in this issue provides the empirical support needed to launch these theories into the spotlight. This paper presents a quantitative experimental test of a theory of adaptation initially developed by John Gillespie in the 1980s (refs. 3–5) and recently extended by Allen Orr6. Gillespie realized that in a population slightly displaced from its closest fitness optimum, there will be but a handful of mutations that improve fitness compared with an overwhelming number that reduce fitness3–5. Therefore, beneficial mutations James J. Bull is in the Section of Integrative Biology and Institute for Cellular and Molecular Biology, University of Texas, Austin, Texas 78712-0253, USA. Sarah P. Otto is in the Department of Zoology, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada. e-mail: [email protected]

342

Figure 1 A model bacteriophage takes its first adaptive step on a fitness landscape. An adaptive walk along a mutational landscape, reflecting all possible mutations deriving from the initial sequence, can represent the evolution of a virus. Now, a virus’s first steps in an adaptive walk have been defined to within a likely mutational landscape. Photo by J. Palmersheim; phage sculpture by H. Wichman; landscape by A. Johnston.

VOLUME 37 | NUMBER 4 | APRIL 2005 | NATURE GENETICS

© 2005 Nature Publishing Group http://www.nature.com/naturegenetics

NEWS AND VIEWS single-stranded DNA virus to determine whether the beneficial mutation fixed at the first substitution in an adaptive walk agreed with that predicted by Orr’s theory (Fig. 1). The empirical test was not trivial, requiring considerable replication as well as detailed information on the identity and fitness of substituted alleles. To accomplish this, Rokyta et al. focused on the first step in adaptation, allowing for greater replication and predictive ability. Rokyta et al. carried out 20 replicate single-step adaptations using a wild relative of ΦX174 grown in liquid culture, allowing each line to adapt independently to the same conditions. In each replicate, the first mutation to both arise and spread in the line was identified by whole-genome sequencing. The fitness effect of each mutation was measured as the growth rate of the virus, and the 20 fitness effects were ranked. Of the 20 first adaptive steps examined, all mutations were nonsynonymous, involving nine distinct amino acid changes, and all increased fitness (from 11% to 39%). Comparing these experimental results with previous theory, Rokyta et al. found that Orr’s model fit the observed fitness distribution of the mutations reasonably well. The predictions of Orr, however, are only expectations over all possible genomic starting points and over all possible adaptive walks.

Rokyta et al. found a substantially improved fit to the data by incorporating mutation rate differences between their starting sequence and each of the nine observed amino acid changes. A slightly better fit was obtained using all the available data (including the mutation rate differences, fitness effects and population size dynamics). Thus, the authors found that models tailored to the specifics of the population could better describe the process of adaptation. It is notable that, without this additional knowledge of the starting and mutant sequences, Orr’s predictions faired so well. Rigorous biological tests of these models are presently limited to small genomes (viruses, plasmids and single molecules subjected to in vitro selection) and to computer models of fitness landscapes. Tests using bacteria, yeast and higher eukaryotes await the cost-feasibility of sequencing large genomes with multiple replicates of a single experiment. But even now, tests are feasible and of obvious relevance for predicting drug resistance evolution in viruses, including HIV and influenza, two viruses for which drugs have or could have a crucial role in treatment and for which we already know that evolution causes problems. In some cases, we might want the theory tailored to the individual genome, a combination of Orr’s model and the modifications offered by Rokyta et al.

Next steps In showing the relevance of existing adaptation theories to real experimental conditions, this work brings new excitement to adaptive walks. For any particular system, general properties about the course of adaptive walks can now be predicted based on only a few parameters. To borrow from Fisher, no practical biologist would have dared imagine that the details of adaptive walks might be largely independent of the biology, yet that is precisely what the current results suggest. This work shows that it is possible to predict the spectrum of possible first steps of an adaptive walk. The next steps will be to extrapolate this work to the full course of an adaptive walk. Such predictive power would be extremely valuable when anticipating the evolutionary response of pathogens to new antimicrobial drugs and when using directed evolution to create molecules with specific functions for industry.

1. Fisher, R.A. The Genetical Theory of Natural Selection (Oxford University Press, Oxford, UK, 1930). 2. Orr, H.A. Nat. Rev. Genet. 6, 119–127 (2005). 3. Gillespie, J.H. Theor. Popul. Biol. 23, 202–215 (1983). 4. Gillespie, J.H. Evolution 38, 1116–1129 (1984). 5. Gillespie, J.H. The Causes of Molecular Evolution (Oxford University Press, Oxford, UK, 1991). 6. Orr, H.A. Evolution 56, 1317–1330 (2002). 7. Rokyta, D.R., Joyce, P., Caudle, S.B. & Wichman, H.A. Nat. Genet. 37, 441–444 (2005).

The X chromosome: not just her brother’s keeper Eric J Vallender, Nathaniel M Pearson & Bruce T Lahn The X chromosome has traditionally been characterized as a conscientious sister to her derelict brother that is the Y. Beyond dutifully maintaining the family heritage, however, the X has developed its own unique identities. Now, the complete sequence of the human X allows us to appreciate its distinctiveness at an unprecedented resolution. Since its discovery, the human X chromosome has been defined by its relationships to other chromosomes. Its hemizygosity in males and unusual patterns of inheritance immediately separate it from its autosomal cousins, and its length and gene content set it apart from its Y chromosome sibling. The two sex chromosomes in mammals descended from a pair of autosomes1. The Y underwent massive degeneration, losing size and gene content, Eric J. Vallender, Nathaniel M. Pearson and Bruce T. Lahn are in the Howard Hughes Medical Institute, Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA. e-mail: [email protected]

whereas the X was maintained, retaining its size and most of its genes. But the X is much more than a faithful copy of its autosomal progenitor; it also evolved many distinctive features2. The most notable of these include X inactivation, the extensive flux (both accretion and loss) of sex-specific genes, and a deficit in polymorphism. Additionally, the X has a disproportionately large representation of genes involved in mendelian diseases, probably owing to the relative ease of identifying such genes when X-linked. As recently reported by Mark Ross and colleagues in Nature3, the sequence of the human X brings these and other features of this chromosome into sharp focus.

NATURE GENETICS | VOLUME 37 | NUMBER 4 | APRIL 2005

Origin of the sex chromosomes The mammalian sex chromosomes arose from autosomal progenitors ∼300 million years ago. Before then, sex determination probably relied on environmental cues such as egg incubation temperature, as is the case in many extant reptiles. Ross et al.3 confirmed that much of the long arm of the human X is homologous to the short arm of chicken chromosome 4, whereas most of the short arm of the human X matches a stretch of chicken chromosome 1. The bird sex chromosomes Z and W, on the other hand, show homology to human chromosome 9. These observations demonstrate the independent origins of genetic sex determination in mam-

343

NEWS AND VIEWS Table 1 General properties of the human sex chromosomes and autosomes

Size of euchromatic region (Mb)

© 2005 Nature Publishing Group http://www.nature.com/naturegenetics

Gene number (count per Mb)

Copy number in population as scaled to autosomes Heterozygosityb Fraction of LINE1 sequence

X chromosome

Y chromosomea

Autosomes

150

23

2,863

∼1,098 (7)

∼78 (3.5)

∼29,800 (11)

3/4

1/4

1

4.7 × 10–4

1.5 × 10–4

7.5 × 10–4

29%

23%

16%

Unique evolutionary forces affecting gene content

Sexual antagonism; hemizygous exposure; evasion of male germline X inactivation

Asexual degeneration; constant directional selection; sexual antagonism

None

Genes that show enrichment

Early spermatogenesis genes; brain genes; skeletal muscle genes; ovary genes; placenta genes

Spermatogenesis genes

None

Genes that show deficit

Late spermatogenesis genes

Most types of genes

None

aPseudoautosomal

regions are excluded. bA measure of polymorphism defined as the average number of nucleotide differences per base between two randomly sampled copies of the chromosome.

mals and birds. Furthermore, comparison of the human X with that of the mouse, rat and dog indicates that gene order of the X in human and dog largely represents that of the ancestral X, whereas many rearrangements have occurred on the murine X. After the sex chromosomes were established, recombination between X and Y was suppressed progressively over time in a blockby-block manner along the chromosomes4. This was probably accomplished by a series of large-scale inversions on the Y. Previous studies posited at least four evolutionary ‘strata’ on the X chromosome, each corresponding to a block of the chromosome where recombination was suppressed at once (or within a short period)4. Ross et al.3 now identify a fifth stratum on the telomeric end of the X short arm. On the whole, the data portray an unrelenting advance of the pseudoautosomal boundary towards the telomeres over evolutionary time, with the most recent shift occurring ∼30 million years ago. X inactivation As the sex chromosomes diverged, they developed their own identities. The Y underwent massive genic atrophy and a corresponding size reduction (and also accumulated male-beneficial genes5,6). The X remained more stable in terms of gene content and, in response to Y decay, evolved a mechanism

344

of inactivating one of its two copies in each somatic cell to compensate for the gene dosage difference between males and females. Although most genes on the X are subject to this haplo-inactivation, many escape it7. Typically, these escapees have functional (or very recently decayed) Y homologs. This is consistent with the notion that X inactivation evolved gene by gene and, in each case, as a delayed response to the degeneration of a corresponding Y homolog8. The mechanisms by which X inactivation occurs are only partly understood. In particular, it is uncertain how X inactivation spreads from the X inactivation center (where the gene XIST resides) across the rest of the chromosome. One hypothesis holds that LINE1 (L1) repetitive elements may act as ‘way stations’ to facilitate the spread of X inactivation9. Ross et al.3 show not only that are L1 repeats enriched on the X relative to the rest of the genome, but also that their density in a given region roughly correlates with the region’s age of divergence from the Y chromosome and its completeness of inactivation. Regions long diverged from the Y show high L1 density and more thorough X inactivation, whereas regions recently diverged from the Y show low L1 density and greater tendency to escape inactivation. These data are consistent with (but do not yet prove) the hypothesis that L1 repeats are the way stations of X inactivation.

Selective forces on the X The evolutionary history of the sex chromosomes differs categorically from that of the autosomes. Notably, both the X and the Y are present less frequently than an autosome in the population. As predicted by population genetics theories, this should lead to fewer polymorphisms on the sex chromosomes. Ross et al.3 report that, as expected, the polymorphism level of the X is only ∼57% that of an autosome. Selective regimes also differ greatly between the two sex chromosomes. For the Y, two particularly prominent forces, asexual degeneration and constant directional selection, have been postulated to result in a chromosome that is generally gene-poor but specifically enriched for male-beneficial functions2. For the X, hemizygous exposure is theorized to drive a subtler accumulation of male-beneficial genes on the assumption that recessive male-beneficial alleles can more readily manifest their benefit from the X—a chromosome that is hemizygous (or ‘exposed’) in males— than from autosomes2,10. Extending previous findings11,12, Ross et al. showed that the X is enriched for cancer-testis antigen genes, a subset of testis genes that are also expressed in cancer cells. Of note, a recent study confirmed that the mammalian X is enriched for genes involved in early stages of spermatogenesis, consistent with the influence of

VOLUME 37 | NUMBER 4 | APRIL 2005 | NATURE GENETICS

© 2005 Nature Publishing Group http://www.nature.com/naturegenetics

NEWS AND VIEWS hemizygous exposure13. But the same study also showed a deficit of genes implicated in late stages of spermatogenesis, which was hypothesized to be an evolutionary response to male germline X inactivation that occurs at the onset of male meiosis13. Thus, the cancer-testis antigen genes shown by Ross et al. to be enriched on the X are probably early spermatogenesis genes. The human X is also mildly enriched for brain and skeletal muscle genes, which could potentially be explained by hemizygous exposure2. In addition to the accretion of certain malebeneficial genes, the mammalian X also seems to be enriched for female-beneficial genes such as those expressed in ovary and placenta2,13. This could be explained by sexual antagonism: because the X spends most of its time (two thirds) in females, genes benefiting females but harming males may tend to move from autosomes to the X, whereas genes benefiting males but harming females may tend to move from the X to autosomes (or the Y)2,11,14. Regardless of the selective forces, the X is predicted to be a hotbed of gene traffic. This was demonstrated by a recent study show-

ing that the X has disseminated as well as recruited a disproportionately high number of functional retroposed genes15. Thus, opposing selective forces have driven the gene content of the X to change in a manner far more complex than that of autosomes. The results are a suite of evolutionary outcomes that includes masculinization fostered by hemizygous exposure (e.g., enrichment of early spermatogenesis genes), demasculinization driven by sexual antagonism or the need to evade male germline X inactivation (deficit of late spermatogenesis genes) and feminization propelled by sexual antagonism (accretion of ovary and placenta genes). Compare and contrast The X and Y chromosomes have such tightly intertwined evolutionary histories that each can only be viewed in the light of the other (Table 1). The X chromosome gained early fame for its role in the discovery of recessive disease-associated alleles. Beyond this illustrious contribution to medical genetics, the X has come to have an important role in the understanding of genome evolution. The

NATURE GENETICS | VOLUME 37 | NUMBER 4 | APRIL 2005

newest work by Ross et al.3 now allows virtually every nucleotide of the X chromosome to be studied. Such a resource will surely produce additional insights into the function and evolution of the X chromosome—and, by extension, the Y chromosome. 1. Ohno, S. Sex Chromosomes and Sex-Linked Genes (Springer, Berlin, 1967). 2. Vallender, E.J. & Lahn, B.T. Bioessays 26, 159–169 (2004). 3. Ross, M.T. et al. Nature 434, 325–337 (2005). 4. Lahn, B.T. & Page, D.C. Science 286, 964–967 (1999). 5. Lahn, B.T. & Page, D.C. Science 278, 675–680 (1997). 6. Skaletsky, H. et al. Nature 423, 825–837 (2003). 7. Carrel, L., Cottle, A.A., Goglin, K.C. & Willard, H.F. Proc. Natl. Acad. Sci. USA 96, 14440–14444 (1999). 8. Jegalian, K. & Page, D.C. Nature 394, 776–780 (1998). 9. Lyon, M.F. Cytogenet. Cell Genet. 80, 133–137 (1998). 10. Rice, W.R. Evolution 38, 735–742 (1984). 11. Saifi, G.M. & Chandra, H.S. Proc. R. Soc. Lond. B Biol. Sci. 266, 203–209 (1999). 12. Wang, P.J., McCarrey, J.R., Yang, F. & Page, D.C. Nat. Genet. 27, 422–426 (2001). 13. Khil, P.P., Smirnova, N.A., Romanienko, P.J. & CameriniOtero, R.D. Nat. Genet. 36, 642–646 (2004). 14. Wu, C.I. & Xu, E.Y. Trends Genet. 19, 243–247 (2003). 15. Emerson, J.J., Kaessmann, H., Betran, E. & Long, M. Science 303, 537–540 (2004).

345