Genomic selection for growth and wood quality ... - Wiley Online Library

5 downloads 13229 Views 1MB Size Report
Genomic selection (GS) is expected to cause a paradigm shift in tree breeding by improving .... means of a Suunto PM-5 clinometer in CEN and a Haglof.
Research

Genomic selection for growth and wood quality in Eucalyptus: capturing the missing heritability and accelerating breeding for complex traits in forest trees Marcos D. V. Resende1,2, Ma´rcio F. R. Resende Jr2, Carolina P. Sansaloni3,4, Cesar D. Petroli3,4, Alexandre A. Missiaggia5, Aurelio M. Aguiar5, Jupiter M. Abad5, Elizabete K. Takahashi6, Antonio M. Rosado6, Danielle A. Faria3, Georgios J. Pappas Jr.3,7, Andrzej Kilian8 and Dario Grattapaglia3,7 1

EMBRAPA Forestry Research, Colombo, PR, 83411-000, Brazil; 2Universidade Federal de Vic¸osa – Vic¸osa MG, 36570-000, Brazil; 3EMBRAPA Genetic Resources and Biotechnology –

EPqB, 70770-910, Brasilia, DF, Brazil; 4Universidade de Brasilia – Campus Darcy Ribeiro Brası´lia, DF, 70910-900, Brazil; 5FIBRIA Celulose S.A., Rod. Aracruz ⁄ Barra do Riacho, km 25, Aracruz, ES, 29197-900, Brazil; 6CENIBRA Celulose Nipo Brasileira S.A, Belo Oriente, MG, 35196-000, Brazil; 7Universidade Catolica de Brası´lia- SGAN, 916 modulo B, Brasilia, DF, 70790-160, Brazil; 8DArT - Diversity Arrays Technology, POB 7141, Yarralumla, ACT, Australia 2600

Summary Author for correspondence: Dario Grattapaglia Tel: +55 61 34484652 Email: [email protected] Received: 26 October 2011 Accepted: 6 December 2011

New Phytologist (2012) 194: 116–128 doi: 10.1111/j.1469-8137.2011.04038.x

Key words: applied genomics, DArT, effective population size, Eucalyptus, genome-wide selection (GWS), genomic selection (GS), tree breeding.

• Genomic selection (GS) is expected to cause a paradigm shift in tree breeding by improving its speed and efficiency. By fitting all the genome-wide markers concurrently, GS can capture most of the ‘missing heritability’ of complex traits that quantitative trait locus (QTL) and association mapping classically fail to explain. Experimental support of GS is now required. • The effectiveness of GS was assessed in two unrelated Eucalyptus breeding populations with contrasting effective population sizes (Ne = 11 and 51) genotyped with > 3000 DArT markers. Prediction models were developed for tree circumference and height growth, wood specific gravity and pulp yield using random regression best linear unbiased predictor (BLUP). • Accuracies of GS varied between 0.55 and 0.88, matching the accuracies achieved by conventional phenotypic selection. Substantial proportions (74–97%) of trait heritability were captured by fitting all genome-wide markers simultaneously. Genomic regions explaining trait variation largely coincided between populations, although GS models predicted poorly across populations, likely as a result of variable patterns of linkage disequilibrium, inconsistent allelic effects and genotype · environment interaction. • GS brings a new perspective to the understanding of quantitative trait variation in forest trees and provides a revolutionary tool for applied tree improvement. Nevertheless population-specific predictive models will likely drive the initial applications of GS in forest tree breeding.

Introduction Genomic selection (GS) or genome-wide selection (GWS) was a landmark proposal made 11 yr ago providing a genome-wide paradigm for marker-assisted selection (MAS) in plants and animals (Meuwissen et al., 2001; Goddard & Hayes, 2009). This idea, based on the forecast of rapid technological advances and dropping costs of high-throughput genotyping, has now become reality, revolutionizing animal breeding (Hayes et al., 2009a,b; Hayes & Goddard, 2010) and raising awareness in plant breeding (Bernardo & Yu, 2007; Resende et al., 2008; Grattapaglia et al., 2009; Heffner et al., 2009; Jannink et al., 2009). Three major insights characterize this ground-breaking proposition. First, instead of the standard two-step MAS approach where marker–trait associations are first discovered and validated in one or a few representative families or populations and then used for 116 New Phytologist (2012) 194: 116–128 www.newphytologist.com

selection, GS estimates all marker effects simultaneously, retaining all of them as predictors of performance, thus precluding the prior search for significant marker–trait associations. Secondly, GS assumes that linkage disequilibrium (LD) provided by dense genotyping is sufficient to inform about QTL effects which are expected to be in LD with at least some of the queried markers. Thirdly, it avoids prior marker selection for the development of a prediction model so that estimated marker effects are sufficiently precise, unbiased and accurate to mitigate the quandary of how to capture the ‘missing heritability’ of complex traits (Manolio et al., 2009; Yang et al., 2010; Makowsky et al., 2011) likely explained by large numbers of small-effect QTLs that association genetics studies do not capture. No report exists to date on the actual use of QTL or association genetics (AG) information for operational tree breeding by MAS, in spite of the large volume of QTL mapping information  2012 The Authors New Phytologist  2012 New Phytologist Trust

New Phytologist and an increased number of AG reports in forest trees (Grattapaglia et al., 2009; Neale & Kremer, 2011). As proposed early on, reasons for this derive mainly from the rapid decay of LD in undomesticated tree genomes, such that marker–trait associations detected in specific mapping families do not hold in unrelated pedigrees (Strauss et al., 1992). Initial predictions that QTLs detected in biparental tree pedigrees would only be useful within the same or related families (Grattapaglia et al., 1995) have been confirmed by several reports showing that many more QTLs with small and variable effects across backgrounds and environments typically underlie complex traits in forest trees (Dillen et al., 2009; Rae et al., 2008; Ukrainetz et al., 2008; Mamani et al., 2010). Although AG was presented as a way to develop more efficient methods of MAS in forest trees (Neale & Savolainen, 2004), experimental results based on candidate genes (GonzalezMartinez et al., 2007; Wegrzyn et al., 2010) have not captured satisfactory fractions of the genetic variance to be valuable to breeding. These results challenge the significance of such mapping efforts for breeding practice that deals with a broad diversity and must capture large proportions of valuable allelic combinations for multiple traits. In GS an estimation or ‘training’ population involving several hundreds or thousands of individuals is genotyped for a genomewide marker panel and phenotyped for target traits of interest. From these data sets, prediction models are derived and validated in a ‘validation’ set using adequate methods to avoid overfitting. This model is subsequently used to calculate the genomeestimated breeding values (GEBVs) of the selection candidates for which only genotypes are recorded (Goddard & Hayes, 2009). Although these GEBVs, just as QTLs, do not say much about the function or identity of the underlying genes, they have provided accurate selection criteria both in simulation studies (Meuwissen et al., 2001; Bernardo & Yu, 2007; Jannink et al., 2009; Heffner et al., 2010; Grattapaglia & Resende, 2011) and in an increasing number of experimental validation reports in domestic animals (Luan et al., 2009; VanRaden et al., 2009; Daetwyler et al., 2010; Moser et al., 2010), mice (Lee et al., 2008; Legarra et al., 2008) biparental populations of maize, barley and Arabidopsis (Lorenzana & Bernardo, 2009), trials of wheat and maize lines (Crossa et al., 2010) and advanced breeding populations of maize (Albrecht et al., 2011). As high-throughput genotyping has experienced constantly decreasing cost, while phenotyping costs are relatively stable, GS has become routine in some dairy cattle (Hayes et al., 2009a,b) and private plant breeding (Eathington et al., 2007) programs. Genomic selection in forest trees and perennial crops is particularly attractive because of the potential of increasing gain per unit time and enhancing selection for low heritability traits (Wong & Bernardo, 2008; Jannink et al., 2010; Grattapaglia & Resende, 2011). We originally proposed GS in the context of forest tree genetics and breeding as a promising way to capture larger proportions of variation for growth traits (Resende et al., 2008; Grattapaglia et al., 2009). Using deterministic simulations we subsequently showed that the extent of marker–QTL LD, modeled by varying the effective population size (Ne) and genotyping density, had the largest impact on the accuracy of GS. By  2012 The Authors New Phytologist  2012 New Phytologist Trust

Research 117

shortening the breeding cycle we also predicted that GS could radically improve the efficiency of forest tree breeding practice (Grattapaglia & Resende, 2011), a conclusion also reached by a subsequent study using stochastic simulations for a conifer breeding program (Iwata et al., 2011). While the limited extent of LD in natural populations of forest trees (Neale & Savolainen, 2004) would make GS economically unfeasible, marker–QTL LD is readily increased by limiting Ne, a standard practice in tree breeding. The generation of new LD as a result of the reduction in Ne is therefore a key element for the prospects of GS in tree breeding at currently economically viable genotyping densities of two to five markers per cM (Eckert et al., 2010; Sansaloni et al., 2010), although higher marker densities are becoming practical at competitive costs (Elshire et al., 2011). In contrast to animal breeding, where little control over the effective population size is possible, tree breeders have ample autonomy to establish elite populations. Tree breeding populations with Ne between 20 and 50 support selection with appreciable genetic gains for several generations (Namkoong et al., 1988). Such elite breeding populations have increasingly been used in advanced programs worldwide (McKeand & Bridgwater, 1998; Resende & de Assis, 2008). In Eucalyptus, major breakthroughs in productivity and wood quality have been achieved by advanced hybrid breeding involving populations of 10–50 elite parents exploiting the wide interspecific variation for cold and drought tolerance, growth and wood quality coupled to clonal propagation (Grattapaglia & Kirst, 2008; Resende & de Assis, 2008). Such strategies are particularly suited to the application of GS. It is in the framework of such specialized Eucalyptus breeding programs that the operational application of GS is envisaged. Experimental validations of GS in forest trees and perennial plants in general are critically needed to provide the necessary support for its recommendation in applied breeding. From the fundamental standpoint, it is also appealing to assess the power of this approach in capturing the genetic variance controlling quantitative traits while learning about the genome positions of the markers that optimize predictive accuracy of GS. In this work we extend our initial assessments of GS and describe a comprehensive proof-of-concept study with two large genetically unrelated Eucalyptus breeding populations with contrasting Ne. We demonstrate that GS not only captures large fractions of trait heritability but achieves selection accuracies as good as, or better than, those attainable by conventional phenotypic selection for all growth and wood quality traits evaluated. Finally, we show that in spite of a largely coincident genomic distribution of the loci controlling the same trait in the two populations, GS models have low predictive accuracies across populations, highlighting the fact that population- and environment-specific genome-enabled predictions will likely drive the application of GS in tree breeding.

Materials and Methods Breeding populations and phenotypes The study was carried out with two standard, genetically unrelated Eucalyptus elite breeding populations belonging to two New Phytologist (2012) 194: 116–128 www.newphytologist.com

New Phytologist

118 Research

Brazilian pulp and paper companies, CENIBRA (CEN) and FIBRIA (FIB) (Table 1). Height growth (HG) was measured by means of a Suunto PM-5 clinometer in CEN and a Haglof Vertex Hypsometer in FIB. Circumference at breast height (CBH; c. 130 cm from the ground) over bark was measured with a diameter tape. In CEN wood specific gravity (WSG) and pulp yield (PY) were measured on felled trees. WSG was measured by the water displacement method using a 3- to 5-cm-thick wood disk sampled at breast height while PY was measured by batch kraft digestion of 150 g of wood chips at 15–18% effective alkali (NaOH) to obtain kappa of c. 17 per sample. In FIB, WSG was measured on the standing tree using the pin penetration of a 6 J Pilodyn (Proceq, Schwerzenbach, Switzerland) at breast height. A 12-mm-diameter core was taken from bark to bark through the centre of each tree at breast height and in a north–south direction. Each core was air-dried, ground to wood meal and used to indirectly measure pulp yield by taking a near infrared reflectance spectroscopy (NIRS) reading using a Foss NirSystem 5000 and applying calibration curves developed by Fibria using methods described previously (Raymond & Schimleck, 2002). DArT marker genotyping

Genetic value predictions from field experiments and adjusted phenotypes Analyses were carried out using the mixed model methodology (Lynch & Walsh, 1998), using SELEGEN-REML ⁄ BLUP software (Resende & Oliveira, 1997) under the following mixed linear model: y ¼ X b þ Za þ Wb þ e where y is the vector of the trait being analyzed; b is a vector of fixed effects (i.e. general mean and experiments effects); a is a vector of random additive genetic effects of individuals; b is a vector of random incomplete block effects; e is the random residual effect; and X, Z and W are incidence matrices for b, a and b, respectively. The variance structure of the model was as follows: y jb; V  N ðX b;V Þ; a jA;r2a  N ð0; Ar2a Þ;  b r2b  N ð0; I r2b Þ;  er2e  N ð0; I r2e Þ; Cov ða; b 0 Þ ¼ 0;

DArT genotyping was carried out at DArT Pty (Yarralumla, Australia) using a 7680 DArT-probe microarray as described previously (Sansaloni et al., 2010). DArT genotyping and marker evaluation were accomplished based on: reproducibility ‡ 95% as measured by the concordance of the genotype call between the two DArT clone replicates on the array; marker quality Q-value ‡ 70% which measures between-cluster variance as a percentage of total variance in signal distribution among the genotyped samples; marker call rate ‡ 80%, that is, the percentage of effective scores.

Cov ða; e0 Þ ¼ 0;

Cov ðb; e0 Þ ¼ 0;

V ¼ ZAr2a Z 0 þ WI r2b W 0 þ I r2e ¼ ZGZ 0 þ WBW 0 þ R; G ¼ Ar2a ; B ¼ I r2b ; R ¼ I r2e : A, matrix of additive genetic relationships among individuals; I, identity matrix.

Table 1 General attributes of the two breeding populations studied Attribute

CENIBRA population (CEN)

FIBRIA population (FIB)

Total no. of trees in progeny trial Total no. of families in progeny trial Average no. of trees per family No. of parents crossed Eucalyptus species of parents

c. 4900 43 c. 115 11 E. grandis · E. urophylla F1 hybrids

Mating design No. of experiments Municipality(ies)

Age of measurements (yr)

Incomplete diallel 3 Sabino´polis 1834¢S ⁄ 4258¢W Virgino´polis 1834¢S ⁄ 4231¢W Antonio Dias 1922¢S ⁄ 4247¢W 1012–1273 15–26 1152–1280 Randomized complete block with 36 reps; single-tree plots; spacing 3 · 2 m 3 for all traits

c. 9400 232 c. 40 51 E. grandis, E. urophylla and E. globulus and F1 hybrids of these species Incomplete diallel 1 Aracruz 1949¢S ⁄ 4005¢W

No. of families sampled for genomic selection (GS) Effective population size (Ne) No. of individuals ⁄ family sampled for GS No. of individuals sampled for GS

43 11 15–23 738

Altitude (m) Temperature range (C) Precipitation (mm) Experimental design

New Phytologist (2012) 194: 116–128 www.newphytologist.com

20 22 1200 Alpha-lattice with 40 reps; single-tree plots; spacing 3 · 2 m 3 for growth traits 3.7 for wood traits 75 51 10–15 920

 2012 The Authors New Phytologist  2012 New Phytologist Trust

New Phytologist

Research 119

The mixed model equations for the best linear unbiased predictor (BLUP) were as follows:

employed (Resende & Oliveira, 1997; Resende et al., 2008) and also a script written in R (M.F.R. Resende Jr. unpublished).

2

Cross-validation and accuracy of genomic selection

X 0X 4 Z 0X W 0X

X 0Z 0 Z Z þ A 1 k1 W 0Z

32 3 2 0 3 ^ X y X 0W b 0 5 4 ZW ^ a 5 ¼ 4 Z 0 y 5; W 0y W 0 W þ I k2 b^

where: k1 ¼

r2e 1  h 2  b 2 ¼ ; r2a h2

k2 ¼

r2e 1  h 2  b 2 ¼ r2b b2

r2

h 2 ¼ r2 þr2a þr2 : individual narrow sense heritability across incoma

b

e

plete blocks. b 2 ¼ r2b =ðr2a þ r2b þ r2e Þ: coefficient of determination of incomplete block effects. r2a , additive genetic variance; r2b , variance among blocks; r2e , residual variance. Variance components were estimated by restricted maximum likelihood (REML) (Lynch & Walsh, 1998). Predicted additive genetic values were deregressed and corrected for parents’ effects to obtain the adjusted phenotypic values to be used for genomic predictions. The deregressing approach employed (Garrick et al., 2009) corrects the raw phenotypic data for environmental and parent average effects removing familial structure. Other nongenetic sources of variation were corrected by fitting environmental fixed and random effects in the mixed model. Estimation of molecular marker effects and genomeenabled capture of heritability The markers had their effects estimated adjusting all the allelic effects simultaneously using the random regression best linear unbiased predictor (RR-BLUP) (Meuwissen et al., 2001) as described previously (Resende et al., 2008, 2012). Differently from that previous study, however, dominant markers were used so that the values of Zij in the incidence matrix Z are 0 or 1 for marker genotypes mm and MM or Mm, respectively. This prediction equation assumes a priori that all loci explain an equal amount of the genetic variation, as previously described by (Meuwissen et al., 2001) and applied by others (Bernardo & Yu, 2007; Muir, 2007). Therefore, the genetic variation of each locus is given by r2a =g where g is related to the number P nm of markers used in the prediction model and is given by g ¼ ni m pi ð1pi Þ (Resende et al., 2008; Gianola et al., 2009). The total additive genetic variance r2a was estimated by restricted maximum likelihood (REML) based on phenotypes. The recovered additive genetic variance and the corresponding trait heritability captured by the markers were estimated based on the proportion of the genetic variation explained by the markers using an approach described previously (Gianola et al., 2009; Resende et al., 2010). To assess the increase in captured heritability with increasing number of markers, this analysis was carried out for progressively larger sets of markers ranked by their absolute effect. To fit RRBLUP, the genomics module of the software SELEGEN was  2012 The Authors New Phytologist  2012 New Phytologist Trust

The estimated markers effects were validated using a ‘leave-oneout’ cross-validation scheme. Briefly, a single individual from the population was used as the validation set, and the remaining individuals as the estimation or training set. This was repeated such that each individual in the sample was used once as the validation data. This process was repeated N times, using each time a different set of individuals for estimation and one different individual for validation until all individuals had their phenotypes predicted and validated. This method maximized the estimation and validation population sizes so that eventually the training population had (N ) 1) individuals in each set and the validation population consisted of the whole set of N individuals genotyped and phenotyped. Each individual tree had its GEBVs predicted by multiplying the incidence matrix Z for the marker by the vector of estimated marker effects and summing thePestimated general mean according to the expression: ^yj ¼ u^ þ Zij m ^i i The accuracy of GS ðrq^q Þ to predict breeding values was calculated by the correlation of the GEBV with the adjusted phenotypes (deregressed predicted additive genetic values corrected for parents effects) (y) and then dividing it by the square root of trait 2 heritability. Theoretically the accuracy of GS depends on ðrmq Þ, the proportion of the genetic variation explained by the markers (degree of LD), which is a function of Ne and the chromosome segment length (cM) between markers as given generally by Sved’s equation (Sved, 1971); and rmm^ , the accuracy of the prediction of the marker effects in LD with the QTL. The analyses were carried out using the information of all the markers that provided the maximum accuracy. The expected gain from genomic selection was compared with conventional phenotypic selection considering different reductions in the breeding cycle duration as a result of early GS as described previously (Grattapaglia & Resende, 2011). Genome predictions across populations and genome positioning of GS fitted markers Two comparative analyses of the GS results were carried out between the breeding populations. First, prediction accuracies of GS across populations were obtained for the four traits. In essence, we simply applied a predictive model developed for a particular trait in CEN to predict the same trait in individuals belonging to FIB and vice versa. Secondly, we compared the genome positions of the markers that were fitted into the genomic prediction models in the two populations for each trait separately. The corresponding DArT marker sequences (GenBank accession numbers HR865291–HR872186) were mapped on to the assembled Eucalyptus grandis genome (version 6.1 available in Phytozome) after partitioning it in 500 kb virtual bins corresponding to c. 1 cM based on a simple correspondence between the size of the assembled Eucalyptus genome (609 Mbp) and the New Phytologist (2012) 194: 116–128 www.newphytologist.com

New Phytologist

Results DArT markers provide suitable genotyping density and coverage for GS Given the large samples sizes of the populations, DArT markers with a minimum allele frequency (MAF) ‡ 0.01 corresponding P to a gene diversity (GD) ‡ 0.02 ðGD ¼ 1  ni¼1 pi2 Þ were used in all subsequent analyses to allow the capture of rarer alleles at relevant loci. The total number of high quality and polymorphic markers scored was slightly higher for FIB (3564) than for CEN (3129), reflecting the most diverse genetic composition of the former. The distributions of percentage markers across the filtering criteria were similar between the two populations, with ‡ 80% of the markers with reproducibility ‡ 99%, call rate ‡ 90% and GD ‡ 0.25 in both populations (Supporting Information, Fig. S1). Over 87% of the genotyped markers (2720 markers for CEN and 3145 for FIB) could be physically mapped with high confidence on to the 11 main scaffolds of the current Eucalyptus genome assembly (608.5 Mbp) while the remaining markers (c. 13%) mapped to the additional unassembled scaffolds. A realized average density of one marker every 193–223 kb was achieved. At the physical level, using a conservative analysis based on 500 kbp bins that correspond, on average, to 1 cM recombination, 912 of the 1217 bins, that is, 75% of the genome, was tagged by the DArT markers (Fig. S2). These results indicate that in spite of the fact that the 7680 DArT probes in the array were selected based exclusively on polymorphism detection and high-quality signal-to-noise ratio (Sansaloni et al., 2010), they sample the genome in a relatively well distributed fashion, thus providing a good tool to carry out an experiment that requires a whole-genome analysis. From the recombination standpoint, assuming an average recombining genome size for Eucalyptus between 1200 and 1300 cM (Brondani et al., 2006), an expected average genotyping density ‡ two markers per cM was achieved in both populations. Such a density corresponds to an average intermarker distance equal to 0.5 cM, a value used to estimate the proportion of the genetic variation explained by the markers (see the Materials and Methods section). Capture of trait heritability by genome-wide markers For each trait, the heritability estimated from the phenotypes and pedigree data was regarded as the upper limit that could be explained by the GS model. Markers were ranked by the largest to the smallest effect in absolute value and used to estimate the captured fractions of the trait heritability (h2). In both populations for most traits, ‡ 80% of h2 could be captured with 200 markers of largest effect (Fig. 1). The increase in h2 captured was similar for CBH, HG and WSG, while for PY a plateau was reached earlier. For CBH, HG and WSG, the increase in the h2 captured was slower after 50 markers, reaching a plateau at 95–97% of captured h2 in CEN and 70–82% in FIB with 300– New Phytologist (2012) 194: 116–128 www.newphytologist.com

CENIBRA

1.0 0.8 0.6 0.4 0.2 0.0

0

100

200

300

400

500

Number of markers with largest effect CBH

HG

WSG

PY

FIBRIA

1.0

Frac on of h2 captured

best estimates of the size of the recombining Eucalyptus genome, estimated between 1200 and 1300 cM (Brondani et al., 2006).

Frac on of h2 captured

120 Research

0.8 0.6 0.4 0.2 0.0

0

100

200

300

400

500

Number of of markers with largest effect CBH

HG

WSG

PY

Fig. 1 Fractions of trait heritability (h2) captured for growth and wood quality traits in the two Eucalyptus breeding populations (CENIBRA and FIBRIA) by increasingly larger numbers of markers with largest absolute effect computed by a genome-wide estimation and prediction approach of the adjusted phenotypic records on marker genotypes (CBH, circumference at breast height; HG, height growth; WSG, wood specific gravity; PY, pulp yield).

500 markers of largest effect, consistent with the smaller effective population size of CEN. For PY no increase in captured h2 was observed after considering 25 markers in CEN and 50 markers in FIB, and a slight drop was seen in FIB possibly as a result of overfitting for this trait for which a smaller training set was used to estimate marker effects. Predictive abilities and accuracies of GS Moderate to high prediction accuracies of GS were estimated from cross-validation for all four traits in both populations (Table 2). Higher estimates were obtained in CEN as expected, as a result of its smaller Ne that results in a more extensive LD. Accuracies of prediction models ranged between 0.74 and 0.88 in CEN, and 0.55 and 0.73 in FIB, depending on the trait. More markers were needed to maximize accuracy for CBH and HG in both populations consistent with the expectation of more loci controlling these traits when compared with WSG and PY. The number of markers that maximized prediction accuracy ranged from 564 for PY to 1543 for CBH in FIB. Genomic predictions were generally unbiased as revealed by regression coefficients close to one, indicating that the variability between predicted and true genetic values was similar. In both populations, PY was measured in a smaller set of individuals because of the higher cost of measuring this trait. The smaller training and validation set for  2012 The Authors New Phytologist  2012 New Phytologist Trust

New Phytologist

Research 121

Table 2 Experimental results of genomic selection (GS) for growth and wood quality traits in two Eucalyptus breeding populations (CENIBRA and FIBRIA) CENIBRA (Ne = 11)

Trait heritability (estimated from field experiments) Number of individuals N in the training (N ) 1) and validation (N) populations1 Number of markers that maximized accuracy of the GS model Predictive ability2 Accuracy of GS (experimental)3 Accuracy of GS (simulations)4 Regression coefficient of observed on predicted phenotypes Accuracy of conventional phenotypic BLUP selection5 Proportion of the accuracy of phenotypic BLUP selection recovered by GS Estimated number of QTL controlling trait variation6

FIBRIA (Ne = 51)

CBH

HG

WSG

PY

CBH

HG

WSG

PY

0.53 780 1429 0.54 0.74 0.71 0.96 0.80 0.93 233

0.42 780 1177 0.51 0.79 0.71 0.99 0.76 1.04 189

0.59 820 1455 0.60 0.78 0.72 0.99 0.83 0.94 263

0.38 594 777 0.54 0.88 0.69 1.00 0.73 1.20 142

0.56 920 1543 0.55 0.73 0.58 1.01 0.82 0.90 276

0.48 920 1174 0.46 0.66 0.59 0.99 0.79 0.84 211

0.42 920 926 0.42 0.65 0.60 0.99 0.77 0.84 164

0.47 650 564 0.38 0.55 0.63 0.98 0.74 0.75 97

Ne, effective population size; CBH, circumference at breast height; HG, height growth; WSG, wood specific gravity; PY, pulp yield. 1 Cross-validation was carried out by a ‘leave-one-out’ jackknife procedure so that the size of training population was N ) 1 while the size of the validation population was N. 2 Correlation between the observed and predicted breeding values obtained by cross-validation (Fig. S3). 3 Calculated as the ratio between the predictive ability and the square root of trait heritability (Dekkers, 2007). 4 Estimated as described previously (Grattapaglia & Resende, 2011). 5 Estimated by restricted maximum likelihood (REML) ⁄ best linear unbiased predictor (BLUP) based on phenotypes and pedigree information (Lynch & Walsh, 1998). 6 Estimated from the effective number of chromosome segments (Goddard, 2009; Resende et al., 2010).

this trait possibly contributed to a smaller number of markers in the model when compared with the other traits. However, while in FIB this fact might explain the smaller than expected accuracy, in CEN a higher accuracy was observed. This discrepancy could arise as a result of the fact that PY in CEN was measured by batch kraft digestion, a higher-precision method than NIRS. Accuracies of GS estimated from this proof-of-concept experiment were consistent with those expected from simulations, although slightly larger for most traits. The recent hybrid origin of both populations is probably contributing to a longer-range LD than the one expected from theory, resulting in an improved ability of capturing genetic variation given a fixed genotyping density. Accuracies of GS matched or surpassed those provided by phenotypic BLUP selection in CEN and recovered between 75 and 90% of such accuracies in FIB. The estimated number of QTLs controlling the trait variation ranged between 142 and 263 for CEN and between 97 and 276 for FIB. These numbers were consistent with the theoretical expectations given the effective population size and size of the recombining genome in Morgans (Hayes & Goddard, 2001), although in the case of PY it is probably underestimated as a result of the smaller training population. Increase of selection efficiency by genomic-enabled predictions The value of GS over phenotypic selection materializes when the breeding cycle is reduced by performing early selection for yet-to-be observed phenotypes. The expected increase of selection efficiency of GS in the two experimental populations was modeled for increasing reductions in the time necessary to complete a breeding cycle (Fig. 2). These increased efficiencies from experimental data matched those predicted from earlier simulations (Grattapaglia & Resende, 2011). The intersection at zero values  2012 The Authors New Phytologist  2012 New Phytologist Trust

on the X and Y axes correspond to the efficiency of traditional BLUP-based phenotypic selection. Positive values of Y correspond to gains in efficiency of selection by adopting GS. If a modest reduction of only 50% in the length of a breeding cycle is achieved, efficiency gains between 50% for FIB and 100% for CEN are expected for all traits. If reductions of 75% are biologically feasible, a remarkable increase of 300% for CEN and 200% for FIB for all traits could be achieved. Suitability of GS prediction models across populations A pertinent question is whether a GS model fitted to one population is suitable to predict phenotypes in an unrelated population. When a model developed for CEN was used to predict phenotypes in FIB or vice versa, the accuracies declined drastically (Table 3). The percentage of markers that could be used to carry out these predictions varied between 60 and 79%. For example, out of the 1177 DArT markers that maximized the accuracy of the GS model for HG in CEN, only 899 (76%) were also genotyped in FIB, the remaining were either not polymorphic or did not pass the quality filtering criteria. Therefore the prediction in FIB was carried out using these 899 markers only. However, irrespective of the proportion of markers shared, accuracies were close to zero, suggesting that this was not an important element to explain the reduction in accuracies. Different size effects associated with the markers, variable patterns of LD and genotype · environment interaction are the likely factors explaining the reduction (see the Discussion section). Comparative genomic distribution of markers fitted into the GS models A comparative analysis of the physical distribution of the markers fitted in the GS models in the two populations was carried out New Phytologist (2012) 194: 116–128 www.newphytologist.com

New Phytologist

122 Research CBH

700 600 500 400 300 200 100 0

–100

0

CEN Experim.

25 50 75 Reduc on of the breeding cycle (%) FIB Experim.

HG

800 Increase inselec on efficiency (%)

Increase inselec on efficiency (%)

800

CEN Simul.

100

700 600 500 400 300 200 100 0

–100

FIB Simul.

0

CEN Experim.

25 50 75 Reduc on of the breeding cycle (%) FIB Experim.

WSG 800 Increase in selecc on efficiency (%)

Increase in selecc on efficiency (%)

FIB Simul.

PY

800 700 600 500 400 300 200 100 0 –100

CEN Simul.

100

0

CEN Experim.

25 50 75 Reduc on of the breeding cycle (%) FIB Experim.

CEN Simul.

100

FIB Simul.

700 600 500 400 300 200 100 0

–100

0

CEN Experim.

25 50 75 Reduc on of the breeding cycle (%) FIB Experim.

CEN Simul.

100

FIB Simul.

Fig. 2 Increase in selection efficiency of genomic selection (GS) over traditional best linear unbiased predictor (BLUP)-based phenotypic selection with increasing percentage reduction in the length of a breeding cycle for Eucalyptus CENIBRA (CEN) and FIBRIA (FIB) populations. Estimates were obtained based on the GS accuracies from the experimental data (Experim., solid lines) and plotted together with the predicted GS accuracies from simulated data (Simul., dashed lines) generated as described previously (Grattapaglia & Resende, 2011). CBH, circumference at breast height; HG, height growth; WSG, wood specific gravity; PY, pulp yield. Table 3 Prediction accuracies of genomic selection (GS) across the two breeding populations: CENIBRA (CEN) and FIBRIA (FIB) GS model estimated in CEN predicting in FIB

GS model estimated in FIB predicting in CEN

Trait

CBH

HG

WSG

PY

CBH

HG

WSG

PY

No. of DArT markers used in the GS model developed in the estimation population No. of DArT markers of the GS model of the estimation population also genotyped in the prediction population % DArT markers of the GS model that could be used for prediction Predictive ability Accuracy of GS

1429

1177

1455

777

1543

1174

926

564

1132

899

1129

594

1010

744

560

381

78 ) 0.07 ) 0.11

76 ) 0.09 ) 0.14

79 0.04 0.05

76 0.14 0.21

65 0.14 0.19

63 0.04 0.06

60 0.01 0.02

68 ) 0.03 ) 0.05

CBH, circumference at breast height; HG, height growth; WSG, wood specific gravity; PY, pulp yield.

based on the presence or absence of at least one significant marker–trait effect in the bin (Fig. 3). The reasons for using this more conservative approach instead of the total number of significant markers per bin is the relatively long-range LD in these populations, so that when several trait-associated markers are found in the same 500 kb bin they are likely linked to the same effect. Furthermore, the number of markers per bin varies with the sequence polymorphism of that particular genome segment so that a test based on the total number of significant markers per bin would be biased. All tests at the whole-genome level (Pearson New Phytologist (2012) 194: 116–128 www.newphytologist.com

chi-squared) were highly significant (P < 0.001); at the individual chromosome level, all 11 Fisher exact tests were significant for CBH, 10 for HG, eight for WSG and eight for PY (P < 0.01). These results indicate that loci underlying trait variation in CEN and FIB overlap significantly more frequently than expected as a result of chance alone for all four traits. In other words, the distribution of the genomic segment maximizing selection accuracy was highly coincident between the two populations in spite of the low suitability of the prediction models between populations.  2012 The Authors New Phytologist  2012 New Phytologist Trust

New Phytologist

Research 123

Fig. 3 Comparative distribution of the genomic regions (in 500 kbp bins) containing markers that maximized accuracy of genomic selection (GS) in the two Eucalyptus breeding populations. DArT markers associated with trait variation were mapped on to the 11 chromosomes of the 608.5 Mbp Eucalyptus genome assembly partitioned in 500 kbp bins. White bins, no significant marker–trait effect detected; green bins, one or more significant marker–trait effects detected in both populations; yellow and blue bins, effects for only one population or the other. CBH, circumference at breast height; HG, height growth; WSG, wood specific gravity; PY, pulp yield.

 2012 The Authors New Phytologist  2012 New Phytologist Trust

New Phytologist (2012) 194: 116–128 www.newphytologist.com

124 Research

Discussion We have presented the first experimental results of genomeenabled prediction accuracies for growth and wood quality traits in Eucalyptus and among the first ones in trees in general. More importantly, however, these results were generated for two genuine tree breeding program settings with regard to the effective population size, genetic composition, sizes of training and validation populations and phenotyping procedures, providing real-life data on the prospects of GS for accelerating selection for complex traits in forest trees. Accuracies of GS varied between 0.55 and 0.88 depending on the trait and effective population size, closely matching the accuracies achieved by conventional phenotypic selection and corroborating the encouraging results of our earlier simulations. Substantial proportions (74–97%) of the trait heritability could be captured by fitting all of the genome-wide markers simultaneously, confirming that the GS approach brings a new perspective to the understanding of quantitative trait variation in forest trees and a revolutionary tool for applied tree breeding. Genome-wide approaches capture large fractions of trait heritabilities Trait heritabilities estimated by the resemblance between relatives in the two populations fell within the ranges typically estimated for these traits in Eucalyptus (Apiolaza et al., 2005; Rosado et al., 2009). Large proportions (74–97%) of these heritabilities were captured in both populations for all four traits, leaving a relatively small fraction of the estimated heritability unexplained or ‘missing’ (Fig. 1). These results are consistent with the expectations of modeling all genome-wide markers concurrently instead of the one-marker-at-a-time hypothesis testing of association genetics (Meuwissen et al., 2001). Such a whole-genome prediction approach has been recently shown to successfully capture similarly large fractions of the heritability for human height (Yang et al., 2010; Makowsky et al., 2011). Although our estimates of captured heritability in tree growth closely match those obtained for human height growth, a key difference exists between the experiments. The human study dealt with a much larger effective population size which evidently demanded a significantly higher genotyping density with over 40 000 single nucleotide polymorphisms (SNPs) needed to capture 90% of the heritability. In our populations the effective population sizes were much smaller, consistent with the sizes employed in advanced breeding of forest trees. With a much more extensive LD, only a few hundred markers were necessary to capture similar fractions of the heritability for height growth. Notwithstanding all other differences in the biology of the organisms involved, these results confirm that a genome-wide approach with genotyping density calibrated to the degree of LD in the population effectively explains a large fraction of the genetic variation of complex traits. Recently, Hamblin et al. (2011) discussed the differences in population genetics properties, trait architecture and mode of reproduction that led to more successful AG results in crops than in humans. In trees, however, the recent domestication and large

New Phytologist (2012) 194: 116–128 www.newphytologist.com

New Phytologist effective population size of natural populations used in AG experiments more closely resemble humans, making the ‘missing heritability’ dilemma fully relevant to forest tree populations in natural conditions. Experimental results have confirmed this expectation, with very small proportions of genetic variance explained by the associations found in forest trees (GonzalezMartinez et al., 2007; Thumma et al., 2009; Wegrzyn et al., 2010). Our results demonstrate, however, that the genome-wide approach of GS, applied to tree breeding populations that have undergone some amount of domestication bottleneck and directional selection, can capture large fractions of the genetic variation underlying quantitative traits. Interestingly, our results also show that, given the extensive LD of these populations, c. 200 markers of largest effect already capture over 80% of the heritability, although the individual effects of these markers hardly surpass 1%, and thus are comparable to the size effects found for SNPs in AG experiments of forest trees. Nevertheless, while in AG markers have been selected a priori in candidate genes aiming at allele mining in low-LD populations, in GS markers are selected a posteriori from a large number of genome-wide markers in high-LD breeding populations, therefore allowing a better opportunity to capture relevant alleles at multiple loci in disequilibria along the genome. Although GS makes no progress in discovering genes or functional polymorphisms underlying complex traits, larger effect markers fitted in the GS predictive models may provide unbiased genomic positions wherefrom allele mining efforts could be proposed. Accuracy of GS matches that of conventional BLUP phenotypic selection Confirming earlier simulation-based predictions for forest tree breeding scenarios (Grattapaglia & Resende, 2011; Iwata et al., 2011), the experimental accuracies estimated in this study confirm that genome-enabled predictions have true potential to radically improve the efficiency and speed of tree breeding (Table 2). The accuracies of our GS models matched or even exceeded those calculated for conventional pedigree-based phenotypic selection for all traits in both populations, with a slight advantage when the effective population size was smaller. The reduction of effective population size together with directional selection not only extended LD but possibly contributed by shifting average allele frequencies toward intermediate values, eliminating rare alleles so that QTLs tended to segregate at frequencies more similar to the frequency of marker alleles used for prediction (Hamblin et al., 2011). Given the clear, although modest, DArT marker divergence between Eucalyptus species (Steane et al., 2011), the recent hybridization that occurred in both breeding populations might also have contributed to an increased extent of LD with a positive impact on the estimated accuracies of GS. Three elements should be highlighted regarding these estimates that make them useful benchmarks for assessing the prospects of GS in Eucalyptus and tree breeding in general. First, the use of deregressed phenotypes makes these GS models free of familial relatedness (Garrick et al., 2009), capturing only the relevant

 2012 The Authors New Phytologist  2012 New Phytologist Trust

New Phytologist marker–trait LD in a way that accuracies should hold in independent samples and future generations with adequate model updating. Secondly, the leave-one-out validation maximized the training and validation population sizes with no cross-contamination of estimates, providing the best possible sampling of allelic effects (given the afforded training ⁄ validation set) and thus optimizing the power to estimate genomic breeding values. Thirdly, our experimental assessment of GS was carried out for several traits with variable heritability in two different populations and the accuracies were always close to those anticipated by simulations. Furthermore, our results are in line with those obtained for growth traits in loblolly pine (Resende et al., 2012) and the growing body of experimental reports of GS in annual crops (Crossa et al., 2010; Albrecht et al., 2011). Finally, our GS results, similar to the few others in plants, have been better than those reported for domestic animals (Hayes et al., 2009a,b; Moser et al., 2010). Reasons for this include not only those that have made AG results in plants more rewarding than in humans (Hamblin et al., 2011), but also the opportunity of controlling the Ne of the target population, together with the possibility of assembling large training populations and phenotyping them at high precision.

Research 125

sequence allowed us to compare these results with the genomewide positioning of the GS fitted markers in each population. This contrast revealed an interesting result: while GS accuracies across populations were poor, a highly significant level of coincidence across populations was observed regarding the location of the relevant genomic segments underlying complex traits (Fig. 3). These apparently incompatible results are probably expected, suggesting that although there is a significant coincidence of the genome position of the loci that explain trait variation, the allelic effects vary across populations, making predictions inaccurate. This variation has two major components to it that make phenotype predictions ineffective: when markers capture the same loci across populations they probably tag different alleles as a result of inconsistent coupling–repulsion relationships between marker and QTL alleles; when the same alleles are tagged by the same markers, that is, the linkage phase is consistent across populations, the relative allelic effect varies as a result of the surrounding genetic background or of genotype · environment interaction. Our experimental results at the genomewide level essentially corroborate the theoretical predictions of poor transferability of marker–QTL associations across populations of forest trees (Strauss et al., 1992).

GS will likely require population-specific predictive models The use of GS models across populations is an important issue for the operational use of GS by a program that sustains forest plantation with distinct breeding populations tailored to different environments. In plants, no study to date has looked into this issue across genetically unrelated populations. In domestic animals, the few studies to date showed that a GS model fitted to one population cannot predict phenotypes in an unrelated population unless genotyping density is increased (Hayes et al., 2009a,b). In our study we show that the predictive models had no appreciable accuracy for any trait when going from CEN to FIB or vice versa (Table 3). Our results indicate that GS prediction models will likely be population-specific. Multipopulation GS models might be feasible with increased genotyping density so that marker– QTL linkage phase would persist across populations. However, the genotype · environment interaction might supersede the persistence of LD relationships and eventually cause equally unacceptable accuracies. In fact, a recent GS study of a cloned loblolly pine population planted in different sites showed that genotype · environment interaction may severely affect the transferability of GS models across breeding zones (Resende et al., 2012). A safer and more accurate GS applicable across a range of environments is likely to come from fitting models taking into account the genotype · environment interaction effects. Genomic regions underlying trait variation significantly overlap between populations Genomic selection presupposes that phases of LD between markers and QTLs are the same in the selection candidates and the training population. Poor performance of GS across populations is therefore consistent with the genetics of outbred forest trees in linkage equilibrium. The availability of a Eucalyptus genome  2012 The Authors New Phytologist  2012 New Phytologist Trust

Breeding by GS in tropical Eucalyptus We had previously outlined the opportunities and challenges of GS in forest trees based on deterministic simulations (Grattapaglia & Resende, 2011). The experimental results presented here substantiate those views and confirm that the assumptions underlying those simulations, that is, a strict infinitesimal model and build-up of LD following Sved’s equation (Sved, 1971), were generally valid. Gains expected from GS in these Eucalyptus populations will largely derive from the ability to shorten the breeding cycle by ultra-early selection for yet-to-be observed phenotypes at the seedling stage. Gains in selection efficiency should reach 50% if the breeding cycle time is halved and can exceed 100–200% if more aggressive tactics are taken (Fig. 2). GS in current tropical Eucalyptus breeding would reduce breeding cycle by at least 50% by eliminating the progeny trial and the primary clonal trial where a large number of trees selected only for growth traits are typically tested as clones. The net effect is increasing selection intensity not only for growth but for all wood properties simultaneously at the progeny level and anticipating the deployment of elite clonal material by 9 yr (Fig. 4). Using conservative calculations, for a progeny trial of 20 000 seedlings genotyped at $50–100 each, assuming that the GS selected clones would provide an increase of 1% in pulp yield 9 yr earlier than expected by conventional breeding, an economic return of at least 20 times on the investment made in GS is expected for a 500 000 ton yr)1 pulp mill operation. Breeding for multiple traits simultaneously by GS should be no more complicated than the process of multiple-trait selection in conventional breeding. GEBV for all traits would be aggregated into a single selection index and different multiple-trait selection methods applied on the genomic data. The number of fitted markers per trait also indicate that the use of low-density New Phytologist (2012) 194: 116–128 www.newphytologist.com

New Phytologist

126 Research

MATING Flowering and crosses: 3 yr

PROGENY TRIAL Field establishment and phenotyping : 5 yr

1st CLONAL TRIAL Rescue of selected trees, cloning and phenotyping: 5 yr

CONVENTIONAL Eucalyptus BREEDING:

MATING Flowering and crosses: 3 yr

DNA 1 yr

CLONAL TRIAL Cloning from juvenile seedlings and phenotype verificaƟon: 5 yr

GS-based Eucalyptus BREEDING: 9 YR

2nd CLONAL TRIAL Rescue of re-selected trees, cloning and phenotyping: 5 yr

18 YR Fig. 4 Eucalyptus breeding by genomic selection (GS) compared with conventional phenotypic selection. The avoidance of the progeny testing phase and the primary clonal trial reduces the current breeding cycle length by 9 yr. Highest ranked individual seedlings by a multitrait GS index would be immediately vegetatively propagated and established as clones in a verification clonal trial before recommendation for commercial deployment or use as parents for the next breeding cycle.

ELITE CLONES

TIME SAVINGS:

9 YR

marker panels could be a possible alternative to reduce genotyping costs (Habier et al., 2009). Current trends in genotyping-bysequencing technologies (Elshire et al., 2011), however, point to the possibility of increasing marker densities at the same cost per sample so that considerations about single data-point costs might no longer be an issue. Higher marker densities will also mitigate the decay of marker–trait LD as a result of recombination, providing satisfactory accuracies even when the training and selection populations are several generations apart (Meuwissen & Goddard, 2010) or when multipopulation predictive models are pursued (Hayes et al., 2009a,b). In conclusion, our experimental genomic-enabled predictions in two Eucalyptus breeding populations, together with those recently reported for loblolly pine (Resende et al., 2012), are very promising and should encourage additional assessments of GS in forest trees. We reiterate, however, our cautiously optimistic outlook about GS, given that some issues remain to be examined such as the accuracy of GS on selection candidates several generations ahead of the training set and the possibility of reduced long-term genetic gains as a result of a potentially faster build-up of inbreeding with GS when compared with conventional phenotypic selection (Grattapaglia & Resende, 2011). We predict, nevertheless, that reports on realized accuracies of operational GS in Eucalyptus breeding will likely follow in the next few years. While most technical aspects of GS in tree improvement will soon be covered and genotyping costs will become increasingly affordable, its adoption by breeders will depend to a large extent on a detailed financial evaluation of the achievable gains of deploying improved clones or seeds several years earlier, against the costs of genotyping and data handling of this new breeding technology. As with all innovations, this will only happen if one is willing to consider a new attitude, take some degree of risk and recognize that genomics may finally make its way into applied tree breeding.

Acknowledgements This work was supported by CNPq grant 577047 ⁄ 2008-6, PRONEX-FAP-DF grant ‘NEXTREE’ 2009 ⁄ 00106-8 and EMBRAPA Macroprogram 2 grant 02.07.01.004. C. P. S., C. D. P. and M. F. R. R. Jr were sponsored by graduate fellowships from New Phytologist (2012) 194: 116–128 www.newphytologist.com

ELITE CLONES

CAPES, and D. A. F. was sponsored by a postdoctoral fellowship from CNPq; M. D. V. R., G. J. P. Jr and D. G. have been awarded research fellowships from CNPq. We acknowledge the team at DArT Pty for their outstanding support to the genotyping work developed by C. P. S. and C. P. D. while there as part of their graduate training.

References Albrecht T, Wimmer V, Auinger HJ, Erbe M, Knaak C, Ouzunova M, Simianer H, Schon CC. 2011. Genome-based prediction of testcross values in maize. Theoretical and Applied Genetics 123: 339–350. Apiolaza LA, Raymond CA, Yeo BJ. 2005. Genetic variation of physical and chemical wood properties of Eucalyptus globulus. Silvae Genetica 54: 160–166. Bernardo R, Yu JM. 2007. Prospects for genomewide selection for quantitative traits in maize. Crop Science 47: 1082–1090. Brondani RP, Williams ER, Brondani C, Grattapaglia D. 2006. A microsatellite-based consensus linkage map for species of Eucalyptus and a novel set of 230 microsatellite markers for the genus. BMC Plant Biology 6: 20. Crossa J, de los Campos G, Perez P, Gianola D, Burgueno J, Araus JL, Makumbi D, Singh RP, Dreisigacker S, Yan JB et al. 2010. Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers. Genetics 186: 713–724. Daetwyler HD, Hickey JM, Henshall JM, Dominik S, Gredler B, van der Werf JHJ, Hayes BJ. 2010. Accuracy of estimated genomic breeding values for wool and meat traits in a multi-breed sheep population. Animal Production Science 50: 1004–1010. Dekkers JCM. 2007. Prediction of response to marker-assisted and genomic selection using selection index theory. Journal of Animal Breeding and Genetics 124: 331–341. Dillen S, Storme V, Marron N, Bastien C, Neyrinck S, Steenackers M, Ceulemans R, Boerjan W. 2009. Genomic regions involved in productivity of two interspecific poplar families in Europe. 1. Stem height, circumference and volume. Tree Genetics & Genomes 5: 147–164. Eathington SR, Crosbie TM, Edwards MD, Reiter R, Bull JK. 2007. Molecular markers in a commercial breeding program. Crop Science 47: S154–S163. Eckert AJ, van Heerwaarden J, Wegrzyn JL, Nelson CD, Ross-Ibarra J, Gonzalez-Martinez SC, Neale DB. 2010. Patterns of population structure and environmental associations to aridity across the range of loblolly pine (Pinus taeda L., Pinaceae). Genetics 185: 969–982. Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, Buckler ES, Mitchell SE. 2011. A robust, simple genotyping-by-sequencing (GbS) approach for high diversity species. PLoS ONE 6: e19379. Garrick DJ, Taylor JF, Fernando RL. 2009. Deregressing estimated breeding values and weighting information for genomic regression analyses. Genetics Selection Evolution 41: 55.  2012 The Authors New Phytologist  2012 New Phytologist Trust

New Phytologist Gianola D, de los Campos G, Hill WG, Manfredi E, Fernando R. 2009. Additive genetic variability and the Bayesian alphabet. Genetics 183: 347–363. Goddard M. 2009. Genomic selection: prediction of accuracy and maximisation of long term response. Genetica 136: 245–257. Goddard ME, Hayes BJ. 2009. Mapping genes for complex traits in domestic animals and their use in breeding programmes. Nature Reviews Genetics 10: 381–391. Gonzalez-Martinez SC, Wheeler NC, Ersoz E, Nelson CD, Neale DB. 2007. Association genetics in Pinus taeda L. I. Wood property traits. Genetics 175: 399–409. Grattapaglia D, Bertolucci FL, Sederoff RR. 1995. Genetic mapping of QTLs controlling vegetative propagation in Eucalyptus grandis and E. urophylla using a pseudo-testcross strategy and RAPD markers. Theoretical and Applied Genetics 90: 933–947. Grattapaglia D, Kirst M. 2008. Eucalyptus applied genomics: from gene sequences to breeding tools. New Phytologist 179: 911–929. Grattapaglia D, Plomion C, Kirst M, Sederoff RR. 2009. Genomics of growth traits in forest trees. Current Opinion in Plant Biology 12: 148–156. Grattapaglia D, Resende MDV. 2011. Genomic selection in forest tree breeding. Tree Genetics & Genomes 7: 241–255. Habier D, Fernando RL, Dekkers JCM. 2009. Genomic selection using lowdensity marker panels. Genetics 182: 343–353. Hamblin MT, Buckler ES, Jannink JL. 2011. Population genetics of genomicsbased crop improvement methods. Trends in Genetics 27: 98–106. Hayes B, Goddard ME. 2001. The distribution of the effects of genes affecting quantitative traits in livestock. Genetics Selection Evolution 33: 209–229. Hayes B, Goddard M. 2010. Genome-wide association and genomic selection in animal breeding. Genome 53: 876–883. Hayes BJ, Bowman PJ, Chamberlain AJ, Goddard ME. 2009b. Invited review: genomic selection in dairy cattle: progress and challenges. Journal of Dairy Science 92: 433–443. Hayes BJ, Bowman PJ, Chamberlain AC, Verbyla K, Goddard ME. 2009a. Accuracy of genomic breeding values in multi-breed dairy cattle populations. Genetics Selection Evolution 41: 51. Heffner EL, Lorenz AJ, Jannink JL, Sorrells ME. 2010. Plant breeding with genomic selection: gain per unit time and cost. Crop Science 50: 1681–1690. Heffner EL, Sorrells ME, Jannink JL. 2009. Genomic selection for crop improvement. Crop Science 49: 1–12. Iwata H, Hayashi T, Tsumura Y. 2011. Prospects for genomic selection in conifer breeding: a simulation study of Cryptomeria japonica. Tree Genetics & Genomes 7: 747–758. Jannink JL, Lorenz AJ, Iwata H. 2010. Genomic selection in plant breeding: from theory to practice. Briefings in Functional Genomics 9: 166–177. Jannink JL, Zhong SQ, Dekkers JCM, Fernando RL. 2009. Factors affecting accuracy from genomic selection in populations derived from multiple inbred lines: a barley case study. Genetics 182: 355–364. Lee SH, van der Werf JHJ, Hayes BJ, Goddard ME, Visscher PM. 2008. Predicting unobserved phenotypes for complex traits from whole-genome SNP data. Plos Genetics 4: e1000231. Legarra A, Robert-Granie C, Manfredi E, Elsen JM. 2008. Performance of genomic selection in mice. Genetics 180: 611–618. Lorenzana RE, Bernardo R. 2009. Accuracy of genotypic value predictions for marker-based selection in biparental plant populations. Theoretical and Applied Genetics 120: 151–161. Luan T, Woolliams JA, Lien S, Kent M, Svendsen M, Meuwissen THE. 2009. The accuracy of genomic selection in Norwegian red cattle assessed by crossvalidation. Genetics 183: 1119–1126. Lynch M, Walsh B. 1998. Genetics and analysis of quantitative traits. Sunderland, MA, USA: Sinauer Associates, Inc. Makowsky R, Pajewski NM, Klimentidis YC, Vazquez AI, Duarte CW, Allison DB, de los Campos G. 2011. Beyond missing heritability: prediction of complex traits. Plos Genetics 7: e1002051. Mamani EMC, Bueno NW, Faria DA, Guimaraes LMS, Lau D, Alfenas AC, Grattapaglia D. 2010. Positioning of the major locus for Puccinia psidii rust resistance (Ppr1) on the Eucalyptus reference map and its validation across unrelated pedigrees. Tree Genetics & Genomes 6: 953–962.

 2012 The Authors New Phytologist  2012 New Phytologist Trust

Research 127 Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR, Chakravarti A et al. 2009. Finding the missing heritability of complex diseases. Nature 461: 747–753. McKeand SE, Bridgwater FE. 1998. A strategy for the third breeding cycle of loblolly pine in the Southeastern US. Silvae Genetica 47: 223–234. Meuwissen T, Goddard M. 2010. Accurate prediction of genetic values for complex traits by whole-genome resequencing. Genetics 185: 623–631. Meuwissen TH, Hayes BJ, Goddard ME. 2001. Prediction of total genetic value using genome-wide dense marker maps. Genetics 157: 1819–1829. Moser G, Khatkar MS, Hayes BJ, Raadsma HW. 2010. Accuracy of direct genomic values in Holstein bulls and cows using subsets of SNP markers. Genetics Selection Evolution 42: 37. Muir WM. 2007. Comparison of genomic and traditional BLUP-estimated breeding value accuracy and selection response under alternative trait and genomic parameters. Journal of Animal Breeding and Genetics 124: 342–355. Namkoong G, Kang HC, Brouard JS. 1988. Tree breeding: principles and strategies. New York, NY, USA: Springer Verlag. Neale DB, Kremer A. 2011. Forest tree genomics: growing resources and applications. Nature Reviews Genetics 12: 111–122. Neale DB, Savolainen O. 2004. Association genetics of complex traits in conifers. Trends in Plant Science 9: 325–330. Rae A, Pinel M, Bastien C, Sabatti M, Street N, Tucker J, Dixon C, Marron N, Dillen S, Taylor G. 2008. QTL for yield in bioenergy Populus: identifying G·E interactions from growth at three contrasting sites. Tree Genetics & Genomes 4: 97–112. Raymond CA, Schimleck LR. 2002. Development of near infrared reflectance analysis calibrations for estimating genetic parameters for cellulose content in Eucalyptus globulus. Canadian Journal of Forest Research-Revue Canadienne De Recherche Forestiere 32: 170–176. Resende MDV, de Assis TF. 2008. Selec¸a˜o recorrente recı´proca entre populac¸o˜es sinte´ticas multi-espe´cies (SRR-PSME) de eucalipto. Pesquisa Florestal Brasileira 57: 57–60. Resende MDV, Lopes PS, Silva RL, Pires IL. 2008. Selec¸a˜o genoˆmica ampla (GWS) e maximizac¸a˜o da eficieˆncia do melhoramento gene´tico. Pesquisa Florestal Brasileira 56: 63–77. Resende MFRJ, Mun˜oz P, Acosta JJ, Peter GF, Davis JM, Grattapaglia D, Resende MDV, Kirst M. 2012. Accelerating the domestication of trees using genomic selection: accuracy of prediction models across ages and environments. New Phytologist 193: 617–624. Resende MDV, Oliveira EB. 1997. Sistema Selegen – Selec¸a˜o gene´tica computadorizada para o melhoramento de espe´cies perenes. Pesquisa Agropecuaria Brasileira 32: 931–939. Resende MDV, Resende MFRJ, Aguiar AM, Abad JIM, Missiaggia AA, Sansaloni CP, Petroli CD, Grattapaglia D. 2010. Computac¸a˜o da Selec¸a˜o Genoˆmica Ampla (GWS). Documentos EMBRAPA 210. Brasilia, Brazil: EMBRAPA. Rosado AM, Rosado TB, Resende MFR, Bhering LL, Cruz CD. 2009. Predicted genetic gains by various selection methods in Eucalyptus urophylla progenies. Pesquisa Agropecuaria Brasileira 44: 1653–1659. Sansaloni CP, Petroli CD, Carling J, Hudson CJ, Steane DA, Myburg AA, Grattapaglia D, Vaillancourt RE, Kilian A. 2010. A high-density Diversity Arrays Technology (DArT) microarray for genome-wide genotyping in Eucalyptus. Plant Methods 6: 16. Steane DA, Nicolle D, Sansaloni CP, Petroli CD, Carling J, Kilian A, Myburg AA, Grattapaglia D, Vaillancourt RE. 2011. Population genetic analysis and phylogeny reconstruction in Eucalyptus (Myrtaceae) using highthroughput, genome-wide genotyping. Molecular Phylogenetics and Evolution 59: 206–224. Strauss SH, Lande R, Namkoong G. 1992. Limitations of molecular-markeraided selection in forest tree breeding. Canadian Journal of Forest ResearchRevue Canadienne De Recherche Forestiere 22: 1050–1061. Sved JA. 1971. Linkage disequilibrium and homozygosity of chromosome segments in finite populations. Theoretical Population Biology 2: 125–141. Thumma BR, Matheson BA, Zhang DQ, Meeske C, Meder R, Downes GM, Southerton SG. 2009. Identification of a cis-acting regulatory polymorphism in a eucalypt COBRA-like gene affecting cellulose content. Genetics 183: 1153–1164.

New Phytologist (2012) 194: 116–128 www.newphytologist.com

New Phytologist

128 Research Ukrainetz N, Ritland K, Mansfield S. 2008. Identification of quantitative trait loci for wood quality and growth across eight full-sib coastal Douglas-fir families. Tree Genetics & Genomes 4: 159–170. VanRaden PM, Van Tassell CP, Wiggans GR, Sonstegard TS, Schnabel RD, Taylor JF, Schenkel FS. 2009. Invited review: reliability of genomic predictions for North American Holstein bulls. Journal of Dairy Science 92: 16–24. Wegrzyn JL, Eckert AJ, Choi M, Lee JM, Stanton BJ, Sykes R, Davis MF, Tsai CJ, Neale DB. 2010. Association genetics of traits controlling lignin and cellulose biosynthesis in black cottonwood (Populus trichocarpa, Salicaceae) secondary xylem. New Phytologist 188: 515–532. Wong CK, Bernardo R. 2008. Genomewide selection in oil palm: increasing selection gain per unit time and cost with small populations. Theoretical and Applied Genetics 116: 815–824. Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, Madden PA, Heath AC, Martin NG, Montgomery GW et al. 2010. Common SNPs explain a large proportion of the heritability for human height. Nature Genetics 42: 565–569.

Fig. S1 Properties of the genotyped DArT marker in the two breeding populations. Fig. S2 Physical coverage of the genome achieved by the genotyped DArT markers in the two breeding populations. Fig. S3 Scatter plots of the predictive ability of the GS models, that is, the correlation between observed and predicted breeding values obtained by cross-validation. Please note: Wiley-Blackwell are not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing material) should be directed to the New Phytologist Central Office.

Supporting Information Additional supporting information may be found in the online version of this article.

New Phytologist is an electronic (online-only) journal owned by the New Phytologist Trust, a not-for-profit organization dedicated to the promotion of plant science, facilitating projects from symposia to free access for our Tansley reviews. Regular papers, Letters, Research reviews, Rapid reports and both Modelling/Theory and Methods papers are encouraged. We are committed to rapid processing, from online submission through to publication ‘as ready’ via Early View – our average time to decision is