Comparative epigenomics reveals evolution of ... - Wiley Online Library

7 downloads 0 Views 2MB Size Report
Nov 27, 2017 - Keywords: comparative epigenomics, DNA methylation, duplicated genes, .... Here, we report the DNA methylome of potato at single- nucleotide ...... Transcriptome analysis of human tissues and cell lines reveals one domi-.
The Plant Journal (2018) 93, 460–471

doi: 10.1111/tpj.13790

Comparative epigenomics reveals evolution of duplicated genes in potato and tomato Lin Wang1, Jiahui Xie1, Jiantuan Hu1, Binyuan Lan1, Chenjiang You2,3, Fenglan Li4, Zhengjia Wang5 and Haifeng Wang1,* 1 State Key Laboratory of Ecological Pest Control for Fujian and Taiwan Crops, Fujian Agriculture and Forestry University, Fuzhou, Fujian 350002, China, 2 College of Life Sciences and Oceanography, Guangdong Provincial Key Laboratory for Plant Epigenetics, Shenzhen University, Shenzhen 518060, China, 3 Department of Botany and Plant Sciences, Institute of Integrative Genome Biology, University of California, Riverside, CA, 92521, USA, 4 College of Life Sciences, Northeast Agricultural University, Harbin 150030, China, and 5 State Key Laboratory of Subtropical Silviculture, Zhejiang Agriculture and Forestry University, Hangzhou, Zhejiang 311300, China Received 11 August 2017; revised 30 October 2017; accepted 21 November 2017; published online 27 November 2017. *For correspondence (e-mail [email protected]). L.W., J.X. and J.H. contributed equally to this article.

SUMMARY The evolution of duplicated genes after polyploidization has been the subject of many evolutionary biology studies. Potato (Solanum tuberosum) and tomato (Solanum lycopersicum) are the first two sequenced genomes of asterids, and share a common polyploidization event. However, the epigenetic role of DNA methylation on the evolution of duplicated genes derived from polyploidization is not fully understood. Here, we explore the role of the DNA methylation in the evolution of duplicated genes in potato and tomato. The overall levels of DNA methylation are different, although patterns of DNA methylation are similar in potato and tomato. Different types of duplicated genes can display different methylation patterns in potato and tomato. In addition, we found that differences in the methylation levels between duplicated genes were associated with gene expression divergence. In particular, for the majority of duplicated gene pairs, one copy is always hyper- or hypo-methylated compared with the other copy across different tomato fruit ripening stages, and these genes are enriched for specific function related to transcription factor (TF) activity. Furthermore, transcription of hundreds of duplicated TFs was shown to be regulated by DNA methylation during fruit ripening stages in tomato, some of which are well-known fruit ripening TFs. Taken together, our results support the notion that DNA methylation may facilitate divergent evolution of duplicated genes and play roles in important biological processes such as tomato fruit ripening. Keywords: comparative epigenomics, DNA methylation, duplicated genes, polyploidization, potato, tomato.

INTRODUCTION DNA methylation is an epigenetic mark that is stably inherited and widespread among eukaryotic species. It has been extensively studied and reported to play important roles in regulating gene expression and transposon silencing (Finnegan et al., 2000; Gehring and Henikoff, 2007). In animals, methylated cytosines are found predominantly in the CG context, but in plants non-CG (CHG or CHH, where H is A, C or T) methylations are also present and are associated with repression of transposable elements (Suzuki and Bird, 2008). In the model plant Arabidopsis, due to its compact 460

genome and rapid life cycle, many studies have been successfully carried out to characterize DNA methyltransferases and other factors involved in the different DNA methylation pathways (Law and Jacobsen, 2010). To date, DNA methylome studies in plants have explored the roles of methylation in development, tissue specificity, hybrid vigor, etc. (Zemach et al., 2010b; Chodavarapu et al., 2012; He et al., 2013; Kawakatsu et al., 2016; Satge et al., 2016). On the other hand, because DNA methylation is highly conserved among many plant lineages, it has been © 2017 The Authors The Plant Journal © 2017 John Wiley & Sons Ltd

DNA methylation roles on evolution of duplicates 461 suggested to play a possible role in the evolution of gene duplicates in cassava, soybean and rice (Wang et al., 2013, 2015; Kim et al., 2015). However, it is still unclear whether this role is specific in these species or could be rather general across higher plants. Gene duplication (GD) generates duplicated genes with initially redundant functions, and they might experience non-functionalization, sub-functionalization or neo-functionalization during subsequent evolution (Lynch and Conery, 2000). GD could often be the result of a whole-genome duplication (WGD, also referred to as polyploidization), which generates another copy of the entire set of chromosomes. Polyploidizations are widespread phenomena in plants, especially in angiosperms. Many polyploidization events have been found during the evolutionary history of many angiosperm lineages. For example, one polyploidization event occurred in the ancestor of core eudicots (Jiao et al., 2012). The ancestor of monocots also underwent one round of WGD (Tang et al., 2010; Jiao et al., 2014). Besides these early polyploidizations, many recent polyploidization events also have been identified in some plant lineages, such as Manihot esculenta, Gossypium raimondii, Malus domestica, Glycine max, Linum usitatissimum, Brassica rapa, Zea mays and Musa acuminate (Blanc and Wolfe, 2004; Schmutz et al., 2010; Velasco et al., 2010; Wang et al., 2011, 2012a,b, 2015; D’Hont et al., 2012; Vanneste et al., 2014; Van de Peer et al., 2017). After the polyploidization, the majority of duplicated genes are lost through non-functionalization. However, recent polyploidization of some species/lineages showed plenty of retained duplicated genes compared with ancient ones. For example, in soybean, ~75% of all genes were duplicated genes that resulted from two most recent polyploidizations. Most of them are CG-body methylated genes, and are often expressed higher than single-copy genes (Roulin et al., 2013; Kim et al., 2015). Moreover, it has been suggested that tissue-specific DNA methylation of duplicated genes correlated with tissue-specific expression from multiple human tissues, and DNA methylation was reduced along evolutionary time (Keller and Yi, 2014). A recent study in cassava showed that DNA methylation divergence of duplicated gene pairs was positively correlated with expression divergence, and the gene pairs with much divergence in both DNA methylation and expression are functionally enriched in specific categories (Wang et al., 2015). Nevertheless, these analyses from previous studies are either influenced by multiple rounds of different polyploidizations [e.g. soybean underwent two recent polyploidizations – one occurred at 13 MYA (million years ago) and the other one occurred at 53 MYA (Schmutz et al., 2010)] or too few number of duplicated genes retained for the study (only 1325 gene pairs were selected in human; Keller and Yi, 2014). Potato and tomato shared one recent polyploidization event at the ancestor of Solanum genus

(Potato Genome Sequencing et al., 2011; Tomato Genome, 2012). Although 67 million years have passed since this polyploidization in both species, many duplicated genes are still retained and can be easily identified in both genomes. Therefore, it presents an excellent opportunity to study the role of DNA methylation in the evolution of duplicated genes in close species without further influence from additional recent polyploidization(s). Here, we report the DNA methylome of potato at singlenucleotide resolution using whole-genome bisulfite sequencing. Comparison of DNA methylation patterns between duplicated genes of potato and tomato allowed us to further understand the evolution of duplicated genes in these two closely related species. DNA methylation divergence in both gene region and promoter was correlated with expression divergence between duplicated genes. Duplicated genes showing consistent methylation changes (increase or decrease) across different fruit ripening stages in tomato were enriched in specific functional classes, such as transcription factor (TF) activity, response to stress or stimulus. Furthermore, hundreds of duplicated TFs were also shown to be regulated by DNA methylation, many of which are well-studied TFs involved in fruit ripening in tomato. Collectively, these results suggest a critical role of DNA methylation in the evolution of duplicated genes, as well as in their transcriptional regulation during plant development. RESULTS Comparison of DNA methylation profiles between potato and tomato The methylome of tomato has been studied across different fruit ripening stages (Zhong et al., 2013), whereas the methylome of potato has not been reported so far. To this end, we performed whole-genome bisulfite sequencing (BS-seq) of potato in three biological replicates. In total, we generated 558 million 150-bp reads. We first tested the reproducibility of three replicates and found that the correlation coefficients between each pair are higher than 0.95, and the methylation pattern/level across gene/ transposon regions are indistinguishable from each other (Figure S1; Table S1), indicating very high reproducibility among them. Then, in order to obtain greater depth, we combined three replicates as in previous studies (Wang et al., 2015; Ausin et al., 2016), and the total coverage of the potato genome was 54-fold (Table S2), with approximately 80% of all cytosines covered by at least 4 reads (Figure S2). We compared our potato methylome with the methylomes of tomato of different fruit ripening stages from a published dataset (Zhong et al., 2013). The overall DNA methylation profiles of potato and tomato across 12 chromosomes are shown in Figure 1a. Consistent with the findings from other plant species (Zhong et al., 2013; Wang

© 2017 The Authors The Plant Journal © 2017 John Wiley & Sons Ltd, The Plant Journal, (2018), 93, 460–471

462 Lin Wang et al.

(a)

(b)

(c)

(f)

(d)

(g)

(e)

(h)

Figure 1. Characteristics of DNA methylomes of potato and tomato. (a) Circle plot of 12 chromosomes of potato and tomato. From outer to inner: TE density, gene density, and methylation level of CG, CHG and CHH. Blue color indicates high methylation level and high gene/TE density. Red color indicates low methylation level and low gene/TE density. Red lines indicate inter-species homologous gene pairs. Green lines indicate homologous gene pairs in tomato. Blue lines indicate homologous gene pairs in potato. (b) Comparison of DNA methylation levels between potato and tomato. Leaf, 17 days post-anthesis (DPA), 39 DPA, 42 DPA and 52 DPA was selected to represent tomato. DNA methylation patterns across genes (c–e) and TEs (f–h) in potato and tomato. In all cases, 2 kb indicates the upstream 2000 bp of transcriptional start sites (TSS), and 2 kb indicates the downstream 2000 bp of transcriptional end sites (TES). Upstream, gene body and downstream regions were divided into 20 proportionally sized bins to draw the metaplot.

et al., 2015), DNA methylation levels were positively correlated with TE density and negatively correlated with gene density. As shown in Figure 1a and b, tomato has higher methylation levels at CG and CHG (H = C, A or T) contexts, while asymmetric methylation (CHH) levels are higher in potato. This is also consistent with the comparison between potato and another tomato cultivar Micro-Tom (Figure S3). We also observed differences among different tissues of tomato and different stages of tomato fruit ripening. Generally, leaf tissue shows higher methylation levels than fruits at both CG and CHG contexts, but lower methylation levels at asymmetric context (Figure 1b). During tomato fruit development, CG methylation levels in early developmental stages tend to be higher than in late stages (Figures 1b and S4; Zhong et al., 2013; Lang et al., 2017). Consistent with the global methylation patterns, metaplot analyses showed that CG methylation is higher in tomato than in potato in both genes and TEs (Figure 1c and f), but it was the opposite for CHH methylation (Figure 1e and h). For CHG methylation, there was no significant difference between the two species in gene regions (Figure 1d and g).

We also found a small amount of non-CG methylation within genic regions in potato and tomato, but this was reduced after excluding those protein-coding genes with intronic transposons (Figure S5). Taken together, our results indicate that TE density plays important roles on shaping the landscape of DNA methylome (Wang et al., 2015). Evolution of the DNA methylation of homologous genes between potato and tomato The common ancestor of potato and tomato underwent a polyploidization about 67 MYA (Potato Genome Sequencing et al., 2011). This resulted in thousands of duplicated genes retained. As shown in Figure 1a, we identified plenty of homologous gene pairs between potato and tomato. In order to investigate the role of DNA methylation after the Solanum sp. polyploidization event in the evolution of these homologous duplicated genes, we selected 5464 and 6040 duplicated genes in potato and tomato as Solanum duplicates, respectively. We classified the rest of the genes into species-specific duplicates and single-copy genes

© 2017 The Authors The Plant Journal © 2017 John Wiley & Sons Ltd, The Plant Journal, (2018), 93, 460–471

DNA methylation roles on evolution of duplicates 463 shared between these two species. We calculated methylation divergence between pairs of orthologs and located them into the collinear regions. The relative DNA methylation divergence is evaluated by (Msl Mst)/(Msl + Mst), in which Msl indicates methylation of the gene in tomato and Mst indicates methylation of the ortholog in potato (Keller and Yi, 2014). We found that the relative methylation divergence of CG context is close to zero, which indicates DNA methylation levels in CG context (most of which is gene region methylation) are very similar for most orthologous duplicated genes (Figure 2h and i). However, non-CG methylation showed lower levels in tomato than in potato (Figures 2j and k, and S6b–d). Moreover, in contrast to CG and CHG, CHH methylation was much more divergent, when comparing each pair of duplicated genes between Potato

0.5 Tomato CG methylation

1

TES

–2 kb

TES

–2 kb

0.4

TSS

0.2

CHG methylation

–2 kb

–2 kb

0 TSS

TES

–2 kb

–2 kb

TSS

Species specific duplicates

Single copy

Potato CHH methylation 0.5 1

R = 0.29 Density

high

low

0

R = 0.41

0 0

–2 kb

(k)

Potato CHG methylation 0.5 1

1 0.5 0

Potato CG methylation

(j)

R = 0.67

TES

0 TES

Solanum WGD

Tomato

TSS

0.2

0.2

TSS

(g)

–2 kb

(i)

–2 kb

CHH methylation

(d)

0.4

CG methylation

0

0.4

CG methylation

0 0.4

–2 kb

0

0.2

TES

0

–1

TSS

(f)

–2 kb

0.1

Potato

0

Methylation divergence

1

CHH methylation

(h)

CHG methylation

–2 kb

(c)

Tomato

(e) 0.8

(b) 0.8

(a)

0.1

(Figure 2a). We observed that single-copy genes showed the highest methylation levels in all three sequence contexts in potato (Figure 2b–g), which is consistent with previous observations in B. rapa (Chen et al., 2015). Nevertheless, we did not observe it in tomato. While this phenomenon also exists in other closely related species such as soybean and common bean (Kim et al., 2015), methylation patterns of duplicates and singletons could differ even between very closely related species. To further understand methylation changes after polyploidization, we compared methylation status between orthologs in potato and tomato, as well as between paralogs in potato and tomato. Because potato and tomato diverged about 67 MYA (Figure S6A; Potato Genome Sequencing et al., 2011), there are many collinear regions

0

0.5 Tomato CHG methylation

1

0

0.5 Tomato CHH methylation

1

Figure 2. DNA methylation comparison of different gene categories. (a) Different types of duplicated genes in potato and tomato. DNA methylation patterns of different duplicated gene categories in potato (b–d) and tomato (e–g). Duplicated genes were classified into Solanum whole-genome duplication (WGD), species-specific duplicated genes and single-copy genes. (h) DNA methylation divergence of orthologous genes between potato and tomato across 12 chromosomes in CG context. Methylation divergence was calculated as (Sl St)/ (Sl + St), where Sl indicates methylation level of orthologs in tomato and St indicates methylation level of orthologs in potato. (i–k) DNA methylation correlation of orthologous gene pairs between potato and tomato in CG, CHG and CHH contexts.

© 2017 The Authors The Plant Journal © 2017 John Wiley & Sons Ltd, The Plant Journal, (2018), 93, 460–471

464 Lin Wang et al. tomato and potato, and the majority of tomato copies showed lower methylation levels than potato copies (Figure S6c and d), which is consistent with the observation that non-CG methylation is not conserved over evolutionary time across different species (Seymour et al., 2014; Niederhuth et al., 2016; Takuno et al., 2016). This suggests that DNA methylation of orthologous genes might have evolved differently in different species after polyploidization. Meanwhile, we also found that the DNA methylation divergences between duplicated genes were similar, independent of the syntenic block they are located in (Figures 2h, and S6b and c). Previous studies have demonstrated that CG gene region methylation was conserved across orthologs in plant species (Takuno and Gaut, 2013; Kim et al., 2015), which is consistent with our study of orthologs of potato and tomato. In addition, we found that CG gene region methylation was also conserved between paralogous gene pairs in potato and tomato (Figure S7a and b), and divergence in DNA methylation was similar for all three cytosine contexts in potato and tomato (Figure S7c and d). Taken together, these results demonstrated that DNA methylation might play important roles in the evolution of both orthologous and paralogous duplicates after polyploidization in both potato and tomato. Differential DNA methylation associated with transcription divergence between duplicated genes It has been well established that DNA methylation can regulate gene expression. Promoter methylation is often

Gene region methylation

(b)

Promoter methylation

Gene region methylation

(d)

Promoter methylation

Figure 3. Association of DNA methylation divergence with expression divergence between duplicate genes. DNA methylation is separated into gene region methylation and promoter methylation. Pearson correlation coefficients are 0.09, 0.07, 0.16 and 0.17 of (a), (b), (c) and (d), respectively.

Tomato

(a)

associated with downstream gene repression, while CG gene region methylation is positively correlated with gene expression (Zemach et al., 2010a; Wang et al., 2015). To assess the correlation between DNA methylation and transcription between duplicated genes, we characterized the potato transcriptome by performing RNA-seq from the same tissues used for BS-seq, and downloaded RNA-seq datasets of tomato of different fruit ripening stages from the aforementioned tomato methylome study (Table S3; Zhong et al., 2013). Similar to the method for calculating relative DNA methylation divergence, we calculated the expression divergence between duplicates as (E1 E2)/ (E1 + E2). Then, we performed a correlation analysis of DNA methylation divergence and expression divergence between duplicated genes in tomato and potato, respectively. In this analysis, we classified DNA methylation into gene region and promoter methylation, as they have different biological functions. As shown in Figure 3, both gene region and promoter methylation showed a weakly negative correlation with expression divergence at the genomewide scale. For example, in the promoter regions of tomato leaf tissues, the Pearson correlation coefficient is 0.07, although it is statistically significant (P < 2.3 9 10 5). We found that most of the duplicated gene pairs showed similar methylation levels (methylation divergence close to zero) but, for other duplicated gene pairs, the hyper-methylated copy tends to be expressed at a lower level, while the hypo-methylated copy shows higher levels of expression. This pattern is consistent across potato and tomato fruit ripening stages (Figures 3c and d, and S8).

high

Density

Potato

(c)

low

© 2017 The Authors The Plant Journal © 2017 John Wiley & Sons Ltd, The Plant Journal, (2018), 93, 460–471

DNA methylation roles on evolution of duplicates 465 negative correlations with methylation divergence (P < 0.01, Wilcoxon rank sum test), and no significant correlation was found for CG gene region, which is inconsistent with the observation in cassava (Figure 4c–e). Intriguingly, Figure 4a and c showed different results for CG methylated gene regions. High expression copies tend to be hypo-methylated in comparison with low expression copies in Figure 4a, but this correlation is not significant in Figure 4c. The difference between them is that one (Figure 4a) is done by calculating methylation levels in the gene region, and the other one (Figure 4c) is done by calculating average methylation levels. This suggests that while the role of CG gene region methylation is still elusive, transcription and CG gene region methylation are possibly associated with each other to a certain extent. Taken together, these results revealed species-specific roles of DNA methylation in divergent expression between duplicated genes.

To test the role of methylation changes in gene expression, we defined a highly expressed gene and a lowly expressed gene from the two genes in each duplicated pair in tomato using a twofold cutoff of the difference of expression between the ‘highly’ and ‘lowly’ expressed partners. We observed that lowly expressed genes were clearly hyper-methylated in CG (P < 0.05, Wilcoxon rank sum test), CHG (P < 0.01, Wilcoxon rank sum test) and CHH (P < 0.01, Wilcoxon rank sum test; Figure 4a). To confirm this finding, we did the analysis in the opposite direction. We first divided duplicated genes into a hypermethylated copy and a hypo-methylated copy when the absolute methylation difference of a given gene pair exceeded a certain cutoff (> 0.6 for CG, > 0.4 for CHG and > 0.1 for CHH), and then compared transcription levels. Consistent with the observation in Figure 4a, we found that hyper-methylated copies tend to be expressed at a lower level than hypo-methylated copies (Figure 4b). Our previous study in cassava showed that CG gene region methylation changes are positively correlated with gene expression divergence between duplicated genes, but not with other sequence contexts or regions (Wang et al., 2015). In order to test whether this phenomenon is also present in tomato, we performed a similar analysis; however, the results were very different. Non-CG methylation of both gene body and flanking regions showed

(b)

60

FPKM

0.4

**

20

40

0.8 0.6

Methylation level

0.2

**

**

**

80

1.0

*

DNA methylation plays an important role in the regulation of plant development (Gehring and Henikoff, 2007; Pikaard and Mittelsten Scheid, 2014). We set out to test if DNA methylation could have a possible role in the regulation of duplicated genes across different developmental stages.

100

(a)

DNA methylation of duplicated genes across developmental stages

0

0.0

** high

low

high

low

high

Highly expressed genes

hyper hypo hyper hypo hyper hypo

low

Lowly expressed genes

CHG

CHH

CG

0.1

0.2

CHH methylation

**

TES

2 kb

–2 kb

**

0

0.1

CHG methylation TSS

** **

**

**

0

0.4 0 –2 kb

(e)

*

** CG methylation

CHH

(d)

0.8

(c)

CHG

0.05

CG

TSS

TES

2 kb

–2 kb

TSS

TES

2 kb

Figure 4. Relationship between DNA methylation and expression of duplicated genes. (a) Differentially expressed duplicated genes showing different methylation. (b) Differentially methylated duplicated genes showing different expression. (c–e) Metaplots of average DNA methylation of duplicated protein-coding genes. Duplicated gene pairs were classified into a highly expressed gene and a lowly expressed gene if the expression difference between a given pair is more than twofold. Highly expressed genes were indicated by a blue line; lowly expressed genes were indicated by a red line. The P-value was calculated by Wilcoxon rank sum test. *P < 0.05; **P < 0.01.

© 2017 The Authors The Plant Journal © 2017 John Wiley & Sons Ltd, The Plant Journal, (2018), 93, 460–471

466 Lin Wang et al. First, we explored DNA methylation differences between duplicated genes across tomato fruit developmental stages [17 days post-anthesis (DPA), 39 DPA, 42 DPA and 52 DPA] and leaf tissue (Zhong et al., 2013). Intriguingly, we found that most of the duplicated genes (3252 out of 4060, 80%) showed consistent CG methylation levels across different stages and tissues, in which one copy always stays hyper-/ hypo-methylated compared with the other copy across different developmental stages and tissues (Figure 5a). This can also be observed for non-CG DNA methylations (Figures 5b and c, and S9). Among three DNA methylation contexts, plenty of duplicated genes (49.5%) showed consistent CG and non-CG methylation levels (Figure 5d). Therefore, it is possible that duplicated genes were regulated by DNA methylation across developmental stages and tissues, and both CG and non-CG methylation play potentially important roles. Furthermore, functional categories from these duplicated genes with consistent CG methylation differences were examined by Gene Ontology (GO) term enrichment analysis. The most significant terms were functional categories involved in TF activity, chromatin/nucleosome assembly and response to chemical stimulus (Figure 5e; Table S4). Some of these categories were also enriched in duplicated genes with consistent DNA methylation divergence in the non-CG contexts (Figure S9b; Tables S5 and S6). These results suggest that duplicated genes involved in these specific functional categories might be preferentially retained after polyploidization, and have undergone different DNA methylation regulation across different tissues and developmental stages. Duplicated TFs involving fruit ripening were regulated by DNA methylation It has been previously reported that DNA methylation could contribute to the regulation of fruit development, such as fruit ripening and size (Manning et al., 2006; Zhong et al., 2013; Liu et al., 2015; Gallusci et al., 2016; Daccord et al., 2017). Zhong and colleagues found that methyltransferase inhibitor 5-azacytidine induced tomato fruit ripening prematurely (Manning et al., 2006; Zhong et al., 2013). Furthermore, it also has been shown that TFs, such as MADSbox, SPL and other types, play important roles during the fruit ripening process (Manning et al., 2006). These previous findings led us to investigate the possible role of DNA methylation in the regulation of TFs across fruit development. We first identified differentially expressed TF genes (DEGs), and found 941 out of 1846 TFs were differentially expressed across different developmental stages (Figure 6a). Among these differentially expressed TFs, several families can be identified, i.e. bHLH, MYB, ERF, bZIP, WRKY, etc. DNA methylation changes were not obvious across different stages even when all three methylation contexts were taken into account among these differentially

expressed TFs (Figure 6b). TFs of the same family are considered to be derived from the same ancestor by GD, through either large-scale duplication (polyploidization) or small-scale ones, such as single-GD. As a result, some TF families contain hundreds of members. To examine the specific roles of DNA methylation on the expression of duplicated TFs across different fruit ripening stages, we defined TF pairs as differentially methylated gene pairs (DMGs) when their absolute methylation difference was equal or higher than the cutoff (> 0.6 for CG, > 0.4 for CHG and > 0.1 for CHH) and compared their expression changes. Thirty-four TF pairs were identified as both DMGs and DEGs (Figure 6c), indicating that the expression of these TF pairs could be regulated by differential DNA methylation during the fruit development (Table S7). Among these TF pairs, several are known to play crucial roles in regulating tomato fruit ripening, such as RIN, NOR and MADS1 (Karlova et al., 2014). Other well-studied TFs such as CNR, HB-1, NAC4, AP2 and TAGL1 could also be recovered in this analysis by decreasing our DMG calling stringency. One example is RIPENING-INHIBITOR (RIN), a key player for tomato fruit ripening, and its partner (Solyc02 g089200). This pair of TF genes displayed both differential methylation and expression patterns across fruit development stages (Figure 6d and e). The expression of RIN is likely to be regulated by CG methylation, as shown in Figure 6d. RIN showed progressive increase of expression during fruit development, which was coincidental with a decrease in gene region methylation. On the other hand, Solyc02g089200 was stably expressed at low levels, and no significant changes in CG methylation were detected across developmental stages (Figure 6e). These data indicate that methylation divergence between duplicated TFs during fruit ripening might be correlated with expression difference and regulating tomato fruit ripening process. DISCUSSION In this study we compared the methylomes of potato and tomato, and found that the DNA methylation levels can vary greatly between these two closely related species at all three cytosine contexts, which is very similar to what has been reported when DNA methylations of soybean and common bean were compared (Kim et al., 2015; Niederhuth et al., 2016). Overall, the level of DNA methylation is lower in potato than in tomato. In particular, CG and CHG methylation levels of both potato genes and TEs were lower than that of tomato, but CHH methylation level was higher in potato. Moreover, because potato and tomato shared a common polyploidization before their divergence and there is no evidence of any more recent independent polyploidization in each lineage, it provides us with a good opportunity to understand the dynamics of DNA methylation divergence of duplicated genes resulting from this

© 2017 The Authors The Plant Journal © 2017 John Wiley & Sons Ltd, The Plant Journal, (2018), 93, 460–471

DNA methylation roles on evolution of duplicates 467 (a)

–1

CG

le

(d)

(b)

Methylation difference 1

af PA PA PA PA D D D D 17 39 42 52

CG

567 (15.1%)

CHG –0.5>

Methylation difference >0.5

le

Methylation difference >0.2

CHH –0.2>

af PA PA PA PA D D D D 17 39 42 52

CHG 527 (14%)

(c)

le

af PA PA PA PA D D D D 17 39 42 52

(e) 1

2

3

4

>5

–log10 (P value)

90 (2.4%)

1856 (49.5%)

302 (8%)

91 (2.4%)

319 (8.5%)

CHH

0

20

40

60

80

100

Number of genes

Figure 5. Consistent patterns of DNA methylation in duplicated genes across developmental stages. (a–c) Patterns of DNA methylation differences across leaf and various fruit developmental stages in CG, CHG and CHH contexts. (d) Venn diagram showing the number of duplicated gene pairs with consistent methylation difference across stages/tissues in CG, CHG and CHH contexts. (e) Enrichment of Gene Ontology (GO) categories of duplicated genes with consistent methylation difference across stages/tissues in CG context. Methylation difference was measured by M1 M2, M1 indicates DNA methylation of one copy, and M2 indicates DNA methylation of the other copy.

polyploidization. We investigated the divergent patterns between Solanum-derived duplicated genes and speciesspecific duplicated genes in comparison with single-copy genes. Single-copy genes showed higher levels of methylation in gene regions in potato, but this phenomenon was not observed in tomato. When soybean and common bean were compared, single-copy genes were densely methylated at CG context in common bean, but not in soybean (Kim et al., 2015). It suggests that divergence in DNA methylation after GDs is different from species to species, and there is not a general pattern. Gene region methylation is largely conserved in many plants, such as rice, Brachypodium distachyon, soybean and common bean, and it has been shown to be conserved between orthologs (Takuno and Gaut, 2013; Niederhuth et al., 2016). Here, we also observed that gene region methylation is very much conserved in orthologs between

potato and tomato, independently of collinear specificity. Besides, paralogs were also conserved in gene region methylation, even though not to the same extent as orthologs. While methylation at promoter regions generally causes transcriptional gene silencing, in the case of gene region methylation its association with transcription is still not completely clear (Bewick et al., 2016; O’Malley et al., 2016; Lang et al., 2017). In fact, many previous studies showed that there is a positive correlation between gene region methylation and transcription, but this is not the case for every gene, and in some cases heavy gene region methylation is associated with low transcriptional levels. In addition, Eutrema salsugineum and Conringia planisiliqua were found to have lost gene region methylation and CMT3 orthologs, indicating that gene region methylation is dispensable for normal transcription (Bewick et al., 2016). Our data indeed support a negative association between

© 2017 The Authors The Plant Journal © 2017 John Wiley & Sons Ltd, The Plant Journal, (2018), 93, 460–471

468 Lin Wang et al.

(a)

(b)

(c)

(e)

(d)

Figure 6. DNA methylation analysis of differentially expressed transcription factors (TFs). (a) Hierarchical clustering of 941 differentially expressed genes (DEGs) encoding TFs across different developmental stages. (b) Methylation clustering of 941 differentially expressed TF genes across different developmental stages. (c) Venn diagram of the number of duplicated TF pairs showing different expression and different methylation in four fruit development stages. (d and e) Methylation and expression patterns of RIN and its copy across different developmental stages.

gene region methylation and duplicated genes expression. When looking at duplicated genes, in several cases the gene showing higher expression is also the gene showing lower gene region methylation levels and vice versa. Based on these results, we hypothesize that differences in DNA methylation of duplicated genes could affect gene expression, and therefore might contribute to functional divergence. It is interesting to note that the majority of duplicated genes show consistent methylation patterns across different developmental stages, in which one copy

always displays higher methylation levels compared with the other copy. This is similar to previous studies showing that one alternative spliced isoform is always expressed at a higher level compared with the other isoforms across different tissues (Gonzalez-Porta et al., 2013; Vitulo et al., 2014; Thatcher et al., 2016). Both GD and alternative splicing (AS) could contribute to the diversity of the protein repertoire (Lynch and Conery, 2000; Graveley, 2001). Several previous studies have also explored the correlation between AS and GD (Kopelman et al., 2005; Talavera et al.,

© 2017 The Authors The Plant Journal © 2017 John Wiley & Sons Ltd, The Plant Journal, (2018), 93, 460–471

DNA methylation roles on evolution of duplicates 469 2007). Intriguingly, some cases have shown that duplicated genes in one species are very similar to alternative spliced isoforms in another species (Pacheco et al., 2004). Meanwhile, recent studies have shown that large gene families are lacking in splicing isoforms and vice versa (Su et al., 2006). Furthermore, there is evidence showing roles of epigenetic modification in AS (Gelfman et al., 2013; Lev Maor et al., 2015; Wang et al., 2016). All these suggest the complex regulatory network between AS and GD by epigenetic modification. Extensive efforts have been made to illustrate the mechanism of fruit ripening, including its regulation by TFs and hormone ethylene pathway (Barry and Giovannoni, 2006; Kevany et al., 2007). For example, TFs such as RIN, NOR, MADS1 and others have been shown to be required for normal fruit ripening (Karlova et al., 2014). Our results suggested a possible role of DNA methylation in the expression regulation of duplicated TFs during fruit ripening stages. This might reveal a new aspect of epigenetic modulation of the expression of TFs during fruit ripening stages. EXPERIMENTAL PROCEDURES Library construction and sequencing DNA was extracted from tuber tissue of potato (Solanum tuberosum cultivar pacific), and BS-seq libraries were prepared using the TruSeq Nano DNA LT kit (Illumina), as described previously (Wang et al., 2015). Three libraries were sequenced on a HiSeq 2500 system (Illumina) to obtain paired-end 150-bp reads per the manufacturer’s instructions. Total RNA was extracted from the same tissue as BS-seq libraries, and RNA-seq libraries were prepared using TruSeq preparation kit with polyA mRNA selection, as per the manufacturer’s instructions (Illumina). Three libraries were pooled and sequenced for paired-end 150-bp reads using an Illumina HiSeq 2500 system.

BS-seq analysis For tomato BS-seq data (AC cultivar), we used public data (SRA accession SRR046092) from leaf and fruit of different ripening stages, for example, 17 DPA (immature), 39 DPA (mature green), 42 DPA (breaker) and 52 DPA (red ripe). For tomato BS-seq data (Micro-Tom cultivar), we obtained data from GEO database (GSE94903). Analyses of both potato and tomato BS-seq libraries were performed by BSMAPv2.90 (Xi and Li, 2009), keeping only uniquely mapped reads, excluding redundant polymerase chain reaction products, and allowing 4 mismatches per 100-bp read length. Methylation levels were calculated as weighted methylation levels (Schultz et al., 2012). Gene region methylation levels were defined as the entire gene, including both exon and intron regions. Differentially methylated regions were defined as previously described (Ausin et al., 2016). Cutoffs of methylation difference for defining DMGs were >0.6 for CG, >0.4 for CHG and >0.1 for CHH in gene body, which is similar to a previous study (Song et al., 2015). Gene region methylation and promoter methylation in Figure 3 contain only methylations in CG context, because non-CG methylations were already well known to play repressive roles in transcription (Law and Jacobsen, 2010).

Defining gene region methylation Different to previous defining of gene body methylation (Takuno and Gaut, 2012; Niederhuth et al., 2016), where they classified gene body methylation (gbM) genes as those genes only methylated in CG context, but not in non-CG contexts, here, gene region methylation of each gene was qualified from the transcription start site to the transcription terminal site. We did not classify genes into gbM genes or non-gbM genes as we classified duplicated genes into high-expression and low-expression groups of genes, and studied the methylation divergence between them.

RNA-seq data analysis For tomato RNA-seq data, we used public data (SRR404309 and SRR404309) from leaf and different developmental stages of fruit ripening, which correspond to the BS-seq libraries from SRA046132. RNA-seq reads of tomato were aligned to tomato reference genome v2.5 by Tophat v2.1.0 (Trapnell et al., 2009), with default parameters and keeping only uniquely mapped alignment. Total clean reads of potato were aligned to potato reference genome v4.03 by Tophat v2.1.0, with the same parameters as those for tomato. Expression values were quantified by Cufflinks (Trapnell et al., 2012) with FPKM (fragments per kilobase per million mapped reads). DEGs were identified by Cuffdiff v2.2.1 with default parameters.

Identification of Solanum-lineage duplicated genes and species-specific duplicated genes We first identified homologs of Solanum lycopersicum, S. tuberosum, Mimulus guttatus, Arabidopsis thaliana and Prunus persica by performing all-against-all BLASTP with e-value cutoff as 10 5. Then, Inparanoid was employed to filter out matches exhibiting poor global similarity (lower than 50%) or gene coverage (