Backup in gene regulatory networks explains ... - Semantic Scholar

1 downloads 0 Views 642KB Size Report
Hollenhorst PC, Pietz G, Fox CA (2001) Mechanisms controlling differential ... Lee TI, Rinaldi NJ, Robert F, Odom DT, Bar-Joseph Z, Gerber GK, Hannett NM, ...
Backup in gene regulatory networks explains differences between binding and knockout results Supplementary Information Supplementary Results

2

Supplementary Methods

19

Supplementary References

23

Supplementary Tables XIV and XV

.xls files

1

Backup in gene regulatory networks explains differences between binding and knockout results Supplementary Results P-value threshold analysis in cleaned data set The rank-based analysis described in the main text used the cleaned version of the binding and knockout data in which general KO genes were removed. The range of p-values of the significance of the overlap for the k top ranked genes is shown in Supplementary Figure 1 for k from 1 to 1000. k = 56 interactions per gene yields the most significant overlap, which is close to the average number of significant interactions per gene (60.9 binding interactions and 49.2 knockout interactions) when using a p-value threshold of 0.005 to identify significant genes. We also used the top ranked genes to define significant interactions in one dataset while using a pvalue threshold of 0.005 in the other (Supplementary Figure 1). Although for the best choice of k this yielded a more significant overlap than when the rank-based method was applied to both datasets, a p-value threshold for both datasets still presented a much more significant overlap. Supplementary Figure 1. Threshold-based versus rank-based selection of significant interactions. A comparison of the significance of the overlap when using p-value, rank, or a combination to select binding and knockout interactions as significant. For the threshold-based selection of significant targets, a p-value of 0.005 was used. The “Both threshold” method does not vary because it is not dependent on k.

2

The alternate binding dataset, in which all interactions are supported by sequence conservation data, reported whether an interaction had a p-value less than 0.001 or less than 0.005 but did not provide the exact p-value. Therefore, it was not possible to analyze the effects of the p-value cutoff in this dataset (as in Figure 1 of the main text) so instead we compared the overlaps at the 0.001 and 0.005 p-value thresholds. We observed similar trends in the significance of the overlap for both p-values. However, the relaxed threshold yielded higher percent overlaps (Supplementary Table I). Supplementary Table I. P-value cutoff of 0.005 versus 0.001. The overlap and its significance when using a p-value threshold of 0.005 or 0.001. Binding dataset

Original Remove general KO Conservation Conservation, remove general KO

p-value threshold 0.005 p-value Binding Knockout overlap overlap (%) (%) 10-114 3.87 3.52 -124 10 3.44 4.26 -122 10 7.18 3.69 10-132 6.72 4.48

p-value threshold 0.001 p-value Binding Knockout overlap overlap (%) (%) 10-102 4.07 2.96 -109 10 3.64 3.83 -85 10 5.79 3.33 10-91 5.37 4.16

To test the hypothesis that there may be better overlaps among the most significant interactions for each gene, we identified the most significant TF-gene interaction for every gene (including the general KO genes). However, the overlap and its significance were much lower for this subset of interactions than they were for the full datasets (Supplementary Table II). Similarly, we identified the subset of genes that were only bound by a single TF. While such genes were thought to be more likely to be affected by the knockout of that TF, our results do not support this claim (Supplementary Table II). Supplementary Table II. Most significant interaction per TF and genes affected by a single TF. The overlap and its significance when examining only subsets of interactions that were believed to be of higher quality. Subset of the data used

p-value

Most significant TF-gene interaction Affected by one TF, p-value = 0.005 Affected by one TF, p-value = 0.001 Affected by one TF, p-value = 0.0001

10-13.4 10-5.6 10-4.3 10-6.2

Binding overlap (%) 1.08 0.92 0.85 1.60

Knockout overlap (%) 1.52 1.21 1.22 1.97

Non-parametric tests We divided each dataset, using the versions without general KO genes, into two sets of TF-gene interactions based on their enrichment as significant targets in the other dataset. For example, in the last row of Supplementary Table III we created one subset of 2955 binding interactions for which the corresponding TF-gene knockout effect has a p-value ≤ 0.0001 and another set of 1095717 binding interactions whose knockout counterpart is not significant at that p-value. The 3

remaining binding interactions, whose gene targets are not present in the knockout experiments, are ignored. We used the non-parametric Mann-Whitney U test, also known as the Wilcoxon rank-sum test, to determine that the two sets of binding interactions do not come from identical distributions with equal medians (p-value 7.84*E-25). As seen in Supplementary Table III, for a range of interaction p-value thresholds, the p-value of the U test shows that the two sets of interactions do not come from identical distributions. This confirms that there is a strong dependence between the significant binding interactions and knockout effects and that this relationship is more prevalent at stricter p-values. Supplementary Table III. Mann-Whitney U test. Threshold dataset is the dataset to which the p-value threshold was applied to define a set of significant gene targets. The matching and non-matching targets are the number of targets in the other dataset that are or are not significant at that p-value. The U test p-value is the p-value for the rejection of the null hypothesis that the two sets if interactions are from identical distributions with equal medians. Threshold p-value Matching targets dataset threshold in other dataset Binding Binding Binding Binding Knockout Knockout Knockout Knockout

0.01 0.005 0.001 0.0001 0.01 0.005 0.001 0.0001

16343 10857 5152 2319 12367 9048 5099 2955

Non-matching targets in other dataset 1082329 1087815 1093520 1096353 1086305 1089624 1093573 1095717

U test p-value 3.85*E-20 4.51*E-26 2.20*E-30 5.58*E-24 1.07*E-02 3.42*E-07 8.51*E-14 7.84*E-25

Removal of head-to-head genes In the ChIP-chip study, whenever a TF bound between two head-to-head (divergent) genes it was deemed to interact with both of them. In reality, the TF may only functionally affect one of the two genes, which lowers the binding overlap because the TF was assigned to bind the other gene as well, and that gene is unlikely to be affected by the knockout of the TF. To address this issue, we removed all ~2000 head-to-head genes from the data and calculated the overlap (Supplementary Table IV). While the binding overlap did show some improvement as expected (4.2%), it was not significant enough to justify ignoring more than one-third of all yeast genes (p-value of 10-73 compared to an original p-value of 10-114). As Supplementary Table IV shows, the trend in overlaps when only using non-head-to-head genes is in agreement with the results using all genes.

4

Supplementary Table IV. Overlap after head-to-head gene removal. A comparison of overlaps of the original set of genes and a reduced set where all head-to-head genes have been removed. In all cases, a p-value threshold of 0.005 was used to select significant interactions. p-value Original binding data Remove general KO Conservation Conservation, remove general KO

Normal No head-to-head Normal No head-to-head Normal No head-to-head Normal No head-to-head

10-113.7 10-73.3 10-124.0 10-79.9 10-122.2 10-88.3 10-132.6 10-96.4

Binding Knockout overlap (%) overlap (%) 3.87 3.52 4.17 3.37 3.44 4.26 3.58 4.20 7.18 3.69 8.66 3.48 6.72 4.48 8.02 4.42

Noise filtered binding data Garten et al. (Garten et al, 2005) designed four methods to filter noise from the binding data and assert that a TF-gene interaction is reported correctly. We examined the overlap in two resulting datasets, one which required verification from any of the four methods (union) and one which required support from all methods (intersection). In both cases, the binding overlap increases as expected, but the knockout interactions are not as well explained so the knockout overlap decreases (Supplementary Table V). These effects are more exaggerated in the intersection dataset because it has higher confidence interactions but contains roughly 1/3 of the original interactions and 1/2 of the union dataset’s interactions. The binding data to which the noise filtering techniques were applied was an earlier version of the binding data we use, and thus only contained interactions for 113 TFs of which 104 were also present in the knockout data. Although the binding overlap does improve in the noise-filtered dataset, the absolute number of interactions in the overlap is small and it is based on older binding data (Lee et al, 2002). Therefore, we did not conduct further analysis with this dataset. Supplementary Table V. Noise filtered binding data overlap. The overlaps for the noise filtered binding data alongside the overlap in the original data for the same 104 TFs. The union and intersection datasets were binary and did not rely on p-values, but a p-value threshold of 0.005 was used for the knockout data and the original binding dataset. Binding dataset Original Noise filtered – union Noise filtered - intersection

p-value Binding overlap (%) Knockout overlap (%) 10-86 4.57 4.10 -99 10 6.78 3.01 -62 10 9.61 1.37

5

Synthetic lethal pairs Synthetic lethal interactions may provide one explanation for why a TF that binds a gene may not cause a change in that gene’s expression when knocked out. Even though a single knockout does not produce a change downstream, a double knockout may sever two redundant pathways. Double knockout experiments with two TFs are still relatively scarce, but we compiled a list of 33 TF-TF pairs (Pan et al, 2006; Reguly et al, 2006) that exhibited synthetic lethal interactions or synthetic growth defects. However, the subset of TFs involved in synthetic lethal interactions showed greater binding and knockout overlap than the set of all TFs (Supplementary Table VI). The percentage of binding interactions that had corresponding knockout effects was particularly high; the exact opposite of the expected effect. However, because of the shortage of TF-TF double knockout data, this result is not sufficient to conclude that synthetic lethality does not contribute to the low overlap in general. It is possible that the pairs of TFs selected in these initial studies are biased, and that as more experimental data is made available the trend will reverse. Supplementary Table VI. TFs involved in synthetic lethal interactions. The overlaps of TFs involved in synthetic lethal interactions versus the overlaps of all TFs. Because there are fewer TFs, the overlaps of the subsets of TFs with a synthetic lethal partner are less significant even when they have larger binding and knockout overlap. All TFs Binding dataset

p-value

Original Remove general KO Conservation, remove general KO

10-114 10-124 10-132

Binding overlap (%) 3.87 3.44 6.72

TFs with synthetic lethal partner Knockout p-value Binding Knockout overlap overlap overlap (%) (%) (%) 3.52 10-70 7.52 4.76 -79 4.26 10 7.22 5.54 -95 4.48 10 11.78 6.01

Benefits of conserved binding dataset are not due to smaller set of TFs Because many TFs do not have any binding targets supported by a conserved sequence motif, the overlaps calculated using the alternate binding dataset involve only about half of the original TFs (97 of 188). As a control, we verified that the improved overlap with the conserved binding targets is due to the higher quality of binding interactions rather than the change in TFs examined by calculating the overlap for these TFs using the original binding data (Supplementary Figure 2). Surprisingly, one of the TFs present in Harbison et al.’s conserved binding dataset, UME1, is not in their original binding dataset. Therefore, the overlap below is calculated from the 96 TFs included in both binding datasets.

6

Supplementary Figure 2. Overlap for the TFs with binding targets supported by conserved motifs. For the 96 TFs present in the original binding dataset, the binding dataset supported by conserved motifs, and the knockout data, the binding overlap and significance of the overlap is higher when using the conserved binding data. This indicates that the conserved binding targets are more likely to be functional than the set of all binding targets. Furthermore, it greatly reduces the possibility that the conserved binding data yields a more significant overlap due to some inherent properties of the smaller set of TFs. 10.00%

Binding overlap Knockout overlap

10 -122

8.00% 6.00%

10 -102

4.00% 2.00%

0.00% Original

Conservation

Multiple levels of binding data conservation criteria The filtered binding data that contained only interactions supported by sequence conservation was available for multiple levels of conservation strictness. We chose the strictest version, which required conservation in at least two other species of yeast, for the results reported in the main text. However, similar trends were observed in least strict version, which did not explicitly enforce conservation in other species for all binding interactions but did require presence of a sequence motif (Supplementary Figure 3).

7

Supplementary Figure 3. Improved overlap between binding and knockout experiments for different sequence conservation criteria. The same trends observed in Figure 2 hold when using the binding data with less strict conservation criteria. The significance of the overlap of the cleaned datasets is higher with this dataset than it is for the strictest binding data, but the binding overlap of TFs without a paralog is lower (10.4% compared to 12.6%). 14.00%

12.00% Binding overlap

8.00%

Knockout overlap

4.00%

12.00%

PPI < 20% PPI ≥ 20%

0.00% E-20 E-10 E-3 No paralog paralog paralog paralog

10.00%

8.00%

10 -172 10 -167

6.00%

4.00%

10 -114

10 -124

2.00%

No paralog

E-3 paralog

E-10 paralog

E-20 paralog

Conservation, remove general KO

Conservation

Remove general KO

Original

0.00%

Gene Ontology for condition-specific TF activity One possible explanation for low binding overlap is that while a TF may bind a gene in a variety of conditions, it may only have a functional effect (and therefore cause a change in gene expression when knocked out) in specific non-YPD conditions. Because all of the knockout experiments were performed in YPD, it is feasible that TFs only active in YPD would have better overlap than all TFs in general. We isolated YPD binding interactions from non-YPD interactions by using Gene Ontology (GO) (Ashburner et al, 2000) to identify the conditions in which the TFs are active (see Supplementary Methods). Some TFs were found to be active only in YPD, others only in non-YPD conditions, and the rest in both YPD and non-YPD conditions (Supplementary Table VII). We compared the overlaps of TFs active only in YPD with TFs that can be active in non-YPD conditions and found that the YPD-only TFs did not have higher overlap. In fact, YPD factors actually had a lower overlap than TFs sometimes active in non-YPD conditions (Supplementary Table VII).

8

Supplementary Table VII. Condition in which TFs are active. A comparison of overlaps for TFs active only in YPD, those active in YPD and non-YPD conditions, and those active only in non-YPD conditions.

YPD Non-YPD

GO properties

# TFs

p-value

Only active in YPD Can be active in non-YPD Only active in non-YPD

34 62 1

10-32 10-100 1.0

Binding overlap (%) 6.10 6.95 0

Knockout overlap (%) 3.34 5.08 0

Overlap for TFs with no reported protein interactions Our primary claim regarding shared PPIs is that TFs that share PPIs with their paralog are more likely to compensate for the loss of that paralog than those who do not interact with the same set of proteins as their paralog. This claim is supported by the binding and knockout data as we showed in the main text, but is not the only indicator that a TF may have a redundant partner. We also looked at how well the compensation mechanism works for TFs with no reported proteinprotein interactions (but with a strong paralog). There are 9 TFs with a very similar paralog (E10 and E-20 sets) that do not have any reported protein interactions. The average binding overlap for these TFs, 1.08%, is less than both the average overlap of the entire set of TFs (6.72%) and the average overlap of TFs in these two paralog groups with a high percentage of shared interactions (2.37%). Thus, among pairs of TFs with strong sequence similarity there is evidence that compensation for a knockout can occur even when the deleted TF does not physically interact with other proteins. Pfam-based paralogs show similar trend The binding overlap of increasingly strict sets of paralogs defined through Pfam analysis followed the same trend as the BLASTP-defined paralogs (Supplementary Figure 4). Those TFs without a paralog had binding overlap approximately 2.5 times greater than that of the group of TFs with a paralog at the strictest threshold (E-value E-8). However, we found that at E-8 the TFs that shared a higher percentage of PPI with their paralog had a greater binding overlap than those with few shared PPI. Because there were only 2 TFs in this subset, this anomaly could be a property of those particular TFs rather than their percentage of shared PPI. Because the Pfam analysis only examined the TF binding domain, we further split the strictest set of Pfam-defined paralogs based on the homology of their non-binding domains (Supplementary Methods). Only the strictest set of paralogs was examined because this set contained 47 paralogs and was therefore large enough to split into smaller groups that each contained a reasonably large number of TFs. As seen in Supplementary Figure 5, the paralogs with more homologous binding domains, indicated by BLASTP matches with a lower E-value, are more likely to be redundant and have low binding overlap. Similarly, paralogs with a greater number of homologous regions outside the binding domain are more likely to be redundant.

9

Supplementary Figure 4. Pfam-defined parlogs. The binding and knockout overlap for subsets of TFs determined to have a paralog at various Pfam E-value thresholds. The inset contains only the percentage of binding interactions that have a corresponding significant knockout effect (the binding overlap). 18.00% 16.00%

16.00% 12.00%

Shared PPI < 20% Shared PPI ≥ 20%

Binding overlap Knockout overlap

8.00% 14.00% 12.00%

4.00% 0.00% E-8 paralog E-3 paralog

10.00% 8.00% 6.00% 4.00% 2.00% 0.00% Pfam E-8 paralog

Pfam E-3 paralog

Pfam E-1 paralog

No Pfam paralog

Supplementary Figure 5. Pfam-based paralogs’ homologous regions outside the binding domain. Dividing the Pfam E-8 paralogs by their homologous regions outside the binding domain shows that paralogs with more similar non-binding regions have lower binding overlap. Panel A splits the paralogs based on the strength of the non-binding domain homology (BLASTP E-value), whereas panel B divides the paralogs by the number of non-binding domain matches at a BLASTP E-value of E-10. 6.00% A 6.00% B 5.00%

5.00%

4.00%

4.00%

3.00%

3.00%

2.00%

2.00%

1.00%

1.00%

0.00%

0.00% E-40 E-10 E-5 None Threshold for match outside binding domain

2 1 0 Number of matches outside binding domain

10

It has been reported that paralogous genes that have dissimilar expression patterns are more likely to be functionally redundant (Kafri et al, 2008) so we also divided the groups of paralogous TF-TF pairs based on their mean expression similarity. However, we were unable to draw conclusions from this split due to the disproportionate sizes of the subsets (Supplementary Methods). Choice of PPI dataset does not impact interaction network results The original interaction network was constructed using literature-curated PPI (Reguly et al, 2006). However, similar results were obtained when using a larger set of protein-protein interactions from BioGRID (Stark et al, 2006). As seen in Supplementary Table VIII, BioGRID contains nearly all of the literature-curated data in addition to many high-throughput datasets. We initially selected only literature-curated interactions due to several studies that have shown high-throughput methods to be error prone (Bader and Hogue, 2002; Sprinzak et al, 2003; von Mering et al, 2002). Indeed, the presence of false positives in the BioGRID PPI leads to slightly less significant overlaps compared to the literature-curated PPI network even though the percentage of knockout effects explained is higher (Supplementary Figure 6). Nevertheless, the BioGRID-derived network still yields much better overlap than the protein-DNA binding data alone. Supplementary Table VIII. Comparison of the literature-curated and BioGRID PPI datasets. Almost all of the literature-curated protein-protein interactions were deposited in the BioGRID database, which contains many additional interactions from other sources. PPI dataset Literature-curated only BioGRID only Both

Number of unique interactions 176 30166 11158

Supplementary Figure 6. The BioGRID protein interaction network explains many knockout effects. The improvement in overlap between physical interactions and indirect knockout effects when using BioGRID PPI closely resembles that obtained with the literaturecurated PPI. Both networks yield the most significant overlap with a maximum path length of 2. 50.00%

Binding overlap Knockout overlap

10-73

10 -58

PPI 3

PPI 4

40.00% 10 -180

30.00% 20.00%

10.00%

10-133

10 -152

0.00%

No PPI

PPI 1

PPI 2

11

Randomizing the PPI network In order to confirm that the higher overlaps that result from supplementing the yeast binding and knockout data with a PPI network are significant, we randomized the PPI and the protein-DNA binding interactions. The PPI network is used to explain indirect knockout effects, and we found that in all cases we found the original data led to larger overlaps than the randomized data (Supplementary Table IX). Supplementary Table IX. Yeast PPI randomization. A comparison of the actual overlap and the overlap with random PPI or binding interactions. Randomization results are reported as average ± one standard deviation. Protein-DNA binding interactions are included as directed edges in the PPI network. Path length 1 1 1 2 2 2

Dataset randomized Neither PPI Binding and PPI Neither PPI Binding and PPI

Binding overlap (%) 4.32 3.73 ± 0.13 1.82 ± 0.26 2.57 2.54 ± 0.16 1.58 ± 0.18

Knockout overlap (%) 8.45 6.37 ± 0.23 2.28 ± 0.38 21.79 18.00 ± 1.21 10.74 ± 1.34

Multiple levels of conservation criteria in the interaction network The PPI network analysis with the less strict conservation criteria (Supplementary Figure 7) reveals the same pattern seen with the strictest conservation criteria (Figure 3). As expected, the less strict conservation criteria yields lower binding overlaps (1.92% versus 2.57% for path length of 2), but the knockout overlaps are higher, which causes the overlap to be more significant (10-241 versus 10-211 for path length 2).

12

Supplementary Figure 7. Influence of physical interaction networks for different conservation criteria. As when using the strictest binding data conservation criteria, the interaction network with the less strict conservation criteria achieves the most significant overlap with a maximum path length of 2. Longer paths result in physical explanations for a larger percentage of knockout effects, but the lower percentage of physical interactions that are functional (i.e. explain a knockout effect) causes the significance to be worse for long paths. 70.00% 60.00%

Binding overlap Knockout overlap

10-156

10 -94

50.00% 10-241 40.00%

30.00% 10 -203

20.00%

10.00%

10-172

0.00%

No PPI

PPI 1

PPI 2

PPI 3

PPI 4

Analysis of human p63 studies The human p63 depletion study (Yang et al, 2006) reported multiple sets of significant knockout effects based on various false discovery rate (FDR) thresholds. The increase in overlap when using a PPI network reported in the main text for the set of knockout effects with FDR threshold of 0.2 was also observed for the other FDR thresholds (Supplementary Figure 8), as were the PPI randomization results (Supplementary Table X). The effect of using limited motif-based binding data instead of genome-wide experimental binding data, which is not yet available for humans, is apparent when the binding interactions are randomized. While randomizing the PPI edges led to much smaller overlap compared to the real PPI network (Supplementary Table X), randomizing the binding data yielded inconclusive results because the dataset itself is incomplete. It should be noted that we examined a single TF (p63) in the human analysis as opposed to all TFs in yeast. The effects observed for the single human TF may not be consistent with the trends we would observe if a genome-wide human analysis was possible.

13

Supplementary Figure 8. Knockout and binding overlap in a human p63 depletion study. For both down- and up-regulated genes, four FDR thresholds were used to determine significant knockout effects. The binding and knockout overlaps are reported below both with and without the incorporation of the PPI network. A B

C

D

14

Supplementary Table X. Human p63 and PPI randomization. The binding and knockout overlap when using random PPI instead of the actual data. Randomization results are reported as average ± one standard deviation. In all cases, the PPI network includes protein-DNA binding edges and allows a maximum path length of 2. The knockout overlap significantly decreases when using randomized PPI data. FDR 0.05 0.05 0.05 0.05 0.1 0.1 0.1 0.1 0.15 0.15 0.15 0.15 0.2 0.2 0.2 0.2

Regulation Down Down Up Up Down Down Up Up Down Down Up Up Down Down Up Up

Dataset randomized None PPI None PPI None PPI None PPI None PPI None PPI None PPI None PPI

Binding overlap (%) 1.11 1.41 ± 0.10 2.02 2.17 ± 0.48 2.72 3.14 ± 0.27 3.02 3.11 ± 0.51 6.97 8.02 ± 0.91 3.97 4.14 ± 0.20 13.16 14.19 ± 1.01 5.28 5.46 ± 0.39

Knockout overlap (%) 72.12 54.06 ± 7.09 59.62 26.32 ± 9.41 69.36 44.70 ± 6.10 61.02 33.15 ± 6.86 66.11 35.43 ± 11.93 61.12 28.82 ± 17.88 63.27 36.92 ± 16.62 62.27 33.58 ± 12.62

Double knockout experiments support redundancy as a cause for low binding overlap We collected data and carried out new experiments for four TF pairs with high BLASTP similarity scores and percentages of shared PPI (Supplementary Table XI). Supplementary Table XI. BLASTP E-values and shared PPI of putative paralogs examined in double knockout experiments. Strong sequence similarity and a high percentage of shared PPI were used to identify putatively redundant TF pairs that were likely to compensate for each other’s deletion. Fkh1 and Fkh2 do not have any PPI in common, but we consider them because of their sequence similarity and previously reported evidence of redundancy (Hollenhorst et al, 2001). The BLASTP E-value is not symmetric, and the lower of the two values is reported here. TF1

TF2

BLASTP E-value

Pdr1 Fkh1 Ace2 Yhp1

Pdr3 Fkh2 Swi5 Yox1

9E-139 6E-115 2E-66 4E-47

TF1 PPI shared by TF 2 (%) 25 0 36 100

TF2 PPI shared by TF1 (%) 25 0 21 33

We performed a new Pdr1/Pdr3 double knockout experiment and obtained double knockout expression data for the other pairs from previous studies (Pramila et al, 2002; Voth et al, 2007; Zhu et al, 2000). Comparison to the binding data was done without the conservation filtering 15

criteria (original Harbison et al. results) since some of these factors only bound very few genes when using this criteria. In all cases, the p-value of the overlap was higher for the double KO when compared to the single KO. In fact, with the exception of Ace2 and Swi5, we found that deleted TFs with a strong putative paralog do not affect the expression levels of a significant number of genes they bind due to their partner’s compensation. In sharp contrast, when both a TF and its paralog are deleted, the backup mechanism is eliminated and a significant number of bound genes are differentially expressed (Supplementary Table XII). Even for Ace2 and Swi5, which significantly affect their binding targets when knocked out individually, the effects of a double knockout are more pronounced. Supplementary Table XII. Double knockouts of paralogous TFs significantly affect bound genes. Functional redundancy explains the apparent lack of response that results from a single knockout of a TF with a strong paralog. When the paralog is deleted concurrently, bound genes are significantly affected. The original binding dataset was used for the results below. See the Supplementary Methods for further description of the datasets used and the methods applied to identify differentially expressed genes. Genes bound by Pdr1 Pdr3 Pdr1 Pdr1 Fkh1 Fkh2 Fkh1 Fkh2 Ace2 Swi5 Ace2 Swi5 Yhp1 Yox1 Yhp1 Yox1

TF(s) deleted Pdr1 and Pdr3 Pdr1 and Pdr3 Pdr1 Pdr3 Fkh1 and Fkh2 Fkh1 and Fkh2 Fkh1 Fkh2 Ace2 and Swi5 Ace2 and Swi5 Ace2 Swi5 Yhp1 and Yox1 Yhp1 and Yox1 Yhp1 Yox1

p-value Binding Knockout Bound Deletionoverlap (%) overlap (%) genes affected genes -5.4 10 18.71 5.75 139 452 0.85 4.26 0.44 47 452 0.85 0.72 1.18 139 85 1.0 0.00 0.00 47 13 10-4.5 5.02 15.38 239 78 -8.5 10 8.43 19.23 178 78 1.0 0.00 0.00 239 21 0.04 1.12 18.18 178 11 -7.4 10 10.08 14.12 119 85 10-11.5 10.84 21.18 166 85 -7.1 10 6.72 25.00 119 32 10-2.7 1.81 30.00 166 10 -4.1 10 7.55 14.29 53 28 10-7.5 8.86 25.00 79 28 1.0 0.00 0.00 53 42 0.32 1.27 3.33 79 30

16

Expression analysis of yeast TFs We investigated whether the expression level of the paralog of a TF increased after the deletion of the TF. Increased expression would support our belief that the paralog compensates for the knocked out TF. As seen in Supplementary Table XIII, we found this to be the case for Ace2, Gzf3, and Sok2 after their respective paralogs Swi5, Gat1, and Phd1 were knocked out. In addition, the expression increase of Swi6 after its paralog Swi4 is deleted is just below the significance threshold. In contrast, no TF’s expression level significantly decreased when its paralog was deleted. Supplementary Table XIII. Paralog expression changed after partner knockout. The Xscore is related to expression fold change (Hu et al, 2007). Negative X-score corresponds to down-regulation. An X-score of about 2.8 in either direction corresponds to a p-value of 0.005 and was considered to be significant. The knockout data was not available for the black boxes. TF1 ORF name

TF2 ORF name

YBR049C YBR182C YDL056W YDL056W YDR146C YER040W YER040W YER111C YER169W YFL021W YFL021W YFL021W YGL013C YGL071W YGL073W YIR018W YJL056C YJL056C YKL062W YKL062W YKR034W YML027W YMR016C YNL068C YOR028C YOR113W YOR113W YPL038W

YDR026C YPL089C YER111C YLR182W YLR131C YJL110C YKR034W YLR182W YMR037C YER040W YJL110C YKR034W YBL005W YPL202C YHR206W YOL028C YHL027W YMR037C YJL056C YMR037C YJL110C YDR451C YKL043W YIL131C YDR259C YJL056C YMR037C YDR253C

TF1 standard name Reb1 Smp1 Mbp1 Mbp1 Swi5 Gln3 Gln3 Swi4 Rph1 Gat1 Gat1 Gat1 Pdr1 Rcs1 Hsf1 Yap5 Zap1 Zap1 Msn4 Msn4 Dal80 Yox1 Sok2 Fkh2 Cin5 Azf1 Azf1 Met31

TF2 standard name YDR026C Rlm1 Swi4 Swi6 Ace2 Gzf3 Dal80 Swi6 Msn2 Gln3 Gzf3 Dal80 Pdr3 Aft2 Skn7 Yap7 Rim101 Msn2 Zap1 Msn2 Gzf3 Yhp1 Phd1 Fkh1 Yap6 Zap1 Msn2 Met32

Effect of TF1 KO on Effect of TF2 KO on TF2 (X-score) TF1 (X-score)

0.46 -0.07 0.66 -0.29 0.87 1.21 -1.68 -0.68 -0.61 0.19 0.10 -0.47 -0.69 0.34 -0.06 -0.48 0.37 -0.57 -0.51 0.58 5.90 -0.01 -0.31 0.29 -1.63 -0.14

-0.45 0.17 0.24 3.25 -0.26 -0.69 2.74 -0.03 -0.99 4.46 0.18 -0.92 -1.03 -0.48 -0.03 0.18 -0.08 0.37 0.43 -1.26 -0.98 -0.88 0.42 0.25 0.41 0.58 17

Supplementary Table XIV (Excel file). BLASTP-defined paralog sets. This table contains the assignment of TFs to paralog sets defined using the BLASTP E-value and percentage of shared PPI along. In addition, it provides the binding and knockout overlaps for each TF. Only TFs for which we have conservation-supported binding data and knockout data are included. For TFs without a putative paralog, we calculated shared PPI using approximate paralogs (see Methods).

18

Backup in gene regulatory networks explains differences between binding and knockout results Supplementary Methods Head-to-head gene removal The head-to-head gene pairs were defined in the same manner as the divergent genes in (Tsai et al, 2007), and a list of 1202 divergent gene pairs was obtained from the authors of this paper. In short, head-to-head genes are divergently transcribed on opposite strands of DNA and have transcription start sites within 1000 base pairs of one another. SGD (http://www.yeastgenome.org/) was used to obtain the gene sequence and annotations from which the 1202 pairs were identified after filtering dubious, silent, and overlapping ORFs. GO analysis The GO terms used to determine if a TF is active in YPD or non-YPD conditions consist of the grandchild terms of the GO term biological process (accessed January 24, 2008). We manually annotated these terms as corresponding to a process that is only active in YPD or not (Supplementary Table XV – Excel file). Next, we downloaded the GO terms assigned to each yeast TF from SGD and used OboEdit (Day-Richter et al, 2007) to map the TFs to the annotated GO terms. Each TF was determined to be associated only with GO terms corresponding to YPD processes or with one or more GO terms indicative of a non-YPD process, such as GO term 0042221 – response to chemical stimulus. Redundant paralogs Kafri et al. (Kafri et al, 2005; Kafri et al, 2008) generated a list of 2224 protein-protein paralogous pairs by using BLASTP with an E-value threshold of E-20 and discarding pairs for which the length of the protein sequence of the longer TF is more than 1.33 times greater than that of the shorter TF. The subset of these pairs for which both proteins are TFs was used to define our E-20 paralog set. Note that TF-TF pairs that have a BLASTP E-value less than E-20 but do not meet the length ratio requirement are placed in the next less strict paralog set instead of the E-20 set. The authors created a second list of 112 protein-protein pairs with literature support of their redundancy. However, of these pairs, Fkh1-Fkh2 was the only TF-TF pair so we could not use this list in our paralog analysis. The literature-curated protein interactions (Reguly et al, 2006) were used to compute the percentage of shared PPI for the BLASTP-defined putative paralog. Using the methods described by Kafri et al., we also calculated the mean expression similarity for all pairs of paralogous TFs to test whether within each of the four BLASTP-defined groups of paralogs, those TFs that had one or more paralogous partners with mean expression similarity ≤ 0.4 would be more likely to be redundant. However, we found that across all four sets of paralogs, only two TFs had mean expression similarity > 0.4. Thus, we were unable to draw any general conclusions from this analysis. When working with the Pfam-derived paralog sets, similarities in the DNA non-binding regions of the different TFs were detected with BLAST when the E-value of their alignment was smaller 19

than E-5. To search homology, we used a database composed by the 188 TFs in which the DNAbinding regions of the different TFs, according to the previous assignment of Pfam, were masked. The Pfam analysis was conducted independently from the BLASTP-based analysis and used a different set of PPI. These PPI were extracted from the DIP (release 2008014) (Salwinski et al, 2004), MINT (release 2007-12-20) (Chatr-aryamontri et al, 2007), and MPACT (release April 2007) (Guldener et al, 2006) databases. The results were merged via UniProt (Boeckmann et al, 2003) codes, protein sequence, and protein specie. Protein-protein interaction network construction and randomization In the PPI network, PPI were treated as undirected edges whereas TF-gene binding interactions were directed. To find the set of TF-gene physical interactions supplemented by the PPI network, we performed a breadth-first search starting at the source TF to obtain the set of TFs connected to the source TF. Path length was always limited so that the shortest distance from the source TF to the set of connected TFs did not exceed some fixed maximum. We then found the set of genes that are significantly bound by the connected TFs and took these genes as the set that are physically connected with the source TF. The overlap calculation was conducted as before except now the TF-gene physical connections were used in place of the TF-gene binding interactions. The interaction network based on the literature-curated PPI dataset (Reguly et al, 2006) made use of all unique interactions reported. However, the BioGRID dataset (version 2.0.48) (Stark et al, 2006) included non-physical interactions. Therefore, we removed all genetic interactions as well as those inferred from co-localization. Furthermore, only interactions between two yeast proteins were added to our interaction network. The PPI randomization results we report are the summary statistics from a series of 50 individual randomization tests. In order to preserve global properties such as the degree of each node in the network, a random set of PPI was created by repeatedly randomly selecting two interactions and swapping a partner from the first interaction with a partner from the second. PPI are undirected so that either protein in the first randomly selected interaction can be swapped with either protein from the other interaction. We disallow a random swap if it would create an undirected edge that is already present in the PPI network, since this would reduce the number of distinct interactions in our random dataset. The number of swapping iterations performed is equal to 25 * the number of PPI, and because two interactions are selected at each iteration the average number of swaps per interaction is 50. The Poisson distribution can be used to find the probability of a particular interaction not being swapped k e   50 0 e 50 f (k ,  )   f (0,2 * 25)   1.93*E-22 k! 0! where k is the exact number of swaps per interaction of interest and λ is the expected number of swaps per interaction. Because the probability a given interaction will not be swapped is very low, we are confident the swapping procedure yields an effectively random set of interactions. The directed TF-gene binding interactions are randomized in a different manner because the interaction source and target cannot be interchanged. Rather than swapping edges, we find the number of significant gene targets at a given p-value threshold for each TF and randomly select that many gene targets from the set of all genes in the binding dataset. If binding edges are 20

allowed in the network, the same set of randomized binding interactions will be used for the intermediate edges in the PPI network and to link TFs connected to the source TF to significantly bound genes. The randomization results reported in Supplementary Table IX were calculated by finding the significance and overlap for each individual of the 50 randomization tests and then computing the summary statistics for these values. This yields different results than calculating the average number of overlapping interactions, binding interactions, and knockout interactions and then calculating the significance and overlap of these values. Cytoscape was used for the protein-protein interaction network visualization (Shannon et al, 2003) Double knockout analysis LIMMA (Smyth, 2004, 2005) was used to normalize and process the Pdr1/Pdr3 double knockout microarray data. We chose not to apply background correction, gave 0 weight to spots with a GenePix flag less than or equal to -50, and used print-tip group loess within array normalization. The set of significantly affected genes was obtained by adjusting the p-values for multiple hypothesis testing and taking genes with an adjusted p-value < 0.05, A > 9, and B > 1.5 (A is the average log2 expression level and B is the log odds of differential expression). The Fkh1/Fkh2, Ace2/Swi5, and Yhp1/Yox1 double mutant studies did not report p-values so different methods were used for each dataset to identify genes significantly affected by the perturbation. The Fkh1 and Fkh2 double knockout data (Zhu et al, 2000) was subsequently analyzed by BarJoseph et al. to identify differentially expressed genes (Bar-Joseph et al, 2003). They found 56 cell-cycle-regulated genes and 22 noncycling genes to be significantly affected by the double knockout, and we use these 78 genes for our analysis. For Fkh1 and Fkh2 single deletions, we apply a p-value threshold of 0.005 to the Hu at al. data to determine which genes are affected (Hu et al, 2007). The expression analysis of Voth et al. included an Ace2/Swi5 double knockout and well as single deletions of each of these TFs (Voth et al, 2007). In both the single and double knockouts, Voth et al. considered a gene to be affected if it experienced a 2-fold or greater decrease in expression. We relax this criterion by allowing a 2-fold change in expression level in either direction. Pramila et al. identified 28 cell-cycle-regulated genes that were derepressed in a Yhp1/Yox1 double mutant (Pramila et al, 2002). When using the hypergeometric distribution to calculate the significance of the binding and knockout overlap, the complete set of possible gene targets must be known. Because the full Yhp1/Yox1 double knockout dataset was not available, the number of genes on the microarray was not known. Thus, to estimate the p-value of the overlap we used the set of 6275 ORFs from the array used in our Pdr1/Pdr3 double knockout. Pramila et al. did not perform Yhp1 or Yox1 single knockouts, so as with Fkh1/Fkh2 we apply a p-value threshold of 0.005 to the Hu at al. data (Hu et al, 2007) to determine which genes are affected by a single deletion.

21

Human data sources and interaction network The motif-based human TF-gene binding data (Xie et al, 2005) from the Molecular Signatures Database (Subramanian et al, 2005) was provided as a mapping from TRANSFAC matrices (http://www.biobase-international.com/index.php?id=transfac, accessed April 19, 2008) (Matys et al, 2006) to genes, and it was therefore necessary to map the TRANSFAC matrices to unique TFs. It is not possible to directly translate the TRANSFAC matrices to Entrez gene identifiers (http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene) (Maglott et al, 2005), our target identifier, so we integrated several data sources to map from TRANSFAC matrices to TRANSFAC TF identifiers to UniProt accession numbers (http://www.uniprot.org, data downloaded April 20, 2008) (Boeckmann et al, 2003) to Entrez gene identifiers. The mapping from TRANSFAC matrices to Entrez gene identifiers was not 1-to-1 because at each of the intermediate steps it was possible that a single identifier mapped to zero, one, or many other identifiers. Nevertheless, the overlaps we calculate are similar enough to those reported by Yang et al. (Yang et al, 2006) for us to conclude that any loss of information at these intermediate mapping steps is negligible. The p63 knockout data was reported as UniGene (http://www.ncbi.nlm.nih.gov/sites/entrez?db=unigene) (Pontius et al, 2003) and GenBank accession numbers (http://www.ncbi.nlm.nih.gov/Genbank, accessed April 17, 2008) (Benson et al, 2008), which we mapped to Entrez gene identifiers. The p63 binding data was reported as the chromosomal position which was bound, and as described by Yang et al. (Yang et al, 2006) we used the USCS Genome Browser (http://genome.ucsc.edu/, accessed April 22, 2008) (Karolchik et al, 2008) to obtain RefSeq (Pruitt et al, 2007) identifiers for the binding sites. The RefSeq identifiers were then mapped to Entrez gene identifiers. The motif-based binding, p63 binding, and p63 knockout datasets were then integrated with PPI from HPRD (http://www.hprd.org, HPRD_Release_7_09012997) (Mishra et al, 2006). The human p63 binding and knockout overlaps with and without the PPI network were calculated in the same way as the yeast TF overlaps, however the process used to randomize the PPI varied slightly. Because of the large number of human PPI in the HPRD database, we performed 5 random swaps per interaction instead of the 25 used with yeast to reduce the computational demands of the randomization. Supplementary Table XV (Excel file). YPD and non-YPD GO terms. This table contains the list of GO terms that were used to assess whether a TF is active only in YPD or if it may be active in non-YPD conditions as well.

22

Supplementary References Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G (2000) Gene Ontology: tool for the unification of biology. Nat Genet 25: 25-29. Bader GD, Hogue CWV (2002) Analyzing yeast protein-protein interaction data obtained from different sources. Nat Biotech 20: 991-997. Bar-Joseph Z, Gerber G, Simon I, Gifford DK, Jaakkola TS (2003) Comparing the continuous representation of time-series expression profiles to identify differentially expressed genes. Proceedings of the National Academy of Sciences of the United States of America 100: 1014610151. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL (2008) GenBank. Nucl Acids Res 36: D25-30. Boeckmann B, Bairoch A, Apweiler R, Blatter M-C, Estreicher A, Gasteiger E, Martin MJ, Michoud K, O'Donovan C, Phan I, Pilbout S, Schneider M (2003) The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucl Acids Res 31: 365-370. Chatr-aryamontri A, Ceol A, Palazzi LM, Nardelli G, Schneider MV, Castagnoli L, Cesareni G (2007) MINT: the Molecular INTeraction database. Nucl Acids Res 35: D572-574. Day-Richter J, Harris MA, Haendel M, The Gene Ontology OBOEWG, Lewis S (2007) OBOEdit an ontology editor for biologists. Bioinformatics 23: 2198-2200. Garten Y, Kaplan S, Pilpel Y (2005) Extraction of transcription regulatory signals from genomewide DNA-protein interaction data. Nucl Acids Res 33: 605-615. Guldener U, Munsterkotter M, Oesterheld M, Pagel P, Ruepp A, Mewes H-W, Stumpflen V (2006) MPact: the MIPS protein interaction resource on yeast. Nucl Acids Res 34: D436-441. Hollenhorst PC, Pietz G, Fox CA (2001) Mechanisms controlling differential promoteroccupancy by the yeast forkhead proteins Fkh1p and Fkh2p: implications for regulating the cell cycle and differentiation. Genes Dev 15: 2445-2456. Hu Z, Killion PJ, Iyer VR (2007) Genetic reconstruction of a functional transcriptional regulatory network. Nat Genet 39: 683-687. Kafri R, Bar-Even A, Pilpel Y (2005) Transcription control reprogramming in genetic backup circuits. Nat Genet 37: 295-299. Kafri R, Dahan O, Levy J, Pilpel Y (2008) Preferential protection of protein interaction network hubs in yeast: Evolved functionality of genetic redundancy. Proceedings of the National Academy of Sciences 105: 1243-1248. 23

Karolchik D, Kuhn RM, Baertsch R, Barber GP, Clawson H, Diekhans M, Giardine B, Harte RA, Hinrichs AS, Hsu F, Kober KM, Miller W, Pedersen JS, Pohl A, Raney BJ, Rhead B, Rosenbloom KR, Smith KE, Stanke M, Thakkapallayil A, Trumbower H, Wang T, Zweig AS, Haussler D, Kent WJ (2008) The UCSC Genome Browser Database: 2008 update. Nucl Acids Res 36: D773-779. Lee TI, Rinaldi NJ, Robert F, Odom DT, Bar-Joseph Z, Gerber GK, Hannett NM, Harbison CT, Thompson CM, Simon I, Zeitlinger J, Jennings EG, Murray HL, Gordon DB, Ren B, Wyrick JJ, Tagne J-B, Volkert TL, Fraenkel E, Gifford DK, Young RA (2002) Transcriptional Regulatory Networks in Saccharomyces cerevisiae. Science 298: 799-804. Maglott D, Ostell J, Pruitt KD, Tatusova T (2005) Entrez Gene: gene-centered information at NCBI. Nucl Acids Res 33: D54-58. Matys V, Kel-Margoulis OV, Fricke E, Liebich I, Land S, Barre-Dirrie A, Reuter I, Chekmenev D, Krull M, Hornischer K, Voss N, Stegmaier P, Lewicki-Potapov B, Saxel H, Kel AE, Wingender E (2006) TRANSFAC(R) and its module TRANSCompel(R): transcriptional gene regulation in eukaryotes. Nucl Acids Res 34: D108-110. Mishra GR, Suresh M, Kumaran K, Kannabiran N, Suresh S, Bala P, Shivakumar K, Anuradha N, Reddy R, Raghavan TM, Menon S, Hanumanthu G, Gupta M, Upendran S, Gupta S, Mahesh M, Jacob B, Mathew P, Chatterjee P, Arun KS, Sharma S, Chandrika KN, Deshpande N, Palvankar K, Raghavnath R, Krishnakanth R, Karathia H, Rekha B, Nayak R, Vishnupriya G, Kumar HGM, Nagini M, Kumar GSS, Jose R, Deepthi P, Mohan SS, Gandhi TKB, Harsha HC, Deshpande KS, Sarker M, Prasad TSK, Pandey A (2006) Human protein reference database-2006 update. Nucl Acids Res 34: D411-414. Pan X, Ye P, Yuan DS, Wang X, Bader JS, Boeke JD (2006) A DNA Integrity Network in the Yeast Saccharomyces cerevisiae. Cell 124: 1069-1081. Pontius JU, Wagner L, Schuler GD (2003) UniGene: a unified view of the transcriptome. The NCBI Handbook 1. Pramila T, Miles S, GuhaThakurta D, Jemiolo D, Breeden LL (2002) Conserved homeodomain proteins interact with MADS box protein Mcm1 to restrict ECB-dependent transcription to the M/G1 phase of the cell cycle. Genes & Development 16: 3034-3045. Pruitt KD, Tatusova T, Maglott DR (2007) NCBI reference sequences (RefSeq): a curated nonredundant sequence database of genomes, transcripts and proteins. Nucl Acids Res 35: D61-65. Reguly T, Breitkreutz A, Boucher L, Breitkreutz B-J, Hon G, Myers C, Parsons A, Friesen H, Oughtred R, Tong A, Stark C, Ho Y, Botstein D, Andrews B, Boone C, Troyanskya O, Ideker T, Dolinski K, Batada N, Tyers M (2006) Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae. Journal of Biology 5: 11.

24

Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D (2004) The Database of Interacting Proteins: 2004 update. Nucl Acids Res 32: D449-451. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T (2003) Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks. Genome Research 13: 2498-2504. Smyth G (2004) Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments. Statistical Applications in Genetics and Molecular Biology 3: 1027. Smyth G (2005) Limma: linear models for microarray data. In Bioinformatics and computational biology solutions using R and bioconductor, Gentleman R, Carey V, Huber W, Irizarry R, Dudoit S (eds), pp 397-420. New York: Springer-Verlag New York, Inc. Sprinzak E, Sattath S, Margalit H (2003) How Reliable are Experimental Protein–Protein Interaction Data? Journal of Molecular Biology 327: 919-923. Stark C, Breitkreutz B-J, Reguly T, Boucher L, Breitkreutz A, Tyers M (2006) BioGRID: a general repository for interaction datasets. Nucl Acids Res 34: D535-539. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP (2005) Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America 102: 15545-15550. Tsai H-K, Su C, Lu M-Y, Shih C-H, Wang D (2007) Co-expression of adjacent genes in yeast cannot be simply attributed to shared regulatory system. BMC Genomics 8: 352. von Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, Bork P (2002) Comparative assessment of large-scale data sets of protein-protein interactions. Nature 417: 399-403. Voth W, Yu Y, Takahata S, Kretschmann K, Lieb J, Parker R, Milash B, Stillman D (2007) Forkhead proteins control the outcome of transcription factor binding by antiactivation. The EMBO Journal 26: 4324-4334. Xie X, Lu J, Kulbokas EJ, Golub TR, Mootha V, Lindblad-Toh K, Lander ES, Kellis M (2005) Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals. Nature 434: 338-345. Yang A, Zhu Z, Kapranov P, McKeon F, Church GM, Gingeras TR, Struhl K (2006) Relationships between p63 Binding, DNA Sequence, Transcription Activity, and Biological Function in Human Cells. Molecular Cell 24: 593-602. Zhu G, Spellman PT, Volpe T, Brown PO, Botstein D, Davis TN, Futcher B (2000) Two yeast forkhead genes regulate the cell cycle and pseudohyphal growth. Nature 406: 90-94. 25