A comparative review of computational methods for

0 downloads 0 Views 4MB Size Report
rsc.li/molecular-biosystems. Molecular. BioSystems. REVIEW. Published on 17 July 2017. Downloaded by Cornell University Library on 17/07/2017 13:04:07.
Molecular BioSystems View Article Online

Published on 17 July 2017. Downloaded by Cornell University Library on 17/07/2017 13:04:07.

REVIEW

Cite this: DOI: 10.1039/c7mb00170c

View Journal

A comparative review of computational methods for pathway perturbation analysis: dynamical and topological perspectives† Q. Vanhaelen,

* A. M. Aliper and A. Zhavoronkov

Stem cells offer great promise within the field of regenerative medicine but despite encouraging results, the large scale use of stem cells for therapeutic applications still faces challenges when it comes to controlling signaling pathway responses with respect to environmental perturbations. This step is critical for the elaboration of stable and reproducible differentiation protocols, and computational modeling can be helpful to overcome these difficulties. This article is a comparative review of the mechanism-based methods used for hypothesis-driven approaches and data-driven methods which are two types of computational approaches commonly used for analysing the dynamics of pathways involved in stem cell regulation. We firstly review works based on kinetic modelling. We emphasize the relationships between the dynamics of these pathways and their topological features, and illustrative examples are described to show how the analysis of these relationships can contribute to a more detailed and formal Received 20th March 2017, Accepted 15th June 2017 DOI: 10.1039/c7mb00170c

understanding of the signaling dynamics. This discussion is followed by a review of the recent datadriven pathway analysis methods. Based on a simplified description of the pathways, these methods are able to handle high dimensionality data, and topological features of the pathways taken into account in the latest methods improve both accuracy and predictive power. Nevertheless, progress is still needed

rsc.li/molecular-biosystems

to clarify the biological meaning of the topological decompositions used by these methods.

Introduction Insilico Medicine Inc., Johns Hopkins University, ETC, B301, MD 21218, USA. E-mail: [email protected] † Electronic supplementary information (ESI) available. See DOI: 10.1039/c7mb00170c

Quentin Vanhaelen earned his Bachelor’s and Master’s in Physics (first class honor) from Universite´ libre de Bruxelles (ULB) in 2006. He earned his DEA in Sciences from ULB in 2007. He earned his PhD in Theoretical Physics (Grant FNRS–FRIA) from ULB in 2010. During his PhD studies, he developed a big interest in questions related to regenerative medicine. He was a post doc fellow at Ottawa Hospital Research InstiQ. Vanhaelen tute (OHRI), Ottawa, Canada, from 2011 to 2013. He is now at Insilico medicine Inc. engaged in research on modeling and analysis of signaling pathways.

This journal is © The Royal Society of Chemistry 2017

Experimental modeling of human disorders is used to identify the molecular mechanisms behind the onset of diseases, a primordial step for the development of appropriate therapies. Traditionally, these research studies are performed by using animal models although they differ from humans at epigenetic, genetic and metabolic levels. Consequently, drugs that were demonstrated to be successful on animal models might fail when moved into clinical trials. With their ability to differentiate into any cell type, human pluripotent stem cells (PSCs) can be useful to overcome the limitations of animal models for certain disorders. PSCs are now considered as a fundamental aspect of regenerative therapy which includes the repair, replacement, or regeneration of cells, tissues or organs.1 PSCs have also attracted interest from pharmacological companies regarding applications such as disease modelling, drug discovery and drug testing.2 PSCs are also used for therapeutic screening and intervention as recently emphasized.2,3 It is now understood that differentiation of PSCs is controlled by the cellular environment. Interconnected signal transduction pathways transmit signals received from the environment to gene regulatory networks4–6 (GRNs) which regulate PSC fate decision accordingly.7–9 The third type of regulatory components

Mol. BioSyst.

View Article Online

Published on 17 July 2017. Downloaded by Cornell University Library on 17/07/2017 13:04:07.

Review

are the metabolic pathways which are constituted of biochemical reactions, mostly catalyzed by enzymes. They regulate metabolism whose aim is to produce energy and maintain cellular activity. Signal transduction pathways, GRNs and metabolic pathways interact with each other through components such as the mTOR pathway. MTOR is known for being involved in the control of metabolism, cellular growth and proliferation. It acts as a sensor for these processes, and is connected to many upstream and downstream signaling cascades. Despite the research undertaken to better understand the relationships between signalling pathway activation, gene regulation and stem cell fate commitment, the use of PSCs is still restricted by several unknowns. Consequently, the elaboration of effective lineage-specific differentiation protocols remains complicated. Often, the protocols are unable to produce the large amounts of cells required for therapeutic use. Furthermore, the techniques currently used for cultivating PSCs tend to generate heterogeneous population of self-renewing and partially differentiated cells. As described in Fig. 1, there are three categories of modeling methods for studying the various types of networks described above: interaction-based, constraint-based

Molecular BioSystems

and mechanism-based.10 Each of these methods has been extensively reviewed elsewhere11,12 and although it is possible to apply them to all types of networks, some methods are more adapted for specific types of pathways. For example, metabolic pathways aim at optimizing the amount of biomass produced while maintaining a steady-state behavior.13 Thus when modeling such systems, one applies constraint-based approaches which rely on optimization techniques to identify a single state, within the space of allowed states of the system, which corresponds to the actual flux distribution under a defined set of nutrient conditions. Applying constraint-based methods to metabolic pathways is an active area of research with various improved methods suggested. A method for determining optimal metabolic pathways in terms of the level of concentrations of the enzymes catalyzing reactions in a metabolic network was proposed.14 Another improvement is based on observations that the steady-state activity of metabolic pathways incorporates feedback inhibition of the enzymes. To take this feature into account, feedback inhibition was incorporated in a flux balance analysis (FBA) based method for modelling the optimal activity of metabolic pathways. The examples showed that this

Fig. 1 (a) Modeling complex biological systems can be done using three main approaches: in the interaction based one, only static interactions are taken into account and represented usually as a (un)directed graph. Molecular details of the interactions are ignored. For the constraint based one, molecular details of the interactions are included through the stoichiometric matrix. The mechanism based methods include kinetic parameters which allow us to take into account the temporal dynamics of the system. (b) The dimensionality of the system that can be taken into account decreases as the level of details of the description increases. Interaction based approaches are appropriate for modeling large systems but their structure and topology can only be analyzed through generic properties of the associated network whereas mechanism based methods are more adapted for smaller systems but the properties of the networks and it associated topology can better be connected to specific dynamical behavior. (c) Interaction based methods are usually favored when implementing data-driven approaches which involve large sets of data and associated variables which need to be analyzed at a very large scale usually by statistical means whereas mechanism based methods are suitable for hypothesis based approaches which aim at analyzing very specific dynamical properties of a restricted set of components.

Mol. BioSyst.

This journal is © The Royal Society of Chemistry 2017

View Article Online

Published on 17 July 2017. Downloaded by Cornell University Library on 17/07/2017 13:04:07.

Molecular BioSystems

algorithm demonstrates superior performance compared with methods such as the extreme pathway analysis.15 Both interaction and mechanism based methods have been applied to decipher the complexity of the signalling pathway dynamics such as mechanisms behind receptor activation and signal transduction.16,17 Mechanism-based modeling methods (also called kinetic modeling methods) are favored when developing hypothesis-driven approaches which analyse functionalities of biological systems by focusing on their dynamical and topological properties.18,19 These approaches use mathematical models to firstly formulate hypotheses on potential regulatory mechanisms and then evaluate their consistency with experimental data and suggest experimental procedures to identify them.20 These models can also include crosstalks between different pathways and even GRNs. These methods are suitable for relatively small systems because they require a large amount of quantitative information not always available. On the other hand, the relationship between pathway activation and transcriptomic response has led to the elaboration of pathway maps built using an interaction-based scheme. The representation in terms of interaction maps allowed the development of rather complex data-driven approaches which are essentially statistical-based methods. The pathway maps are used by statistical techniques to analyze changes in the expression of genes that are known to belong to common pathways or are involved in similar biological processes. These methods can be attractive for two reasons. Firstly, it is possible to incorporate the transcriptomic data without the need of external quantitative information. Secondly, as they work in terms of pathway activity levels and not in terms of individual genes, they reduce the genomic complexity from tens of thousands of features to measurements on only dozens of pathways. Data-driven methods can be classified into four generations.21 The first generation methods, overrepresentation approach (ORA),22 focus on the number of differentially expressed (DE) genes observed in a pathway by comparing DE genes expected to hit the pathway by chance. If this number differed significantly from that expected by chance, the pathway was considered as significant. The second generation methods, functional class scoring (FCS) methods, such as GSEA-based pathway analysis,23,24 make use of the correlation between the pathway genes and the class of the samples to detect coordinated changes in the expression of genes in the same pathway. These methods use prior knowledge pathway databases in a minimal way as the pathways are considered as unstructured lists of genes. Thus, the pathway topologies as well as the knowledge about the gene interactions are not considered.22 This leads us to the third generation of methods, called pathway-topology-based algorithms which take into account the topology of the pathways. The elaboration of these methods is possible with the development of biological annotations which systematically include a description of gene interactions in the form of gene signalling networks. Finally, the methods of the fourth generation optimize the use of the topological information embedded within the pathways by applying mathematical procedures to identify sub-structures within pathways.

This journal is © The Royal Society of Chemistry 2017

Review

The aim of this work is twofold. Firstly, it completes the analysis of hypothesis-based approaches presented earlier20 and the recent review of the first two classes of data-based methods.21 We review several achievements obtained using mechanismbased methods, the approach of choice when developing hypothesis-based study. We emphasize the relationships between pathway topology, dynamics and biological behaviors. We pursue by reviewing data-driven methods of the third and fourth classes. We discuss how the inclusion of more detailed topological descriptions can contribute to improve accuracy and predictive power. Nevertheless, it is emphasized that the simplifications resulting from the representation of pathways using interactionbased methods make the definition of biologically relevant topological structures challenging.

Kinetic modeling to study the interplay between dynamical, topological and biological properties Case study: mechanism-based approach for modeling lineage specification When developing mechanism based models, one is concerned with the temporal evolution of compounds reacting together through a set of chemical reactions described using an appropriate mathematical formalism. As described in Fig. 2, assembling such models is a multi-step process. Mechanism-based models are well-adapted for describing biological mechanisms in terms of the dynamical and topological properties of the pathways. For example, differentiation can be analysed in terms of the concept of multi-stationarity,25,26 a dynamical property depending on the presence of positive feedback loops.27 On the other hand, pluripotency maintenance and dynamical phenomena such as oscillatory behaviours can be associated with negative feedback loops, a topological structure which also contributes to the maintenance of homeostasis.27,28 Robustness29 is another example of dynamical property linked to the pathway topology as robustness has been shown to increase with hierarchy.30 Here we illustrate these general concepts by describing the key points of a study previously published.31 The purpose of that study was to build a mechanism based model of a small GRN of transcription factors (TFs) regulating pluripotency and differentiation in order to provide computational evidence supporting the hypothesis that these GRNs are able to elicit different steady states depending on the kind of external perturbations received. As summarized in Fig. 3, the study was done as follows. A set of master TFs was firstly selected. They either act as markers for the pluripotency state (characterized by high levels of OCT4, SOX2 and NANOG) or indicate a differentiated state corresponding to trophectoderm (low levels of OCT4, SOX2 and NANOG and high levels of CDX2 and GATA6) or endoderm (high levels of OCT4 and GATA6) lineages. To build the GRN, ChIP-on-chip and microarray data were analyzed to identify binding affinities and whether the binding of a given TF had a positive or negative effect on gene regulation. As a result, one obtained a digraph of TFs.

Mol. BioSyst.

View Article Online

Published on 17 July 2017. Downloaded by Cornell University Library on 17/07/2017 13:04:07.

Review

Molecular BioSystems

Fig. 2 Assembling a kinetic model is a two step process. (a) Molecular species to be included must be selected. Then, a qualitative model describing how these species are known or supposed to interact together must be established. This mechanistic description can imply to distinguish between different cellular compartments. For example, this can include the membrane, the cytoplasm and the nucleus. (b) To obtain a dynamical model, the qualitative mechanistic description must be completed with information about the species concentration and kinetic rates. Various approaches can be used to that end. For a large model, computational methods are favoured although they usually require time series data for most of the variables of the system to perform correctly and as system dimensionality increases, several issues such as parameter uncertainty and identifiability must be addressed. Another point to consider is that the behaviour of the components observed in experimental data depends also on processes not necessarily included in the model. Consequently, the estimated values of the parameters should guarantee that the model is able to reproduce the dynamics experimentally observed but not that these computed values correspond to the actual rates as if they were obtained by experiment. The kind of kinetic values needed also depends on the mathematical formulation chosen to simulate the dynamics of the system.

Among the features included, there is positive feedback acting on OCT4, SOX2 and NANOG through the binding of the OCT4SOX2 and OCT4-SOX2-NANOG heterodimers. The OCT4-CDX2 heterodimers repress both OCT4 and CDX2 whereas a positive feedback loop allows CDX2 to regulate its own expression. GCNF is activated by CDX2 or GATA6 and can suppress OCT4, by binding to it as an inhibitor. To obtain a mechanism-based model, it is necessary to develop the description of the interactions between the different components. Here, as the model is a simple GRN, the stoichiometry describes either gene repression or activation. Then, the dynamics is introduced by developing a set of ODEs which are dependent on kinetic parameters. The values of these kinetic parameters were obtained by assuming a thermodynamic model of gene regulation where the transcriptional rate of a gene is proportional to the occupancy, which can be computed through computing equilibrium values of TFs which are bound to the promoters of the genes being transcribed. The results obtained with the initial version of the model have led the authors to formulate two plausible assumptions. Firstly, NANOG could bind to GATA6 as a repressor and OCT4 could activate GATA6, and secondly, the heterodimer OCT4-GATA6 could repress NANOG. Both assumptions were successfully validated with experimental results. Next, the authors confirmed several hypotheses behind our mechanistic understanding of stem cell fate decision. For example, the dynamics of the model demonstrated that many genes controlling pluripotency maintenance as well as differentiation are actually regulated in a biphasic manner with respect to OCT4 concentration levels.

Mol. BioSyst.

Simulation showed that from the stem cell state, it is possible to drive the system into either trophectoderm or endoderm by decreasing or increasing OCT4 levels. OCT4 levels are essentially controlled by the external perturbations received. This is in agreement with the fact that the external perturbations are a key factor for stem cell fate decision. To conclude, it is worth emphasizing that, although simplified, the model presented here elicits multi-stationarity as there exist different steady states (characterized by a different set of concentrations for the TFs) corresponding either to the pluripotent state or to one of the differentiated states. This property is linked to the existence of several positive feedback loops incorporated in the model. This work illustrates how biological features can be modelled and characterized using well established dynamical and topological properties using even simple mechanistic models. Kinetic modeling of the signaling pathway regulating stem cell fate decision The model presented above was simplified from various points of view. Firstly, the dynamics of the signaling pathways transmitting the external perturbations was ignored. Secondly, the protein space was also neglected and the dynamics was restricted to the transcriptomic level. Mechanism-based approaches have also been applied to elaborate more realistic and biologically relevant kinetic models.32–35 Among them, several kinetic models of signaling pathways involved in stem cell fate decision were developed. Their analysis in combination with other standard mathematical tools provided quantitative characterization of their dynamical properties.

This journal is © The Royal Society of Chemistry 2017

View Article Online

Published on 17 July 2017. Downloaded by Cornell University Library on 17/07/2017 13:04:07.

Molecular BioSystems

Review

Fig. 3 (a) For building a reduced model of TFs involved in stem cell fate decision, master regulators are identified and associated experimental data about their interactions are gathered. (b) Using these data, a first interaction-based model is assembled. (c) To build a dynamical model, regulation of gene expression is modeled using a set of ODEs made of two terms: a regulatory function taking into account the activators and inhibitors acting on the gene and a term modeling the degradation of the gene. Numerical values for the various kinetic coefficients are extracted from data or inferred using biophysical models. (d) The obtained mechanism-based model can be tested and in silico results are analyzed, compared to experimental data. The model can be used to test and validate hypothesis regarding the existing interactions between the genes.

For example, to investigate the dynamics of the TGF-beta pathway, sensitivity analysis was used for identifying the parameters whose perturbation the pathway response is most sensitive to or most robust against, and steady state analysis showed that the regulation of the signal is strongly influenced by the balance between clathrin dependent endocytosis and nonclathrin mediated endocytosis.36 The conditions under which the pathway elicits transient or sustained response as well as its different sensitivities to ligand doses at different time scales were also investigated37 as well as the dynamics of Smad2 accumulation within the nucleus upon ligand addition was analyzed.38 Here also, sensitivity analysis provides an overview of the susceptibility of the peak concentration of Smad2/ Smad4 complexes to changes in model parameters. The dynamics of the EGF receptor transduction cascade is also the subject of many modeling works. The signaling responses were shown to be stable over a 100-fold range of ligand concentration and the most important parameter in determining signal efficacy was the initial velocity of receptor activation.39 Comparisons of the signaling responses of various RTK receptors such as EGFR and FGFR were also performed.40 In another work,41 EGFR and IR pathways were combined into one model which includes

This journal is © The Royal Society of Chemistry 2017

feedback loops between the two pathways and again, the dose dependence of responses to EGF and insulin addition was studied. The effect of different concentrations of ROCK on feedback loops involved in EGF signaling was studied.42 Another model was developed to analyze how the response is controlled by the relative levels of proteins and to determine under what conditions the EGF pathway elicits a transient or sustained response.43 Furthermore, a sensitivity analysis demonstrated that although EGFR signalling appears to be robust with respect to change of many rate constants, there is a contextual dependency of the parameter sensitivity to the experimental conditions.44 Another study focusing on the MAPK cascade identified a negative feedback loop controlling the desensitization of signals, that is, the ability of the downstream signals to culminate even while receptor–ligand binding continues to increase.45 The effects of negative feedback loops on the robustness of the MAPK signalling are also of importance and it was suggested that a single negative feedback between ERK and RAF could be responsible for the robustness of MAPK.46 The relationship between negative feedback loops and the existence of oscillatory dynamics within the MAPK system is also established and properties such as robustness,

Mol. BioSyst.

View Article Online

Published on 17 July 2017. Downloaded by Cornell University Library on 17/07/2017 13:04:07.

Review

shape and width and time period of the oscillation of ERK were further analyzed.47 The nuclear factor kB (NF-kB) pathway and its associated receptor, the tumor necrosis factor (TNF), are known for regulating, through NF-kB transcription factors, cellular stress response and immune response to infections.48 The activation of these pathways induces a specific oscillatory behavior of nuclear NF-kB concentration whose properties and origins have been widely studied.49 Another model, in which short pulses of TNF were considered, predicted how negative feedback loops regulate both the resetting of the system and cellular heterogeneity.50 The results demonstrated that the oscillation frequency has a functional role as patterns of NF-kB dependent gene expression depend on the stimulation properties, a hypothesis corroborated elsewhere.51 Studies identified a delayed negative feedback loop involving the transcription of IkB, an inhibitor of NF-kB, as responsible for the oscillations in NF-kB translocation52 and this model was also used to study the relative importance of IkB degradation using sensitivity analysis.53 The apoptosis pathway has also attracted attention for showing oscillatory behaviors in response to ionizing radiation. These oscillations are thought to be controlled by a negative feedback loop involving p53 and MDM2.54,55 Another study combining numerical and analytical computations has clarified the role played by another positive feedback loop between p53 and AKT in the all-or-none switching behavior between a prosurvival state and a pro-apoptotic state. The role played by BAX and BCl-2 in the sustained oscillations of p53 concentration and the robustness of caspase-3 activity with respect to BCL2 and BAX interactions were also studied and strategies to prevent Caspase activation were suggested.56 BAX activation was also found to be involved in topological structures controlling a robust bistability switch.57 Positive feedback loops are also found in crosstalks. For example, a positive feedback loop between WNT and ERK pathways was identified as responsible for the maintenance of high activities of WNT and ERK even without a persistent extracellular signal.58 The authors of this model suggested that mutational changes in proteins can cause functional changes spreading to different pathways through crosstalks which could be involved in the mechanism of carcinogenesis. The WNT pathway is also well-known for being involved in stem cell fate decision and for that reason kinetic models have been specifically designed to study its dynamics. A model focusing on the analysis of the effects of prolonged and transient WNT stimulation on beta-catenin and AXIN turnover was firstly assembled59 and an extension of this model was found to elicit oscillatory behaviors controlled by negative feedback loops.60 Standard sensitivity analysis demonstrated that the WNT pathway is robust with respect to perturbations of parameters. The mathematical structure of this model was also studied from a more formal point of view and a dimensionality reduction allowed identification of key regulators of the pathway.61 As mentioned above, MTOR is a pathway connected to many upstream and downstream signalling cascades and as a result, its dynamics is better understood with larger models.

Mol. BioSyst.

Molecular BioSystems

For example an extended model was used to analyze the dynamical relationships between insulin regulation and gene transcription.62 A sensitivity analysis was performed to assess the robustness of the system and the effects of several large feedback loops. Other models focused on more specific aspects such as the amino-acid dependent regulation of mTORC1 activation.63 Here, it was possible to identify feedback loops that control and maintain a robust insulin response and the insulin resistant state was interpreted as the result of the distortion of the balance between positive and negative feedback. Finally, different hypotheses to explain the activation of mTORC2 via a TSC1-PI3K channel were analysed using mathematical modeling and comparison with experiments.64 To provide predictions, mechanism-based models must comply not only with a mechanistic understanding of the biological processes but also with constraints regarding the kinetic model whose parameters are assumed or inferred. Concretely, information on the concentrations of compounds and reaction rates is required. Sometimes, these values are obtained from experimental data. However, reaction rates not only depend on the compounds and tissues, but also on the physiological state of the cell. Thus, the use of values from different sources can lead to dynamical patterns which do not correspond to any observed behavior. Alternatively, reaction rates can be obtained or constrained by biochemical experiments which provide reaction times and other physico-chemical quantities. If parameters cannot be measured or if one wants to extend an existing kinetic model, parameters can be derived from the data of steady-state concentrations with fluxes used as a constraint. Other parameters can be fixed such that the results of the model are in agreement with known data. For larger models, computational approaches can be used to build constrained kinetic models. For example, for the large model of mTOR,62 an experimentally constrained population of parameters was estimated using multi-objective optimization. Model parameters were estimated, starting from an initial best fit parameter set based on data taken from the literature and the residual between model simulations and each of the experimental constraints was simultaneously minimized using the multi-objective POETs algorithm.65 The parameters can also be estimated by minimizing an objective function measuring the deviation between the experimental time courses and the computed model trajectories.44 When computational or measured time course data are available, parameters can be obtained using the parameter estimation tool SBMLPET-MPI.37,66 Although these constraint-based methods can be helpful, they require time series data for most of the variables and for that reason it can be necessary to firstly perform a dimensionality reduction of the model, for example, using a genetic approach.49 These steps explain why such theoretical efforts have had limited impact within the stem cell community, who still favor powerful qualitative tools to construct logical and formal models of pathway structures. Nevertheless, as the understanding of the molecular organization of these pathways improves, these approaches should attract more interest. For example, recent results establish quantitative relationships between measured

This journal is © The Royal Society of Chemistry 2017

View Article Online

Molecular BioSystems

Published on 17 July 2017. Downloaded by Cornell University Library on 17/07/2017 13:04:07.

mRNA levels, mRNA and protein concentrations. These relationships can be used to estimate kinetic values.67–70 More generally, quantitative modeling is already a useful tool in various areas of system biology with, for example, the elaboration of complete models of organisms,71 descriptions of subsystems of the human body such as the ones involved in metabolism72,73 and methods to simulate population dynamics.74,75

Data-driven pathway analysis: from signal transduction to transcriptomic responses In this section, algorithms of the third and fourth generation are described. A schematic representation of the main steps of these methods is depicted in Fig. 4.

Review

Third-generation methods PLAGE76 computes pairwise comparisons of pathway activity levels between two different conditions. This approach employs singular value decomposition (SVD) to identify for each pathway a specific gene, called metagene, whose expression level is proportional to the pathway activity level. The computation of the activity level involves several steps. Firstly, for each condition, a matrix containing all normalized gene expression levels for all samples of one condition is created. Secondly, the pathway activity level is obtained as a weighted sum of the standardized expression levels of the individual genes pondered by the biggest singular value of the SVD of the expression matrix. The vector of weights is given by the singular vector corresponding to the biggest singular value of the matrix of expressions. Finally, a t-statistic is computed to identify the pathways with an activity level significantly different between the two conditions. Although the combination of activity

Fig. 4 (a) Datasets are normalized and rescaled using standard procedures. As several algorithms assess the perturbation of a pathway by comparing perturbed and non-perturbed conditions, both conditions need to be provided as input. (b) The topology of the pathways is extracted from various databases and undergoes several adaptations in order to transform it into a set of directed graphs for interactions between genes known as being downstream targets of the pathways. Each pathway is defined by its topology, represented as a directed graph where nodes are genes or set of genes and edges describe interactions of different types that are classified as being either activation or inhibition of downstream nodes. Additional simplifications can be required for certain methods. (c) Before being applied for prediction, the algorithm undergoes a validation phase. This phase can include performance comparison with other methods using gold standard datasets. For the validation and prediction phases, pathway scoring is done using criteria combining topology, statistics and estimation of DEGs within each pathway and its substructures when taken into account. (d) The statistical significance of the pathways selected is assessed using various statistical approaches, most of them based on the computation of associated p-values and appropriate statistical tests.

This journal is © The Royal Society of Chemistry 2017

Mol. BioSyst.

View Article Online

Published on 17 July 2017. Downloaded by Cornell University Library on 17/07/2017 13:04:07.

Review

levels and weights specifies an optimal approximation for the matrix of expression, this definition of activity level could be too restrictive and its biological relevance is not clearly established. SPIA77 combines DEGs of two experimental conditions provided as an input with a set of predefined pathways of genes and proteins. The method computes a perturbation probability for each pathway as a product of two independent probabilities. The first one is the significance of a given pathway as provided by an over-representation analysis of the number of DE genes observed on the pathway. The second one is related to the perturbation strength measured in each pathway, that is, the probability to measure a total accumulated perturbation of the pathway greater than a given threshold. This threshold is a function of the net perturbation accumulation at the level of each gene obtained for each gene by subtracting the observed log fold-change from its perturbation factor. The perturbation factor itself is defined for each gene as a linear function of the perturbation factors of all genes in a given pathway. It is a function of the signed normalized measured expression change of the gene and the perturbation factors of its upstream genes normalized by the number of its downstream genes. In practice, the computation of the accumulated perturbation is done by inversion of the adjacency matrix associated with each pathway and thus requires pathways with non-zero determinant matrix. The combined probability is then used to rank the pathways and test whether the pathway is significantly perturbed. SPIA demonstrates significant improvements compared to an earlier similar method.78 Firstly, evidence obtained from the pathway topology is now independent from the overrepresentation evidence and false positive rates are better. Secondly, SPIA shows increased sensitivity when compared with GSEA, as well as an improved specificity and better pathway ranking when compared with ORA. Nevertheless, the perturbation factor for a given gene is still defined as a linear function of all perturbation factors of all genes in a pathway. This makes it difficult to include non-linear signalling events like negative feedback loops. Secondly, matrix inversion must be performed by the algorithm and manual interventions are required to regularize the pathway at the cost of modifying the biological relevance of the interactions. DART79 also infers the activity of pathways with respect to a perturbation using a set of DEGs. The novelty is that it includes a preliminary step to verify the coherence of the prior pathway information and topology (the prior information consisting of genes which are up-regulated or down-regulated in response to pathway activation). The evaluation of the prior information is a two-step process. Firstly, an expression correlation network of all genes in a pathway is assembled by computing Pearson correlations between each pair of genes and p-values are computed. Gene pairs that passed the p-value threshold are assigned an edge in the network. The second step is to evaluate the consistency of the network with the prior information by assigning a binary weight to the edge of the network that is compared with the weight prediction made from the prior information. An edge is consistent if the sign is the same as that of the model prediction. A consistency score for the

Mol. BioSyst.

Molecular BioSystems

observed network is obtained as the fraction of consistent edges. Once the prior pathway information is denoised, the pathway activity is inferred using two activation metrics. Both are single-gene based metrics. The first one is an average over the genes in the network that does not consider the underlying topology whereas the second one includes this information by weighting each gene by the number of neighbours. The results show that the evaluation of the prior information obtained from pathway databases significantly improves the inference of pathway activity. The most robust results are obtained when using the denoised correlation network to estimate activity with weighted average of the expression values of the genes where the weights are directly given by the degrees of the nodes in the maximally connected component of the network. However, when comparing the performances, it must be considered that DART differs from previous methods such as SPIA as it is an unsupervised method for inferring a subset of pathway genes that represent pathway activity. DART was compared with the supervised method called Condition Responsive Genes (CORG) and was found to perform similarly. The main difference observed is that CORG yielded very small gene subsets compared to the larger gene sub networks inferred when using DART. According to the authors, although small discriminatory gene sets may be advantageous from the viewpoint of experimental cost, the biological interpretation is less clear. Fourth-generation methods Clipper80 is an improvement of a previous method.81 Firstly, the dynamic perturbation of pathways is tested by statistically testing equality of concentration matrices and mean vectors between different conditions. To that end, the data of the two conditions are modelled using graphical Gaussian models with the same undirected graph G. G is obtained after transforming the network obtained from the graphite R package first into a di-acyclic graph (DAG) and then into its moral graph. In this configuration, the cliques, i.e. complete sub-graphs having all their vertices joined by an edge, can be used to identify relevant sub-paths as described below. Two tests are performed, one comparing the strength of the links between genes in the two conditions (testing whether they share the same covariance matrix) and another one testing the differential expression of the pathway (testing whether they share the same mean). Thus, to perform these tests, parameters of the model are estimated from the data. The Iterative Proportional Scaling (IPS) algorithm is used to compute the maximum-likelihood and the concentration matrix. A necessary condition for the existence of the maximum-likelihood estimate from IPS is that the number of samples must be greater than the cardinality, i.e. the number of nodes of the largest clique (a setting that is missed in the case of gene expression data), as a consequence, the covariance matrix must be computed using a shrinking procedure. To compare portions of the pathways, with the aim of identifying subgroups of genes which appear to drive differences of the entire structure, these two tests are performed on each single clique. Secondly, for each pathway, the structure of the junction tree is used as a backbone to identify the portions of the tree

This journal is © The Royal Society of Chemistry 2017

View Article Online

Published on 17 July 2017. Downloaded by Cornell University Library on 17/07/2017 13:04:07.

Molecular BioSystems

associated to the phenotype. This requires the use of paths connecting the root clique with a leaf clique. For each clique along the paths the p-value of the test on whether the two conditions share the same covariance matrix provides a weight for the clique. On each path, the portions of the path composed by consecutive meaningful cliques containing at most one nonmeaningful clique are selected, using a cut-off. Such portions define the so-called sub-paths. Finally, the relevance of the subpaths is computed and clipper results consist of a number of relevant signal paths. An important hypothesis in this method is that pathways are considered as acyclic. According to the authors, the number of pathways is quite small and as a result, self-loops are eliminated from the graph and cycles are solved removing the weakest edge of the cycle based on expression data. When tested on various types of biological problems, this method provided interesting results strongly coherent with experimental findings available in the literature. Pathiways82 combines the pathway topology with topological structures defined within each pathway. The method identifies the perturbed pathways by computing the probability of signal transmission along the pathways. Signal transmission is computed in two main steps. Firstly, the Dijkstra algorithm computes all circuits connecting a given input node to all output nodes. Signal input nodes are any nodes with no incoming interactions and signal output nodes are any nodes with outgoing interaction. Secondly, a signal transmission probability from an input to any output is derived from the probability of activation of all proteins included in the circuits. The activation of the proteins is a function of the presence of the transcript of the corresponding gene that is assumed to be either active or inactive. The activation state is inferred from the expression value across samples. Finally, the probability of signal transmission along a circuit is obtained from the probability of having the nodes connecting the input to the output in an active state and having all the nodes that are inhibitors of nodes in the pathway in an activation state compatible with the transmission of the signal. A change in pathway activity is reported when the probability of having the circuit activated during signal transmission is significantly higher in at least one of the conditions. The method was compared to ORA,16 GSEA6 and SPIA. It is worth mentioning that the behavior of SPIA is generally closer to that of GSEA, because both methods return a global pathway score and also because Pathiways tests a different aspect of the activity of a pathway than SPIA and GSEA. No incoherence was found among the significant values reported by these methods and Pathiways which finds circuits activated in pathways which are detected as significant by SPIA and GSEA. Interestingly, Pathiways demonstrates a better sensitivity because it detected activations of specific circuits in pathways which were not considered by SPIA and GSEA. The sub-SPIA algorithm21 is an improved version of SPIA. Sub-SPIA includes a sub-pathway analysis, based on the identification of the Minimal Spanning Tree (MST) of the pathways that focuses on a local region in a pathway. The methodology can be summarized as follows. Firstly, the graphite R package is used to reconstruct the gene network from the pathway maps and DEGs

This journal is © The Royal Society of Chemistry 2017

Review

are mapped on the gene network. Secondly, the sub-pathways are identified. This is done by identifying all node sets in a pathway which include closely connected signature nodes. One ends up with a connected sub-gene-network containing as few non-signature nodes as possible. The Kruskal algorithm is then used to construct the MST and suppress redundant nonsignature nodes. In that way, the MST structure overcomes the disadvantage of the k-clique method, which generally results in many overlapped sub-pathways. Furthermore, the use of MST structure provides the flexibility required for finding cross-talks when applied on connected pathway networks. Finally, the pathway significance is computed by following the same procedure as the SPIA algorithm. Compared to SPIA, sub-SPIA identifies more relevant pathways. Furthermore, subSPIA improves the identification resolution because it focuses on a local region in a pathway. When compared to Clipper and Pathiway, sub-SPIA performs more stably with respect to the type of datasets and the biological relevance of the predictions is also improved. IPANDA83 computes pathway activation scores (PAS) using a combination of two terms: the contribution of the individual genes and the contribution of gene modules. Modules are computed using a database of downstream genes controlled by human sequence-specific transcription factors that were intersected with genes extracted from COEXPRESdb that were themselves further clustered using the Euclidian distance matrix. Clusters of genes were identified using DBScan and hierarchical clustering with average linkage criteria. The contributions of modules and individual genes to PAS are functions of the logarithmic fold changes of gene expression levels between reference sample and perturbed sample modulated by a statistical term and a topological term. The statistical term is computed using a set of p-values obtained from a t-test for reference and perturbed samples. To obtain the topology term for a given gene of a pathway, all walks through that gene are computed using genes which have zero outgoing edges as final nodes. Then, the number of walks of each gene for each pathway is divided by the maximum number of walks that has been obtained from all genes for a given pathway. PAS relevance is assessed using p-values computed using weighted Fisher’s combined probability test and modulated using the statistical and topology terms defined above. To estimate the robustness of the prediction, the Common Marker Pathway (CMP) index is introduced. The larger values of the CMP index correspond to the robust PAS prediction across the datasets. When compared to other methods (SPIA, GSEA, PLAGE and DART), iPANDA showed better performance especially for the noise reduction test. This makes it suitable for noise reduction in transcriptomic data analysis.

Conclusion The incorporation of pathway maps into knowledge-based pathway analysis algorithms improves their predictive power. The overview of the current algorithms suggests some improvements.

Mol. BioSyst.

View Article Online

Published on 17 July 2017. Downloaded by Cornell University Library on 17/07/2017 13:04:07.

Review

Molecular BioSystems

Fig. 5 Hypothesis-based methods such as kinetic modelling are based on highly detailed molecular models of the biological process under study. These approaches are thus adapted to analyse in detail dynamical properties of systems of moderate size. Depending on the mathematical formulation used, well-defined dynamical properties can be characterized in terms of topological characteristics of the system. Data-driven pathway analysis methods are statistical-based methods which are suitable for analysing large sets of interactions between species and are of great help when all kinetic details are not known. They provide ranked lists of pathways undergoing strong changes between reference and perturbed conditions. Although their optimal uses are not as well defined as in the kinetic models, the topological structures present in the pathway maps can improve the accuracy of the predictions, assuming that their biological relevance is established.

Besides the denoising steps suggested to check the coherence of the pathway maps, it could be advantageous to have interactive pathway maps built in a way that the gene interactions are adaptable for various developmental stages or environmental conditions. From a more conceptual point of view, as pathways were originally defined for historical reasons, a different approach could be required. This should include the use of topological structures defined in terms of dynamical behavior rather than specific historical circumstances. Regarding the incorporation of sub-pathways and topological substructures of the pathway maps, the situation faced by data-driven pathway analysis methods differs from the studies done using kinetic models. Indeed, as summarized in Fig. 5, studies based on kinetic modeling address specific questions which only involve a subset of elements whose behavior appears to be critical with respect to well-defined functionalities of interest. This allows the elaboration of ad hoc models and the effective use of specific dynamical and topological

Mol. BioSyst.

concepts. However, data-driven algorithms are designed for investigating questions often disconnected from specific dynamical behaviour and involving multiple experimental conditions and perturbations. This makes the identification of relevant topological structures a complex process and approaches implemented use topological decompositions whose biological relevance and connections to the pathway dynamics are not clearly established. Consequently, these methods might fail to capture dynamical features that could be identified by analyzing more relevant topological features of the pathways, these features being ignored by the design of the methods themselves. Data-driven pathway analysis methods demonstrate interesting capabilities for reducing both noise and dimension of the pathways which could be used to generate hypothesis or to select subsets of pathways relevant to a specific situation. Mechanistic modeling of these pathways could be performed to answer questions related to the dynamics for which datadriven methods are not adapted.

This journal is © The Royal Society of Chemistry 2017

View Article Online

Molecular BioSystems

Conflict of interest All authors are employed by Insilico Medicine, Inc., a company involved in the development and use of data-driven algorithms for pathway perturbation analysis.

Published on 17 July 2017. Downloaded by Cornell University Library on 17/07/2017 13:04:07.

Acknowledgements The authors thank the referees for their comments which contributed to improve this work.

References 1 C. Mason and P. Dunnill, A brief definition of regenerative medicine, Regener. Med., 2008, 3(1), 1–5. 2 S. P. Atkinson, M. Lako and L. Armstrong, Potential for pharmacological manipulation of human embryonic stem cells, Br. J. Pharmacol., 2013, 169(2), 269–289. 3 Y. Avior, I. Sagi and N. Benvenisty, Pluripotent stem cells in disease modelling and drug discovery, Nat. Rev. Mol. Cell Biol., 2016, 17(3), 170–182. 4 M. Thomson, S. J. Liu, L.-N. Zou, Z. Smith, A. Meissner and S. Ramanathan, Pluripotency factors in embryonic stem cells regulate differentiation into germ layers, Cell, 2011, 145(6), 875–889. 5 E. Walker, M. Ohishi, R. E. Davey, W. Zhang, P. A. Cassar and T. S. Tanaka, et al., Prediction and testing of novel transcriptional networks regulating embryonic stem cell self-renewal and commitment, Cell Stem Cell, 2007, 1(1), 71–86. 6 D. Tantin, Oct transcription factors in development and stem cells: insights and mechanisms, Development, 2013, 140(14), 2857–2866. 7 R. Iglesias-Bartolome and J. S. Gutkind, Signaling circuitries controlling stem cell fate: to be or not to be, Curr. Opin. Cell Biol., 2011, 23(6), 716–723. 8 H.-H. Ng and M. A. Surani, The transcriptional and signalling networks of pluripotency, Nat. Cell Biol., 2011, 13(5), 490–496. 9 S. Dalton, Signaling networks in human pluripotent stem cells, Curr. Opin. Cell Biol., 2013, 25(2), 241–246. 10 J. Stelling, Mathematical models in microbial systems biology, Curr. Opin. Microbiol., 2004, 7, 513–518, DOI: 10.1016/j.mib.2004.08.004. 11 M. A. Aon and S. Cortassa, Systems Biology of the Fluxome. Processes, 2015, 3, 607–618, DOI: 10.3390/pr3030607. 12 N. Tomar and R. K. De, Comparing methods for metabolic network analysis and an application to metabolic engineering, Gene, 2013, 521, 1–14. 13 N. Tomar and R. K. De, A comprehensive view on metabolic pathway analysis methodologies, Curr. Bioinf., 2014, 9(3), 295–305. 14 R. K. De, M. Mouli Das and S. Mukhopadhyay, Incorporation of enzyme concentrations into FBA and identification of optimal metabolic pathways, BMC Syst. Biol., 2008, 2, 65, DOI: 10.1186/1752-0509-2-65.

This journal is © The Royal Society of Chemistry 2017

Review

15 R. K. De and N. Tomar, Modeling the optimal central carbon metabolic pathways under feedback inhibition using flux balance analysis, J. Bioinf. Comput. Biol., 2012, 10(6), 1250019, DOI: 10.1142/S0219720012500199. 16 Q. Bian and P. Cahan, Computational Tools for Stem Cell Biology, Trends Biotechnol., 2016, 34(12), 993–1009. 17 J. Zou, M.-W. Zheng, G. Li and Z.-G. Su, Advanced systems biology methods in drug discovery and translational biomedicine, BioMed Res. Int., 2013, 742835. 18 A.-L. Barabsi and Z. N. Oltvai, Network biology: understanding the cells functional organization, Nat. Rev. Genet., 2004, 5(2), 101–113. 19 F. J. Bruggeman and H. V. Westerhoff, The nature of systems biology, Trends Microbiol., 2007, 15(1), 45–50. 20 M. Herberg and I. Roeder, Computational modelling of embryonic stem-cell fate control, Development, 2015, 142(13), 2250–2260. 21 X. Li, L. Shen, X. Shang and W. Liu, Subpathway Analysis based on Signaling-Pathway Impact Analysis of Signaling Pathway, PLoS One, 2015, 10(7), e0132813. 22 P. Khatri, M. Sirota and A. J. Butte, Ten years of pathway analysis: current approaches and outstanding challenges, PLoS Comput Biol., 2012, 8(2), e1002375. 23 A. Subramanian, P. Tamayo, V. K. Mootha, S. Mukherjee, B. L. Ebert and M. A. Gillette, et al., Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles, Proc. Natl. Acad. Sci. U. A. S., 2005, 102(43), 15545–15550. 24 V. K. Mootha, C. M. Lindgren, K.-F. Eriksson, A. Subramanian, S. Sihag and J. Lehar, et al., PGC-1alpha responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes, Nat. Genet., 2003, 34(3), 267–273. 25 R. Thomas and M. Kaufman, Frontier diagrams: partition of phase space according to the signs of eigenvalues or sign patterns of the circuits, Int. J. Bifurcation Chaos Appl. Sci. Eng., 2005, 15(10), 3051–3074. 26 M. Kaufman, C. Soul and R. Thomas, A new necessary condition on interaction graphs for multistationarity, J. Theor. Biol., 2007, 248(4), 675–685. 27 R. Thomas and M. Kaufman, Multistationarity, the basis of cell differentiation and memory. II. Logical analysis of regulatory networks in terms of feedback circuits, Chaos, 2001, 11(1), 180–195. 28 O. Cinquin and J. Demongeot, Positive and negative feedback: striking a balance between necessary antagonists, J. Theor. Biol., 2002, 216(2), 229–241. 29 D. A. Pechenick, J. L. Payne and J. H. Moore, Phenotypic robustness and the assortativity signature of human transcription factor networks, PLoS Comput. Biol., 2014, 10(8), e1003780. 30 C. Smith, R. S. Puzio and A. Bergman, Hierarchical Network Structure Promotes Dynamical Robustness, arXiv:1412.0709v2[q-bio.PE], 2015. 31 V. Chickarmane and C. Peterson, A Computational Model for Understanding Stem Cell, Trophectoderm and Endoderm Lineage Determination, PLoS One, 2008, 3(10), e3478, DOI: 10.1371/journal.pone.0003478.

Mol. BioSyst.

View Article Online

Published on 17 July 2017. Downloaded by Cornell University Library on 17/07/2017 13:04:07.

Review

32 D. Thieffry, Dynamical roles of biological regulatory circuits, Briefings Bioinf., 2007, 8(4), 220–225. 33 D. Angeli, J. E. Ferrell and E. D. Sontag, Detection of multistability, bifurcations, and hysteresis in a large class of biological positive-feedback systems, Proc. Natl. Acad. Sci. U. A. S., 2004, 101(7), 1822–1827. 34 S. Pigolotti, S. Krishna and M. H. Jensen, Oscillation patterns in negative feedback loops, Proc. Natl. Acad. Sci. U. A. S., 2007, 104(16), 6533–6537. 35 J. Wang, L. Xu, E. Wang and S. Huang, The potential landscape of genetic circuits imposes the arrow of time in stem cell differentiation, J. Biophys., 2010, 99(1), 29–39. 36 Z. Zi and E. Klipp, Constraint-based modeling and kinetic analysis of the Smad dependent TGF-beta signaling pathway, PLoS One, 2007, 2(9), e936. 37 Z. Zi, Z. Feng, D. A. Chapnick, M. Dahl, D. Deng and E. Klipp, et al., Quantitative analysis of transient and sustained transforming growth factor-signaling dynamics, Mol. Syst. Biol., 2011, 7, 492. 38 B. Schmierer, A. L. Tournier, P. A. Bates and C. S. Hill, Mathematical modeling identifies Smad nucleocytoplasmic shuttling as a dynamic signal-interpreting system, Proc. Natl. Acad. Sci. U. A. S., 2008, 105(18), 6608–6613. 39 B. Schoeberl, C. Eichler-Jonsson, E. D. Gilles and G. Mller, Computational modeling of the dynamics of the MAP kinase cascade activated by surface and internalized EGF receptors, Nat. Biotechnol., 2002, 20(4), 370–375. 40 S. Yamada, T. Taketomi and A. Yoshimura, Model analysis of difference between EGF pathway and FGF pathway, Biochem. Biophys. Res. Commun., 2004, 314(4), 1113–1120. 41 N. Borisov, E. Aksamitiene, A. Kiyatkin, S. Legewie, J. Berkhout and T. Maiwald, et al., Systems-level interactions between insulin-EGF networks amplify mitogenic signaling, Mol. Syst. Biol., 2009, 5, 256. 42 C. Y. Ung, H. Li, X. H. Ma, J. Jia, B. W. Li and B. C. Low, et al., Simulation of the regulation of EGFR endocytosis and EGFR-ERK signaling by endophilin-mediated RhoA-EGFR crosstalk, FEBS Lett., 2008, 582(15), 2283–2290. 43 B. N. Kholodenko, O. V. Demin, G. Moehren and J. B. Hoek, Quantification of short term signaling by the epidermal growth factor receptor, J. Biol. Chem., 1999, 274(42), 30169–30181. 44 W. W. Chen, B. Schoeberl, P. J. Jasper, M. Niepel, U. B. Nielsen and D. A. Lauffenburger, et al., Input-output behavior of ErbB signaling pathways as revealed by a mass action model trained against dynamic data, Mol. Syst. Biol., 2009, 5, 239. 45 A. R. Asthagiri and D. A. Lauffenburger, A computational study of feedback effects on signal dynamics in a mitogen activated protein kinase (MAPK) pathway model, Biotechnol. Prog., 2001, 17(2), 227–239. 46 R. Fritsche-Guenther, F. Witzel, A. Sieber, R. Herr, N. Schmidt and S. Braun, et al., Strong negative feedback from Erk to Raf confers robustness to MAPK signalling, Mol. Syst. Biol., 2011, 7, 489. 47 H. Shankaran, D. L. Ippolito, W. B. Chrisler, H. Resat, N. Bollinger and L. K. Opresko, et al., Rapid and sustained

Mol. BioSyst.

Molecular BioSystems

48

49

50

51

52

53

54

55

56 57 58

59

60

61

62

63

nuclear-cytoplasmic ERK oscillations induced by epidermal growth factor, Mol. Syst. Biol., 2009, 5, 332. K.-H. Cho, S.-Y. Shin, H.-W. Lee and O. Wolkenhauer, Investigations into the analysis and modeling of the TNF alpha-mediated NF-kappa B-signaling pathway, Genome Res., 2003, 13(11), 2413–2422. R. Cheong, A. Hoffmann and A. Levchenko, Understanding NFkappaB signaling via mathematical modeling, Mol. Syst. Biol., 2008, 4, 192. L. Ashall, C. A. Horton, D. E. Nelson, P. Paszek, C. V. Harper and K. Sillitoe, et al., Pulsatile stimulation determines timing and specificity of NF-kappaB-dependent transcription, Science, 2009, 324(5924), 242–246. Z. Wang, D. A. Potoyan and P. G. Wolynes, Molecular stripping, targets and decoys as modulators of oscillations in the NF-kB/IkBa/DNA genetic network, J. R. Soc., Interface, 2016, 13(122), DOI: 10.1098/rsif.2016.0606. D. E. Nelson, A. E. C. Ihekwaba, M. Elliott, J. R. Johnson, C. A. Gibney and B. E. Foreman, et al., Oscillations in NFkappaB signaling control the dynamics of gene expression, Science, 2004, 306(5696), 704–708. E. L. ODea, D. Barken, R. Q. Peralta, K. T. Tran, S. L. Werner and J. D. Kearns, et al., A homeostatic model of IkappaB metabolism to control constitutive NF-kappaB activity, Mol. Syst. Biol., 2007, 3, 111. K. B. Wee, U. Surana and B. D. Aguda, Oscillations of the p53-Akt network: implications on cell survival and death, PLoS One, 2009, 4(2), e4407. D. Bhatt, Z. Oltvai and I. Bahar, Stochastic modeling of p53 regulated apoptosis upon radiation damage, arXiv:1109.0743v1 [physics.bio-ph], 2011. J. Varner, M. Fussenegger and J. E. Bailey, Nat. Biotechnol., 2000, 18, 768–774, DOI: 10.1038/77589. T. Sun, X. Lin, Y. Wei, Y. Xu and P. Shen, Evaluating bistability of Bax activation switch, FEBS Lett., 2010, 584(5), 954–960. D. Kim, O. Rath, W. Kolch and K.-H. Cho, A hidden oncogenic positive feedback loop caused by crosstalk between Wnt and ERK pathways, Oncogene, 2007, 26(31), 4571–4579. E. Lee, A. Salic, R. Krger, R. Heinrich and M. W. Kirschner, The roles of APC and Axin derived from experimental and theoretical analysis of the Wnt pathway, PLoS Biol., 2003, 1(1), E10. C. Wawra, M. Khl and H. A. Kestler, Extended analyses of the Wnt/beta-catenin pathway: robustness and oscillatory behaviour, FEBS Lett., 2007, 581(21), 4043–4048. G. R. Mirams, H. M. Byrne and J. R. King, A multiple timescale analysis of a mathematical model of the Wnt/ betacatenin signalling pathway, J. Math. Biol., 2010, 60(1), 131–160. J. Lequieu, A. Chakrabarti, S. Nayak and J. D. Varner, Computational modeling and analysis of insulin induced eukaryotic translation initiation, PLoS Comput. Biol., 2011, 7(11), e1002263. P. K. U. Vinod and K. V. Venkatesh, Quantification of the effect of amino acids on an integrated mTOR and insulin signaling pathway, Mol. BioSyst., 2009, 5(10), 1163–1173.

This journal is © The Royal Society of Chemistry 2017

View Article Online

Published on 17 July 2017. Downloaded by Cornell University Library on 17/07/2017 13:04:07.

Molecular BioSystems

64 P. Dalle Pezze, A. G. Sonntag, A. Thien, M. T. Prentzell, M. Gdel and S. Fischer, et al., A dynamic network model of mTOR signaling reveals TSC-independent mTORC2 regulation, Sci. Signaling, 2012, 5(217), ra25. 65 S. O. Song, A. Chakrabarti and J. D. Varner, Ensembles of signal transduction models using Pareto Optimal Ensemble Techniques (POETs), J. Biotechnol., 2010, 5(7), 768–780. 66 Z. Zi, SBML-PET-MPI: a parallel parameter estimation tool for Systems Biology Markup Language based models, Bioinformatics, 2011, 27(7), 1028–1029. 67 R. Jansen, D. Greenbaum and M. Gerstein, Relating wholegenome expression data with protein-protein interactions, Genome Res., 2002, 12(1), 37–46. 68 B. Schwanhusser, D. Busse, N. Li, G. Dittmar, J. Schuchhardt and J. Wolf, et al., Global quantification of mammalian gene expression control, Nature, 2011, 473(7347), 337–342. 69 J. J. Li, P. J. Bickel and M. D. Biggin, System wide analyses have underestimated protein abundances and the importance of transcription in mammals, PeerJ, 2014, 2, e270. 70 C. Vogel and E. M. Marcotte, Insights into the regulation of protein abundance from proteomic and transcriptomic analyses, Nat. Rev. Genet., 2012, 13(4), 227–232. 71 I. Thiele, N. Swainston, R. M. T. Fleming, A. Hoppe, S. Sahoo and M. K. Aurich, et al., A community-driven global reconstruction of human metabolism, Nat. Biotechnol., 2013, 31(5), 419–425. 72 J. R. Karr, J. C. Sanghvi, D. N. Macklin, M. V. Gutschow, J. M. Jacobs and B. Bolival Jr, et al., A whole-cell computational model predicts phenotype from genotype, Cell, 2012, 150(2), 389–401. 73 N. C. Duarte, S. A. Becker, N. Jamshidi, I. Thiele, M. L. Mo and T. D. Vo, et al., Global reconstruction of the human metabolic network based on genomic and bibliomic data, Proc. Natl. Acad. Sci. U. S. A., 2007, 104(6), 1777–1782.

This journal is © The Royal Society of Chemistry 2017

Review

74 L. You, R. S. Cox rd, R. Weiss and F. H. Arnold, Programmed population control by cell-cell communication and regulated killing, Nature, 2004, 428(6985), 868–871. 75 D. A. Charlebois and M. Krn, An Accelerated Method for Simulating Population Dynamics, Comput. Phys. Commun., 2013, 14(02), 461–476. 76 J. Tomfohr, J. Lu and T. B. Kepler, BMC Bioinf., 2005, 9, 225. Available from: http://bmcbioinformatics.biomedcentral. com/articles/10. 1186/1471-2105-6-225. 77 A. L. Tarca, S. Draghici, P. Khatri, S. S. Hassan, P. Mittal and J.-S. Kim, et al., A novel signaling pathway impact analysis, Bioinformatics, 2009, 25(1), 75–82. 78 S. Draghici, P. Khatri, A. L. Tarca, K. Amin, A. Done and C. Voichita, et al., A systems biology approach for pathway level analysis, Genome Res., 2007, 17(10), 1537–1545. 79 Y. Jiao, K. Lawler, G. S. Patel, A. Purushotham, A. F. Jones and A. Grigoriadis, et al., DART: Denoising Algorithm based on Relevance network Topology improves molecular pathway activity inference, BMC Bioinf., 2011, 12, 403. 80 P. Martini, G. Sales, M. S. Massa, M. Chiogna and C. Romualdi, Along signal paths: an empirical gene set approach exploiting pathway topology, Nucleic Acids Res., 2013, 41(1), e19. 81 M. S. Massa, M. Chiogna and C. Romualdi, Gene set analysis exploiting the topology of a pathway, BMC Syst. Biol., 2010, 4, 121. 82 P. Sebastian-Leon, E. Vidal, P. Minguez, A. Conesa, S. Tarazona and A. Amadoz, et al., Understanding disease mechanisms with models of signaling pathway activities, BMC Syst. Biol., 2014, 8, 121. 83 I. V. Ozerov, K. V. Lezhnina, E. Izumchenko, A. V. Artemov, S. Medintsev and Q. Vanhaelen, et al., In silico Pathway Activation Network Decomposition Analysis (iPANDA) as a method for biomarker development, Nat. Commun., 2016, 7, 13427.

Mol. BioSyst.