Using Fuzzy Logic and a Hybrid Genetic Algorithm ... - Semantic Scholar

6 downloads 0 Views 172KB Size Report
Using Fuzzy Logic and a Hybrid Genetic Algorithm for Metabolic. Modeling. John Yen and Bogju Lee. Center for Fuzzy Logic and. Intelligent Systems Research.
Using Fuzzy Logic and a Hybrid Genetic Algorithm for Metabolic Modeling John Yen and Bogju Lee Center for Fuzzy Logic and Intelligent Systems Research Department of Computer Science Texas A&M University College Station, TX77843-3112

James C. Liao Department of Chemical Engineering Texas A&M University College Station, TX77843-3122

Abstract The identi cation of metabolic systems such as metabolic pathways, enzyme actions, and gene regulations is a complex task due to the complexity of the system and limited knowledge about the model. Mathematical equations and ODE's have been used to capture the structure of the model, and the conventional optimization techniques have been used to identify the parameters of the model. In general, however, a pure mathematical formulation of the model is dicult due to parametric uncertainty and incomplete knowledge of mechanisms. In this paper, we propose a modeling approach that (1) uses fuzzy rule-based model to augment algebraic enzyme models that are incomplete, and (2) uses a hybrid genetic algorithm (GA) to identify uncertain parameters in the model.

1. Introduction Very often, chemical reactions happen as a series of steps instead of as a single basic action. Therefore, a chemical research problem has been to capture or describe the series of steps called pathway of a chemical reaction. Figure 1 shows the pathway of glucose metabolic model. Each node describes a metabolite participating in the pathway, while each reaction is shown in the pathway as an arrow, which is labeled by the variable V denoting the rate of the reaction. Several attempts have been reported to simulate or predict system behavior based on individual component models. For example, enzyme kinetic equations have been derived and assembled to model metabolic pathways [1, 5]. As a consequence of the fast progress of molecular biology, mechanisms at the molecular level are rea-

GLU PYR

DHAP

Vpts G6P PEP Vpgi F6P Vpfk FDP Vald Vgap GAP Vtpi

P13G

Vpgk P3G

Vpgm P2G Veno PEP

Vglt CIT Vaco

OAA Vmdh MALARATE

PYR ACCOA

Vpyk

Vace

ISOC Vicd

Vfum

α−KETO Vakg

FUMARATE Vsucdh SUCCINATE

SUCCOA Vsucd

Figure 1. Pathway of glucose metabolic model

sonably well established. These molecular mechanisms are combined to explain system behavior, most often in an intuitive manner. This intuitive approach has been successful to the extent of rst approximation, but has rapidly become unsatisfactory as one demands a detailed explanation of system behavior. Furthermore, when an explanation based on intuitive synthesis of molecular mechanisms fails, it is dicult to determine whether the observation is a manifestation of novel molecular mechanism or is a complex interaction of known mechanisms. In general, complete mechanistic models are rare because of parametric uncertainty and incomplete knowledge of mechanisms. We thus have to rely on descriptive and qualitative information to model the aspects of enzyme reactions that are not characterized mechanis-

tically. Fuzzy logic-based modeling, which allows the integration of mechanistic models with descriptive concepts appears to be ideal for our purpose here. It has been demonstrated that fuzzy modeling can be used to model complex systems that are not well understood [7, 6]. In this paper, we focus on modeling component level behavior. Consequently, we propose a modeling approach that (1) uses fuzzy rule-based model to augment algebraic enzyme models that are incomplete, and (2) uses a hybrid GA to identify uncertain parameters in the model. We have applied this approach to modeling the rate of two enzyme reactions in E. coli central metabolism. A hybrid genetic algorithm (GA) [8] that integrates the GA and the simplex method is used to identify the parameters. The hybrid GA turned out to speed up the GA's convergence rate while avoiding being entrapped at a local optimum.

2. Background 2.1. Fuzzy Logic-based Modeling

It has been demonstrated that fuzzy modeling can be used to model complex systems that are not well understood [7, 6]. The main contribution of fuzzy logic to system modeling is to introduce a new paradigm of modeling through three fundamental concepts that are closely related: fuzzy partition, fuzzy rules and interpolative reasoning. A fuzzy partition divides an input space to partially overlapping regions using fuzzy sets. Each subregion is associated with a local model for the region through a fuzzy rule. In areas where subregions partially overlap, the corresponding local models are combined to form a global model through a process (called interpolative reasoning or fuzzy inference) that is analogous to linear interpolation. A fuzzy partition generalizes classical partitions and divides a space into a collection of disjoint subspaces to allow smooth transitions from one subspace into a neighboring one. This is accomplished using fuzzy sets, which were developed by Lofti A. Zadeh to allow objects to take partial membership in a vague concept (i.e., a concept without sharp boundaries) [9]. The degree to which an object belongs to a fuzzy set, which is a real number between 0 and 1, is called the membership value in the set. The meaning of a fuzzy set is thus characterized by a membership function that maps elements in a universe of discourse (i.e., the domain of interest) to their corresponding membership values. Figure 2 shows the membership functions of the fuzzy sets for modeling enzyme PPC. CoA represents acetyl-coA, and  denotes the membership value

µ

LOW

µ

HIGH

1

1

0

0.4 µ 1

0

6

0

CoA

VERY-LOW LOW

LOW

MEDIUM

16

HIGH

2.0

FDP

HIGH

75

100

α

Figure 2. Fuzzy sets for Vppc modeling

in the fuzzy sets. Based on fuzzy set theory, fuzzy logic generalizes modus ponens in classical logic to allow a conclusion to be drawn from a fuzzy if-then rule even when the rule's condition is partially satis ed [10]. The strength of the conclusion is calculated based on the degree to which the antecedent is satis ed by the input data. Conclusions from multiple fuzzy rules are then combined to form a global conclusion. This is the essence of the interpolative reasoning. There are two kinds of fuzzy rule. The rst kind of fuzzy model, referred to as the Sugeno-Takagi-Kang model in the literature, uses a linear equation to describe a rule's local model. An example of this type of rule is shown below for a system with two input variables (x, y) and one output variable (z ): If x is A and y is B then z = a0 + a1 x + a2 y where A and B denote fuzzy sets and a0, a1 and a2 denote constants. Let wi denotes the degree the input to the model matches the condition of the i-th rule, and yi denotes the conclusion of the i-th rule. The formula below combines the conclusion of all rules in a SugenoTakagi-Kang model through interpolative reasoning:

y=

X wiyi= X wi i

i

The second type of fuzzy rule maps a fuzzy subregion to a fuzzy conclusion as shown below: If x is A and y is B then z is C The interpolative reasoning process for this kind of rule is analogous to that of Sugeno-Takagi-Kang fuzzy model. Degree of matching in the premise of a rule is propagated to the consequent to form an inferred fuzzy

subsets. These fuzzy subsets are combined and defuzzi ed if necessary. Both types of fuzzy rule is used in the proposed modeling approach. Compared to other approximation technique (e.g., piecewise linear approximation, spline, etc.), a fuzzy model is simpler to develop, easier to understand, and more exible in providing a smooth approximation to a complex nonlinear relationship.

V

V

Vm

Vm

Km

(a)

X

Km

V

V

Vm

Vm

(b)

X

2.2. Genetic Algorithms

Genetic algorithms are global search and optimization techniques modeled from natural genetics, exploring search space by incorporating a set of candidate solutions in parallel [4]. A genetic algorithm (GA) maintains a population of candidate solutions where each solution is usually coded as a binary string called a chromosome. A chromosome { also referred to as a genotype { encodes a parameter set (i.e., a candidate solution) for a set of variables being optimized. Each encoded parameter in a chromosome is called a gene. A decoded parameter set is called a phenotype. A set of chromosomes forms a population, which is evaluated and ranked by a tness evaluation function. The evolution from one generation to the next one involves mainly three steps. First, the current population is evaluated using the tness evaluation function, then ranked based on their tness values. Second, GA stochastically select \parents" from the current population with a bias that better chromosomes are more likely to be selected. This is accomplished using a selection probability that is determined by the tness value or the ranking of a chromosome. Third, the GA reproduces \children" from selected \parents" using two genetic operations: crossover and mutation. This cycle of evaluation, selection, and reproduction terminates when an acceptable solution is found, when a convergence criterion is met, or when a predetermined limit on the number of iterations is reached. The GA has been shown to be an e ective search techniques on a wide range of dicult optimization problems [2, 4]. The randomness and parallelism of GA often enable it to nd a global optimum without being trapped in a local optimum. However, the computational cost of a GA to nd a global optimum is typically very high. That is, it usually requires a large number of generations before it converges to an acceptable solution. This issue is especially important for applying a GA to the parameter identi cation of metabolic and physiological systems due to the high computational cost of the tness evaluation function. To reduce the computational cost of GA-based

Km

(c)

X

Km

(d)

X

Figure 3. Four typical behaviors in chemical reactions

approaches to the identi cation of parameters for metabolic systems, we have developed a hybrid approach that integrates the GA and the simplex method to speed up the rate of convergence while avoiding being easily entrapped at a local optimum [8].

3. Metabolic Modeling 3.1. Mathematical Modeling of Enzyme Kinetics

The typical behaviors of chemical reactions can often be characterized by one of the four curves in Figure 3: (a) hyperbolic (b) sigmoidal (c) and (d) inhibitions. In metabolic modeling, the horizontal lines represent the amount of metabolites (X ) while the vertical lines represents the rate of reaction (V ). Hyperbolic behavior represents that the reaction dramatically increases at a small amount of a metabolite but saturates at some point of metabolite. In sigmoidal behavior, the reaction increases at a low speed for low concentration of X , but increases rapidly at some middle region, and eventually saturates as in hyperbolic behavior. The inhibition behavior (c) shows hyperbolic shape in the initial metabolite region, but the reaction starts to decrease at some point due to an inhibition e ect of X on the reaction. In the inhibition behavior (d), the inhibition e ect is signi cant even for low concentration. These behaviors are captured by the following mathematical equations where Vm represents the saturation point in the reaction and Km represents the amount of the metabolite that results in a reaction rate that is half of the saturation point.

µ CoA=0.4,FDP=2.0 CoA=0.4 FDP=2.0 None

1.2

1

LOW

µ

HIGH

1

1

LOW

HIGH

Vppc

0.8

0

0.6

0.4

0

CoA

2.0

FDP

Figure 5. Fuzzy sets for Vppc modeling

0.4

0.2

0 0

2

4

6

8

10

PEP

70

FDP=0.1 FDP=0.0

60

Figure 4. PPC reaction Vpyk

50

V V V V

= Vm X=(Km + X ) = Vm X n =(Kmn + X n ) = Vm X=(Km + X + (X=Ka )n ) = Vm Xm =(Km + X )

(a) (b) (c) (d)

3.2. Integrating Fuzzy Logic with Mechanistic Modeling

The kinetics of enzyme reactions have been studied extensively and they are modeled mechanistically if the mechanisms are available. For enzymes with incomplete mechanisms, fuzzy models are incorporated to mend the de ciency of the incomplete mechanistic model. We describe two types of fuzzy logic application in augmenting models based on algebraic equations.

Type 1: Converting a Constant to a Contextdependent Parameter

Fuzzy logic can be used to convert a constant in a mathematical model into a \context-dependent parameter". An example of this type is the PPC reaction (Figure 4) in which the following observations are made: (1) Without any activator, the reaction proceeds at a very low rate; (2) Acetyl-CoA is a very powerful activator; (3) FDP exhibits no activation alone; (4) FDP produced a strong synertistic activation with acetyl-CoA. This reaction shows the hyperbolic shape no matter what activator is used. However, the saturation point Vm depends on the activators. Therefore, the reaction is modeled with the following mechanistic equation modi ed by a fuzzy logic factor ( ) which is determined by a set of fuzzy if-then rules.

Vppc = Vmax K PEP m + PEP The fuzzy factor is modeled by the following four fuzzy rules:

40 30 20 10 0 0

2

4

6

8

10

PEP

Figure 6. PYK reaction

If CoA is LOW and FDP is LOW then = c1 If CoA is LOW and FDP is HIGH then = c2 If CoA is HIGH and FDP is LOW then = c3 If CoA is HIGH and FDP is HIGH then = c4 where c1 , c2 , c3 , and c4 are constants in the consequent of Takagi-Sugeno fuzzy model. The membership functions of these fuzzy sets are shown in Figure 5, which were constructed based on experimental data in Figure 4. Therefore, there are totally six parameters to be identi ed (Vm and Km in the mathematical model, and c1 , c2 , c3, c4 in the fuzzy model). As mentioned, we use a hybrid GA to identify all six parameters. The GA's chromosome consists of six genes, each gene representing a parameter. Since each chromosome is associated with a model, given a guessed chromosome, the tness of a chromosome is the error between the model's outputs and the training data (chemical experimental data). Of course, the lower the tness, the better the chromosome.

Type 2: Changing Qualitative Behavior

Fuzzy logic can also be used to capture a model whose qualitative behavior depends on certain functions. An example of this type is PYK reaction which is activated by FDP (Figure 6). With low (or no) FDP, the reaction shows sigmoidal shape while with high FDP, the reaction shows hyperbolic shape. They are di erent not only in Vm and Km values but also in qualitative nature of the relationship between reaction rate and concentration. This reaction can be modeled with the following mathematical equation with an exponent n that is determined by fuzzy if-then rules. Be-

µ

LOW

HIGH

1.6 Data(CoA=0.4,FDP=2.0) Data(CoA=0.4) Data(FDP=2.0) Data(CoA=FDP=0) Prediction(CoA=0.4,FDP=2.0) Prediction(CoA=0.4) Prediction(FDP=2.0) Prediction(CoA=FDP=0)

1.4 FDP

sides the exponent n (shape), Vm and Km should also be determined by two other sets of fuzzy rules. n Vpyk = KVm+PEP m PEP n If FDP is LOW then n = n1 If FDP is HIGH then n = n2 If FDP is LOW then Vm = V1 If FDP is HIGH then Vm = V2 If FDP is LOW then Km = K1 If FDP is HIGH then Km = K2 The membership functions of LOW and HIGH fuzzy sets for the FDP variable are shown in Figure 7. These six parameters (n1 , n2 , V1, V2 , K1 , and K2 ) are again identi ed using the hybrid GA.

4 Results We applied the hybrid genetic algorithm to identify the parameters in the proposed model. The tness of a candidate parameter set is the root means square error between the real experimental data reported in the literature and the candidate model by the GA. Figure 9 plots the tness versus trials for modeling the reaction rate Vppc . The behavior of the identi ed model is shown in Figure 8. The gure shows a good t between dots representing real experimental data and the lines representing the prediction of the model identi ed. Similarly, the behavior of the identi ed model for Vpyk and corresponding experimental data is shown in the gures 10, 11, and 12.

5. Summary In this paper, we have proposed a novel methodology to integrate fuzzy logic techniques with mathematical modeling method to deal with incomplete knowledge about the process being modeled. We have applied the technique to model the component level structures of metabolic systems. We also use a hybrid genetic algorithm to identify the key parameters of the model. The strategy here allows one to easily incorporate incomplete information and qualitative description into a mathematical formulation of the model.

Vppc

1 0.8 0.6 0.4 0.2 0 0

2

4

6

8

10

PEP

Figure 8. Data and model prediction for the reaction rate of PPC with activators

11 Pure GA Our Hybrid GA

10 9 8 7

Fitness

Figure 7. Fuzzy sets for FDP in PYK reaction

1.2

6 5 4 3 2 1 0 0

500

1000 1500 2000 2500 3000 3500 4000 4500 5000 Trials

Figure 9. Performance of the hybrid GA on modeling Vppc

ATP=CoA=0 80

Data(FDP=1.0) Data(FDP=0) Prediction(FDP=1.0) Prediction(FDP=0)

70 60

Vpyk

0.1

50 40 30 20 10 0 0

2

4

6

8

10

PEP

Figure 10. Data and model prediction for FDP activation in PYK reaction

One of the most important issues remained to be addressed in our future research is to develop a scalable approach for dealing with the large search space at the system level, for the number of system parameters that may need to be adjusted to t experimental data are typically very large. We are currently developing a supervisory architecture for dynamically selecting parameters to be optimized based on heuristics, insights about the model, and sensitivity analysis.

ATP=CoA=0 100 Data(PEP=2.0) Data(PEP=0.4) Data(PEP=0.1) Prediction(PEP=2.0) Prediction(PEP=0.4) Prediction(PEP=0.1)

80

Acknowledgements

This research is currently supported by NSF Award BES-9511737 and was partially supported by NSF Young Investigator Awards IRI-9257293 and BCS9257351.

Vpyk

60

40

20

0 0

0.2

0.4

0.6

0.8

1

FDP

Figure 11. Data and model prediction for PEP activation in PYK reaction

FDP=1.0 80 Data(ATP=CoA=0) Data(ATP=2.0) Data(CoA=2.0) Data(ATP=2.0,CoA=2.0) Prediction(ATP=CoA=0) Prediction(ATP=2.0) Prediction(CoA=2.0) Prediction(ATP=2.0,CoA=2.0)

70 60

Vpyk

50 40 30 20 10 0 0

0.02

0.04

0.06 PEP

0.08

0.1

0.12

Figure 12. Data and model prediction for ATP and CoA inhibition in PYK reaction

References [1] M. J. Achs and D. Gar nkel. Computer simulation of rat heart metabolism after adding glucose to the perfusate. Am. J. Physiol., 232:175{184, 1977. [2] K. A. Dejong. Analysis of the behavior of a class of genetic adaptive systems. PhD thesis, Department of Computer and Communication Sciences, University of Michigan, 1975. [3] D. E. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Publishing Company, Inc., 1989. [4] J. H. Holland. Adaptation in Natural and Arti cial Systems. Ann Arbor, MI: University of Michigan Press, 1975. [5] J. C. Liao, E. N. Lightfoot, S. O. Jolly, and G. K. Jacobson. Application of characteristic reaction paths: Rate-limiting capacity of phosphofructokinase in yeast fermentation. Biotech. Bioeng., 31:855{868, 1988. [6] M. Sugeno and G. T. Kang. Structure identi cation of fuzzy model. Fuzzy Sets and Systems, 28:315{334, 1988. [7] T. Takagi and M. Sugeno. Fuzzy identi cation of systems and its applications to modeling and control. IEEE Transactions on Systems, Man, and Cybernetics, 15(1):116{132, 1985. [8] J. Yen, J. C. Liao, D. Randolph, and B. Lee. A hybrid approach to modeling metabolic systems using genetic algorithm and simplex method. In Proceedings of the 11th IEEE Conference on Arti cial Intelligence for Applications (CAIA95), pages 277{283, Los Angeles, CA, Feburary 1995. [9] L. A. Zadeh. Fuzzy sets. Information Control, 8:338{ 353, 1965. [10] L. A. Zadeh. Outline of a new approach to the analysis of complex systems and decision processes. IEEE Transactions on Systems, Man, and Cybernetics, 3:28{44, 1973.