A hybrid of multiobjective Evolutionary Algorithm and HMM-Fuzzy ...

3 downloads 224 Views 718KB Size Report
Nov 11, 2011 - Telstra Corporation Ltd. 4 Jan 2000. 1 Mar 2005. 2 Mar 2005. 3 Aug 2007. International Concert Attractions Ltd. 22 Jan 2001. 20 Mar 2003.
Neurocomputing 81 (2012) 1–11

Contents lists available at SciVerse ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

A hybrid of multiobjective Evolutionary Algorithm and HMM-Fuzzy model for time series prediction Md. Rafiul Hassan a,n, Baikunth Nath b, Michael Kirley b, Joarder Kamruzzaman c a

Department of Information and Computer Science, King Fahd University of Petroleum and Minerals, Dhahran 31261, Saudi Arabia Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia c Gippsland School of IT, Monash University, Churchill, VIC 3842, Australia b

a r t i c l e i n f o

abstract

Article history: Received 14 April 2011 Received in revised form 27 July 2011 Accepted 7 September 2011 Communicated by W.S. Hong Available online 11 November 2011

In this paper, we introduce a new hybrid of Hidden Markov Model (HMM), Fuzzy Logic and multiobjective Evolutionary Algorithm (EA) for building a fuzzy model to predict non-linear time series data. In this hybrid approach, the HMM’s log-likelihood score for each data pattern is used to rank the data and fuzzy rules are generated using the ranked data. We use multiobjective EA to find a range of tradeoff solutions between the number of fuzzy rules and the prediction accuracy. The model is tested on a number of benchmark and more recent financial time series data. The experimental results clearly demonstrate that our model is able to generate a reduced number of fuzzy rules with similar (and in some cases better) performance compared with typical data driven fuzzy models reported in the literature. & 2011 Elsevier B.V. All rights reserved.

Keywords: Fuzzy logic Hidden Markov model Time series Prediction methods

1. Introduction The inherent volatility and non-linearity present in many time series (particularly financial time series) makes the task of modeling data complex and difficult. Fuzzy modeling [1] is a well-known soft computing paradigm, which has been used for analyzing and modeling non-linear time series. The structure of the fuzzy model may be generated in two ways. Expert knowledge is incorporated into the model if it is available, or the data-driven approach where rules are generated from the available numeric data. The generation of appropriate fuzzy rules from the available data is often challenging. This may be attributed to the fact that even for small number of data features, it is possible to generate a large number of rules that map a set of input–output relations [2]. There exists a number of methods [3–5] that attempted to reduce the complexity of the fuzzy model by using an optimal number of fuzzy rules. A common characteristic of such methods, is to divide the input space either by clustering or by using the grid-partitioning concept [6]. In clustering, the number of possible clusters in the dataset must be known a priori, while in the grid partitioning, for an n-dimensional dataset the number of possible rules extracted is nn. Moreover, when generating rules using clustering, typically Euclidean distance is used to compute the distance between data patterns. However, Euclidean distance is not a suitable method to differentiate time series data

n

Corresponding author. Tel.: þ96638607330; fax: þ 96638602174. E-mail address: [email protected] (Md. Rafiul Hassan).

0925-2312/$ - see front matter & 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.neucom.2011.09.012

patterns consisting of linear drift [7]. For example, the two time series data vectors D1: /0 1 2 3 4 5 6 7 8S and D2: /5 6 7 8 9 10 11 12 13S (as shown in bold in Fig. 1) have similar trends, although they are dissimilar in terms of Euclidean distance. For a time series application, since these two data vectors exhibit similar pattern they should belong to the same rule. Therefore, a suitable technique is required to identify the similar trend in the patterns. HMM can be applied to find such similarities in the patterns [8]. Recently we developed HMM-Fuzzy model [9,10] which is a data driven fuzzy rule generation technique where the HMM’s data pattern identification method is used to rank the data vectors. The ranked data vectors are grouped based on the respective ranking and fuzzy rule is generated for each group. HMM facilitates through modeling the system, using available training data, and thereby provides relatively high probability to the representative data vectors compared with less representative data vectors [9]. This motivates to use HMM for ranking and grouping the data patterns accordingly. In HMM-Fuzzy model, rules are generated incrementally through monitoring the performance of the model on the training dataset. However, the performance of the model is not guaranteed to be converged with increasing the number of fuzzy rules. Furthermore, the generated fuzzy model might be over fitted. Hence, a suitable criterion to stop generating fuzzy rules is required. Example 1.1. For the Mackey–Glass time series dataset, the error on the training dataset decreases with increasing total number of fuzzy rules (fuzzy rules are generated using HMM-Fuzzy approach) up to five rules (as shown in Fig. 2) and then starts

2

Md. Rafiul Hassan et al. / Neurocomputing 81 (2012) 1–11

Fig. 1. Two similar data patterns with different Euclidean distance (ED). Here ED between D1 and D2 is 15.

Fig. 2. The error plot (training dataset) for Mackey–Glass time series dataset while generating fuzzy rules using HMM-Fuzzy model [38,9].

increasing and again after the rule 6 the model converges to a minimum error (very close to a mean square error (MSE) 0) at rule 11. The error increases dramatically when the total fuzzy rule is increased from 16 to 17. The above example highlights the trade-off that exists between the total number of fuzzy rules to be used in a model and the performance of the model. In this paper we propose a hybrid of Evolutionary Algorithm (EA) and HMM-Fuzzy model where we employ the multiobjective EA to stop generating too many fuzzy rules in HMM-Fuzzy model. The multiobjective Evolutionary Algorithm (EA) finds a range of compromise solutions between the suitable number of rules and prediction accuracy, while the log-likelihood value generated by the HMM for each data pattern is used to rank and group the data vectors. This approach also overcomes the ‘‘over-fitting problem’’ associated with keeping a large number of fuzzy rules. The remainder of the paper is organized as follows. Section 2 briefly reviews fuzzy rule generation approaches, HMM and existing literature. In Section 3, we describe our hybrid of multiobjective EA

along with the HMM-Fuzzy approach that generates an appropriate and compact data driven fuzzy model. Section 4 presents experimental results using benchmark datasets and recent financial time series data. Finally, we conclude the paper with a discussion of the model and its implication in Section 5. 2. Background and related work In this section we formally introduce fuzzy rule, fuzzy inference system in brief and then review existing studies on efficient generation of fuzzy rules. Since, in our proposed approach we use HMM to rank the data vectors, we present HMM in brief at the end this section. 2.1. Fuzzy rule and fuzzy inference system 2.1.1. Fuzzy rule A fuzzy rule is a linguistic representation of the relationship between the set of independent features and the set of dependent features.

Md. Rafiul Hassan et al. / Neurocomputing 81 (2012) 1–11

3

If x1 is m1 and x2 is m2 then output is y. This rule maps the inputs x1 and x2 to output y. Here m1 and m2 represent the degree of involvement of the respective input in the rule. These membership functions are defined as

objective function value, and equal or superior with respect to the other objective function values. Detail on EA applied to multiobjective optimization can be found in [14].

m1 : x1 -½0; 1

2.3. Data driven fuzzy model

and

When building a fuzzy model to solve a problem, the generation of fuzzy rules is a challenging task. There are two ways of generating the fuzzy rules: (1) by using the available expert knowledge about the solution space or (2) by analyzing the available dataset—known as the data driven fuzzy model. A number of methods have been proposed and developed to generate fuzzy rules from dataset. We briefly discuss these methods below. The first method considered is known as clustering based rule generation. Here the number of clusters must be identified prior to partitioning the dataset. However, choosing the number of clusters is difficult without knowing the structure of the dataset. The performance of the fuzzy model depends on the appropriate choice of the number of clusters [2], because each cluster of data is used to generate a fuzzy rule. Popular clustering techniques used for generating fuzzy rules are the fuzzy c-means algorithm [15] and the subtractive clustering algorithm [16]. A second technique commonly used to generate fuzzy rules is the grid partitioning technique [6]. In this method, each input feature is used to generate independent membership function (MF). Consequently, this process suffers from exponential rule explosion with an increasing the number of input features. Moreover, this method may generate rules where no data instances appear. Referenced by [17] an example of rule extraction method using this approach is the learning from examples [6] and its modifications [18]. A third approach is the tree partitioning technique [13]. In this method, more than one MFs are used for each of the input features, and thus a large number of rules can be generated. Essentially, a fuzzy model with huge fuzzy rules make the system computationally complex as well the performance might degrade. Furthermore, the problem of producing rules for some empty regions still remains. A fuzzy entropy based rule generation technique was developed by Wang et al. [19]. In this technique, fuzzy rules were generated from the examples of fuzzy representation by generalizing the concept of crisp extension matrix. The approach used the extended information-entropy based on probabilistic models as fuzzy entropy. The value of the fuzzy entropy was used to generate the fuzzy rules. This technique has been named as the minimum fuzzy entropy criterion. A heuristic algorithm was then applied to extract the minimum number of fuzzy rules based on the minimum fuzzy entropy criterion. Chen and Lee [20] have also generated fuzzy rules from numerical values using the fuzzy subsethood values between decisions to be made and terms of attributes by using the level threshold value a and the applicability threshold value b, where a A ½0; 1 and b A ½0; 1. In order to generate a compact number of fuzzy rules without degrading system performance, genetic algorithm (GA) based fuzzy rule generation technique has also been used. Angelov and Buswell [5] proposed a method where the rules that make up the fuzzy model were encoded to reduce the size of chromosome significantly so that the interpretability of the system becomes more transparent by employing an optimal number of fuzzy rules. The objective of the GA was to minimize the errors of the generated fuzzy system. This approach was successfully used to control a HVAC (Heating, Ventilation and Air-Conditioning) system. Ishibuchi et al. [21] proposed a technique to generate a small number of linguistically interpretable fuzzy/rules from numerical data for high-dimensional pattern classification problems using a GA. There were three objectives considered in this method: maximize classification accuracy, minimize the number of fuzzy rules and

m2 : x2 -½0; 1 Here the relationship between inputs/variables is represented using fuzzy connectives. A standard syntax for a fuzzy rule is If antecedent proposition Then consequent proposition 2.1.2. Fuzzy inference system Fuzzy inference is the process which combines the fuzzy rules and generates a solution. Fuzzy inference models may be differentiated into one of two types—Mamdani [11] or Takagi–Sugeno (TS) [12] fuzzy model, depending on how the fuzzy inference is calculated and represented in the consequent part. In the Mamdani fuzzy model, the consequent part is a fuzzy proposition while that of TS model produces crisp (numeric) value from the antecedent variables. As in the TS model, it is possible to further tune the parameters of the fuzzy rules using gradient descent method [cf. [13]]. We have used the TS model in this study. 2.2. Evolutionary Algorithm Evolutionary Algorithm (EA) has been a widely used technique to the most feasible solution for many optimization problems. The adaptation mechanism of survival of the fittest criterion of EA allows effective sampling through large search spaces and makes it significantly different from other search techniques. EA operates with the help of three operators: selection, crossover and mutation. EA starts with an initial population of randomly generated candidate solutions to work with the operators. The population consists of a set of strings which have their own fitness value. The fitness value is used to determine the probability of each string in the population of current generation to select the parent string from the current population. The crossover operator produces child offstring from two selected strings by crossing over at some point. With a time interval, the mutation introduces variation to the population by changing few bit values in the string. These processes are executed iteratively and new next generations are created until a stopping criterion is reached. In EA the objective is to either minimize or maximize the fitness value following the biological inspiration of survival for the fittest. If the problem is to optimize (i.e., minimize or maximize) a single fitness value can be defined as a single objective EA. However, in the real world there exists problem where more than one values need to be optimized in parallel. For example, consider a traveler who wants to reach at a destination within the shortest time and spending the lowest cost. The problem can be portrayed on a multidimensional graph where each axis represents an individual objective. For the traveler problem, there are two objectives to optimize: (1) time and (2) cost. Hence, the graph would consist of two dimensions. Such multiobjective optimization problem is referred to as Pareto-optimization problem. EA has been proved as a potential approach to solve Pareto-optimization problem. In this case, the EA operates with the standard operators: selection, crossover and mutation as defined above. Since there are more than one objective functions to be optimized, the Pareto-optimization criteria is used as the fitness value. Pareto-optimization criteria is defined as optimizing all the objective function values such that there exists no other superior solution including the solutions where the solution is superior at least in case of one

4

Md. Rafiul Hassan et al. / Neurocomputing 81 (2012) 1–11

minimize the total number of antecedent conditions. The large number of rules initially generated were reduced to a small number of linguistic fuzzy rules and the GA was used to search the nondominated rule sets with respect to the objectives mentioned above. Chang and Lilly [22] proposed an evolutionary approach for generating a fuzzy classification system with the least number of rules from data where no prior knowledge or assumptions on the distribution of the data is required. The variable input spread inference training algorithm (VISIT) [23] was used to tune the spread of membership functions such that they may cover the universe of discourse efficiently. Fuzzy rules were generated from data via VISIT as a multiobjective optimization problem. Ishibuchi and Yamamoto [24] have also used a multiobjective GA to select a small number of fuzzy rules for pattern classification problems, where the attribute values were continuous. In this case, the three objectives considered were: maximize the classification accuracy, minimize the number of selected rules and minimize the total rule length. Experimental results reported indicated that reasonable classification accuracy was found with a compact number of fuzzy rules. Abonyi et al. [25] have generated a compact, accurate, and linguistically sound fuzzy classifiers based on a decision tree initialization and a multiobjective GA. The initial set of fuzzy rules generated using a binary decision tree was further reduced by means of similarity-driven rule-reduction. The multiobjective GA was used to eliminate the redundancy in rules and improve classification accuracy. Although a number of studies (as mentioned above) attempted to generate fuzzy rules from available data, probably the widely used techniques are as follows. Kasabov and Song [26] proposed a model called dynamic evolving neural fuzzy inference system (DENFIS). The DENFIS model is developed with the idea that, depending on the position of the input vector in the input space, a fuzzy inference system for calculating the output is formed dynamically based on m fuzzy rules that had been created during the past learning process. Gaweda and Zurada [17] developed a new approach to fuzzy rule-based modeling of non-linear systems from numerical data. The approach introduces interpretable relational antecedents that incorporate local linear interactions between the input variables into the inference process which aids in an improved approximation quality and allows to limit the number of rules in the fuzzy model. Kim et al. [27] developed a fuzzy modeling algorithm that takes into consideration the correlation among components of sample data through the method of principal component. The new fuzzy modeling has been referred as simply identified (SI) model. Wang and Langari [28] proposed and developed a new approach to building Sugeno-type fuzzy models where the premise identification and the consequence identification have been done separately. In identifying the consequence an orthogonal estimator is used. qThe experimental results on the well known gas furnace data of Box and Jenkins prove the ability of the approach in identifying appropriate structure of the fuzzy model. Paul and Kumar [29] proposed a subsethood-based (SupFunIs) method for rule generation. The rules are generated from a trained network while the network structure is built by fuzzifying the numerical inputs and the connections in the network are represented by Gaussian fuzzy systems. 2.4. Hidden Markov model An HMM is a finite automata with double probabilities. The double probabilistic approach to combine the observations/data of a process into states [30] makes it a useful tool for handling noisy data efficiently. The variations among the data features are represented by the transitions between states [31]. The state transitions in HMM are assumed to follow the standard Markov process which

is a stochastic process where the next event is solely dependent on the occurrence of current event [32]. HMM is based on Markov process and as such it considers the dependencies between the subsequent variables in the data vector and can be useful to find the relationship between the predictor data vector and the predictee variable. Details about HMM are described in Appendix.

3. Hybrid of multiobjective EA and HMM-Fuzzy model The proposed hybrid fuzzy system combines a multiobjective EA with the HMM-Fuzzy model generating approach. The multiobjective EA is used to select the optimal parameter set for which a compact fuzzy model is obtained. In the system initially a number of HMM-Fuzzy models are generated where rule generation is stopped based on the objective values provided through the initial chromosome of multiobjective EA. First we introduce the HMM-based fuzzy rule generation tool and then describe how we used the multiobjective EA hybridized with HMM-Fuzzy model. In HMM-Fuzzy model, a HMM is used to sort the data vectors in the multivariate dataset and divide the input space into a number of subspaces to form fuzzy rules. Our proposed hybrid model consists of three phases: Phase 1: The HMM is used to rank and partition the input dataset based on the ordering of the calculated HMMloglikelihood values. Phase 2: An iterative top-down (divide and conquer) algorithm is used to generate the minimum number of fuzzy rules to meet the pre-defined mean square error (MSE) for the training dataset. Phase 3: We refer to the model developed in phases 1 and 2 as the base model. A multiobjective EA is applied to the base model to choose a suitable MSE for the dataset. We describe the proposed hybrid system in the sequel. 3.1. HMM-Fuzzy model generation In the process of HMM-Fuzzy model generation, the HMM log-likelihood values are used to rank and partition the input space for generating the fuzzy rule base. There are three steps: (1) ranking the data vectors; (2) fuzzy rule inference; (3) fuzzy rule generation. 3.1.1. Ranking the data vectors In the HMM-Fuzzy model, a single HMM is used to rank the available training data vectors based on the HMM-loglikelihood values. According to Rabiner [8], after training the HMM, which acts as a reference point, it becomes suitable to compute the probability that the vector was produced by the model. They have also shown that the probability acts as an indicator of how well a given model matches a given vector. Therefore, any vector can be transformed into a scalar log likelihood value. The value thus can be used to rank the data vector considering the HMM as a reference model. Example 3.1. Consider the log-likelihood value for the three data vectors D1, D2 and D3 are l1, l2 and l3 respectively, where l1 4 l2 4 l3 . According to the log-likelihood values of each individual data vector, the rank of D1 is higher than that of D2 and D3. Similarly the rank of D2 is higher than that of D3. In the model, each data vector is formed using a number of distinct variables. For instance, for a dataset consisting of k explanatory variables x1,x2,y,xk, the input data vector is /x1 ,x2 , . . . ,xk S. The log-likelihood of generating such a data vector given the trained

Md. Rafiul Hassan et al. / Neurocomputing 81 (2012) 1–11

HMM is computed following the description in Appendix B.1. Thus log-likelihood scores of each of the data vectors in the training dataset are used to rank the respective data vector as explained in the above example. 3.1.2. Bucketing to group similar data vector The range of log-likelihood scores (l1 to lm, where li is the loglikelihood value produced for the ith data vector and m is the total

5

data vectors) is split into equal sized buckets. The data vectors in each bucket produce similar log-likelihood values. The size of the bucket, y, is a parameter of the model that is used to guide the rule extraction process. A bucket has a start point and an end point corresponding to the log-likelihood values. The start point corresponds to the immediate next log-likelihood value to the end point of the last bucket. The end point is the log-likelihood value that is at a y distant from the start point. These buckets are

Fig. 3. The flowchart of the hybrid model for generating fuzzy rules.

6

Md. Rafiul Hassan et al. / Neurocomputing 81 (2012) 1–11

generated so that they can be used to generate fuzzy rules at a later phase. 3.2. Fuzzy rule generation In this phase of the model, we divide the dataset using the buckets and employ a divide and conquer approach to generate appropriate number of fuzzy rules. To begin with, we create only one fuzzy rule that represents the entire input space of the training dataset. At this point, all log-likelihood values contained in the individual buckets may be perceived as belonging to one global bucket. In the process of rule generation, we calculate the mean mxi and standard deviation sxi to define the membership function M xi for each features xi in the dataset as follows: ð1=2Þððxi mx Þ=sxi Þ2

M xi ¼ e

i

ð1Þ

The prediction error for the training data vectors is calculated using the generated fuzzy rule. A Mean Square Error (MSE) is used to quantify the performance of the developed model for the training dataset. If the prediction error for the training dataset is less than or equals a threshold value x, the algorithm is terminated and no further rules are extracted. On the other hand, if the prediction error is greater than x then the input space is split into two parts with the help of buckets produced in previous section. The splitting of the input space is done by dividing the total buckets into two equal parts. Data in the respective parts constitute the splitted input space. Each splitted partition has individual rules created for it. Finally, the total number of rules is increased by one. The prediction error for the training dataset is recalculated using the extracted rule set. Should the error threshold x not be reached then the buckets containing the datasets responsible for the left part of the rule are divided into two rules, and the process is iterated. Again, if the error threshold x is still not met, the right part of the rule is partitioned and the process is repeated. This cycle continues until either the error threshold x is met or the number of rules equals the number of buckets. 3.3. Multiobjective EA applied to the HMM-Fuzzy model The HMM-Fuzzy model described so far uses an iterative divideand-conquer method to divide the input data-space into a number of data subspaces. Each subspace is used to generate a fuzzy rule. We use the desired MSE, x, for the training dataset given the generated fuzzy rules as the stopping criteria of the iterative process. This approach may suffer from the ‘‘over fitting problem’’ when the desired MSE is chosen a very small value. It is possible that the maximum number of rules is generated for the given MSE. Once the fuzzy model is built, it may not perform well for the unknown test dataset because of the high number of fuzzy rules. A good/reasonable desired MSE could solve the problem of generating the rules up to a reasonable number and solve the over fitting problem as well. Therefore, the problem of generating an efficient and compact fuzzy model can be expressed as a multiobjective optimization problem, where there is always a trade-off between the generated number of fuzzy rules and the MSE. In this case, the two objectives are 1. Prediction/classification accuracy—high values required, but the model should not suffer from the over fitting problem. 2. Minimum number of fuzzy rules—a compact fuzzy model is required.

Fig. 4. The range of suitable solutions obtained using multiobjective EA. Here we scale the cost function: prediction error (MSE) in the range of 0–0.1.

(or Pareto) front. These solutions are sub-optimal in the wider sense that no other solution in the search space is superior to them when all above-mentioned objectives are considered. Fig. 4 shows one such range of solution obtained using the multiobjective EA with the two above-mentioned objectives. An outline of the hybrid process is presented below. 1. Divide the training dataset into two parts: training and validation. 2. Generate the initial population P with N individuals (each individual contains the desired MSE for the validation dataset). 3. Generate N fuzzy models using our HMM-Fuzzy approach and training dataset. Then calculate the corresponding MSE values for each individual in the population. 4. Sort the N fuzzy models according to their MSE value and the number of fuzzy rules generated for each of the N models. That is, use a non-dominated sort based on the given objectives. 5. Select individuals from the non-dominated set as parents, then apply the standard genetic operators (crossover and mutation). 6. Repeat steps 3–5 until a predefined time limit Tstop is reached. 7. Finally, consider those individuals on the Pareto front as the range of suitable solutions for the problem. The flowchart in Fig. 3 shows how the hybrid system works.

4. Experimental results To investigate the performance of our proposed hybrid HMMFuzzy with EA model, a number of experiments are conducted. Initially, three well known benchmark time series datasets, the Mackey–Glass dataset [35], Box–Jenkins gas furnace dataset [36] and Gas-Mileage dataset [37] are used in the experiments. We then investigate the effectiveness and forecast accuracy of our model using a number of stock prices listed on the US and Australian Stock exchange. Details about the datasets are provided below. 4.1. Mackey–Glass data

In this study, we use a multiobjective EA [33,34] to find a range of alternative solutions for the given problem. The EA is used to generate a diverse set of points distributed along the non-dominated

The Mackey–Glass dataset is a well known benchmark time series data introduced by Mackey and Glass [35]. This dataset has

Md. Rafiul Hassan et al. / Neurocomputing 81 (2012) 1–11

been used in a number of studies to evaluate the performance of forecasting methods. The values of the time series are generated using the following differential equation: dxðtÞ 0:2xðttÞ ¼ dt 1 þx10 ðttÞ

7

Table 2 Comparison of prediction accuracies for the Box–Jenkins gas furnace data. Model

Number of rules

MSE

HMM-Fuzzy with EA SI model with transformed inputs [40] Data-driven linguistic modeling using relational fuzzy rules [2] SI model [27] Sugeno’ model with OLS [28]a Sugeno’s model [46]a Sugeno and Yasukaw [47] Chiu’s fuzzy model [48] ARMA [36]a

2 2 2

0.0454 0.048 0.055

2 2 2 6 3 N/A

0.055 0.066 0.068 0.190 0.072 0.202

ð2Þ

The values of the attributes: xðt18Þ, xðt12Þ, xðt6Þ and x(t) are used as predictor, while xðt þ6Þ is the dependent variable. The initial conditions for generating the dataset are, xð0Þ ¼ 1:2 and xðttÞ ¼ 0 for 0 r t r t. This is obtained following the Runge– Kutta method. First 500 data vectors of the 1000 generated data are used for training the forecasting method and the remaining 500 data are kept to test the method. From the training dataset, first 450 data are used for training the model and the next 50 are used for validation. In the earlier study [38] we tested our base HMM-Fuzzy model using the same dataset, but we did not use the validation data to test the model during the training of the model. Table 1 shows the results of some well known prediction methods tested on Mackey–Glass data. Using the base model, in study [39] 31 suitable rules were found corresponding to a 0.0055 NRMSE (Normalized Root Mean Square Error) accuracy level for the test data. When the multiobjective EA has been used with validation data, significant improvement in performance is achieved. Table 1 shows improvement in both accuracy and complexity of results.

a

Collected from [2].

Table 3 Comparison of prediction accuracies for the automobile gas mileage data. Model

HMM-Fuzzy with EA Neuro computing and soft computing approach [13] Data-driven linguistic modeling using relational fuzzy rules [2]

Number of rules

RMSE

2 3 4 3

2.97 2.82 2.97 2.85

4.2. Box–Jenkins gas furnace data The Box–Jenkins gas furnace data [36] is one of the widely used benchmark data to test the efficiency of a prediction method. In this gas furnace, air and methane were combined in order to obtain a mixture of gases which contained CO2 (carbon dioxide). The methane gas feed rate forms the input series u(k) and the gas outflow y(k) is the output series, which is the concentration of CO2 at the outlet. Two hundred and ninety-six successive pairs of observations were collected from the continuous records at 9-s intervals to form the entire gas-furnace dataset. In this experiment, we choose the following input variables to the fuzzy model to make it consistent with some of the existing studies [2,27] uðk1Þ,uðk2Þ,uðk3Þ,yðk1Þ,yðk2Þ,yðk3Þ for the output variable yðkÞ

To be consistent with other studies, in this experiment both the training and testing have been done on all the 296 data vectors. Table 2 compares the performance of the proposed HMM-Fuzzy with EA model against some of the prediction methods reported in recent studies.

Table 1 Comparison of prediction accuracies for the Mackey–Glass data. Model

Number of rules

NRMSE

HMM-Fuzzy with EA HMM-Fuzzy [38] ANFIS [44] Data-driven linguistic modeling using relational fuzzy rules [2] SuPFuNIS [29] DENFIS (OFFLINE) [26] EPNet [45] Rule extraction using subtractive clustering followed by optimization using gradient descent method [16]

13 31 16 6

0.0048 0.0055 0.0074 0.009

15 – – 16

0.014 0.016 0.02 0.0084

4.3. Automobile gas mileage data The automobile gas mileage prediction containing six numerical attributes specifying the automobile type: number of cylinders, displacement, horse power, weight, acceleration, model year, has been used to model the fuel consumption for a variety of vehicles. This problem is a non-linear regression problem that relates the fuel consumption to the respective vehicle characteristics. This dataset has been used in a number of studies [2,13]. To maintain consistency with previous studies, in our experiment we also remove the six samples with missing values out of 398 samples. The remaining 392 samples have been divided randomly into two equal parts. In producing the fuel consumption value, we use the two predictors: weight and model year, identified in [13]. Table 3 presents the comparison of the performance of our model with other methods reported in previous studies on the same dataset. 4.4. Stock price data In this experiment, 20 different stocks, drawn from New York Stock Exchange (NYSE) and Australian Security Exchange (ASX), have been used. For each company, the daily closing prices have been used to form the time series. A small window size to predict the next day’s stock price is chosen as 4. Table 4 displays the length of the training and test datasets for the 20 stocks considered, of which seven stocks are drawn from the Automobile and telecommunication sectors of the NYSE and the remaining 13 stocks are selected from the ASX volume leader stocks on 3 August 2007. Table 5 presents the experimental results for each of the stocks and compares the performance of our model with some other well known data-driven fuzzy methods.

5. Discussion and conclusion In this paper, we have developed an efficient data-driven fuzzy rule generation method for predicting future values of non-linear time series. The proposed method generates a minimum number

8

Md. Rafiul Hassan et al. / Neurocomputing 81 (2012) 1–11

Table 4 Training and test dataset. Stock name

Training data

Alaska Communications Systems Group, INC. Centutytel INC. Citizens Communications Shenandoah Telecommunications CO DaimlerChrysler AG (USA). Ford Motor Company Investa Property Group Kimberley Diamond Company NL Telstra Corporation Ltd. International Concert Attractions Ltd. Cullen Resources Ltd. Macquarie Infrastructure Group BHP Billiton Ltd. ConnectEast Group Multiplex Group GPT Group Downer EDI Ltd. Sundance Resources Limited Oxiana Ltd. Qantas Airways Ltd.

Test data

From

To

From

To

18 Nov 1999 5 Nov 1987 26 Mar 1990 23 Jun 2000 26 Oct 1998 3 Jan 1977 19 Oct 2005 4 Jan 2000 4 Jan 2000 22 Jan 2001 4 Jan 2000 13 Aug 2002 4 Jan 2000 4 Nov 2005 2 Dec 2004 1 Jun 2006 5 Jun 2001 16 Dec 2003 4 Jan 2000 4 Jan 2000

14 Sep 2004 25 Sep 1997 26 Mar 2002 11 Sep 2003 30 Dec 2005 4 Apr 1997 23 Feb 2007 1 Mar 2005 1 Mar 2005 20 Mar 2003 1 Mar 2005 16 Aug 2006 6 Dec 2005 29 Dec 2006 26 Mar 2007 24 Apr 2007 3 Jan 2006 6 Jun 2007 6 Dec 2005 6 Dec 2005

15 Sep 2004 26 Sep 1997 27 Mar 2002 12 Sep 2003 3 Jan 2006 7 Apr 1997 26 Feb 2007 2 Mar 2005 2 Mar 2005 21 Mar 2003 2 Mar 2005 17 Aug 2006 7 Dec 2005 2 Jan 2007 27 Mar 2007 26 Apr 2007 4 Jan 2006 7 Jun 2007 7 Dec 2005 7 Dec 2005

10 Jul 2007 10 Jul 2007 10 Jul 2007 10 Jul 2007 10 Jul 2007 10 Jul 2007 3 Aug 2007 3 Aug 2007 3 Aug 2007 3 Aug 2007 3 Aug 2007 3 Aug 2007 3 Aug 2007 31 May 2007 3 Aug 2007 3 Aug 2007 3 Aug, 2007 3 Aug 2007 3 Aug 2007 3 Aug 2007

Table 5 Comparison of forecast accuracies for the stock (closing) prices. Stock name

Alaska Communications Systems Group, INC. Centutytel INC. Citizens Communications Shenandoah Telecommunications CO DaimlerChrysler AG (USA) Ford Motor Company Investa Property Group Kimberley Diamond Company NL Telstra Corporation Ltd. International Concert Attractions Ltd. Cullen Resources Ltd. Macquarie Infrastructure Group BHP Billiton Ltd. ConnectEast Group Multiplex Group GPT Group Downer EDI Ltd. Sundance Resources Limited Oxiana Ltd. Qantas Airways Ltd.

HMM-Fuzzy with EA

Subtractive clustering based fuzzy model

DENFIS

No. of rules

NRMSE

No. of rules

NRMSE

No. of rules

NRMSE

2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

0.0303 0.0118 0.0485 0.0159 0.0060 0.0026 1.4971 0.4272 0.1960 7.5250 15.1704 0.7993 0.5639 3.6852 2.2011 1.8299 0.3055 12.9820 0.3168 0.1074

4 3 4 6 3 3 7 5 3 5 2 4 5 2 3 5 2 3 3 3

0.0337 0.0118 0.0485 0.0254 0.0060 0.0026 3.9277 0.4320 0.1963 7.5657 15.469 0.8105 0.0458 3.6920 2.3066 1.8558 0.3064 11.7214 0.3171 0.0855

7 13 12 11 6 11 6 7 6 6 10 11 9 7 7 6 12 5 6 9

0.0324 0.0540 0.0484 0.0183 0.0060 0.0051 6.7350 0.4310 0.2435 7.4856 16.25 0.8093 0.4148 12.6550 2.8336 1.8498 0.6027 19.4725 4.6851 0.4842

of fuzzy rules and provides better prediction accuracy compared with other fuzzy models. Many of the existing data-driven fuzzy models are constrained by user defined parameters, e.g. number of clusters, the radius of clusters or increased computational complexity due to generation of a large number and/or improper fuzzy rules. Our proposed method is not constrained by user parameters and can generate a fuzzy model with a reasonable and proper fuzzy rules. In the proposed hybrid model, we use an HMM to find similar data patterns from the training dataset. The HMM groups similar data patterns based on the fluctuations among the predictor variables in the training dataset. A key advantage of the approach is that it is not necessary to analyze the training dataset before using the model. Once the data patterns have been grouped into a number of small buckets/bins, we proceed towards the generation of fuzzy rules. The divide and conquer technique for generation of fuzzy rules continues until the desired MSE is obtained for the validation dataset

using the generated fuzzy model. This approach helps to keep the number of fuzzy rules as small as possible by meeting the desired MSE. However, this approach can lead to the generation of a large number of rules if we choose an impractical/unsuitable MSE for the dataset. It is possible for over fitting to occur—that is where a large number of rules are used in the fuzzy model. To overcome this problem we have used a multiobjective EA to find a range of compromise solutions between MSE and the number of fuzzy rules in the model. By choosing a suitable MSE value using the multiobjective EA, the prediction accuracy for the Mackey–Glass dataset (see Table 1) was improved by 12.73% compared with that of the base HMM-Fuzzy model [38,9]. In addition, the number of required rules was significantly reduced from 31 to 13. Compared to the reported studies in the literature, this performance is the best so far for the same dataset with the minimum number of fuzzy rules.

Md. Rafiul Hassan et al. / Neurocomputing 81 (2012) 1–11

The performance of the hybrid HMM-Fuzzy with EA model was further tested on a number of benchmark time series datasets. As shown in Table 2 for the Box–Jenkins gas furnace dataset, by using two fuzzy rules as in other studies [40,2], our model achieved at least 5% higher prediction accuracy than the best reported performance among the existing fuzzy systems. On the automobile gas mileage prediction problem, our model performed better than other models using two rules (Table 3). For this dataset, the performance of our HMM-Fuzzy model with EA was as good as that reported using a neuro-computing and soft computing approach [13]. When comparing the neuro-fuzzy model with our model, we see that our model achieved this accuracy using only two rules, while the neuro-fuzzy model used four rules. Our model achieved the best performance (NRMSE¼ 2.82) using three rules, which is better than data-driven linguistic fuzzy model [2] (NRMSE¼2.85). Table 5 lists the performances of the existing fuzzy models along with our hybrid HMM-Fuzzy-EA model on financial time series data. For these datasets our model performed as good as the fuzzy model generated using subtractive clustering and better than that of the DENFIS: a fuzzy rule generation technique based on evolving clustering method [26]. Indeed, our model achieved this performance using only two rules for all the stocks while subtractive clustering used two 7 rules and DENFIS used seven 13 rules depending on the dataset. For at least 16 stocks out of the 20, the performance of our model is better than the other two fuzzy models. Among these 16 stocks, our model achieved at least 61.88% better prediction accuracy (in terms of NRMSE) for Investa Property Group, compared to the better performance between the other two fuzzy models. For Shenandoah Telecommunications Co., our model achieved 13.11% better performance than the DENFIS and 27.40% better than the fuzzy model generated using subtractive clustering. A 6.50% improved accuracy is achieved for Alaska Communication Systems Group Inc. For the other stocks our model perform as good as the other fuzzy models with a reduced number of rules. The Markov process underlying in the HMM can efficiently identify the similarities among the data vectors considering the variations in the variables and hence the compact groups of similar data vectors were obtained. As a consequence, a compact number of fuzzy rule is generated. The multiobjective EA approach also helps to find out the suitable MSE for the specific problem such that, the minimum fuzzy rules are used to achieve a better performance. In general, by choosing an appropriate MSE level using multiobjective EA for the HMM-Fuzzy model has performed significantly better compared to the state-of-art fuzzy models for all the time series benchmark datasets considered and performed at least as good as the other two data driven fuzzy model for the 20 stock prices. In most of the data driven fuzzy rule generation methods either the data sets are analyzed extensively to determine the appropriate number of fuzzy rules or some existing clustering method is employed to divide the input data space. However, none of these methods takes into account the relationships among the data features which have strong influence on the output feature. HMM-Fuzzy approach followed by multiobjective EA is able to obtain a minimum number of fuzzy rules to generate better time series predictions. In future we plan to embed the combination of different types of membership function (e.g. triangular or trapezoidal membership function) in place of using only the Gaussian membership function in the HMM-Fuzzy followed by multiobjective EA approach.

9

from each rule is aggregated using a weighted average equation (to be discussed later). For a given fuzzy rule the fuzzy implication among the membership functions (e.g. m1 and m2 ) is typically computed using either of the following two operations:

 Minimum m1;2 ðx1 ,x2 Þ ¼ minðm1 ðx1 Þ, m2 ðx2 ÞÞ.  Product m1;2 ðx1 ,x2 Þ ¼ m1 ðx1 Þm2 ðx2 Þ. Here m1 ðx1 Þ and m2 ðx2 Þ represent the degree of membership for the attributes x1 and x2 respectively for the fuzzy rule. In this paper we use the product implication since this operation is better fitted with the gradient-based operation algorithm. The resultant fuzzy implication operation for each rule is considered as the weight of the respective rule. In the TS model, the output y for each rule is computed using a first-order polynomial equation as follows: y ¼ a0 þa1 x1 þ a2 x2 þ    þ ak xk

ðA:1Þ

where a0 , a1 , . . . , ak are some constants. With regard to fuzzy models, the process of aggregating the outputs from all the rules in the model is known as ‘defuzzification’ or consequence. In the TS model the consequent part or defuzzification is computed using Eq. (A.2). Pn i ¼ 1 wi  yi y¼ P ðA:2Þ n i ¼ 1 wi Here the weights wi represent the implication of the respective rule. The implication is computed as described above.

Appendix B. Hidden Markov model Formally, according to Rabiner [8], an HMM is characterized by the following attributes: 1. N, the total number of states. In most applications these states are hidden, though states may represent some significant physical meaning attached to the problem. 2. M, the number of distinct feature values per state. 3. A, the state transition probability matrix, where aij A A,aij , the probability of changing from state i to j at time t, and 1 ri, j r N. 4. B, the observation emission probability matrix, where bj ðkÞ A B and bj(k), the probability of selecting feature symbol xk at time t, from the state j. The values of j and k vary as: 1 r j rN and 1r k rM. 5. p, the prior probability matrix, which represent the initial state transition matrix, where pi A p and pi the probability of selecting state i at time 1, 1 r ir N. An HMM is generally referred to as a set (l ¼ A,B, p), where the three parameters describe the HMM for a specific problem. The three key problems for HMM are

Appendix A. Takagi–Sugeno fuzzy inference

1. Given the HMM: l ¼ ðA,B, pÞ the computation of the probability of the k-dimensional data vector X, i.e., PrðX9lÞ, where xi A X, 1r i rk. 2. Assuming that we have X and the model l, selection of the best state sequence S that will most likely generate the data vector X. 3. Assuming we have X and the initial model l, with the parameter values of A,B, p; the re-estimation of the parameter values so that the model l best explains X with the modified parameter values.

In the TS model, the fuzzy implication between membership functions is used as weight for the respective rule and the output

The first problem is to consider its evaluation: given a model and the data, computation of the probability that X was produced

10

Md. Rafiul Hassan et al. / Neurocomputing 81 (2012) 1–11

by the model. The third problem is the one where the optimization of the model parameters so that the occurrence of a given data X can be best described. The second problem, however, is of little interest for the proposed models in this study and hence is omitted. Details about the second problem are described in Onur and Fatih [41] and Onur et al. [42]. B.1. Computation of the data vector probability given an HMM The probability of a data vector X considering a given HMM l, i.e., PðX9lÞ is computed as follows: X PðX9Q , lÞPðQ 9lÞ ðB:1Þ PðX9lÞ ¼ Q

here Q, is the state sequence q1 ,q2 , . . . ,qk (for a k-state HMM), l, the HMM model, X, the input data vector x1 ,x2 ,x3 , . . . ,xk (observation sequence). The values of PðX9Q , lÞ and PðQ 9lÞ used in the above equation are calculated using the following equations [8]: PðX9Q lÞ ¼

k Y

Pðxi 9qi , lÞ ¼ b1 ðx1 Þb2 ðx2 Þ    bk ðxk Þ

ðB:2Þ

i¼1

here bi ðxi Þ, is the emission probability of the feature xi from state i. PðQ 9lÞ ¼ p1  a1;2  a2;3    ak

ðB:3Þ

here p, is the prior probability matrix, ai,j , the transition probability from state i to state j. The logarithm of the PðX9lÞ is known as the log-likelihood value. B.2. Re-estimation of HMM parameter values The most involved problem in HMM is to adjust/re-estimate the model parameters ðA,B, pÞ so that its probability of generating X is maximized [32]. No analytical solution for this problem exists to model the maximum likelihood criterion. However, researchers have optimized the parameters using an iterative algorithm or gradient technique. The well known iterative algorithm for optimizing the HMM parameters is the Baum–Welch algorithm [43]. An explanation of the Baum–Welch algorithm is given below. Initially, an HMM l is built by choosing random numbers as parameter values. Considering this HMM, the probabilities for the training data are produced. Based on this current probability we can further estimate what the ideal parameter for the model would be [32]. The idea mentioned above can be implemented initially choosing some non-zero parameter values to form an initial model, l. The probability for the given training data X is calculated assuming the initial model, l is best suited with X. The a posterior probability of transitions gij , from state i to state j is calculated using Eq. (B.6).

gt ði,jÞ ¼ PrðSt ¼ i,St þ 1 ¼ j9X, lÞ gt ði,jÞ ¼ gt ði,jÞ ¼

at ðiÞaij bj ðxt þ 1 Þbt þ 1 ðjÞ PrðX9lÞ

at ðiÞaij bj ðxt þ 1 Þbt þ 1 ðjÞ P i aT ðiÞ

ðB:4Þ ðB:5Þ

ðB:6Þ

Here, gi,j , is the probability of the transition from state i to state j at time t and t þ 1 respectively, for a given l. at ðiÞ, the forward variable representing the probability of all paths to state i at time t. bt þ 1 ðjÞ, the backward variable representing the probability of all paths at time t þ 1 to the time T. aij, the probability of changing state i to j. bj ðxt þ 1 Þ, the probability of emitting xt þ 1 from state j. The value of the variables gt ðiÞ, representing the a posterior probability of state i at time t for the training data and the model

l, are calculated using the following equation:

at ðiÞbt ðiÞ gt ðiÞ ¼ PrðSt ¼ i9X, lÞ ¼ P i aT ðiÞ

ðB:7Þ

The new calculated values of gt ðiÞ, at time t ¼1 is usually assumed to be the new estimates p i of the initial state probabilities pi . Thus, the probability matrix of choosing all states/persons at time t (t ¼1) should become the new initial state probability matrix p as shown in Eq. (B.8).

p i ¼ g1 ðiÞ

ðB:8Þ

The new estimate of the probability of data feature xk from state j would be as in Eq. (B.9). P t A O ¼ x gt ðjÞ b j ðxk Þ ¼ PT t k ðB:9Þ t ¼ 1 gt ðjÞ To obtain the re-estimated value of the probability of the transition from state i to state j, aij, we calculate the ratio of the expected number of state transitions from state i to state j and the expected number of total transitions from state i. Thus the new estimate of the value of a ij is as follows: PT1 PT1 t ¼ 1 gt ði,jÞ ¼ 1 gt ði,jÞ a ij ¼ PT1 ¼ Pt T1 ðB:10Þ P t¼1 t ¼ 1 gt ðiÞ j gt ði,jÞ During the re-estimation phase, at some point the re-estimated model l might reproduce the same or close to the same model l within a tolerance/acceptance level before the re-estimation took place. If such thing happens for a number of subsequent iteration the re-estimation process is stopped assuming that we have found the final HMM. The perfect re-estimation of parameter values happens when the condition: PrðX9l Þ r PrðX9lÞ is satisfied. After a number of iterations of the re-estimation phase, the final l is obtained which is suitable to generate X with a maximum probability. To obtain the re-estimated parameters at each iteration Eq. (B.4) through (B.10) are used. References [1] L.A. Zadeh, Fuzzy logic, neural networks, and soft computing, Commun. ACM 37 (3) (1994) 77–84. [2] A.E. Gaweda, J.M. Zurada, Data-driven linguistic modeling using relational fuzzy rules, IEEE Trans. Fuzzy Syst. 11 (2003) 121–134. [3] L.G. Sison, E. Chong, Fuzzy modelling by induction and pruning decision trees, in: Proceedings of IEEE International Symposium on Intelligent Control, 1994, pp. 16–18. [4] C.H. Kim, J.J. Lee, Adaptive network based fuzzy inference system with pruning. in: Proceedings of SICE Annual Conference in Fukui, vol. 10, 2003, pp. 140–143. [5] P.P. Angelov, R.A. Buswell, Automatic generation of fuzzy rule-based models from data by genetic algorithms, Inf. Sci. (2003) 17–31. [6] L.X. Wang, J. Mendel, Generating fuzzy rules by learning from examples, IEEE Trans. Syst. Man Cybern. 22 (1992) 1414–1427. [7] Similarity Search and Outlier Detection in Time Series, /http://www.latest-s cience-articles.com/IT/Similarity-Search-and-Outlier-Detection-in-TimeSeries-4480.htmlS. [8] L.R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition, Proc. IEEE 77 (1989) 257–286. [9] M.R. Hassan, A combination of hidden Markov model and fuzzy model for stock market forecasting, Neurocomputing 72 (2009) 3439–3446. [10] M.R. Hassan, M.M. Hossain, R.K. Begg, Y. Morsi, R. Kotagiri, Breast-cancer identification using HMM-Fuzzy model, Comput. Biol. Med. 72 (2010) 3439–3446. [11] H. Mamdani, S. Assilian, An experiment in linguistic synthesis with a fuzzy logic controller, Int. J. Man–Mach. Stud. 7 (1975) 1–13. [12] T. Takagi, M. Sugeno, Fuzzy identification of systems and its application to modeling and control, IEEE Trans. Syst. Man Cybern. 1 (1985) 116–132. [13] J.S.R. Jang, C.T. Sun, E. Mizutani, Neuro-Fuzzy and Soft-Computing: A Computational Approach to Learning and Machine Intelligence, Prentice-Hall, Upper Saddle River, NJ, 1997. [14] E. Zitzler, M. Laumanns, S. Bleuler, A tutorial on evolutionary multiobjective optimization, in: Metaheuristics for Multiobjective Optimisation, Springer-Verlag, 2003, pp. 3–38.

Md. Rafiul Hassan et al. / Neurocomputing 81 (2012) 1–11

[15] J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms, Kluwer Academic Press, 1981. [16] S.L. Chiu, An efficient method for extracting fuzzy classification rules from high dimensional data, J. Adv. Comput. Intell. 1 (1997) 1–7. [17] J. Zurada, Optimal Data Driven Rule Extraction Using Adaptive Fuzzy-Neural Models, PhD Dissertation, University of Louisbille, 2002. [18] J. Zurada, A. Lozowski, Generating linguistic rules from data using neurofuzzy framework, in: Proceedings of the Fourth International Conference on Soft Computing, 1996, pp. 618–621. [19] X.Z. Wang, Y.D. Wang, X.F. Xu, W.D. Ling, D.S. Yeung, A new approach to fuzzy rule generation: fuzzy extension matrix, Fuzzy Sets Syst. (2001) 291–306. [20] S.M. Chen, S.H. Lee, A new method for generating fuzzy rules from numerical data for handling classification problems, Appl. Artif. Intell. (2001) 645–664. [21] H. Ishibuchi, T. Nakashima, T. Murata, Thee-objective genetics-based machine learning for linguistic rule extraction, Inf. Sci. (2001) 109–133. [22] X. Chang, J.H. Lilly, Evolutionary design of a fuzzy classifier from data, IEEE Trans. Syst. Man Cybern.—Part B: Cybern. (2004) 1894–1906. [23] J.S. Branson, J.H. Lilly, Obtaining fuzzy systems from data—the visit algorithm, in: Proceedings of Southeastern Symposium on System Theory, Auburn, 1999. [24] H. Ishibuchi, T. Yamamoto, Fuzzy rule selection by multi-objective genetic local search algorithms and rule evaluation measures in data mining, Fuzzy Sets Syst. (2004) 59–88. [25] J. Abonyi, J.A. Roubos, F. Szeifert, Data-driven generation of compact, accurate, and linguistically-sound fuzzy classifiers based on a decision tree initialization, Int. J. Approximate Reasoning (2003) 1–21. [26] N.K. Kasabov, Q. Song, Denfis: dynamic evolving neural-fuzzy inference system and its application for time series prediction, IEEE Trans. Fuzzy Syst. 10 (2002) 144–154. [27] E. Kim, M. Park, S. Ji, M. Park, A new approach to fuzzy modeling, IEEE Trans. Fuzzy Syst. 5 (3) (1997) 328–337. [28] L. Wang, R. Langari, Building Sugeno-type models using fuzzy discretization and orthogonal parameter estimation techniques, IEEE Trans. Fuzzy Syst. (1995) 454–458. [29] S. Paul, S. Kumar, Subsethood-product fuzzy neural inference system (Supfunis), IEEE Trans. Neural Networks 13 (2002) 578–599. [30] R.C. Vasko, A. El-Jaroudi Jr., J.R. Boston, An algorithm to determine hidden Markov model topology, in: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, 1996, pp. 3577–3580. [31] J. Makhoul, S. Roucos, H. Gish, Vector quantization in speech coding, Proc. IEEE 73 (11) (1985) 1551–1585. [32] X. Huang, Y. Ariki, M. Jack, Hidden Markov Models for Speech Recognition, Edinburgh University Press, 1990. [33] C.C. Coello, D.V. Veldhuizen, G. Lamont, EA for Solving Multi-Objective Problems, Kluwer, 2002. [34] K. Deb, Multi-Objective Optimization Using Evolutionary Algorithms, John Wiley and Sons, Ltd., Chichester, England, 2001. [35] M. Mackey, L. Glass, Oscillation and chaos in physiological control systems, Science 197 (1977) 287–289. [36] J.E.P. Box, G.M. Jenkins, Time Series Analysis: Forecasting and Control, Holden-Day, San Fransisco, CA, 1970, 1976 (third edition published in 1994). [37] R. Quinlan, Combining instance-based and model-based learning, in: Proceedings on the Tenth International Conference of Machine Learning, 1993, pp. 236–243. [38] M.R. Hassan, B. Nath, M. Kirley, HMM based fuzzy model for time series forecasting, in: Proceedings of World Congress on Computational Intelligence (WCCI2006), 2006, pp. 9963–9968. [39] M.R. Hassan, B. Nath, Stock market forecasting using hidden Markov model: a new approach, in: Proceedings of the 5th International Conference on Intelligent Systems Design and Applications, 2005, pp. 192–196. [40] E. Kim, M. Park, S. Ji, M. Park, A transformed input-domain approach to fuzzy modeling, IEEE Trans. Fuzzy Syst. (1998) 546–604. [41] H.F. Onur, K. Fatih, Examining the link between carbon dioxide emissions and the share of industry in GDP: modeling and testing for the G-7 countries, Energy Policy 39 (6) (2011) 3612–3620. ¨ mer Nezih Gerekb, M. Kurban, A novel wind speed [42] F.O. Hocaoglua, O modeling approach using atmospheric pressure observations and hidden Markov models, J. Wind Eng. Ind. Aerodyn. 98 (8) (2010) 472–481. [43] L.E. Baum, T. Pitrie, G. Souls, N. Weiss, A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains, Ann. Math. Stat. 41 (1970) 164–171. [44] J.R. Jang, Anfis: adaptive-network-based fuzzy inference system, IEEE Trans. Syst. Man Cybern. 23 (1993) 51–63. [45] X. Yao, Y. Lin, A new evolutionary system for evolving artificial neural networks, IEEE Trans. Neural Networks 8 (1997) 694–713. [46] M. Sugeno, M. Tanaka, Successive identification of a fuzzy model and its application to prediction of a complex system, Fuzzy Sets Syst. (1991) 315–334. [47] M. Sugeno, T. Yasukawa, A fuzzy logic based approach to qualitative modeling, IEEE Trans. Fuzzy Syst. 1 (1) (1993) 1–7.

11

[48] S. Chiu, Selecting input variables in fuzzy models, J. Intell. Fuzzy Syst. 4 (1996) 243–256.

Md. Rafiul Hassan received a PhD in 2007 from the University of Melbourne. His research interests include neural networks, fuzzy logic, Evolutionary Algorithms, Hidden Markov Model and support vector machine with a particular focus on developing new data mining and machine learning techniques for the analysis and classification of biomedical data. He is currently involved in several research and development projects for effective prognosis and diagnosis of breast cancer from gene expression microarray data. He is the author of around 25 papers in recognized international journals and conferences. He is a member of the Melbourne University Breast Cancer Research Group, Australian Society of Operations Research (ASOR), and IEEE Computer Society; and is involved in several Program Committees of international conferences. He also serves as the reviewer of few renowned journals such as BMC Breast Cancer, IEEE Transactions on Fuzzy Systems, Neurocomputing, Knowledge and Information Systems, Current Bioinformatics, Information Science, Digital Signal Processing, IEEE Transactions on Industrial Electronics and Computer Communications.

Baikunth Nath received the MA degree from Punjab University, Chandigarh, India and the PhD degree from the University of Queensland, Brisbane, Australia. He was with Monash University for more than 25 years in various senior positions including the director of research in the Gippsland School of IT. In 2001, he joined the Department of Computer Science and Software Engineering, The University of Melbourne, Parkville, Australia, as an Associate Professor and the director of postgraduate studies. His research interests include image processing, intrusion detection, scheduling, optimization, data mining, evolutionary computing, neural networks, financial forecasting, and operations research. He is the author of numerous research publications in various well-reputed international journals and conference proceedings.

Michael Kirley received a BEd in Education from Deakin University and PhD in Computer Science from Charles Sturt University Australia in 1988 and 2003 respectively. Currently, he is a faculty member in the Department of Computer Science and Software Engineering, The University of Melbourne, Australia. His research interest includes the theory and application of evolutionary computation, multi-agent systems, and complex systems science. Recently, he has worked on projects examining the implications of diversity mechanisms and changes in connectivity within a natural computation framework. More generally, he is interested in using techniques from statistical physics and non-linear dynamics to model natural systems and to engineer artificial systems. He has published a number of peer-reviewed publications which include journal papers, conference proceedings and book chapters.

Joarder Kamruzzaman received a BSc and MSc in electrical engineering from Bangladesh University of Engineering and Technology, Dhaka, Bangladesh in 1986 and 1989 respectively, and a PhD in information system engineering from Muroran Institute of Technology, Japan, in 1993. Currently, he is a faculty member in the Faculty of Information Technology, Monash University, Australia. His research interest includes computer networks, computational intelligence, and bioinformatics. He has published over 150 peer-reviewed publications which include 40 journal papers and six book chapters, and edited two reference books on computational intelligence theory and applications. He is currently serving as a program committee member of a number of international conferences.