Self-evolving Neural Networks For Rule-based

5 downloads 0 Views 229KB Size Report
Among the reported neural algorithms with self-evolving network structure are cascade cross correlation [4], restricted coulomb energy [5], adaptive resonance ...
2766

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 11, NOVEMBER 1997

Self-Evolving Neural Networks for Rule-Based Data Processing Saman K. Halgamuge, Member, IEEE Abstract— Two training algorithms for self-evolving neural networks are discussed for rule-based data analysis. Efficient classification is achieved with a fewer number of automatically added clusters, and application data is analyzed by interpreting the trained neural network as a fuzzy rule-based system. The learning vector quantization algorithm has been modifified, acquiring the self-evolvement character in the prototype neuron layer based on sub-Bayesian decision making. The number of required prototypes representing fuzzy rules is automatically determined by the application data set. This method, compared with others, shows better classification results for data sets with high noise or overlapping classification boundaries. The classifying radial basis function networks are generalized into multiple shape basis function networks. The learning algorithm discussed is capable of adding new neurons representing self-evolving clusters of different shapes and sizes dynamically. This shows a clear reduction in number of neurons or the number of fuzzy rules generated, and the classification accuracy is increased significantly. This improvement is highly relevant in developing neural networks that are functionally equivalent to fuzzy classifiers since the transparency is strongly related to the compactness of the system. Index Terms— Clustering, fuzzy rules, learning vector quantization, multiple shape basis functions, radial basis function networks.

I. INTRODUCTION

I

N THE traditional classification of neural networks, they are either categorized according to the direction of signal propagation (feedforward networks and recurrent networks) or according to the nature of learning algorithms (unsupervised, supervised, and reinforcement). Another way is the structural classification differentiating neural networks with dynamically evolving network structure from those with fixed network structure [1]. Examples for the latter are backpropagation, learning vector quantization (LVQ) [2], [3], and classical radial basis function network with least mean square error learning. Among the reported neural algorithms with self-evolving network structure are cascade cross correlation [4], restricted coulomb energy [5], adaptive resonance theory (ART) [6], and an extended form of LVQ known as dynamic vector quantization (DVQ) [7]. The major focus of this paper is on neural network algorithms with the capability of self-evolving structure deManuscript received June 20, 1997. The associate editor coordinating the review of this paper and approving it for publication was Prof. Jenq-Neng Hwang. The author is with the Department of Mechanical and Manufacturing Engineering, the University of Melbourne, Melbourne, Australia. He is also a member of Cooperative Research Center for Sensor Signal and Information Processing, Data and Information Fusion Programme, Adelaide, Australia. Publisher Item Identifier S 1053-587X(97)08063-X.

velopment, in particular, on improved extensions of DVQ, restricted coulomb energy (RCE), and radial basis function networks (RBFN). Section II describes the DVQ, RBFN, and RCE models and their relationship to classifier-type fuzzy systems, and Section III describes the data sets used for simulation. After a summary of the authors recent work on DVQ networks in Section IV, Section V presents a new and more efficient concept for classification based on extensions of RBFN. Section VI concludes the paper by comparing the algorithms and the results.

II. TRANSPARENCY

OF

NEURAL NETWORKS

The well-known “neuro-fuzzy” systems can be seen as trainable neural networks that can be interpreted as fuzzy systems [8]–[12]. The major challenge in those trainable neural networks is the application-dependent structure adjustment (hidden layer) of the neural network that corresponds to the fuzzy rule base. Therefore, it is same as the building of the fuzzy rule base completely from data, which does not leave out the possibility of including existing a priori knowledge to the system. Consider a classifier-type fuzzy system with two classes and number of rules: IF IF IF IF

is is is is

low AND is high THEN medium AND is low THEN high AND is low THEN .......

is defined as a class attribute, and is an input. The rule strength is calculated for each rule, as shown in the left of Fig. 1. It shows the calculation of the first two rules of class The T-norm or the conjunction operator can be defined in many ways; commonly used examples are minimum and the product. The consequences of the fuzzy rules to an output class is represented by selecting the maximum rule strength as the output strength. There can be many rules, where each rule contains an antecedent part (or IF part) with a strength of belonging to the same class, but no rule is assigned to more than one class. The classifier type fuzzy systems can be easily mapped into neural networks as shown in the right of Fig. 1 [7], [13]. The neural network can be either a classifying type RBFN, if the hidden neurons are RBF’s, or a nearest prototype network (e.g., LVQ), if the hidden neurons represent the nearest prototypes.

1053–587X/97$10.00  1997 IEEE

HALGAMUGE: SELF-EVOLVING NEURAL NETWORKS FOR RULE-BASED DATA PROCESSING

2767

Fig. 1. Classifier type fuzzy systems and equivalent neural networks.

Assuming a RBF-type neural network in the right of Fig. 1 (1)

All these methods, except for Fuzzy ART, have been developed in last few years by the researchers at Ph.D. or higher levels in Europe. III. APPLICATION EXAMPLES

Equation (1) describes for and the fuzzy classifier in the left of Fig. 1 if the T-norm is considered to be the product. Further, by restricting the standard deviation , in (1), the nearest prototype vector quantization (e.g., LVQ) neural network equivalent to the same fuzzy system can be obtained [7]. The functional equivalence between RBF networks for function approximation and restricted version of the TakagiSugeno-Kang fuzzy systems is also shown in [14]. However, the existence of efficient self-evolving learning algorithms for those neural networks is crucial when claiming the practical advantages of the functional equivalence to a fuzzy system. The neural network algorithms with selfevolving hidden neurons are good candidates for generation of the architecture of the network; therefore, the generation of the system structure also includes the rules of the fuzzy system. In such neural networks, the distance between an input vector and all the reference vectors are calculated to decide on the class membership of an input vector. The prerequisite is the selection of a suitable distance measure. The distance measure used in competitive learning can be more generally defined by the Minkowski metric [15] (2) The most commonly used measures are the Euclidean distance and the city block distance , and the can be derived from this general form. maximum The fact that the classifier-type fuzzy system is the most natural approximation for data clustering methods motivated many researchers to select it for neuro-fuzzy systems. • • • • • •

NEFCLASS [11]; Fuzzy Rule Net [16]; Fuzzy Self-Organizing Map [10]; Fuzzy ART Map [6]; DVQ Variations [7]; Modified Restricted Coulomb Energy Learning [13].

Algorithms presented in this paper are strongly motivated by the requirements of application examples. Therefore, a summary of the benchmarks and applications is given in this section. Fig. 2 shows two benchmark data sets to evaluate and compare the algorithms. The artificially generated data Island data consists of a training and recall set of two-dimensional (2-D) data with 1100 vectors each. This data set represents a two-class classification problem in which class 1 is separated in two disjoint areas by class 2 vectors. All vectors are uniformly distributed. These artificially created data mainly serve to enable graphical interpretation of results. This data set is often used to test the ability of a system to classify the island of class 1, which is inside the region of class 2 (closer to the top right-hand corner of Fig. 2), which is about 2% of the data set. The kitchen data set consists of 200 data vectors in the training set and 1000 data vectors in the test set. This 2-D data set represents three classes (similar to a floor plan of a kitchen) and, as shown in the right side of Fig. 2 has class 1 data at the bottom left corner and the top middle, class 2 data spreading from the bottom middle to the left middle and on the right, and class 3 data concentrated in remaining areas. It contains more noise than the previous data set. Another artificially generated data set Overlapping data is used to test the generalizing capability of learning algorithms. The data set with two inputs and two output classes contain highly overlapping areas, as shown in Fig. 3. Only algorithms with high generalizing capabilities can show better performance than 50% classification rate for this noisy data set. The iris data set [17] with four inputs and three output classes is a real-world data set that can also be considered to be a benchmark due to its use in many studies. The data set is divided into training and test sets, each of them having 75 vectors. The Solder data set is a subset of data from a real-world fault identification problem described in [18], consisting of 12 inputs and classifying solder joints into two classes: “good”

2768

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 11, NOVEMBER 1997

Fig. 2. Island data in the left and kitchen data in the right.

or “bad.” A training set with 180 vectors and recall set with 80 vectors are selected. Another real-world data set (Digit, which is from the optical digit recognition [19] containing 36 preprocessed inputs and ten outputs) is also tested with the proposed methods. Numerical characters from several writers are preprocessed, and 36 input features are selected. The extracted input features can be categorized as • pixel oriented features, considering either the location of pixels or the relation among them, or • line element features extracting the edges of the image. IV. BAYES RULE IN GENERATING NEW NEURONS Learning vector quantization (LVQ) is one of the wellknown nearest prototype learning algorithms [3]. It has the ability to place a constant number of reference vectors in input space Since LVQ can also be considered to be a supervised clustering algorithm, each weight vector can be interpreted as a cluster center. The LVQ algorithm was extended, later introducing LVQ2, LVQ3 [3], [20], Generalized LVQ (GLVQ), and Fuzzy LVQ (FLVQ) [2]. A major handicap in LVQ-type supervised clustering networks is that the number of reference vectors has to be set by the user. This may be too small for some applications and too large for others. Poirier and Ferrieux [21] proposed a method to generate new prototypes dynamically and adding a new prototype whenever an error in classification occurs and the distance to the closest prototype of the same class is greater than a function of the variance. This algorithm, known as DVQ, terminates when no more new prototypes are generated and the existence prototypes stabilize. This method, however, lacks the generalizing capability, resulting in the generation of many prototype neurons for applications with noisy data. It can be shown that DVQ3, which is described in the following section, is specifically suited for data sets with strong overlapping, where Bayes criterion is appropriate. A. Sub-Bayesian Approximation—DVQ3 LVQ algorithm modifies the weight vector of the nearest prototype with the parameter that converges with

Fig. 3. Overlapping data.

increasing time to zero (3) (4) The initialization is set to 0.5 at the very beginning for the first neuron and then increased to 1 for other generated neurons. The speed of reduction in is determined by , which was found to be ideally evaluated by the equation according to the simulations performed (5) where is the number of training data vectors available, and is the number of class labels.

HALGAMUGE: SELF-EVOLVING NEURAL NETWORKS FOR RULE-BASED DATA PROCESSING

Considering any th prototype neuron as , which is the variance of a winnning for all dimensions, is updated as

2769

TABLE I NUMBER OF GENERATED PROTOTYPES FOR DVQ VARIATIONS

(6) Initially, the variance is set to zero. The probabilities of generated neurons are updated with 1) a winner, if it is in the correct class (7) 2) Otherwise (8) The best initialization for the probability is The simple heuristic considered in DVQ3 is that a certain amount of misclassification is allowed in order to maintain the generalization. Whenever a misclassification occurs during the training [the correct class of the input vector is not same as class assigned by the network] the following two cases are examined. Considering 1) misclassification is to be allowed if

TABLE II CLASSIFICATION PERFORMANCE FOR DVQ VARIATIONS

(9) 2) misclassification is to be avoided if (10) is the decision function of class In the where first case, no new neurons are added, and in the second case, a new prototype neuron with the weight is added to the network. The existence of enough training data is the major prerequisite for the use of sub-Bayesian decision making in the dynamic generation of new prototypes. The PDF, which is usually unknown, can be estimated using training data. A local approach is to calculate the variance for each prototype neuron, assuming the Gaussian distribution. The Gaussian distribution is the one dimensional (1-D) Normal distribution with the mean and the covariance matrix When a misclassification occurs during the training, the can be roughly estimated conditional probability after the network is stabilized as

(11)

Assume that the output classes are equiprobable, i.e., (14) Since this is a constant, it can be defined as (15) of class as the decision function for Assuming learning patterns are statistically independent, covariance matrix is reduced to the vector of standard deviations , and the mean is the weight vector Considering only two neurons for simplicity, and applying (11) in (12), (12) in (13), and (13) and (14) in (15), we have

(16)

Similarly, the decision function for of class belonging of wrong class can be calculated as to the

(17) The probability that of class neuron

of class may belong to the generated is calculated as (12) (13)

B. Analysis on Simulation Results In Tables I and II, the results given by the methods using dynamically generated neurons can be compared.

2770

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 11, NOVEMBER 1997

Five different data sets are used to evaluate and compare the performance. The artificial data set Overlapping data is specially created to test the DVQ3 with Bayes decision possibility. This data set with two classes and two inputs contains extreme overlapping areas. The best results obtained for this data set is 67%, with DVQ3 as shown in Table II. With appropriate parameter settings, the same result is obtained with a backpropagation algorithm. Therefore, DVQ3 has the better generalization capability among the networks described. DVQ and DVQ3 are improved by adding the selfevolvement character to the LVQ algorithm. Ideally, the self-evolving neural network should be capable of generating neurons of different sizes and different shapes as they might fit into the distribution of data. The following section describes a group of algorithms capable of generalizing the shape and size of the self-evolving clusters. V. MULTIPLE-SHAPE BASIS FUNCTION NETWORKS (MSBFN’S) Reilly et al. describe in [5] the restricted Coulomb energy (RCE) algorithm for fast training of a special type of RBF network designed for classification tasks. Before learning, the number of input neurons and output classes have to be defined, whereas RBF neurons are dynamically created. If a newly created RBF neuron causes classification errors, its influence on the classification result is decreased through reducing the radius of its attraction region. To reduce computational efforts for learning large training sets and facilitate hardware solutions with high parallelism, a variation of the original RCE algorithm is implemented [22]. In this case, the region of attraction of each RBF neuron is a hyper box (which is an orthogonal object of higher dimensions with unequal edges) instead of a hyper circle. Therefore, the neurons are called cubic basis functions (CBF’s), and the improved learn algorithm is called a modified RCE (MRCE) [13]. The new method proposed in this paper exploits the Minkowski metric described in the (2) to generalize the RCE algorithm by considering distances with variable values, allowing variable-shaped regions of attraction for RBF-type neurons. Since they are no longer limited in shape or size, we call them multiple shape basis function (MSBF) neurons. In case of higher dimensions, we consider the attraction regions as hyper circles , as hyper boxes , or, in the generalized case (where can take any value), as a hyper body. The generalized RCE (GRCE) calgorithm is described in the next section. A. Description of GRCE Algorithm Let us assume an MSBF neuron is described by its synaptic weight vector (the center of the hyper body) and parameters representing hyper body extensions corresponding to the input axes. Considering the analogy to a hyper circle, we call it the vector of radial parameters. Fig. 4 shows a 2-D input vector with The values , as shown in the figure, are not useful, and the highest value is limited to due to reasons of computing

Fig. 4.

Clusters for multiple shape.

[23]; the choice of a value greater than 20 would not make a huge difference for many applications, as can be seen by comparing the cluster with , shown in Fig. 4, with the cluster when tends to infinity [the box (0, 1), (10, 1), (10, 7), (0, 7)]. To calculate whether an input vector is falling into a neuron’s region of attraction, we have (18) If an input data vector neuron’s hyper body

is on the boundary line of the if

inside the body

if

outside the body.

(19)

There are two methods of reducing or expanding the region of attraction in order to include or exclude an input vector. The first method is updating a radial parameter from the using vector (20) The second method is to update (21) The constant was empirically found as 0.5 in case of contraction and 3 in case of expansion. This can be justified , and the factor since contraction is essential when should be reduced at a moderate speed. In case an expansion has to be increased rapidly. is needed where The GRCE learning algorithm is described as follows. 1) If the classification upon presentation of is correct, i.e., output neuron is active with , then no network changes occur. The weight vector of the corresponding MSBF neuron is modified according to (3) and (4) with The factor converges with the number of winning inputs to zero. become active on presentation 2) If output neurons with (misclassification), reference of

HALGAMUGE: SELF-EVOLVING NEURAL NETWORKS FOR RULE-BASED DATA PROCESSING

2771

Fig. 5. RBFN (left), CBFN (middle), and MSBFN (right) for Island data.

neurons , which cause the misclassification, must be modified, achieving This can be done by two methods, depending on the location of the input vector. • If the input vector is within the hyper boxes of neurons , then the radial parameters of the hyper bodies must be reduced just so much that has they do not contribute to activate an output ( to lie outside their regions of attraction). • Otherwise, there is a possibility of reducing In the first case, only a single component of of a MSBF neuron, namely, the component , has to be modified, for which

(22) holds. The selected element is the one defining the hyper body radial parameter along axis , where the distance between and is maximal for The component has to be modified according to (20). In the second case can be reduced according to (21). belonging to class is 3) If an input vector presented to the network, producing no active output neuron or none of the hyper bodies of MSBF neurons means that it is active. If it happens at the early stages of convergence, then the closest hyper body of the same class is extended to incorporate the new vector. Again, there are two methods, depending on the location of the input vector. • If the input vector is within the maximum hyper body of neurons , then is increased. • Otherwise, the radial parameters of a hyper body must be increased just so much that it contributes to the has to lie inside activatation of the output neuron. the region of attraction.) In the first case, can be reduced according to (21). For of the closest the second case, the selected element boundary is expanded according to (20). A number of iterations can be needed to activate the output neuron

TABLE III NUMBER OF GENERATED PROTOTYPES FOR RBFN VARIATIONS

since might not be included inside the region of attraction by extending only a single radial parameter. If the same input vector does not activate after the adequate number of iterations, the algorithm decides to generate a new MSBF neuron with weight , and hyper body extension vector has to be inserted, where are some predefined initial values. The newly created hidden neuron is connected to the corresponding output neuron. 4) The algorithm terminates after learning the training data set 100% or reaching the maximum number of steps. B. Simulation Results Similar to the RCE and MRCE, GRCE converges after a number of sweeps through the training set. The resulting input space segmentation with RBF and CBF neurons are compared with the new MSBFN’s applied in two benchmark data sets in Figs. 5 and 6. In Fig. 5, the dark and dotted lines represent the clusters of classes 1 and 2 of the Island data, respectively, whereas in Fig. 6, dark, very dark, and dotted lines represent the clusters of classes 1, 2, and 3 of the Kitchen data, respectively. The summary of results are shown in Tables III and IV for three benchmarks and an application data set. Table III shows that more compact networks are created with GRCE and MSBFN in comparison with the existing RCE and RBFN. Table IV describes the performance on the test data sets obtained by applying the networks trained using the training data sets.

2772

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 11, NOVEMBER 1997

Fig. 6.

RBFN (left), CBFN (middle), and MSBFN (right) for Kitchen data.

TABLE IV CLASSIFICATION PERFORMANCE FOR RBFN VARIATIONS

TABLE V SELF-EVOLVING NEURAL ALGORITHMS COMPARISON

B. Comparison of Results VI. CONCLUSIONS The paper describes self-evolving-type neural network algorithms, which are equivalent to classifier-type fuzzy systems, and their performances on benchmarks as well as on real world data sets. A. Comparison of Algorithms Table V shows a comparison of algorithms discussed in the paper. A basic restriction of the LVQ/DVQ methods in considering their interpretation as fuzzy classifiers is the constant variance of the membership functions for all the dimensions. This is due to the fixed size of the generated prototype clusters. It leads to the generation of more fuzzy rules than necessary since the shape of all the membership functions remain constant, whereas the effective placement of them is the major focus of the method. It is also not avoidable that each prototype interpreted as a fuzzy rule contains membership functions for each input dimension. In comparison with the neuro-fuzzy system based on off-line gradient descent techniques FuNe I [19], where the position and the shape of the sigmoid membership functions can be tuned with gradient descent methods, only the positional tuning is allowed in this case. Another restriction of LVQ/DVQ and DVQ3, which is also common to the systems proposed in [24] and [25], will lead to the loss of transparency in case of a very large number of generated rules (or prototype neurons). However, in contrast with the methods proposed in [24] and [25], this system needs neither predetermined rule structure in form from a priori knowledge nor expensive initalization before training.

Comparison of the performance of RCE, MRCE, and GRCE described in Figs. 5 and 6 and Tables III and IV clearly show that MSBFN’s with GRCE training generate fewer numbers of hidden MSBF neurons, and the classification accuracy is superior to the DVQ3. In case of the real world data set Solder data, the number of hidden neurons created has been lowest in the case of CBF. However, this is to be explained with the unsatisfactory performance reported. Comparing with the results obtained for DVQ and DVQ3, it is apparent that MSBFN with the GRCE method is a generalized algorithm suitable for applications requiring local or global optimization learning strategy, whereas the DVQ variations are specialized for either of those strategies. For example, MSBFN achieved good results for the benchmark Iris (which is similar to DVQ3), equivalent results for the real data set Solder (which is similar to DVQ), and best results for the benchmark Island data. Good results could be obtained for comparison with other neural networks that are equivalent to fuzzy systems such as 98.67% for Iris with Fuzzy Min-Max [26] for FuNeI, 99% for Solder with Fuzzy Rule Net [16] for FuNe I [19], and 66% for Overlapping data with Min-Max for FuNe I. It was reported that 97.3% performance was obtained with NEFCLASS [11] for the Iris data set. However, methods such as FuNe I need many neurons and higher training time than the MSBFN method. ACKNOWLEDGMENT The author wishes to thank Dr. W. Poechmueller and Prof. M. Glesner for their support at the early stages of this research,

HALGAMUGE: SELF-EVOLVING NEURAL NETWORKS FOR RULE-BASED DATA PROCESSING

Dr. L. Kuncheva for the kitchen data set, and the reviewers for their useful comments. REFERENCES [1] S. K. Halgamuge, Advanced Methods for Fusion of Fuzzy Systems and Neural Networks in Intelligent Data Processing. D¨usseldorf: Inst. Engineers Germany (VDI Verlag), 1996. [2] J. C. Bezdek, “A review of probabilistic, fuzzy, and neural models for pattern recognition,” J. Intell. Fuzzy Syst., vol. 1, 1993. [3] T. Kohonen, J. Kangas, J. Laaksonen, and K. Torkkola, “LVQ-PAK: The learning vector quantization program package,” Tech. Rep., Helsinki Univ. Technol., Espoo, Finland, Jan. 1992. [4] M. Hoehfeld and S. E. Fahlman, “Learning with limited numerical precision using the cascade correlation algorithm,” IEEE Trans. Neural Networks, vol. 3, 1992. [5] D. L. Reilly, L. N. Cooper, and C. Elbaum, “The use of multiple measurements in taxonomic problems,” Annu. Eugenics, vol. 45, pp. 35–41, 1982. [6] G. Carpenter, S. Grossberg, and D. Rosen, “Fuzzy ART: An adaptive resonance algorithm for rapid, stable classification of analog patterns,” in Proc. Int. Joint Conf. Neural Networks, Seattle, WA, 1991. [7] S. K. Halgamuge and M. Glesner, “Fuzzy neural networks: Between functional equivalence and applicability,” IEE Int. J. Neural Syst., vol. 6, no. 2, pp. 185–196, June 1995. [8] H. Takagi, N. Suzuki, T. Kouda, and Y. Kojima, “Neural-networks designed on approximate reasoning architecture and its applications,” IEEE Trans. Neural Networks, vol. 3, pp. 752–760, 1992. [9] C. T. Lin and C. S. G. Lee, “Reinforcement structure/parameter learning for neural-network-based fuzzy logic control systems,” in Proc. Second IEEE Int. Conf. Fuzzy Syst., San Francisco, CA, Mar. 1993. [10] P. Vuorimaa, “Fuzzy self-organizing map,” Intl. J. Fuzzy Sets Syst., 1994. [11] D. Nauck and R. Kruse, “NEFCLASS—A neuro-fuzzy approach for the classification of data,” in Proc. ACM Symp. Applied Comput., Nashville, TN, Feb. 1995. [12] S. K. Halgamuge and M. Glesner, “A trainable transparent universal approximate for defuzzification in mamdani type neuro-fuzzy controllers,” to be published. [13] S. K. Halgamuge, W. P¨ochm¨uller, and M. Glesner, “An alternative approach for generation of membership functions and fuzzy rules based on radial and cubic basis function networks,” Int. J. Approx. Reasoning, vol. 12, no. 3/4, pp. 279–298, Apr./May 1995. [14] J. S. R. Jang and C. T. Sun, “Functional equivalence between radial basis function networks and fuzzy inference systems,” IEEE Trans. Neural Networks, vol. 4, 1993. [15] T. Kohonen, Self-Organization and Associative Memory. Berlin, Germany: Springer-Verlag, 1989. [16] N. Tschichold-G¨urman, “Generation and improvement of fuzzy classifiers with incremental learning using fuzzy rule-net,” in Proc. ACM Symp. Applied Comput., Nashville, TN, Feb. 1995. [17] E. Anderson, “The irises of the Gaspe peninsula,” Bull. Amer. Iris Soc., vol. 59, no. 1, pp. 2–5, 1935.

2773

[18] S. K. Halgamuge, W. P¨ochm¨uller, and M. Glesner, “A rule-based prototype system for automatic classification in industrial quality control,” in Proc. IEEE Int. Conf. Neural Networks, San Francisco, CA, Mar. 1993, pp. 238–243. [19] S. K. Halgamuge and M. Glesner, “Neural networks in designing fuzzy systems for real world applications,” Int. J. Fuzzy Sets Syst., vol. 65, no. 1, pp. 1–12, 1994. [20] D. DeSieno, “Adding a conscience to competitive learning,” in Second Annu. IEEE Int. Conf. Neural Networks, 1988, vol. 1. [21] F. Poirier and A. Ferrieux, “DVQ: Dynamic vector quantization—An incremental LVQ,” in Int. Conf. Artif. Neural Networks, 1991, pp. 1333–1336. [22] T. Hollstein, S. K. Halgamuge, and M. Glesner, “Computer aided fuzzy system design based on generic VHDL specifications,” IEEE Trans. Fuzzy Syst., Nov. 1996. [23] S. K. Halgamuge, “Multiple shape basis function networks for rule based analysis of data,” in Australian New Zealand Conf. Intell. Inform. Syst., Adelaide, Australia, Nov. 1996. [24] L. X. Wang and J. M. Mendel, “Backpropagation fuzzy system as nonlinear dynamic system identifiers,” IEEE Int. Conf. Fuzzy Syst., San Diego, CA, 1992, pp. 1409–1418. , “Fuzzy basis functions, universal approximation, and orthogonal [25] least squares learning,” IEEE Trans. Neural Networks, vol. 3, pp. 807–814, 1992. [26] P. K. Simpson, “Fuzzy min-max neural networks—Part 1: Classification,” IEEE Trans. Neural Networks, vol. 3, Sept. 1992.

Saman K. Halgamuge (M’85) received the B.Sc. degree in electronic and telecommunication engineering from the University of Moratuwa, Sri Lanka, in 1985 and the Dipl.-Ing. and Dr.-Ing. degrees in computer engineering from Darmstadt University of Technology, Darmstadt, Germany, in 1990 and 1995, respectively. In 1985, he worked as an Engineer at the Ceylon Electricity Board, Sri Lanka. From 1990 to 1995, he was a research associate at Darmstadt University of Technology. After lecturing in computer systems engineering and being associated with the Institute for Telecommunications Research and the School of Physics and Electronic Engineering Systems, University of South Australia, from 1996 to July 1997, he joined the Department of Mechanical and Manufacturing Engineering, University of Melbourne, Australia, as a Senior Lecturer in Mechatronics. Since 1996, he has been a Member of the Cooperative Research Center for Sensor Signal and Information Processing of Australia. He has published about 60 conference/journal papers and contributed to books in the areas of data analysis, mechatronics, neural networks, genetic algorithms, and fuzzy systems. His research interest also includes data and information fusion, communication networks, and modeling and simulation in manufacturing systems.