a new hybrid model using case-based reasoning ... - Semantic Scholar

1 downloads 0 Views 276KB Size Report
ABSTRACT. Case-Based Reasoning (CBR) is one of the preferred problem-solving strategies and machine learning techniques in complex and dynamically ...
IADIS International Conference Applied Computing 2007

A NEW HYBRID MODEL USING CASE-BASED REASONING AND DECISION TREE METHODS FOR IMPROVING SPEEDUP AND ACCURACY Asgarali Bouyer Azad University of Miyandoab Miyandoab, Iran

Bahman Arasteh Azad University of Ahar Ahar, Iran

Ali Movaghar Sharif University of Technology Tehran, Iran

ABSTRACT Case-Based Reasoning (CBR) is one of the preferred problem-solving strategies and machine learning techniques in complex and dynamically changing situations [1]. In addition, Decision Tree is one of the most popular and frequently used methods in data mining for searching predictive information. In this paper, we present a new hybrid method using Case-Based Reasoning and Decision Tree to construct an efficiently new approach over centralized data. Then we tried to compare our method with a standard CBR technique implementation in terms of accuracy and speedup. Our experimental results show that by using CBR on our desired sample related to its own class in Decision Tree, we will get a better speedup with reduced error rate (high confidence). It has been widely adopted in a high performance field to deal with huge data and complex computation. The method is able to deal with large data sets while using a sequential implementation of Decision Tree. KEYWORDS Case-Based Reasoning, Decision Tree, Data mining, Classification.

1. INTRODUCTION Data mining is defined as the persistent and interactive process of discovering valid, useful, and understandable models in data sets [10]. Also data classification is an important data mining task [6] that tries to identify common characteristics in a set of N records (examples) contained in a database and to categorize them into different classes. The most famous technique for data classification constitutes Decision Trees. CBR is used in learning systems to solve new problems. It is discussed in section 2. The primary focus of this paper is on the combination of case-based-reasoning over classes of tree in Decision Tree technique. We explain how CBR can be used in Machine Learning systems that employ Decision Tree technique. Finally, we present some initial empirical results derived from testing our ideas, and draw some figures based on those results.

787

ISBN: 978-972-8924-30-0 © 2007 IADIS

2. CASE-BASED REASONING (CBR) Case-Based Reasoning (CBR) is a novel problem-solving strategy and machine learning technique that has been developed rapidly in recent years in Artificial Intelligence [10]. It can solve problems by evaluating and relating some previously solved problems or experiences to a current, unsolved problem in a way that facilitates the search for an acceptable solution [1]. However, CBR has its limitations. For example, it has difficulty in expressing concepts which can be easily understood by people, it is always susceptible to noises, and it is short of good case adaptation mechanism. In most cases, CBR method cannot ensure the good performance of the system and needs other techniques as supplement. The combination of CBR with different learning methods will ensure the best performance of the system [3].

3. DECISION TREE INDUCTION Given a training set of records or examples tagged with a class label, Decision Tree model can predict the class label of unlabeled future examples with high accuracy. It is composed of nodes, where each node contains a test on an attribute, each branch from a node corresponds to a possible outcome of the test, and each leaf contains a class prediction. Decision Tree is built in two phases: a growth phase and a pruning phase. The tree is grown by recursively replacing the leaves by test nodes, starting at the root [5,9]. As the goal of classification is to accurately predict new cases, the pruning phase generalizes the tree by removing sub-trees corresponding to statistical noise or variation that may be particular only to the training data [8].

4. PRESENTING THE HYBRID METHOD This method tries to improve CBR operation. As it was said previously there are some problems that lead to reduce the efficiency and confidence of the result. This newly introduced method consists of two phases: Phase 1: primary processing of information to make Decision Tree and assigning each existing record to its related class. Decision Tree of course, should be made and preserved only once. Phase 2: final processing and predicting the situation of a record, by using neighboring records. At first phase, to identify the main and efficient parameters in existing database system and to clarify their effect on final result, we do a processing operation on it. Then try to classify the information into different classes (by using decision tree classifier). At second phase, first we try to insert the desired record on its class according to previously done classification in Decision Tree. Then considering the number desired neighbors, we select the existing records which are similar to our desired record and perform the predicting operation. The information or primary system parameters are identified and integrated first, and then following some phases, the main required parameters are computed and classified. Next we can get to the final result of the problem by performing the final processing among the desired record and its neighbors (in the same class).

5. EVALUATING NEW METHOD At the time of selecting training set (or testing set) records according to the nearest neighbors, the suggested method automatically tries to select the records which are similar to desired record among the classes. This is done by considering the important parameters of the problem. The result leads to increase the rate of confidence. Because in many cases all the existing records in training set are similar to each other. As a next advantage, the system only processes records in a class which have the most similarity with the desired record and this is understood by comparing some main parameters. So if there is no similarity between the desired record and the existing record involved in estimating operation, we will face with weak system

788

Figure 1. comparing the new method and classic CBR for 50 sampling from neighbors

IADIS International Conference Applied Computing 2007

performance and low confidence ratio in results. Therefore, it prevents from computing the mentioned records. Figures 1 to 3 illustrate the performance of the new method with classic CBR. You should be aware that we designed a classic CBR algorithm and a new algorithm for our hybrid method, and implemented them over data set (a text file).

70 60 50 40 30 20 10

new hybrid method classic CBR

10

20

50 100 200 400

number of neighbors that comparision with new desird sample figure 2: comparing the new method and classic CBR for 1000 records

10 0

0

10 00 0

80

350 300 250 200 150 100 50 0 10 00

classic CBR

execution tim

reduced error ratio against inreasing number of sampleing neighbor

new hybrid method

number records in database Figure 3. comparing the new method and classic CBR for 50 sampling from neighbors

6. CONCLUSION This article suggests a hybrid recommendation mechanism. The hybrid recommendation mechanism is based on Decision Tree classifier and CBR, which is aimed at enriching the recommended information. So we could show that the new hybrid method has a high degree of speedup and accuracy in comparison to the classic CBR.

REFERENCES Bouyer, A, 2006 “Distributed Data Mining Technique on Grid Environment”, M.Sc thesis Azad University of Arak. Halikul, B. et al, 2005 “Applying Case-Based Reasoning (CBR) in Networked Appliances (NAs)”. Hunt, E.B. et al,1966 “Experiments in Induction”. Academic Press. Jürgen, H, Brezany, P, 2004 “Distributed Decision Tree Induction within the Grid Data Mining Framework GridMinerCore”, TR2004-04. Kantardzic, M. et al , 2003 “Data Mining : Concepts, Methods and Algorithms”. Pi-Sheng Deng, 1994 “Using Case-Based Reasoning for Decision Support”, in Proceedings of the Twenty-Seventh Annual Hawai International Conference on Systems Sciences, pp.552-561. Quinlan, J.R, 1986 “Induction of Decision Trees,” Machine Learning, vol. 1, pp. 81-106. Rastogi, R. Shim, K, 2000 “PUBLIC: A Decision Tree Classifier That Integrates Building and Pruning,” Data Mining and Knowledge Discovery, vol. 4, no. 4, pp. 315-344. Shu-Tzu.Tsai, Chao-Tung Yang , 2005 “Decision Tree Construction for Data Mining on Grid Computing” Tunghai University, IEEE. Zhi-Wei Ni, Shan-Lin Yang, Long-Shu Li, Rui-Yu Jia, 2003 “Integrated Case-Based Reasoning”, in Proceedings of the Second International Conference on Machine Learning and Cybernetics, Xi’an, pp.1845-1849.

789