A Study on Classification Learning Algorithms to ...

13 downloads 28354 Views 1MB Size Report
novel patterns on a large volume of crime data [6]. Other techniques, such as semantic analysis and text mining are used to extract entity extraction from FBI ...
A Study on Classification Learning Algorithms to Predict Crime Status Somayeh Shojaee, Aida Mustapha, Fatimah Sidi, Marzanah A. Jabar

A Study on Classification Learning Algorithms to Predict Crime Status 1

1,2,3,4

Somayeh Shojaee, 2Aida Mustapha, 3Fatimah Sidi, 4Marzanah A. Jabar Faculty of Computer Science and Information Technology, Universiti Putra Malaysia, {somayeh_shojaee, aida, fatimahcd, marzanah}@fsktm.upm.edu.my

Abstract In the recent past, there has been a huge increase in the crime rate, hence the significance of task to predict, prevent or solve the crimes. In this paper, we conducted an experiment to obtain better supervised classification learning algorithms to predict crime status by using two different feature selection methods tested on real dataset. Comparisons in terms of Area Under Curve (AUC), that Naïve Bayesian (0.898), k-Nearest Neighbor (k-NN) (0.895) and Neural Networks (MultilayerPerceptron) (0.892) are better classifiers against Decision Tree (J48) (0.727), and Support Vector Machine (SVM) (0.678). Furthermore, the performance of mining results is improved by using Chi-square feature selection technique.

Keywords: Classification, AUC, Naïve Bayesian, Neural Networks, k-Nearest Neighbor, Decision Tree, Support Vector Machine and Chi-square

1. Introduction Due to increasing the amount of data, a need to develop technologies to analyze data in different fields, such as business, medicine and education, has emerged [1]. Therefore, data mining methods have become the main tools to analyze data and to discover knowledge from them [2]. Here, data mining refers to an integration of multiple methods such as classification, clustering, evaluation, and data visualization [3]. One of these data which needs data mining techniques to discover and predict underlying patterns are crime data [4]. A high number of crimes in different countries have forced governments to use modern technologies and methods to control and to prevent crimes. Data mining techniques are able to identify patterns rapidly for detecting future criminal actions [5]. This is because manual interpretations of crime data are limited due to the size of data as well as the complexity among different crime attributes. Data mining methods accelerate crime analytics, provide better analysis and produce realtime solutions to save considerable resources and time [6]. Today, a high number of crimes are causing a lot of problems in many different countries. In fact, scientists are spending time studying crime and criminal behaviors in order to understand the characteristics of crime and to discover crime patterns. It is known that criminals follow repetitive behaviour patterns, so analyzing their behaviors can help to capture relations among events from past crimes [7]. In this research, crime research studies are integrated by data mining techniques to identify the patterns and to achieve more accurate results. To analyze crimes, there are several characteristics such as different races in a society, income groups, age groups, family structure (single, divorced, married), level of education, the locality where people live, number of police officers allocated to a locality, number of employed and unemployed people among others [8]. Dealing with crime data is very challenging as the size of crime data grows very fast, so it can cause storage and analysis problems. In particular, issues arise as to how to choose accurate techniques for analyzing data due to the inconsistency and inadequacy of these kinds of data. These issues motivate scientists to conduct research on these kinds of data to enhance crime data analysis. The objective of this evaluation is twofold. First, it determines whether the feature selection technique is useful to infer better classification accuracy and performance. Second, it compares the different classifiers in terms of AUC for choosing more accurate algorithms to classify crime status in the United States of America for obtaining a deeper insight into crime. In this study, a real crime dataset is used for data mining from UCI Machine Learning Repository. Five different Classification Algorithms are used to classify dataset based on a binominal class, the crime status. Examined classifiers are Naïve Bayesian, Decision Tree (J48), Support Vector Machine (SVM), Neural Networks (MultilayerPerceptron) and k-Nearest Neighbor. By experiment, results of

International Journal of Digital Content Technology and its Applications(JDCTA) Volume7,Number9,May 2013 doi:10.4156/jdcta.vol7.issue9.43

361

A Study on Classification Learning Algorithms to Predict Crime Status Somayeh Shojaee, Aida Mustapha, Fatimah Sidi, Marzanah A. Jabar

five algorithms for two features stets are studied and compared, and the more efficient algorithms in predicting the goal class (crime status) are then identified. There are many tools available for data mining. For this study, the Rapidminer is chosen as it is freely available from the website www.rapidminer.com. This paper is made up of the following sections. Section 2 presents the related works, Section 3 discusses the dataset and pre-processing, Section 4 on the selected classifiers, Section 5 on results and discussions and Section 6 on the conclusion.

2. Related Work Based on existing research, it has been identified that data mining techniques aid the process of crime detection. Some examples of data mining techniques usage to analyze crime data are classification and machine learning algorithms. Yu et al. [9] employ an ensemble of data mining classification techniques for crime forecasting. Several classification methods that are included in the study are One Nearest Neighbor (1NN), Decision Tree (J48), Support Vector Machine (SVM), Neural Network (Neural) with 2-layer network, and Naïve Bayesian (Bayes). Detecting patterns of serial criminal behaviors and crime activity geographically by using clustering in [6] are also used for pattern recognitions and predictions. Nath [10], in his study, applies clustering by considering the geographical approach which shows regional crimes on a map and clusters crimes according to their types by using a combination of K-means Clustering Algorithm and Weighting Algorithm. Clustering and graph representations are also used to obtain similar crime and group classes of criminals, as well as to visualize the results. Clustering features such as shape, size, and distribution are able to help understand more details about relevant crimes [11, 12] including a clustering analysis on US State database [13]. Association mining is one of the acceptable methods to discover the underlying novel patterns on a large volume of crime data [6]. Other techniques, such as semantic analysis and text mining are used to extract entity extraction from FBI bulletins [14, 15, 16]. In [6], a fuzzy association rules mining application for community crime pattern discovery is proposed. The application produced interesting and meaningful rules at regional and national levels and, to extract novel rules, a relative support metric is defined. In 2007, Ng et al. [17] propose the Incremental Temporal Association Rules (ITAR), an incremental mining algorithm to discover crime patterns. ITAR is able to prevent rescanning of database, which is the main bottleneck in Apriori-based association rule mining. It employs temporal association rule mining when the amount of data is growing. Bruin et al. [14] present a new distance measure to evaluate individual criminals using the profiles to cluster them and enable recognition of criminals' classes. This research also present a particular distance measure for a combination of the profile differences with crime frequency and change of criminal behavior over time. Brown [18] creates the Regional Crime Analysis Program (ReCAP) which provides crime analysts with both data fusion and data mining, to aid Virginia law enforcement for capturing professional criminals in their own region. Coplink is one of various research systems which have helped to enhance criminal intelligence analysis by using a co-occurrence concept space [16]. Redmond and Baveja [19] propose a Crime Similarity System (CSS) to assist police departments for developing a decision-making strategic view point. The system makes a list of communities by using the cities’ enforcement profiles, crime and socioeconomic for obtaining knowledge from past experiences. The same author in [20] conducts a study of several algorithms for numeric prediction using case-based reasoning (CBR). Their emphasis is on case quality, an attempt to filter out cases that may be noisy or idiosyncratic because they are not good for future prediction. The results showed significant increase in the percentage of correct predictions at the expense of an increased risk of poor predictions in less common cases. Chen et al. [21] develop a crime data mining framework based on the Coplink project experience. Abraham and Vel [5] propose a method to realize criminals' behaviors by using computer log files for seeking some relationship among data and produced profiles which are used to understand the behaviors.

362

A Study on Classification Learning Algorithms to Predict Crime Status Somayeh Shojaee, Aida Mustapha, Fatimah Sidi, Marzanah A. Jabar

3. Dataset Description The “Communities and Crime Unnormalized” dataset, available from the UCI Machine Learning Repository, was employed for this study. This dataset focuses on American communities, and combines socio-economic data from the 1990 US Census, law enforcement data from the 1990 Law Enforcement Management and Administrative Statistics survey, and crime data from the 1995 US FBI Uniform Crime Report [22]. These dataset were collected by Buczak [6]. The dataset consisted of 2215 total instances and 147 attributes for communities, 125 predictive, 4 non-predictive and 18 potential goal attributes. The data in each instance belong to different states in the USA. The states are represented in the form of number, every number representing its respective American state. Attributes include information across a variety of crime-related facets, ranging from the percentage of officers assigned to drug units, to population density and percent considered urban and to median household income. Also included are measures of crimes considered violent, which are murder, rape, robbery, and assault. The complete details of all 147 attributes can be obtained from the UCI machine learning repository website.

3.1 Pre-processing There are a few techniques in practice employed for the purpose of data preprocessing. The techniques are data cleaning, discretization and data transformation, and feature selection. It intends to reduce some noises, incomplete and inconsistent data. The result from the preprocessing step is then followed by the data mining algorithm. In this study, pre-processing is carried out in two steps, which are data cleaning and data transformation based on Buczak’s work [6]. For the first step, the goal for data cleaning is to decrease noise and handle missing values. There are a number of methods for treating records that contain missing values such as omitting the incorrect fields(s), omitting the entire record that contains the incorrect field(s), automatically entering or correcting the data with default values, deriving a model to enter or to correct the data, replacing all values with a global constant and using the imputation method to predict missing values. In this study, some communities were removed based on occurrences of significant missing or known incorrect crime statistics. Certain attributes contain a significant number of missing values, more than 80%, for which the data was unavailable or not recorded for particular communities. These attributes with high amount of missing values were removed such as the pctPolicWhite, and the pctPolicBlack. All kinds of crime attributes (potential goal attributes) with missing values were removed because only a total number of violent crimes per 100K population attribute, which is violentPerPop will be considered as class. All 221 instances with violentPerPop missing values were removed and 1994 remained because violentPerPop has been chosen as the goal attribute. For the second step, we performed data normalization, discretization, and data type transformation. For all attributes except state, min-max normalization to [0, 1] is used to avoid the large value issue as it has the advantage of protecting exactly all relationships in the data and to prevent any bias injection [7]. Next, we also discretized the selected class, which is total number of violent crimes per 100K population (violentPerPop), into a binomial class crime status (CrimeStatus). In order to perform prediction, the goal class should be nominal in nature. After discretization, we also performed data type transformation because initial class has two values, “critical” and “non-critical”. If the value in violentPerPop is less than forty percent, than the value of CrimeStatus set to “non-critical”, otherwise “critical”. The states are converted from nominal to numeric; whereby every number represents the respective American state.

3.2 Feature Selection Relevance analysis or feature selection is used to remove the irrelevant or redundant attributes. Feature selection has several objectives such as enhancing model performance by avoiding overfitting in the case of supervised classification. In this study, as mentioned above, crime status (CrimeStatus) was chosen among eighteen potential goal attributes as the desirable dependent variable. For attribute selection, two mechanisms were used to select the final set of attributes:

363

A Study on Classification Learning Algorithms to Predict Crime Status Somayeh Shojaee, Aida Mustapha, Fatimah Sidi, Marzanah A. Jabar





The Golden Standard or manual selection of attributes which is based on human understanding and intellect. This selection was based on the previous study [6]. Following their methodology yielded 44 attributes where a majority of the 44 selected attributes are percentages, or are computed per 100K population. Similar attributes were removed; for example, from the attributes of the number of people under the poverty level (NumUnderPov) where only the percentage of people under the poverty level (PctPopUnderPov) was kept. Also, attributes with large number of missing values were removed. Using the Chi-square test to detect the correlated attributes as they will ruin the classification result because of high dependency among some attributes caused redundancy and subsequently inaccurate results. Chi-square is one of the most effective feature selection methods of classification. Chi-square feature selection has been adopted and 94 attributes are selected.

4. Selected Classifiers After preprocessing and feature selection phases, the numbers of attribute was meaningfully cut down and are now more precise for building the data mining models. In order to quantitatively predict the crime status, many data mining methods can be used. In this study, a classification task is applied for prediction. Classification as a famous data mining supervised learning techniques is used to extract meaningful information from large datasets and can be efficaciously used to predict unknown classes [23]. There are various classification algorithms, such as Support Vector Machines (SVM), k-Nearest Neighbor (k-NN), Decision Tree, Weighted Voting and Artificial Neural Networks. All these techniques can be applied to a dataset for discovering sets of models to forecast unknown class labels [2]. The data classification process in this study has two steps, namely the training step to build a model and using the model for classification. Classification techniques can be applied to a dataset for finding sets of models to predict the unknown class labels. The predictive accuracy of the classifier is measured by using the training set and the accuracy of classifier on a given test set is the percentage of test set tuples which are classified correctly. If the accuracy is acceptable, the classifier can be used for future data tuples for which the class label is unknown [3]. Based on the classification algorithms are used in [9], five different classification algorithms, namely Naïve Bayesian (Bayes), Decision Tree (J48), Support Vector Machine (SVM), Neural Network (MultilayerPerceptron), and k-Nearest Neighbor (k-NN) are chosen to perform classification on the dataset. In the following section, the results of the algorithms will be compared and the more efficient algorithms in crime status prediction will be determined through AUC comparison. Naïve Bayesian classifiers, by adopting a supervised learning approach, have the ability to predict the probability of a given tuple dependency to a specific class. This classifier is very simple to construct, and it may be easily applied to huge data sets [3]. Decision Tree as another supervised learning approach performs a tree where each node refers a test on an attribute value. The leaves symbolize classes or class distributions which predict classification models. The branches show coincidences of features, which go to classes. Decision tree algorithms, after treating all the dataset as a large single set, return to recursively split the set. To construct a tree, a top down move is applied until some stopping criterion is met and different methods, such as Gain in entropy, is used for making nodes [24]. Support Vector Machines (SVM) is a group of supervised learning methods that can be employed for classification or regression [25, 26, 27]. In a two-class learning task, the SVM goal is to discover the best classification function to differentiate between members of the two classes in the training data. For that purpose, SVM construct a hyperplane or a set of hyperplanes in a high or infinite dimensional space for separating dataset and SVM find the best function by maximizing the margin between the two classes. Neural Networks, as another classification method, is a nonlinear model which is able to model real world complex relationships. Neural networks could estimate the posterior probabilities, which give the basis for setting up classification rules and conducting statistical analysis [28].

364

A Study on Classification Learning Algorithms to Predict Crime Status Somayeh Shojaee, Aida Mustapha, Fatimah Sidi, Marzanah A. Jabar

K-Nearest Neighbor (k-NN) classifiers are based on learning by comparison of given test tuple with training tuples. For an unknown tuple, a k-NN classifier seeks to detect a group of k objects in the training set that are closest to the unknown tuple and the unknown tuple labels based on the predominance of a specific class in this neighborhood [3]. One of the elements of this classifier is the value of k which is set to 10 for this study regarding the highest results of accuracy.

5. Results and Discussions In this experiment, the 10-fold cross validation mode was chosen. It randomly divided database into 10 separate blocks of objects and then data mining algorithm was trained using 9 blocks. Meanwhile, the rest was used for testing the algorithm’s performance and the process repeats the k times. Finally, the average of the results was calculated [2]. Evaluation on five selected classification algorithms on two different sets of features was conducted by comparing the findings on precision, recall, accuracy, and AUC. Precision shows that the proportion of data is classified correctly. Recall represents the percentage of information which is relevant to the class and is correctly classified. Accuracy is the percentage of instances which is classified correctly by classifiers. Table 1 illustrates the precision and recall for both different sets. Table 1. Precision and Recall Method

Naïve Bayesian Decision Tree (J48) Support Vector Machine (SVM) Neural Network (MultilayerPerceptron) k-Nearest Neighbor (K=10)

Precision (%) Set 1: 44 Set 2: 94 attributes attributes 86.7 87.5 84.6 85.7 85.0 86.1 85.1 86.8 86.9 87.3

Recall (%) Set 1: 44 Set 2: 94 attributes attributes 84.6 84.4 85.0 86.3 85.6 86.5 85.3 87.1 87.5 88.0

As depicted in Table 1, precision and recall have been enhanced after using Chi-square feature selection technique. However, the difference is not significant but it is proven the feature selection helps in achieving better classification result. After talking about the importance of feature selection, the following experiment is to determine the better classifiers among others. To fulfill this purpose, five selected classifiers have been applied for testing. As shown in the table, the precision results of Naïve Bayesian (87.5%) and k-NN (87.3%) are better than others and the recall value of k-NN (88.0%) is the best among other classifiers. A graphical plot (ROC) is used to illustrate the performance of classifiers. Another Receiver Operating Characteristic (ROC) curves are usually used to show results for binary decision problems in data mining. ROC presents how the number of correctly classified positive examples varies with the number of incorrectly classified negative examples [29]. If the ROC curve of the first classifier is always over the ROC curve of the second classifier, we can conclude that the first classifier is better than the second classifier. Based on Figure 1 and Figure 2, this scenario does not occur in this study. In this case, for instance, the SVM ROC curve is over the ROC curve of the Naïve Bayesian classifier in one part, whereas the SVM classifier’s curve is over the ROC curve of the Naïve Bayesian classifier in some other part. This then implies that the two classifiers are preferred under different loss conditions.

365

A Study on Classification Learning Algorithms to Predict Crime Status Somayeh Shojaee, Aida Mustapha, Fatimah Sidi, Marzanah A. Jabar

Figure 1. ROC comparison for set 1 (with 44 attributes)

Figure 2. ROC comparison for set 2 (with 94 attributes) However, the precision and recall values for all five classifiers are not significantly differentiated from each other and also there is no dominating relation between ROC curves in the entire range. In this situation, AUC provides a good summary for comparing the classifiers. Ling et al. [30] also compared accuracy and the Area Under Curve (AUC) with different classifiers in various dataset. They

366

A Study on Classification Learning Algorithms to Predict Crime Status Somayeh Shojaee, Aida Mustapha, Fatimah Sidi, Marzanah A. Jabar

conclude that the best tool for classifier comparison is AUC which helps users to better understand the performance of the classifiers. Table 2. Accuracy and AUC Method

Naïve Bayesian Decision Tree (J48) Support Vector Machine (SVM) Neural Network (MultilayerPerceptron) k-Nearest Neighbor (k=10)

Accuracy (%) Set 1: 44 Set 2: 94 attributes attributes 84.646 84.395 84.997 86.251 85.649 86.452 85.298 87.054 87.506 88.008

AUC Set 1: 44 Set 2: 94 attributes attributes 0.894 0.898 0.731 0.727 0.66 0.678 0.882 0.892 0.897 0.895

Table 2 shows that k-NN algorithm slightly outperformed other four algorithms due to highest accuracy (88.008 %), particularly when combined with Chi-square attribute selection procedure. Then it is followed by Neural Network; Decision Tree and Support Vector Machine are virtually equivalent in terms of accuracy. In this experiment, increasing values of accuracy confirm the impact of feature selection based on Chi-square. Naïve Bayesian (0.898), k-NN (0.895) and Neural Network (0.892) based on AUC results are completely differentiated from Decision Tree (0.727) and SVM (0.678). A classifier with a greater AUC is said to be better than a classifier with a smaller AUC. Clearly, AUC does tell us that Naïve Bayesian, Neural Network and k-NN are indeed better than classifiers Decision Tree and SVM. The reasons lead to better performance of Naïve Bayesian, k-NN and Neural Network may be attributed to the nature of dataset such as independency between attributes increase the power of Naïve Bayesian to discern what is going on. Neural Network also usually used for predicting numeric quantities, and kNN is work well when attributes are not noisy and unbalanced. In addition it is also possible that some data sets are so easy to learn that classification without any feature selection attains already the maximal possible AUC and since enhancing AUC due to feature selection is difficult.

6. Conclusion The aim of this study is to classify the given specified experimental dataset into two categories which are critical and non-critical. In this regard, we used five classification algorithms by combining two different ways of feature selection techniques, manually and Chisquare, to determine more accurate classifiers. From the experimental results, K-Nearest Neighbor algorithm presents the best accuracy, specifically by using Chi-square feature selection technique. We have shown via exploratory comparisons in terms of AUC that Naïve Bayesian, Neural Networks, and k-Nearest Neighbor predict better than the Support Vector Machine and Decision Tree due to the nature of this dataset. Through the implementation of Chi-square feature selection technique in Rapidminer, it is demonstrated feature selection is an important phase to enhance the mining quality.

References [1] Usama M. Fayyad, S. George Djorgovski, Nicholas Weir, “Automating the Analysis and Cataloging of Sky Surveys”, Advances in Knowledge Discovery and Data Mining, Cambridge, MA: MIT Press, pp. 471–494, 1996. [2] Abdullah H. Wahbeh, Qasem A. Al-Radaideh, Mohammed N. Al-Kabi, Emad M. Al-Shawakfa, “A Comparison Study between Data Mining Tools over some Classification Methods”, International Journal of Advanced Computer Science and Applications, The SAI Organization, Special Issue on Artificial Intelligence. pp. 18-26, 2011. [3] Jiawei Han, Micheline Kamber, Jilan Pei, “Data Mining: Concepts and Techniques”, vol. 5. Morgan Kaufmann Publishers, USA, 2012. [4] Richard Wortley, Lorraine Mazerolle, “Environmental Criminology and Crime Analysis”, Willan Publishing, UK, 2008.

367

A Study on Classification Learning Algorithms to Predict Crime Status Somayeh Shojaee, Aida Mustapha, Fatimah Sidi, Marzanah A. Jabar

[5] Tamas Abraham, Olivier de Vel, “Investigative Profiling with Computer Forensic Log Data and Association Rules”, In Proceedings of the IEEE International Conference on Data Mining (ICDM'02), pp. 11 – 18, 2002. [6] Anna L. Buczak, Christopher M. Gifford, “Fuzzy Association Rule Mining for Community Crime Pattern Discovery”, In ACM SIGKDD Workshop on Intelligence and Security Informatics (ISIKDD '10), 2012. [7] Kevin L. Priddy, Paul E. Keller, Artificial Neural Networks: An Introduction, SPIE Press, USA, 2005. [8] Malathi. A, S. Santhosh Baboo, “Enhanced Algorithms to Identify Change in Crime Patterns” International Journal of Combinatorial Optimization Problems and Informatics, Aztec Dragon Academic Publishing, vol. 2, no.3, pp. 32-38, 2011. [9] Chung-Hsien Yu, Max W. Ward, Melissa Morabito, Wei Ding, “Crime Forecasting Using Data Mining Techniques”, In Proceedings of the 2011 IEEE 11th International Conference on Data Mining Workshops (ICDMW '11), pp. 779-786, 2011. [10] Shyam Varan Nath, “Crime Pattern Detection Using Data Mining”, In Proceedings of the International Conference on Web Intelligence and Intelligent Agent Technology, pp. 41-44, 2006. [11] Peter Phillips, Ickjai Lee, “Mining Top-k and Bottom-k Correlative Crime Patterns through Graph Representations”, In Proceedings of the IEEE International Conference on Intelligence and Security Informatics, pp. 25-30, 2009. [12] Peter Phillips, Ickjai Lee, “Crime Analysis through Spatial Areal Aggregated Density Patterns”, GeoInformatica, Springer, vol. 15, no. 1, pp. 49-74, 2011. [13] Sikha Bagui, “An Approach to Mining Crime Patterns”, International Journal of Data Warehousing and Mining, IGI Global, vol. 2, no. 1, pp. 50-80, 2006. [14] Jeroen S. de Bruin, Tim K. Cocx, Walter A. Kosters, Jeroen F. J. Laros, Joost N. Kok, “Data Mining Approaches to Criminal Career Analysis”, In Proceedings of the International Conference on Data Mining, pp. 171-177, 2006. [15] Michael Chau, Jennifer J. Xu, Hsinchun Chen, “Extracting Meaningful Entities from Police Narrative Reports”, In Proceedings of the 2002 Annual National Conference on Digital Government Research, pp. 1-5, 2002. [16] Roslin V. Hauck, Homa Atabakhsh, Pichai Ongvasith, Harsh Gupta, Hsinchun Chen, “Using COPLINK to Analyze Criminal-Justice Data”, Computer, IEEE Computer Society Press, vol. 35, No. 3, pp. 30-37, 2002. [17] Vincent Ng, Stephen Chan, Derek Lau, Cheung Man Ying, “Incremental Mining for Temporal Association Rules for Crime Pattern Discoveries”, In Proceedings of the Australasian Database Conference, pp. 123-132, 2007. [18] Donald E. Brown, “The Regional Crime Analysis Program (RECAP): A Framework for Mining Data to Catch Criminals”, In Proceedings of the International Conference on Systems, Man, and Cybernetics, pp. 2848-2853, 1998. [19] Michael Redmond, Alok Baveja, “A Data-driven Software Tool for Enabling Cooperative Information Sharing Among Police Departments”, European Journal of Operational Research, Science Direct, vol. 141, no. 3, pp. 660–678, 2002. [20] Michael A. Redmond, Timothy Highley, “Empirical Analysis of Case-Editing Approaches for Numeric Prediction”, Innovations in Computing Sciences and Software Engineering, Springer, pp. 79-84, 2010. [21] Hsinchun Chen, Wingyan Chung, Jennifer Jie Xu, Gang Wang, Yi Qin, Michael Chau, “Crime Data Mining: A General Framework and Some Examples”, Computer, IEEE Computer Society Press, vol. 37, no. 4, pp. 50- 56, 2004. [22] UCI Machine Learning Repository, Available: http://archive.ics.uci.edu/ml/datasets/Communities+and+Crime+Unnormalized# [Accessed: 2011-03-02]). [23] E.W.T. Ngai, Li Xiu, D.C.K. Chau, “Application of Data Mining Techniques in Customer Relationship Management: A Literature Review and Classification”, Expert Systems with Applications, Elsevier, vol. 36, no. 2, pp. 2592-2602, 2009. [24] Ali Hamou, Andrew Simmons, Michael Bauer, Benoit Lewden, Yi Zhang, Lars-Olof Wahlund, Eric Westman, Megan Pritchard, Iwona Kloszewska, Patrizia Mecozzi, Hilkka Soininen, Magda Tsolaki, Bruno Vellas, Sebastian Muehlboeck, Alan Evans, Per Julin, Niclas Sjögren, Christian

368

A Study on Classification Learning Algorithms to Predict Crime Status Somayeh Shojaee, Aida Mustapha, Fatimah Sidi, Marzanah A. Jabar

[25] [26]

[27]

[28]

[29]

[30]

Spenger, Simon Lovestone, Femida Gwadry-Sridhar, “Cluster Analysis of MR Imaging in Alzheimer’s Disease using Decision Tree Refinement”, International Journal of Artificial Intelligence, vol. 6, no. S11, pp. 90-99, 2011. Ovidiu Ivanciuc, “Applications of Support Vector Machines in Chemistry”, Reviews in Computational Chemistry, vol. 23, pp. 291-400, 2007. Yingxu Wang, Fugui Chen, Xilong Qu, “Research and Application of Large-Scale Data Set Processing Based on SVM”, Journal of Convergence Information Technology(JCIT), AICIT, vol. 7, no. 16, pp. 195-200, 2012. Chen Zhenzhou, “Local Support Vector Machines with Clustering for Multimodal Data”, Advances in information Sciences and Service Sciences(AISS), AICIT, vol. 4, no. 17, pp. 266275, 2012. Guoqiang Peter Zhang, “Neural Networks for Classification: A Survey, Systems, Man, and Cybernetics, Part C: Applications and Reviews”, IEEE Transactions on, vol. 30, no. 4, pp. 451462, 2000. Jesse Davis, Mark Goadrich, “The Relationship between Precision-Recall and ROC Curves”, In Proceedings of the 23rd International Conference on Machine Learning (ICML '06), pp. 233240, 2006. Charles X. Ling, Jin Huang, Harry Zhang, “AUC: A Better Measure than Accuracy in Comparing Learning Algorithms”. In Proceedings of the 16th Canadian Society for Computational Studies of Intelligence Conference on Advances in Artificial Intelligence (AI'03), pp. 329-341, 2003.

369