Journal of Experimental & Theoretical Artificial ...

4 downloads 0 Views 329KB Size Report
Jul 17, 2015 - Department of Industrial Engineering and Management Systems,. Amirkabir ... Intelligent breast cancer recognition using particle swarm ... Breast cancer is the most common cancer among women, except for skin cancer, but.
This article was downloaded by: [Parnian Afshar] On: 18 July 2015, At: 05:29 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: 5 Howick Place, London, SW1P 1WG

Journal of Experimental & Theoretical Artificial Intelligence Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/teta20

Intelligent breast cancer recognition using particle swarm optimization and support vector machines a

a

Abbas Ahmadi & Parnian Afshar a

Department of Industrial Engineering and Management Systems, Amirkabir University of Technology, 424 Hafez Ave, Tehran 15875-4413, Iran Published online: 17 Jul 2015.

Click for updates To cite this article: Abbas Ahmadi & Parnian Afshar (2015): Intelligent breast cancer recognition using particle swarm optimization and support vector machines, Journal of Experimental & Theoretical Artificial Intelligence, DOI: 10.1080/0952813X.2015.1055828 To link to this article: http://dx.doi.org/10.1080/0952813X.2015.1055828

PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &

Downloaded by [Parnian Afshar] at 05:29 18 July 2015

Conditions of access and use can be found at http://www.tandfonline.com/page/termsand-conditions

Journal of Experimental & Theoretical Artificial Intelligence, 2015 http://dx.doi.org/10.1080/0952813X.2015.1055828

Intelligent breast cancer recognition using particle swarm optimization and support vector machines Abbas Ahmadi* and Parnian Afshar Department of Industrial Engineering and Management Systems, Amirkabir University of Technology, 424 Hafez Ave, Tehran 15875-4413, Iran

Downloaded by [Parnian Afshar] at 05:29 18 July 2015

(Received 29 January 2015; accepted 29 April 2015) Breast cancer is the most common cancer among women, except for skin cancer, but early detection of breast cancer improves the chances of survivability. Data mining is widely used for this purpose. As technology develops, large number of breast tumour features are being collected. Using all these features for cancer recognition is expensive and time-consuming. Feature extraction is necessary for increasing the classification accuracy. The goal of this work is to recognise breast cancer using extracted features. To reach this goal, a combination of clustering and classification is used. Particle swarm optimization is used to recognise tumour patterns. The membership degree of each tumour to the patterns is calculated and considered as a new feature. Support vector machine is then employed to classify tumours. Finally this method is analysed in terms of its accuracy, specificity, sensitivity and CPU time consuming using Wisconsin Diagnostic Breast Cancer data set. Keywords: particle swarm optimization; breast cancer; support vector machine; intelligent systems

1. Introduction Breast cancer is one of the most common types of cancer among women (Jiaquan Xu, Alteri, Graves, & Cheri, 2012). In most of the cases, early diagnoses significantly reduce the mortality rate. In recent years, astonishing progress in information technology and data processing proposes an important opportunity to employ sophisticated data mining algorithms for early diagnoses. Accuracy and reliability of such models and algorithms have unimaginable importance. Breast cancer recognition is mainly considered as a classification problem rather than a clustering problem. However, high dimensionality of the data in classification reduces the accuracy of such methods. Therefore, clustering is employed to reduce the dimensionality of the problem thus increasing the accuracy. Zheng et al. first employed K-means method to cluster benign and malignant tumours, separately. Then, they calculated degree of membership in each cluster for each instance using a fuzzy membership function and used these degrees as new features. Afterwards, they implemented support vector machine (SVM) classification on this new data set (Zheng, Yoon, & Lam, 2014).

*Corresponding author. Email: [email protected] q 2015 Taylor & Francis

2

A. Ahmadi and P. Afshar

Downloaded by [Parnian Afshar] at 05:29 18 July 2015

K-means algorithm may be trapped in local optimum and the initial solution highly affects the quality of final solution (Ahmadi, Karray, & Kamel, 2010), hence, in this article, we have adopted a same strategy as (Zheng et al., 2014) with the difference that in our model, particle swarm optimization (PSO) is used for clustering. PSO has advantage over K-means algorithm in that it almost converges to global optimum. The organisation of this article is as follows: first in Section 2, literature review is given and in Section 3 related concepts are described. In Section 4, the proposed approach is described followed by the results in Section 5. At the end of this article, concluding remarks and future work are given. 2. Literature review Cancer is a group of diseases, causing body cells change and grow uncontrollably. These cells may finally form a tumour. Tumours are named according to the tissue they are located in. Breast cancer starts from breast tissue that consists of lobules and ducts. The seriousness of the cancer depends on the stage of the disease, hence, early diagnosis of breast cancer has a great impact on the patient’s survivability (Jiaquan Xu et al., 2012), and this is the reason why data mining techniques are used in this area. SVM classification is the most popular classification method in the field of breast cancer diagnosis, but most of the classification methods suffer from curse of dimensionality. Akay (2009) tried to solve this problem using F-score as a feature selection method and then he applied SVM classification on Wisconsin Diagnostic Breast Cancer (WDBC) data set. Chen (2014) used cluster analysis with feature selection to analyse clinical breast cancer diagnosis. Nezafat, Tabesh, and Akhavan (1998) improved the accuracy of classification by trying several features and classification models including linear and non-linear models. For searching in the space of possible feature subsets, they employed the suboptimal sequential forward search to avoid prohibitive computational burden. Delen, Walker, and Kadam (2005) tried to compare three classification methods: decision trees, regression and neural networks in predicting survivability rate among afflicted with breast cancer. Several intelligent systems have been proposed for predicting breast cancer. Karabatak and Ince used association rules and neural networks to design such system. They used association rules to reduce the dimensionality and neural networks to classify data (Karabatak & Ince, 2009). Sujatha and Akila (2012) presented a review article on prediction of breast cancer using classification methods. 3. Related concepts description Different concepts that are used in this work are described in the following sections.

3.1

Feature selection and extraction

A feature is an observable measure of the process. In feature reduction, a subset of features is selected and inappropriate features are eliminated. Feature reduction helps to understand data, reduce the computation time, reduce the effect of high dimensionality and improve the prediction performance (Chandrashekar & Sahin, 2014). Focus of feature selection is on selecting a set of features that can describe data effectively, while irrelevant features may reduce the accuracy of prediction. There are several techniques for

Journal of Experimental & Theoretical Artificial Intelligence

3

Downloaded by [Parnian Afshar] at 05:29 18 July 2015

feature selection and extraction, such as filtering methods, wrapper methods, embedded methods and unsupervised learning (Chandrashekar & Sahin, 2014). Filtering methods try to rank and sort the features. These methods are common due to their simplicity and success in selecting the most relevant features. A ranking approach is used to score features and a threshold is used to eliminate irrelevant features. These methods are employed before building the classification model (Chandrashekar & Sahin, 2014). Wrapper methods use the classification model as a black box and model performance is considered as an objective function to evaluate subsets of features. Searching algorithms are used heuristically to select the most relevant subset. Feature selection is a part of training process in embedded methods (Chandrashekar & Sahin, 2014). Feature extraction generates new features out of the original ones. Unsupervised learning as a feature extraction technique discovers the hidden structures in non-labelled data. Clustering is an unsupervised learning which tries to find a natural grouping in data. These methods can have better performance in comparison with previous methods (Chandrashekar & Sahin, 2014). In our work, we have adopted this kind of method. 3.2

K-means algorithm

K-means algorithm is one of the most common algorithms in data clustering. It was first introduced by Macqueen in 1967 and it is one of the simplest algorithms used to solve clustering problems. It puts data in K distinct clusters by moving towards local minimum (Na, Liu, & Yong, 2010). K-means algorithm has two distinct phases. In the first phase, K centroids are chosen randomly, which K is an apriori selected value. Second phase allocates each data point to the nearest centroid. Then cluster centroids are recalculated and this process goes on until the objective function is minimised (Na et al., 2010). Consider x as a data point and x k as the cluster centroid of cluster Ck. Then the objective function is described as follows: E¼

K X X

2

jx 2 x k j ;

ð1Þ

k¼1 x[C k

where E is the sum of squared error for all data points (Na et al., 2010). Algorithm 1 shows the K-means algorithm process. Algorithm 1. K-means clustering algorithm. 1: 2: 3: 4: 5: 6: 7:

Determine number of clusters. Initiate each cluster’s centre randomly. Calculate the distance between points and the centers. Allocate each point to the nearest cluster. Recalculate centres. Calculate the distance between points and the centres. If no point moved, stop. Otherwise, go back to step 4.

3.3 PSO algorithm PSO is an evolutionary computation method introduced in 1995 and it’s similar to genetic algorithm in that it begins with a random population. Each potential solution is called a

4

A. Ahmadi and P. Afshar

particle in PSO algorithm. Each particle tracks the best coordinates that has achieved so far. This solution is called a pbest. In global version of PSO algorithm, the best global solution and its coordinates are also tracked. This solution is called a gbest (Eberhart & Shi, 2001). In each iteration of PSO, each particle’s velocity moves towards the pbest and the gbest according to Equations (2) and (3). The global version of PSO is shown in Algorithm 2.

Downloaded by [Parnian Afshar] at 05:29 18 July 2015

Algorithm 2. PSO algorithm. 1: Initiate a first population of particles and the velocity randomly. 2: Calculate the fitness function for each particle. 3: Compare each particle’s current fitness with its pbest. If the current fitness is better than the pbest, update the pbest and its coordinates with the new values. 4: Compare the current best global solution with the gbest. If it’s better, update the gbest and its coordinates. 5: Update the velocity and coordinates for each particle using Equations (2)– (3). 6: Return to second step until a previously defined criterion is satisfied.

vid ¼ w £ vid þ c1 £ randðÞ £ ðpid 2 xid Þ þ c2 £ randðÞ £ ðpgd 2 xid Þ:

ð2Þ

xid ¼ xid þ vid :

ð3Þ

In these equations, vid is the velocity of the dth dimension of particle i and xid is the value of the dth dimension of particle i. The constants c1 and c2 are the velocity weights and w is the inertia weight. rand() is a uniformly distributed number between zero and one. pid and pgd are the pbest and the gbest solutions obtained so far (Eberhart & Shi, 2001). The values of these parameters will be described later in model order selection. 3.4

Support vector machine

The main goal of classification is to separate two classes by a function and build a model that can perform correctly on unlabelled data. There are many linear models that can separate data, but there is only one model that can maximise the distance between the nearest points in two classes. This model is called an optimal separating hyper plane. SVM, tries to find this linear model. The model that is provided through this algorithm is as follows (Brereton & Lloyd, 2010): f ðxÞ ¼ sgn ð, w * ; x . þbÞ; w* ¼

l X

ai y i x i ;

ð4Þ ð5Þ

i¼1

1 b * ¼ 2 , w * ; xr þ xs .; 2

ð6Þ

where xr and xs are the support vectors, xi is the data point, yi is the class of data point xi and ai is the lagrange multiplier (Brereton & Lloyd, 2010). Since SVM classification is highly accurate and applicable to different areas, it has been attractive since its introduction in 1992 by Vladimir Vapnik (Han, Kamber, & Pei, 2012).

Journal of Experimental & Theoretical Artificial Intelligence

5

4. Proposed approach Since data preparation techniques like data transformation lead to more qualified data (Zhang, Zhang, & Yang, 2003), we normalise the data before adopting the main algorithm. In the following sections the proposed algorithm and its details are described. 4.1

Feature selection and extraction

Downloaded by [Parnian Afshar] at 05:29 18 July 2015

Our method combines clustering and classification for breast cancer recognition and it extracts benign and malignant tumour patterns for reducing dimensionality before data classification. The first step is to cluster benign and malignant tumours separately. Then, the membership degree of each tumour to all patterns is calculated using the fuzzy membership function mentioned in Equations (7) and (8) and these degrees are considered as new features. Therefore, we will have a data set whose features are as many as the number of clusters (Zheng et al., 2014). The main steps of the proposed method are shown in Algorithm 3. They are also shown in Figures 1 and 2.

f c ðX ij Þ

¼

8