Data mining and CBR integrated methods in

0 downloads 0 Views 391KB Size Report
Biographical notes: Babita Pandey is a full-time Research Scholar in the ... She holds an MCA from Indira Gandhi National Open University, India. Her ... R.B. Mishra is a Reader in the Department of Computer Engineering, IT, BHU, ... algorithm, naive Bayesian classifier, Bayesian network (BN), rough set theory (RST).
Int. J. Medical Engineering and Informatics, Vol. 2, No. 2, 2010

Data mining and CBR integrated methods in medicine: a review Babita Pandey* and R.B. Mishra Department of Computer Engineering, Institute of Technology, Banaras Hindu University, Varanasi, UP-221005, India E-mail: [email protected] E-mail: [email protected] *Corresponding author Abstract: Nowadays, a large number of data mining (DM) techniques are available for data analysis and prediction in medicine. Few literatures are available that provide a comprehensive view which give a guideline to use the DM techniques in different medical areas (domain). This review provides a comprehensive view of the state of the art of single DM and integrated case-based reasoning (CBR) and DM techniques in different medical domains such as: general medicine, nephrology, dermatology, cardiology, urology, oncology, neurology and orthopaedic. We present our observation in tabular and graphic mode, which shows the use of a particular method of DM in the various domains of medical use and the use of different methods of DM in a particular medical domain. The study and observation would help the biomedical engineers to know the applicability of a particular method in different medical domains of practice and research. Keywords: data mining; DM; medicine; case-based reasoning; CBR; general medicine; nephrology; dermatology; cardiology; urology; oncology; neurology; orthopaedic. Reference to this paper should be made as follows: Pandey, B. and Mishra, R.B. (2010) ‘Data mining and CBR integrated methods in medicine: a review’, Int. J. Medical Engineering and Informatics, Vol. 2, No. 2, pp.205–218. Biographical notes: Babita Pandey is a full-time Research Scholar in the Department of Computer Engineering, IT, BHU, Varanasi-221005, UP, India. She holds an MCA from Indira Gandhi National Open University, India. Her research interests include expert systems (AI) and medical computing. R.B. Mishra is a Reader in the Department of Computer Engineering, IT, BHU, Varanasi-221005, UP, India. He holds a BSc (Engineering), MTech and PhD. He has over 28 years of teaching experience and has published around 80 research papers and articles. He has supervised three PhD and 21 MTech dissertations. He visited the University of Bath, UK on INSA faculty exchange in 1997 from April–June. His research interests are AI and multi-agent system and its application to medicine, e-commerce and semantic web.

Copyright © 2010 Inderscience Enterprises Ltd.

205

206

1

B. Pandey and R.B. Mishra

Introduction

Data mining (DM) is a synonym of knowledge discovery in the databases (KDDs) for identifying valid, novel, potentially useful and ultimately understandable patterns in data. Descriptions and predictions are the two main tasks of DM. Several methods have been developed for the two tasks, applicable in the domain of medicine. There is a lack of literature which depicts the application of different methods and techniques of DM such as: instance-based learning, decision tree, neural network, rule induction, evolutionary algorithm, naive Bayesian classifier, Bayesian network (BN), rough set theory (RST) and temporal abstraction. In medical domains, the construction of decision models for procedures such as: prognosis, diagnosis and treatment planning require the deployment of above-mentioned methods. The decision trees technique use different methods such as: classification and regression trees (C&RTs) (Breiman, 1993), chi square automatic interaction detection (CHAID) (Kass, 1980), quick, unbiased, efficient statistical tree (QUEST) (Ture et al., 2008), and their programmable Commercial Version 4.5 (C4.5) (Quinlan, 1993), C5.0 (Quinlan, 1993) and Interactive Dichotomizer Version 3 (ID3) (Quinlan, 1993; Ture et al., 2008). Artificial neural network (ANN) has emerged as one of the powerful methodology for prediction in the data intensive environment whose approaches include the method such as: self-organising map (SOM) or Kohonen map (Zhang and Sun, 2008) and support vector machine (SVM) (Yang and Su, 2008). Evolutionary algorithms such as: genetic algorithms (GAs) (Holland, 1992), genetic programming (GP) (Koza, 1989), evolutionary programming (EP) (Fogel, 1964), evolutionary strategy (ES) (Rechenberg, 1965), generic genetic programming (GGP) (Wong et al., 2001) and hierarchical evolutionary algorithm (HEA) (Tang et al., 1998; Lai and Chang, 2007) have also been proved very effective in the prediction tasks. Naive Bayesian classifier, BN, RST and temporal abstraction are generally used for classification and prediction purposes. All the above-mentioned methods, techniques and programme are heavily used in the medical domain. Li et al. (2004) in their review paper briefly review the application of DM techniques in proteomics for cancer detection/diagnosis. Bellazzi and Zupan (2008), in their review, discuss the extent and role of the research area of predictive DM and propose a framework to cope with the problem of constructing, assessing and exploiting DM models in clinical medicine. In this paper, we present a systematic study of the various methods, techniques and programmes of DM with their salient features, functional attributes and applications in the various domains of medical science such as: general medicine, nephrology, cardiology, neurology, dermatology, orthopaedic, etc. The integration of DM with casebased reasoning (CBR) applicable to medical domains has also been described in this work. We also present our observation in tabular and graphic mode, which shows the use of a particular method of DM used in the various medical domains and the use of different methods of DM in a particular medical domain. The rest of the paper is organised as follows. Section 2 describes the various methods and techniques of DM used in the medical domain. Section 3 presents some specific application domain in medicine. In Section 4, observations are made in tabular and graphic modes which show the use of a particular method of DM used in the various domain of medical and the use of different methods of DM in a particular medical domain respectively. Conclusions are drawn in Section 5.

Data mining and CBR integrated methods in medicine Table 1

Application of DM techniques in medical domain

207

208 Table 1

B. Pandey and R.B. Mishra Application of DM techniques in medical domain (continued)

Data mining and CBR integrated methods in medicine Table 1

Application of DM techniques in medical domain (continued)

209

210 Table 1

B. Pandey and R.B. Mishra Application of DM techniques in medical domain (continued)

Data mining and CBR integrated methods in medicine

2

211

DM techniques in medicine

DM methods have been used in different disciplines of medicine such as: general medicine (GM), nephrology (NL), dermatology (DL), cardiology (CL), urology (UL), oncology (OL), neurology (NL) and orthopaedic (OP). The details of these are given below. Table 1 shows the DM techniques, software used and application domain.

2.1 General medicine (GM) GM is a branch of medicine dealing with problems associated with eye, ear, throat and diabetes. In general medicine, rule induction, instance-based learning are deployed for the diagnosis of rheumatic diseases (Dzeroski and Lavrac, 1996); GA for classification of epidemiological data (Congdon, 2000); RST to identify the most relevant attributes and to induce decision rules from a diabetes mellitus data (Stepaniuk, 1999); ANN and SVM for the diagnosis of polycythemia vera (Kantardzic et al., 2002); BN for the diagnosis of pulmonary embolism (PE) (Luciani et al., 2003); feature selection technique, naive Bayes and C4.5 algorithm are deployed to the data to predict patients’ condition (Huang et al., 2007b); HEA for medical image segmentation (Lai and Chang, 2007); and C4.5 decision tree classifier, ANN and least square support vector machine (LSSVM) algorithms for the diagnosis the optic nerve disease (Polat et al., 2008). Serpen et al. (2008) showed that knowledge-based hybrid learning algorithms (KBANNs) offer good performance in the diagnosis of a PE. Wiggins et al. (2008) used GA and BN for classifying patients according to statistical features extracted from their ECG signals. They deployed GA operators such as: random crossovers and mutations of networks within a population. Melliez et al. (2008) constructed a Markov decision tree to assess the effectiveness and cost-effectiveness of routine childhood vaccination by new vaccines against rotavirus in France.

2.2 Nephrology (NL) The domain of nephrology covers the diagnostic and treatment of kidney diseases including the organ replacement (dialysis, transplantation). The number of patients on hemodialysis due to end stage kidney disease is increasing. Shah et al. (2003) use a DM approaches RST and DT for extracting knowledge in the form of decision rules. The DM rule sets produced a list of significant features such as diagnosis, total dialysis time, arterial pressure, potassium level, deviation from target weight, calcium level, blood flow rate, post-dialysis pulse rate supine and their ranges were identified by DM procedures to be incorporated into the individualised dialysis treatment protocols.

2.3 Dermatology (DL) Dermatology is a branch of medicine dealing with the skin and its appendages (hair, sweat glands, etc.). Grzymala-Busse and Hippe (2001) developed a DM system, Learning from Examples based on Rough Sets (LERS), for prediction of melanoma and obtained optimal ABCD formula by using the three basic algorithms: discretisation using cluster analysis, the Learning from Examples Module, Version 2 (LEM2) algorithm for rule

212

B. Pandey and R.B. Mishra

induction and the LERS classification scheme for testing cases. Andrews et al. (2004) developed a DM system to improve the diagnosis of melanoma.

2.4 Cardiology (CL) Cardiology is a branch of internal medicine dealing with disorders of the heart and blood vessels. The field is commonly divided in the branches of congenital heart defects, coronary artery disease, heart failure, valvular heart disease and electrophysiology. Physicians specialising in this field of medicine are called cardiologists. Boiorczuk et al. (2000) use GP for discovering comprehensible classification rules for diagnosis of chest pain. Coulter et al. (2001) deployed Bayesian statistics implemented in neural network architecture for examining the relation between antipsychotic drugs and myocarditis and cardiomyopathy. Kurgan et al. (2001) describe a computerised process of myocardial perfusion from cardiac signal proton emission computed tomography (SPECT) images using DM and knowledge discovery approach. SPECT images were processed to extract a set of features and then explicit rules were generated, using inductive machine learning and heuristic approaches to mimic cardiologist’s diagnosis. Bortolan and Pedrycz (2002) introduce and discuss a development of a highly interactive and user friendly environment for an ECG signal analysis.

2.5 Urology (UL) Urology is a branch of medicine that focuses on the urinary tracts of males and females and on the reproductive system of males. Beuscart et al. (1999) have constructed a belief network for antibiotic prescription assistance in the case of urinary infections. Hunt et al. (2000) compare the effectiveness of BN versus decision trees in modelling the integral theory of female urinary incontinence diagnostic algorithm.

2.6 Oncology (OL) Oncology is a branch of medicine that studies tumours and seeks to understand their development, diagnosis, treatment and prevention. Setiono (1996) describes a new algorithm for neural network pruning which is used to obtain networks with small number of connections and high accuracy rates for breast cancer diagnosis. Ball et al. (2002) applied ANN (Neuroshell 2) with a backpropagation algorithm to analyse mass spectra for predicting astroglial tumour grade (1 or 2). Tan et al. (2003) proposed a two-phase hybrid evolutionary classification technique to extract classification rules that can be used in clinical practice for better understanding and prevention of unwanted medical events. Jerez-Aragones et al. (2003) present a decision support tool for the prognosis of breast cancer that combines a novel algorithm TDIDT (control of induction by sample division method, CIDIM), to select the most relevant prognostic factors for the accurate prognosis of breast cancer. The system composed of different neural networks topologies that takes selected variables in order as input and provide good correct classification probability.

Data mining and CBR integrated methods in medicine

213

2.7 Neurology (NL) Neurology is a medical specialty dealing with disorders of the nervous system. Komosinski and Krawiec (2000) describe an application of evolutionary feature weighting for diagnosis support in neuropathology. The original data in the classification task are the microscopic images of ten classes of central nervous system (CNS) neuroepithelial tumours. These images are segmented and described by the features characterising regions resulting from the segmentation process. The final features are in part irrelevant. They employ an evolutionary algorithm for feature selection and as well as for adjusting weights of attributes in order to improve the classification accuracy. Herskovits and Gerring (2003) describe a Bayesian method based on DM techniques to represent structure-function associations for lesion-deficit analysis.

2.8 Orthopaedic (OP) Ngan et al. (1999) developed a system for diagnosis of fracture, operation, surgeon and side of fracture for fracture database. Wong et al. (2001) describe two approaches for discovering knowledge from two medical databases fracture and scoliosis. Two different representations of knowledge, rules and causal structures, are learned. Rules capture interesting patterns and regularities in the database. Causal structures represented by BN capture the causality relationships among the attributes. Table 2

Medical application of integrated DM and CBR

Research group Ong et al. (1997)

DM technique CART

CBR process and methods Case representation: feature-vector Retrieval: nearest-neighbour matching Implementation tool: ReMind

Application; domain Prediction of the recurrence of colorectal cancer; oncology

Case representation: attribute value pair Prediction of vitro fertilisation (IVF); others Adaptation automatically, semi-automatically or by the user

Jurisica et al. (1998)

Clustering

Funk and Xiong (2006)

Bayesian network

Case representation: RSA series

CDPD system (Huang et al., 2007a)

Decision tree induction algorithm and case association algorithm

Case representation: feature vector

Zhuang et al. (2007)

SOM

Case storage: case are stored as cluster

Retrieval: nearest-neighbour matching

Search: data mining technique for searching

Retrieval: integration of knowledge guided method and WRF method

Retrieval: neighbourhood retrieval strategy Adaptation: copy

Classification of respiratory sinus arrhythmia; general medicine Prognosis and diagnosis of chronic disease; others

Pathology ordering; others

214

3

B. Pandey and R.B. Mishra

Integration of DM with CBR

The methods and tools use for the prediction and diagnosis of different diseases in the integrated model of DM and CBR are given in Table 2.

4

Observation

The numeric assessment, depicting number of applications in a particular medical domain (department) deployed by different DM techniques has been shown in Figure 1. It is observed from Figure 1 that BN is mostly deployed in general medicine (2), cardiology (2) and urology (2), whereas minimally deployed in neurology (1) and orthopaedic (1); DT is mostly used in oncology (5) and least used in cardiology (1) and urology (1); EA is mostly in GM (3) and oncology (3) and least in cardio (1), neurology (1) and orthopaedic (1); IBL is deployed only in GM; NN is maximally deployed in oncology (7) and least in dermatology (1) and orthopaedic (1); naive Bayesian classifier is mostly in GM (3) and minimum in cardiology (1); RI is equally deployed in GM (2), dermatology (2), cardiology (2) and orthopaedic (2); RST is maximum in nephrology (2) and minimum in GM (1) and cardiology (1); and TA is only used in nephrology. Figure 1

Number of applications in different medical domain deployed by different DM techniques

Data mining techniques

8 7

BN

6

DT

5

EA

4

IBL

3

NN

2

NBC

1

RI

0 General medicine

Nephro-logy

Dermatology

Cardio-logy

Uro-logy

Onco-logy

Neuro-logy Ortho-padic

RST TA

Medical domain

Notes: BN = Bayesian network, DT = decision tree, EA = evolutionary algorithm, IBL = instance-based learning, NN = neural network, NBC = naive Bayesian classifier, RI = rule induction, RST = rough set theory and TA = temporal abstraction

5

Conclusions

In this review paper, an attempt has been made to collect as much literature concerning the DM techniques and integrated DM-CBR in the medical domain. The aim of this paper is to present a variety of DM techniques and to discuss some of their features used for medical problem solving. The numeric assessment by inspection method (simply counting the presence of entity) shows the mostly and least used individual DM techniques, integrated approach in different medical domains such as: neural network and

Data mining and CBR integrated methods in medicine

215

decision trees are maximum used, while RST and temporal abstraction are least used. RST is only used in general medicine, nephrology and cardiology, whereas TA is only used in nephrology. The integrated DM-CBR is used in general medicine, oncology and others. On the basis of these observations, we conclude that general medicine, cardiology and oncology widely use the DM techniques, whereas nephrology, urology and neurology use least. We also observed that integrated approach generally deployed in every medical domain described above. The study and observation would help the medical practitioners to know the usefulness of the DM techniques in the domain of their practice and research as well as biomedical engineers to know the applicability of a particular method in different medical domains of practice and research.

References Andrews, R., Bajcar, S., Grzyma, J.W., Busse, A., Hippe, Z.S. and Whiteley, C. (2004) ‘Optimization of the ABCD formula for melanoma diagnosis using C4.5, a data mining system’, Lecture Notes in Computer Science, Springer Berlin/Heidelberg, Vol. 3066/2004, pp.1611–3349, online. Ball, G., Mian, S., Holding, F., Allibone, R.O., Lowe, J., Ali, S., et al. (2002) ‘An integrated approach utilizing artificial neural networks and SELDI mass spectrometry for the classification of human tumours and rapid identification of potential biomarkers’, Bioinformatics, Vol. 18, No. 3, pp.395–404. Bellazzi, R. and Zupan, B. (2008) ‘Predictive data mining in clinical in medicine: current issues and guidelines’, International Journal of Medical Informatics, Vol. 77, pp.81–97. Bellazzi, R., Larizza, C., Magni, P. and Bellazzi, R. (2005) ‘Temporal data mining for the quality assessment of hemodialysis services’, Artificial Intelligence in Medicine, Vol. 34, pp.25–39. Beuscart, R., Froidure, V., Duhamel, A. and Beuscart, C. (1999) ‘Bayesian networks for antibiotics prescription’, in Proceedings of First Joint BMES/EMBS Conference Serving Humanity, Advancing Technology, Atlanta, GA, USA, pp.13–16. Boiorczuk, C.C., Lopes, H.S. and Freitas, A.A. (2000) ‘Genetic programming for knowledge discovery in chest-pain diagnosis’, IEEE Engineering in Medicine and Biology, Vol. 19, No. 4, pp.38–44. Bortolan, G. and Pedrycz, W. (2002) ‘An interactive framework for an analysis of ECG signals’, Artificial Intelligence in Medicine, Vol. 24, pp.109–132. Breiman, L. (1993) Classification and Regression Trees, Chapman & Hall, New York, London. Chang, C-L. and Chen, C-H. (2008) ‘Applying decision tree and neural network to increase quality of dermatologic diagnosis’, Expert Systems with Applications, available at http://www.sciencedirect.com. Congdon, C.B. (2000) ‘Classification of epidemiological data: a comparison of genetic algorithm and decision tree approaches’, in Proceedings of the Congress on Evolutionary Computation, La Jolla, CA, USA, Vol. 1, pp.442–449. Coulter, D.M., Bate, A., Meyboom, R.H., Lindquist, M. and Edwards, I.R. (2001) ‘Antipsychotic drugs and heart muscle disorder in international pharmacovigilance: data mining study’, BMJ, Vol. 322, pp.1207–1209. Delen, D., Walker, G. and Kadam, A. (2005) ‘Predicting breast cancer survivability: a comparison of three data mining methods’, Artificial Intelligence in Medicine, Vol. 34, pp.113–127. Dzeroski, S. and Lavrac, N. (1996) ‘Rule induction and instance-based learning applied in medical diagnosis’, Technol. Health Care, Vol. 4, No. 2, pp.203–221.

216

B. Pandey and R.B. Mishra

Fogel, L.J. (1964) ‘On the organization of intellect’, PhD dissertation, University of California, Los Angeles. Funk, P. and Xiong, N. (2006) ‘Case-based reasoning and knowledge discovery in medical applications with time series’, Computational Intelligence, Vol. 22, Nos. 3–4. Grefenstette, J.J. (1987) A User’s Guide to GENESIS, Technical report, Navy Center for Applied Research in AI, Washington DC. Grzymala-Busse, J.W. and Hippe, Z.S. (2001) ‘Melanoma prediction using data mining system LERS’, in Proceedings of 25th Annual International Conference on Computer Software and Applications, Chicago, IL, USA, pp.615–620. Grzymala-Busse, J.W. and Hippe, Z.S. (2005) ‘Data mining methods supporting diagnosis of melanoma’, in Proceedings of the 18th IEEE Symposium on Computer-Based Medical Systems (CBMS’05), IEEE Computer Society, Washington DC, 23–24 June, pp.371–373. Herskovits, E.H. and Gerring, J.P. (2003) ‘Application of a data-mining method based on Bayesian networks to lesion-deficit analysis’, Neuroimage, Vol. 19, pp.1664–1673. Holland, J.H. (1992) Adaptation in Natural and Artificial Systems, 2nd ed., University of Michigan Press, Ann Arbor (1975) MIT Press. Huang, M-J., Chen, M-Y. and Lee, S-C. (2007a) ‘Integrating data mining with case-based reasoning for chronic diseases prognosis and diagnosis’, Expert Systems with Applications, Vol. 32, pp.856–867. Huang, Y., McCullagh, P., Black, N. and Harper, R. (2007b) ‘Feature selection and classification model construction on type 2 diabetic patients’ data’, Artificial Intelligence in Medicine, Vol. 41, pp.251–262. Hunt, M., Konsky, B.V., Venkatesh, S. and Petros, P. (2000) ‘Bayesian networks and decision trees in the diagnosis of female urinary incontinence’, in Proceedings of the 22nd Annual EMBS International Conference, Chicago, IL, pp.23–28. Jerez-Aragones, J.M., Gómez-Ruiz, J.A., Ramos-Jiménez, G., Munoz-Perez, J. and Alba-Conejo, E. (2003) ‘A combined neural network and decision trees model for prognosis of breast cancer relapse’, Artificial Intelligence in Medicine, Vol. 27, pp.45–63. Jurisica, I., Mylopoulos, J., Glasgow, J.E., Shapiro, H. and Casper, R.F. (1998) ‘Case-based reasoning in IVF: prediction and knowledge mining’, Artificial Intelligence in Medicine, Vol. 12, pp.1–24. Kantardzic, M., Djulbegovic, B. and Hamdan, H. (2002) ‘A data-mining approach to improving polycythemia vera diagnosis’, Computers & Industrial Engineering, Vol. 43, pp.765–773. Kass, G.V. (1980) ‘An exploratory technique for investigating large quantities of categorical data’, Journal of Applied Statistics, Vol. 29, No. 2, pp.119–127. Komosinski, M. and Krawiec, K. (2000) ‘Evolutionary weighting of image features for diagnosing of CNS tumors’, Artificial Intelligence in Medicine, Vol. 19, pp.25–38. Koza, J.R. (1989) ‘Hierarchical genetic algorithms operating on populations of computer programs’, in Sridharan, N.S. (Ed.): Proceedings of the 11th International Joint Conference on Artificial Intelligence, Morgan Kaufmann Publishers, San Mateo, CA, pp.768–774. Kurgan, L.A., Cios, K.L., Tadeusiewicz, R., Ogiela, M. and Goodendey, L.S. (2001) ‘Knowledge discovery approach to automated cardiac SPECT diagnosis’, Artificial Intelligence in Medicine, Vol. 23, pp.149–169. Kusiak, A., Caldaron, C.A., Kellehec, M.D., Lamd, F.S., Persoon, T.J. and Burns, A. (2006) ‘Hypoplastic left heart syndrome: knowledge discovery with a data mining approach’, Computers in Biology and Medicine, Vol. 36, pp.21–40. Kusiak, A., Dixon, B. and Shah, S. (2005) ‘Predicting survival time for kidney dialysis patients: a data mining approach’, Computers in Biology and Medicine, Vol. 35, pp.311–327. Lai, C-C. and Chang, C-Y. (2007) ‘A hierarchical evolutionary algorithm for automatic medical image segmentation’, Expert Systems with Applications, available at http://www.sciencedirect.com.

Data mining and CBR integrated methods in medicine

217

Lee, Z-J. (2008) ‘An integrated algorithm for gene selection and classification applied to microarray data of ovarian cancer’, Artificial Intelligence in Medicine, Vol. 42, No. 1, pp.81–93. Li, L., Tang, H., Wu, Z., Gong, J., Gruid, M., Zou, J., Tockman, M. and Clark, R.A. (2004) ‘Data mining techniques for cancer detection using serum proteomic profiling’, Artificial Intelligence in Medicine, Vol. 32, pp.71–83. Lin, C. (2001) ‘Formulations of support vector machines: a note from an optimization point of view’, Neural Computation, Vol. 13, pp.307–317. Luciani, D., Marchesi, M. and Bertolini, G. (2003) ‘The role of Bayesian networks in the diagnosis of pulmonary embolism’, J. Thromb: Haemost., Vol. 1, pp.698–707. Melliez, H., Levy, D., Boelle, P.Y., Dervaux, B., Baron, S. and Yazdanpanah, Y. (2008) ‘Cost and cost-effectiveness of childhood vaccination against rotavirus in France’, Vaccine, Vol. 26, pp.706–715. Ngan, P.S., Wong, M.L., Lam, W., Leung, K.S. and Cheng, J.C.Y. (1999) ‘Medical data mining using evolutionary computation’, Artificial Intelligence in Medicine, Vol. 16, No. 1, pp.73–96. Ong, L.S., Shepherd, B., Tong, L.C., Seow-Choen, F., Ho, Y.H., Tang, C.L., Ho, Y.S. and Tan, K. (1997) ‘The colorectal cancer recurrence support (CARES) system’, Artificial Intelligence in Medicine, Vol. 1, pp.175–188. Phillips-Wren, G., Sharkey, P. and Morss-Dy, S. (2007) ‘Mining lung cancer patient data to assess healthcare resource utilization’, Expert Systems with Applications, available at http://www.sciencedirect.com. Polat, K., Kara, S., Guvenc, A. and Gunes, S. (2008) ‘Utilization of discretization method on the diagnosis of optic nerve disease’, Computer Methods and Programs in Biomedicine, article in press. Punch, D. and Rand, B. (1996) Lil-gap 1.01 – User’s Manual: Genetic Algorithms Research and Application Group (GARAGe), Department of Computer Science, Michigan State University. Quinlan, J.R. (1993) C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers, San Mateo, Calif. Rechenberg, I. (1965) Cybernetic: Solution Path of an Experimental Problem, Ministry of Aviation, Royal Aircraft Establishment, UK. Serpen, G., Tekkedil, D.K. and Orra, M. (2008) ‘Acknowledge-based artificial neural network classifier for pulmonary embolism diagnosis’, Computers in Biology and Medicine, Vol. 38, pp.204–220. Setiono, R. (1996) ‘Extracting rules from pruned neural networks for breast cancer diagnosis’, Artificial Intelligence in Medicine, Vol. 8, pp.37–51. Shah, S., Kusiak, A. and Dixon, B. (2003) ‘Data mining in predicting survival of kidney dialysis patients’, in Proceedings of Photonics West – Bios 2003, L.S. Bass et al. (Eds.): Lasers in Surgery: Advanced Characterization, Therapeutics, and Systems XIII, pp.1–8, SPIE, Belingham, WA, January, Vol. 4949. Stepaniuk, J. (1999) ‘Rough set data mining of diabetes data’, Lecture Notes in Computer Science, Springer Berlin /Heidelberg, Vol. 1609/1999, pp.1611–3349, ISSN 0302-9743, online. Tan, K.C., Yu, Q., Heng, C.M. and Lee, T.H. (2003) ‘Evolutionary computing for knowledge discovery in medical diagnosis’, Artificial Intelligence in Medicine, Vol. 27, pp.129–154. Tang, K.S., Man, K., Kwong, F.S. and Liu, Z.F. (1998) ‘Design and optimization of IIR filter structure using hierarchical genetic algorithms’, IEEE Transactions on Industrial Electronics, Vol. 45, pp.481–487. Ture, M., Tokatli, F. and Kurt, I. (2008) ‘Using Kaplan-Meier analysis together with decision tree methods (C&RT, CHAID, QUEST, C4.5 and ID3) in determining recurrencefree survival of breast cancer patients’, Expert Systems with Applications, available at http://www.sciencedirect.com.

218

B. Pandey and R.B. Mishra

Wiggins, M., Saad, A., Litt, B. and Vachtsevanos, G. (2008) ‘Evolving a Bayesian classifier for ECG-based age classification in medical applications’, Applied Soft Computing, Vol. 8, No. 1, pp.599–608. Wiltgen, M., Gerger, A. and Smolle, J. (2003) ‘Tissue counter analysis of benign common nevi and malignant melanoma’, Int. J. of Medical Informatics, Vol. 69, pp.17–28. Witten, I.H. and Frank, E. (2005) Data Mining: Practical Machine Learning Tools and Techniques, 2nd ed., Morgan Kaufmann, San Francisco. Wong, M.L., Lam, W., Leung, K.S., Ngan, P.S. and Cheng, J.C.Y. (2001) ‘Discovering knowledge from medical databases using evolutionary algorithm’, IEEE Engineering in Medicine and Biology, Vol. 19, No. 4, pp.45–55. Xue, W., Sun, Y. and Lu, Y. (2006) ‘Research and application of data mining in traditional Chinese, medical clinic diagnosis’, in Proceedings of 8th International Conference on Signal Processing (ICSP), Vol. 4. Yang, M-D. and Su, T-C. (2008) ‘Automated diagnosis of sewer pipe defects based on machine learning approaches’, Expert Systems with Applications, Vol. 35, No. 3, pp.1327–1337. Zhang, X. and Sun, C. (2008) ‘Dynamic intelligent cleaning model of dirty electric load data’, Energy Conversion and Management, Vol. 49, No. 4, pp.564–569. Zhuang, Z.Y., Churilov, L., Burstein, F. and Sikaris, K. (2007) ‘Combining data mining and case-based reasoning for intelligent decision support for pathology ordering by general practitioners’, European Journal of Operational Research, available at http://www.sciencedirect.com.