Classification of ECG using some novel features

3 downloads 0 Views 445KB Size Report
Component Analysis (PCA) are used for extracting some features and then Multi-Layer ... To name a few, in [2], N. Emanet presented an algorithm named ... In this paper, we propose an ECG arrhythmia classification scheme based on the ...
ICETACS 2013

Classification of ECG using Some Novel Features Pratiksha Sarma

S. R. Nirmala

Kandarpa Kumar Sarma

Department of Electronics and Communication Engineering, Gauhati University Guwahati-781014, Assam, India Email: [email protected]

Department of Electronics and Communication Engineering, Gauhati University Guwahati-781014, Assam, India Email: [email protected] m

Department of Electronics and Communication Technology, Gauhati University Guwahati-781014, Assam, India Email: [email protected]

Abstract— Heart diseases are frequent reasons of death. Hence, there is always a need to develop systems that can provide prior indication about the state of the heart. This is also required because medical facilities may not be uniform everywhere. In such situation certain innovative approaches using certain signal processing techniques can provide considerable support. As a follow up to such possibilities, system for automatic recognition of cardiac arrhythmias has become necessary and important for diagnosis of cardiac abnormalities. S everal algorithms have been proposed to classify cardiac arrhythmias in the literature; however, many of them fail to perform optimally. Here, we have proposed a method for ECG arrhythmia classification using Artificial Neural Network (ANN) and a novel feature set. Fast Fourier Transform is used for pre-processing the ECG recordings. Linear Prediction Coefficients (LPC) and Principal Component Analysis (PCA) are used for extracting some features and then Multi-Layer Perceptron (MLP) ANN performs the classification.

classifier. The performance of the classifier can be improved by reducing the dimensions of features.

Keywords— Electrocardiogram (ECG), Arrhythmia, Artificial Neural Network (ANN)

I.

INT RODUCT ION

The state of human physiological activities is monitored using signals or captured images. In recent times, huge part of the research has been focused on the processing of biomedical signals or images. The cardiac arrhythmias are frequent reason of death around the world. Such arrhythmias arise from any abnormality in the rate, regularity, and site of origin or activation sequence of the heart electric impulse. The classification of heartbeats is an important step for arrhythmia identification since many arrhythmias manifest of a sequence of heartbeats with abnormal timing or ECG morphology [1]. The Electrocardiogram (ECG) is a non-invasive diagnostic and monitoring tool that records the electrical activity of the heart at the body surface. The amplitude and duration of each wave in ECG signals are often used for the manual analysis. It provides accurate information about the performance of the heart and cardiovascular system [2]. From the plot of an ECG, a cardiologist can analyze the shape of the waveform and determine the nature of diseases affecting the heart. The abnormal beats in the ECG pointing to a particular disease can be rare and widespread in the span of a large record. Therefore, the work of the cardiologist tracking down abnormalities can be tedious. Thus, it becomes helpful to use computer-based analysis and classification techniques [3]. Artificial Neural Network (ANN) has been treated as a powerful classifier to deal with ECG arrhythmia classification problems. The features characterizing the different morphological properties of ECG are used as input to the

978-1-4673-5250-5/13/$31.00 ©2013 IEEE

Fig. 1. A typical ECG waveform

In order to better understand how an ECG works, one needs to have a better knowledge of the signal outputted by the leads that are analyzing the heart. A typical heartbeat consists of a PQRST wave as shown in Figure 1. Each of these letters represents a different part of the signal. Starting with the P wave, this represents atrial contraction. This is when the blood is flowing from the atria into the ventricles. The SA node sends out a signal to the AV node during this time. Next, the QRS wave is when the ventricles contract. This depolarization occurs after the SA node has stimulated the ventricles to contract. This has higher amplitude due to the fact that the ventricles are so powerful. They have to deliver blood through the lungs and also through the entire body, depending on whether it is the right of left ventricle. After this, the T wave represents the ventricular recovery time. Each heartbeat contains each part of this signal except for some arrhythmia cases [4]. II.

REVIEW OF LIT ERAT URE

Before From a review of literature, it is found that various methods have been successfully applied for ECG arrhythmia classification. To name a few, in [2], N. Emanet presented an algorithm named Random Forest to classify five types of ECG beats using the ECG signals obtained from the MIT/BIH database were used to classify the five heartbeat classes (N, L,

ICETACS 2013 R, V, P). Feature extraction from the ECG signals for classification of ECG beats was performed by using discrete wavelet transform (DWT) with a classification accuracy of 99.8%. In [5], Wang et al. describes an effective electrocardiogram (ECG) arrhythmia classification scheme consisting of a feature reduction method combining principal component analysis (PCA) with linear discriminant analysis (LDA), and a probabilistic neural network (PNN) classifier to discriminate eight different types of arrhythmia from ECG beats. Their average classification accuracy was 99.71%. In [6], H. G. Hosseini et al. describes a multi-stage network including two multilayer perceptron (MLP) and one self organizing map (SOM) networks. The input of the network is a combination of independent features and compressed ECG data. They classified six common ECG waveforms using ten ECG records of the MIT/BIH arrhythmia database. Their system achieved an average recognition rate of 0.883 within a short training and testing time. In [7], H. Gothwal et al. presented a method of ECG classification using Fast Fourier Transform (FFT) and Neural Network. They classified arrhythmias into Tachycardia, Bradycardia, Supraventricular Tachycardia, Incomplete BBB and Ventricular Tachycardia using MIT-BIH database with 98.48% accuracy. In this paper, we propose an ECG arrhythmia classification scheme based on the feature extraction and reduction method and the ANN classifier to discriminate five types of ECG arrhythmias which are obtained from the MIT-BIH arrhythmia database. Section III discusses briefly the methods and materials used in the proposed system. The proposed scheme is illustrated in section IV; the experimental details and results are presented in section V and finally the conclusion in section VI. III.

M AT ERIALS ANS MET HODS

A. ECG Database Data from the MIT-BIH arrhythmia database is used in this study, which includes recordings of many common and lifethreatening arrhythmias along with examples of normal sinus rhythm [8]. The database contains 48 recordings, each containing two 30-min ECG lead signals (denoted lead A and B). This paper presents a method to analyze electrocardiogram (ECG) signal, extract the features, for the classification of heart beats according to different arrhythmias. Data were obtained from 30 records of the MIT-BIH arrhythmia database (only one lead), selected randomly. Our study is focussed on the classification of five largest heartbeat classes in the MIT-BIH arrhythmia database: (i) normal beat (N); (ii) premature ventricular contraction (PVC); (iii) left bundle branch block (LBBB); (iv) right bundle branch block (RBBB); (v) paced beat (PB). Table 1 shows the distribution of these heartbeat types among the various ECG recordings present in the database. Figure 2 shows the reference heatbeat types considered for this work.

TABLE I.

DISTRIBUTION OF N, PB, LBBB, RBBB AND PVC BEATS IN MIT -BIH DATABASE

Heartbeat type

ECG recordings containing the respective type

N

100-106, 108, 112-117, 119, 121-123, 200-203, 205, 208-210, 212, 213, 215, 217, 219-223, 228, 230, 231, 233, 234

PB

102, 104, 107, 217

LBBB

109, 111, 207, 214

RBBB

118, 124, 207, 212, 231, 232

PVC

100, 102, 104-109, 111, 114, 116, 118, 119, 121, 123, 124, 200-203, 205, 207-210, 213-215, 217, 219, 221, 223, 228, 230, 231, 233, 234

Fig. 2. Reference waveforms for the five heartbeat types under classification (N, PVC, LBBB, RBBB, PB)

B. Linear Predictive Coding Among various feature extraction methods, linear prediction is one of the powerful signal analysis techniques in the method of feature extraction. The basic problem of the LPC system is to determine the forward coefficients of the signal and its basic solution is the difference equation, which expresses each sample of the signal as a linear combination of previous samples. Such an equation is also called a linear predictor, which is better known as Linear Predictive Coding. The estimate is done by minimizing the mean-square error between the predicted signal and the actual signal. The most common representation of the linear prediction model is (n) = -

(1)

where is the predicted signal value, is the previous observed values and is the predictor coefficient.

-188-

ICETACS 2013 The error generated by the estimate is (2) where

is the true signal value.

C. Principal Component Analysis Principal Component Analysis (PCA) has been widely used in statistical data analysis, feature extraction, feature reduction and data compression. It transforms a number of possibly correlated variables into a number of uncorrelated variables called principal components, related to the original variables by orthogonal transformation. PCA is a technique which is generally used for reducing the dimensionality of multivariate datasets. Considering a vector of n random variables x for which the covariance matrix is the principal components (PCs) can be defined by (3) where z is the vector of n PCs and A is the n by n orthogonal matrix with rows that are the eigenvectors of [9]. The eigen values of are proportional to the fraction of the total variance accounted for by the corresponding eigenvectors, so the PCs explaining most of the variance in the original variables can be identified.

In the preprocessing stage, the raw ECG signal processed to remove dc offset. This is done by filtering in the frequency domain. After taking FFT of original signal, the lower frequency components are zeroed. This results in a low frequency offset removed signal with the baseline at zero reference line. On the resultant signal inverse FFT is applied to obtain a preprocessed signal as shown in figure 4. In the feature extraction procedure, a fraction of signal centered on the R peak is extracted manually for each type of heartbeat. The R-peak points for all the heartbeat types are Input (ECG signal from MIT-BIH Arrhythmia database) Preprocessing (FFT Filtered)

Processing (Feature extraction using PCA, LPC and PCA+LPC)

Classification into 5 types: N, PB, RBBB, LBBB, PVC (using MLP NN)

D. Artificial Neural Network Artificial Neural Networks (ANNs) have been applied to a number of real world problems with considerable complexity. ANNs have inherent massively parallel-distributed architectures and are employed for information processing and data clustering of different systems. They are characterized as computational models and consist of a collection of processing elements called neurodes (neural nodes), and weighted connections (synapses). An ANN trains from the presented input examples and learns to extract their statistical properties. The trained ANN can be employed for classification of a set of information including the training examples. It can also be applied to recognize each member of a similar set. Overall, ANNs have the ability to learn or adapt, recognize, classify, and organize data by rapid processing of complex and non-linear multidimensional and temporally varying information. They can deal with real-time processing problems that have persistently evaded solution by alternative methodologies. Here, we have used Multilayer Perceptron (MLP) which is the most popular neural networks used by researchers. It is trained by (error) back propagation (BP) algorithm [7]. There are different kinds of neural networks classified according to operations they perform or the way of interconnection of neurons. IV.

PROPOSED MET HOD

In this paper, we propose an ANN based algorithm for classification of arrhythmias into five basic types discussed above. Our algorithm uses some feature set obtained from LPCs and PCAs for each class , which is then applied as input features to the MLP neural network for classification. The system model of the proposed work is given in Figure 3. It consists of three units: preprocessing unit, processing unit and the classifier unit.

Fig. 3. Block diagram of the proposed system

Fig. 4. A preprocessed signal of the MIT -BIH database

taken from the annotation text file available in the physiobank ATM [8]. It is found that beat-to-beat changes in ECG features such as QRS complex, T wave or P wave could be identified by LPC and PCA from multi-beat, single lead ECG recordings. The LPC gives future predictable coefficient of a particular heartbeat. If there is beat-to-beat changes in ECG features, then it results in a change in the correlation between these features in PCA analysis. To ensure the important characteristic points of ECG like P, Q, R, S and T, a total of 250 sampling points at a 360 Hz sampling rate, 100 sampling points before and 150 sampling points after the R-peak, are collected as one ECG beat sample for each type. Then LPCs of 20 heart beat samples are added samplewise and the mean LPCs are calculated. This process is repeated for other type of ECG beats also. Then the mean LPCs for each type is taken as a feature vector to be fed to the classifier. For PCA, a matrix of each type is formed with

-189-

ICETACS 2013 20x250 samples. Then the PCAs are found using that matrix. PCAs are then sorted in descending order and 80% of the values is taken as a feature vector to be fed to the classifier. Then the combination of the two features is formed and fed as a feature matrix to the classifier. In the classification stage, a Feed-Forward Network with one input layer, two hidden layer and one output layer is proposed. The input layer consist of 250 neurons with the transfer function of tan-sigmoid, the hidden layer consist of 150 and 50 neurons in two-stages with the transfer function of log-sigmoid whereas the output layer consists of 5 neurons with the linear transfer function. MLP learns the behavior of the input data using BP algorithm. The BP algorithm compares the result that is obtained in this that step with the result was expected. The MLP computes the error signal using the obtained output and desired output. The computed signal error is then fed back to the ANN and is used to adjust the weights such that with each iteration the error decreases and the neural model gets closer and closer to produce the desired output. There are different training algorithms, while it is very difficult to know which training algorithm is suitable for a given problem. In order to determine the training algorithm, many parameters should be considered. For instance, the complexity of the problem, the number of data points in the training set, the number of weights, and biases in the network, and error goal to be evaluated [9]. V.

EXPERIMENT AL DET AILS AND RESULT S

The analysis for the present work is carried out separately for LPC, PCA and combination of features obtained from (LPC+PCA). A total of 100 sample heart beats consisting of all the types mentioned above is taken to train the classifier. Table II shows the training and testing samples used from MIT-BIH database. Table III shows the number of training and testing samples used for each heart beat type. TABLE II.

Heartbeat type

Training samples

Testing Samples

100, 101, 103, 105, 116

100, 108, 112, 115, 119

Total beats trained

Total beats tested

N

20

40

PB

20

40

LBBB

20

40

RBBB

20

40

PVC

20

40

Firstly the classifier was trained for LPC and PCA separately. Then the classifier was trained by taking the combination of the two. Then the classifier system was tested with 200 samples, 40 for each beat type. Table IV, V, VI show the classification rate of the three feature set taken individually for all the heartbeat types in the proposed system. The performance of the classifier is better for PCA based features compared to LPC features as shown in table IV and V. When the two features are combined, the classifier shows improved performance in classifying the first four types of heartbeats. For the case of PVC, the classifier performance is the average of LPC and PCA features. From the tables, it is found that overall. LPC feature set has an efficiency of 86%, PCA has 87.5% and combined feature set has 89.5%. As the beat selection is done manually, so the complete dataset was not taken. Signals in the dataset which had majority of a particular heartbeat type, were generally considered for the purpose.

T RAINING AND TESTING SAMP LES TAKEN FROM THE MIT BIH DATABASE

Heartbeat type N

TABLE III. NUMBER OF SAMP LES FOR TESTING AND TRAINING

TABLE IV. P ERFORMANCE OF THE LP C FEATURE USED Heartbeat type

Total beats tested

Beats tested correctly

Classification rate for LPC

40 40 40 40 40

35 36 34 34 33

87.5% 90% 85% 85% 82.5%

N PB LBBB RBBB PVC

TABLE V. Heartbeat type

P ERFORMANCE OF THE P CA FEATURE USED Total beats tested

Beats tested correctly

Classification rate for PCA

PB

102, 104, 107, 217

102, 104, 107, 217

N

40

35

87.5%

LBBB

109, 111, 207, 214

109, 111, 207, 214

PB

40

36

90%

118, 124, 212, 231, 232

118, 124, 207, 212, 231, 232

LBBB

40

35

87.5%

RBBB

40

35

87.5%

105, 106, 215, 233

215, 219,221,230, 228, 233

PVC

40

35

87.5%

RBBB PVC

-190-

ICETACS 2013 TABLE VI. P ERFORMANCE OF THE LP C+P CA FEATURE USED

[2]

Heartbeat type

Total beats tested

Beats tested correctly

Classification rate for LPC+PCA

N

40

37

92.5%

PB

40

36

90%

LBBB

40

37

92.5%

RBBB

40

35

87.5%

PVC

40

34

85%

VI.

[3]

[4]

[5]

CONCLUSION

In this paper, an ANN based ECG classification method was presented to classify heartbeats into five types (N, P, PVC, LBBB, RBBB). The aim of using Artificial Neural Networks (ANN) is to decrease the error by grouping similar parameters in training data. For classification, different feature extraction method like LPC, PCA and combination of the two are used. The performance of the system is found to be better for combined feature set than the individual feature sets. The performance of the system may be improved by increasing the training and testing dataset for each type of heartbeat.

[6]

[7]

[8]

REFERENCES [1]

P. Chazal, M . O. Dwyer and R. B. Reilly, “Automatic Classification of Heartbeats Using ECG M orphology and Heartbeat Interval Features”, IEEE Transactions On Biomedical Engineering, Vol. 51, No. 7, pp. 1196-1206, July 2004.

[9]

-191-

N. Emanet, “ECG Beat Classification by Using Discrete Wavelet Transform and Random Forest Algorithm”, Conference Publications, 2-4 Sept. 2009, pp 1-4, Istanbul, Turkey. A. K. Khoureich, “ECG beats Classification usingnwaveform similarity and RR interval”, Journal of M edical and Biological Engineering, 417-422, Jun 2011; doi: 10.5405/jmbe.905 R. Adams and A. Choi, “Using Neural Networks to Predict Cardiac Arrhythmias”, Florida Conference on Recent Advances in Robotics, pp. 1-6, 2012. J. S. Wang, W. C. Chiang, Y. C. Ting, Yang, and Y. L. Hsu, “An Effective ECG Arrhythmia Classification Algorithm”, In proceeding of Bio-Inspired Computing and Applications, August 11-14, 2011, Zhengzhou,China. H. G. Hosseini, K. J. Reynolds, D. Powers, “A M ulti-Stage Neural Network Classifier For Ecg Events”, 23rd Annual International Conference of the IEEE Engineering in M edicine and Biology Society, October 25-28, 2001, Istanbul, Turkey. H. Gothwal, S. Kedawat and R. Kumar, “Cardiac Arrhythmias detection in an ECG beat signal using Fast Fourier Transform and Artificial Neural Network”, J. Biomedical Science and Engineering, pp 289-296, April, 2001. M IT-BIH Arrhythmia Database, M assachusetts Institute of Technology, 77 M assachusetts Avenue, Cambridge, M A 02139 (1998), http://www.physionet.org/physiobank/database/mitdb B. Anuradha and V. C. V. Reddy, “ANN For Classification of Cardiac Arrhythmias”, ARPN Journal of Engineering and Applied Science, Vol. 3, No 3, June 2008.