Using Wearable Sensors to Monitor Physical Activities of ... - IEEE Xplore

100 downloads 1157 Views 637KB Size Report
Page 1. Using Wearable Sensors to Monitor Physical Activities of Patients with. COPD: A ... building a wearable sensor monitoring system for tracking changes ...
2009 Body Sensor Networks

Using Wearable Sensors to Monitor Physical Activities of Patients with COPD: A Comparison of Classifier Performance Shyamal Patel, Chiara Mancinelli, Paolo Bonato Dept of PM&R, HMS Spaulding Rehabilitation Hospital Boston, USA

Jennifer Healey

Marilyn Moy

Intel Digital Health Group Boston, USA [email protected]

VA Boston Healthcare System Boston MA [email protected]

[email protected] [email protected] [email protected] admission, have potential substantial clinical and economic benefit. In our study we examine the use of wearable sensors to monitor physical activities and related physiological responses of patients with COPD. We hypothesize that physical activity would be reduced and physiological responses altered at the time of an exacerbation. We also hypothesize that monitoring cumulative, free-living physical activity and physiological responses in the patient’s home environment will detect these changes in physical activity and physiological responses during an exacerbation. Monitoring physiological responses in context of the activity level will allow us to build models to predict exacerbation episodes. Prior work [5,6,14] demonstrates that activity recognition with a high degree of accuracy can be achieved using body worn accelerometers. In an earlier study [13], undertaken by our group, we devised a technique based on hierarchical clustering to classify motor activities of patients with COPD based on accelerometer data. In this paper we present results from our study on the use of accelerometers for physical activity recognition on 10 subjects with COPD. We collected data on 15 patients but due to technical problems we could not include 5 of the patients in the data analysis. We present a comparison of the predictive capability of different classifiers. We examine, from the classification point of view, the information captured by individual axes of a tri-axial accelerometer. We perform simulations to determine an appropriate value of window length to extract features and the features that are most informative. An estimation a reduced set of sensors that would provide a comparable classification performance, as has been achieved by using 10 sensors in this study, is presented.

Abstract-Chronic obstructive pulmonary disease (COPD) is a major public health problem. Early detection and treatment of an exacerbation in the outpatient setting are important to prevent worsening of clinical status and need for emergency room care or hospital admission. In this study we use accelerometers to capture motion data; and heart rate and respiration rate to capture physiological responses from patients with COPD as they perform a range of Activities of Daily Living (ADL) and physical exercises. We present a comparative analysis of classification performance of a set of different classification techniques and factors that affect classification performance for activity recognition based on accelerometer data. This is the first step towards building a wearable sensor monitoring system for tracking changes in physiological responses of patients with COPD with respect to their physical activity level.

Keywords-wearable sensors; classification; COPD I.

INTRODUCTION

COPD is currently the fourth leading cause of death in the world, [1] and is projected to rank fifth in 2020 as a worldwide burden of disease. [2] Exacerbations are a prominent part of the natural history of COPD, resulting in functional impairment and disability. An exacerbation is usually defined as increase in dyspnea, cough, or change in character of sputum that requires therapy such as antibiotics, intensified inhaler regimen, or oral steroids. Economic analyses have shown that over 70 % of COPD-related health care expenditures result from emergency room visits and hospital care for exacerbations; this translates to > $10 billion annually in the Unites States. [3] Early identification of an exacerbation and prompt treatment improves recovery time, reduces risks of emergency hospitalization, and is associated with better health-related quality of life (HRQL). [4] Thus, strategies for early detection of exacerbations leading to early treatment, before patients need emergency room care or hospital

978-0-7695-3644-6/09 $25.00 © 2009 IEEE DOI 10.1109/BSN.2009.53 10.1109/P3644.52

236 234

II.

For ADLs the patients walked on a level surface in the corridor, walked up and down an inclined ramp, climbed up/down a flight of stairs, folded laundry while sitting in a chair (5 sets of cloths), swept the floor with a broom (2-3min), used the bathroom (use toilet, wash hands, comb hair etc), used the cafeteria (browsing for food, sit at a table and eat) and took an elevator up/down 5 floors.

METHODS

B. Feature Extraction Sets of thirty 5 second epochs were randomly selected from the sensor data corresponding to each task. Before extracting features, the raw data were lowpass filtered with a cutoff frequency of 12 Hz to remove the high frequency noise. The features were chosen to represent characteristics such as intensity, range of motion, orientation, modulation, and signal complexity. Intensity was measured as the root-meansquare (RMS) value of the detrended accelerometer signal. The modulation of the output of each sensor was used to represent dynamic characteristics of the tasks, and was calculated as the range of the autocovariance of each channel. Large values of this feature were indicative of intervals of rapid movements interspersed with intervals of slow movements. The mean value was calculated as an indicator of the orientation of body segment. Range was calculated as the maximum peak-to-peak signal value. Large values of range indicated high activity with significant movement of a body segment. An estimate of entropy was calculated as in indicator of the signal complexity. Entropy captures the amount of randomness or the level of unpredictability of a signal. Correlation coefficient at zero lag between X, Y and Z (two axes at a time) was calculated as an indicator of the coordination of movement. The features were extracted for each of the axes (i.e. X, Y and Z) for each sensor. In total 180 features were extracted. Task type was used as class label.

(a)

Figure 1. An illustration of the sensor positions on the body of a patient. (a) A SHIMMER node.

A. Data Collection 15 patients with COPD were recruited for this study. All the patients were males, age 71 + 6 (Mean + SD) years, with moderate-severe COPD. We collected heart rate and respiration rate as indicators of physiological responses and tri-axial accelerometers and gyros as indicators of physical activity level. The equipment used to collect accelerometer and gyro data is the SHIMMER wireless system by Intel® (Figure 1(a)). SHIMMER consists of a TI MSP430 microprocessor; a Chipcon CC2420 IEEE 802.15.4 2.4 GHz radio; a MicroSD card slot; a tri-axial MEMS accelerometer, the Freescale MMA7260Q, which can be configured with sensitivities of 1.5, 2, 4, or 6g. Compared with other wireless sensors, SHIMMER achieves a smaller footprint using conventional board technology and integrates a lithium-polymer battery. We used 10 SHIMMER nodes with 2 on each arm, 2 on each leg, one on the sternum and one in the pocket of patients’ trouser. An illustration of the sensor setup is shown in figure 1. Raw accelerometer data was sampled at 50Hz. The activities that the patient performed can be broadly divided into two categories 1) Gym exercises and 2) Activities of daily living (ADLs). As a part of the gym exercises, the patient walked on a treadmill, rode a stationary bike and used an arm ergometer (sitting in a chair) for 3min each and climbed up/down a flight of five gym stairs 5 times.

C. Feature Set Visualization As a first processing step we wanted to visually inspect the structure of the feature space. To do this it was necessary to reduce the dimensionality of the dataset by selecting features that captured the characteristic accelerometer patterns associated with different tasks. Principle Component Analysis (PCA) is one of the most popular dimensionality reduction techniques. In order to reduce computational complexity and minimize the influence of redundant features, a PCA was applied to the feature set, and the first 2 PCs were used for performing scatter plots.

235 237

generalization. All the classifiers were implemented using the Weka 3 data mining software [15].

D. Classification To test classification performance we compare 6 different types of classifiers. The first algorithm was the Instance Based Learning (IBL) [8]. IBL is based on the nearest neighbor approach. It uses Euclidean distance to find the closest training instance to the given test instance and determine the class. Contrary to the nearest neighbor approach, IBL classifier is based on specific instances of the training dataset rather than the entire dataset which makes them relatively faster. The second type of classifier we used was the Naïve Bayes (NB) [9]. NB is a simple probabilistic classifier based on the assumption that the features for a given class are mutually independent, which means that the decisions are made as if all features are equally important. The third classifier we used was the J48 (a version of C4.5) decision tree [10]. We used the J48 algorithm with reduced error pruning using a 10-fold cross-validation. The fourth classifier is the Multilayer Perceptron (MLP) [9]. MLP is based on the backpropogation technique and is one of the most common neural network structures as they are simple and effective. The hidden layers were determined automatically by the algorithm and all the nodes were sigmoid. The fifth classifier is Random Forest (RF) [11]. Random forests are ensembles of weaklycorrelated decision trees that “vote” on the correct classification of a given input. These ensembles minimize the risk of over-fitting the training set, a significant and well-known problem with individual decision trees. For our algorithm we used populated our RF with 10 trees. The sixth classifier is a Support Vector Machine (SVM) [12] with a radial basis function kernel. A SVM is essentially a linear classifier that creates a maximum-margin hyperplane between datasets belonging to different classes. Kernels are used to transform the data so that it is easier to create linear hyperplanes in the transformed feature space. We determined, empirically, that a Gaussian radial basis function kernel (gamma = 0.01, misclassification cost C = 1) performs best for our data set. Classification was performed by two methods 1) 10-fold cross validation and 2) Leave one subject out. By using a 10-fold cross validation approach we basically divide the data from a single subject into 10 subsets. Every iteration we take 9 subsets as the training set and use the remaining 1 subset as a testing set. A 10-fold cross validation approach evaluates the performance of a classifier when it has been trained using subject specific information. Leave one subject out is an approach where the data from 9 subjects is used as a training set and the data from the subject that is left out is used as a training set. This approach evaluates how effective a technique is to

1 0.5

A

0 -0.5 -1

1 0.5

B

0 -0.5 -1

Treadmill

Arm Ergometer

Stairs

Laundry

Bike

Sweeping

Bathroom

Figure 2. An example of raw accelerometer data (g) from (A) Left Upperarm and (B) Left Thigh for different tasks.

4

walk3 stairsDN ramp cafe stairsUP walk2 elv walk1

2

0

Low Intensity Leg Movements

d n -2 2

bath sweep bike lndry GymStairs ArmErg Tdmill stand sit

-4

-6

-8 -15

Arm Movements

-10

-5

0

5

10

1st

Figure 3. Scatter plot of the 1st and 2nd principle components. The points are color coded by task.

III.

RESULTS

In figure 2, we can see an example of raw accelerometer data from the left upperarm accelerometer and left thigh. We can see that different tasks have a distinct signature, which can be easily identified by visual inspection. The features we extract from this raw accelerometer data intend to capture these differences.

236 238

TABLE I. CONFUSION MATRIX FOR CLASSIFICATION OF ACTIVITIES USING SVM CLASSIFIER

ACTUAL CLASS

% Correct

CLASSIFIED AS Sit

Stand

Walk

ArmErg

Stairs

Laundry

Bike

Sweep

Bath

Elevator

Café

Sit

97

0

0

0

0

0

0

0

0

0

3

Stand

0

70

1

0

0

0

0

0

10

18.66

0.33

Walk

0

0.06

92.93

0

6.4

0

0

0

0.2

0

0.4

ArmErg

0

0

0

98.33

0

1.67

0

0

0

0

0

Stairs

0

0

6.111

0

93.22

0

0

0.11

0.55

0

0

Laundry

0

0

0

0

0

86

0

0

12

0

2

Bike

0

0

0

0

0.33

0

99.67

0

0

0

0

Sweep

0

0

0.67

0

0.67

0

0

95

3.67

0

0

Bath

0

5.67

1.33

0

2.33

1.33

0

3.33

70

12.67

3.33

Elevator

0

16.33

0.33

0

0

1

0

0

4

77.67

0.67

Café

7.67

0.67

14.33

0

4.33

5.67

0

0.67

7

3.33

56.33

methods. We can see that for a 10-fold CV most of the techniques perform comparably. The J48 classifier is performs poorly as compared to the others but the error is still pretty low. This suggests that high classification accuracy can be achieved using a subject specific training approach. The results for the leave-onesubject-out approach show that the SVM classifier outperforms the others. The MLP and IBL classifier perform well but they fall short of SVM. In table 1 we can see the confusion matrix for the leave-one-subject-out classification using a SVM classifier. The rows are actual classes and columns are what they have been classified as. We can see from the confusion matrix that gym exercises (walking, arm ergometer and bike) have been classified with a high degree of accuracy. Most of the misclassifications come from tasks that are very similar in nature such as standing and taking the elevator. Approximately 19% of the standing instances have been classified as elevator and approximately 16% of elevator instances have been classified as standing. Another source of misclassifications are activities like eating in the café and using the bathroom. Both these tasks involve the patient performing a series of tasks which involve standing, sitting and using hands. So it is not surprising that almost 8% of café has been classified as sitting and 14% as walking. These observations are encouraging as it means that most of the misclassifications occur due to the similar nature of the tasks

A scatter plot of the first and second principle components of the feature space for one of the patient is shown in figure 3. We can see that points belonging to the similar tasks tend to cluster together. Most of the low intensity tasks (sitting, standing and going up an elevator) are clustered on the top left of the scatter plot. Tasks that involve predominantly arm movements (arm ergometer, sweeping and folding laundry) are clustered on the bottom portion of the scatter plot. Whereas tasks that involve a lot of leg movement (walking, riding a stationary bike and going up/down stairs) are clustered in the top right portion of the scatter plot. Tasks such as eating in the café and using the bathroom involve a combination of different basic activities such as walking, sitting and use of hands and hence the points belonging to these activities have a larger spread. TABLE II. CLASSIFICATION ERROR (%) 10x Cross Validation

Leave-One-SubjectOut

IBL

1.26 (+ 0.56)

16.33 (+ 4.95)

NB

2.92 (+ 1.25)

23.19 (+ 11.24)

J48

5.92 (+ 2.30)

29.47 (+ 11.72)

MLP

1.24 (+ 0.43)

14.18 (+ 9.79)

RF

2.08 (+ 0.44)

18.57 (+ 9.25)

SVM

1.31 (+ 0.55)

12.09 (+ 4.64)

Classifier

In table 2 we can see classification results for leave-one-subject-out and 10-fold cross-validation

237 239

One of the questions we try to answer here is, “How many sensors are required to achieve good classification performance?”. In an attempt to find an answer first we investigate the impact of the three axes (X, Y and Z) of a tri-axial accelerometer on classification error. In figure 5, we can see a comparison of SVM classification error when features are extracted from different combinations of X, Y and Z axes. We can see that X most of the information about a task is captured by the X axis. The combination of XY and XZ give comparable results to XYZ. To find a reduced sensor set, we decided to try different combinations of sensors. First we tried to drop one sensor at a time, which did not result in any significant deterioration in classification performance. It is impractical to present the results of all combinations of sensors so in table 3 we present the best case SVM classification error when only 1, 2, 3, 4, 6 & 8 sensors were used. In the table RUA/LUA stands for right/left upper arm, RFA/LFA stands for right/left forearm, RT/LT stand for right/left thigh RS/LS stand for right/left shank. When using only 1 sensor the best classification error (~30%) was achieved by RT and LT. Both RT and LT sensor are useful for tasks involving leg activity. Also they can differentiate between seated and standing position, which is the main reason we get significantly better results than using any of the other sensor locations. The worst classification performance was achieved by using only the pocket sensor followed by the one on the upper back. For the combination of 2 sensors RFA/RT and LFA/LT performed equally well. Again, this shows that just by using 2 sensors we can achieve classification accuracy > 80%. Using 3 sensors RFA/RT/RS we achieved a classification error of ~15%. Using 3 sensors on only the left side (LFA/LT/LS) gave us slightly poor results as compared to the right side. This might be attributed to the fact that most of the patients were right handed. We can see that as we increase the number of sensors the classification performance improves and approaches what can be achieved using the full sensor setup. These results are important from the point of view of long term home monitoring as one would like to use as few sensors as possible to improve patient comfort.

Figure 4. Bar plot of SVM classification error (%) vs window length (s) for feature extraction.

Next, we tried to estimate an appropriate length for feature extraction. So far we have used a 5sec window length for feature extraction based on empirical observation. In figure 4 we can see a bar plot of SVM classification error for window lengths from 1s to 10s. There is ~2% gain as we go from 5s to 6s, but after 6s we do not gain much in terms of classifier accuracy by increasing the window length. All the results from this point onwards have been obtained using a 6s window. This result is close what was earlier reported (6.7s) by Bao etal [14].

Figure 5. Bar plot of SVM classification error for features extracted from different combinations of the axes of a tri-axial accelerometer.

238 240

Table III. BEST CASE CLASSIFICATION ERROR (%) FOR DIFFERENT SENSOR COMBINATIONS No. 1

2

3

4

6

RT

29.07 (+ 10.50)

LT

28.17 (+ 8.21)

RFA/RT

17.72 (+ 6.32)

LFA/LT

17.56 (+ 5.66)

RFA/RT/RS

13.94 (+ 5.99)

LFA/LT/LS

16.70 (+ 6.17)

RUA/RFA/RT/RS

13.07 (+ 5.86)

RFA/LFA/RT/LT

13.31 (+ 4.11)

RFA/LFA/RT/RS/LT/LS

11.33(+ 4.41)

RUA/RFA/LUA/LFA/RT/LT

12.94 (+ 3.77)

8

RUA/RFA/LUA/LFA/RT/RS/LT/LS

10.78 (+ 4.12)

10

ALL

10.56 (+ 4.46)

IV.

[3]

Mean (+ STD)

Sensor Locations

[4]

[5]

[6]

[7] [8]

DISCUSSION

We have shown that by using subject independent data, classification of physical activities with high accuracy (~90%) can be achieved. Our simulations show that a 6s window length for feature extraction results in a low classification error. We showed that by using a reduced sensor set we can achieve good classification performance if the sensors are selected carefully. By using only 3 sensors placed on the right side of the body we achieved a classification accuracy > 86%. For future work we would like to pursue techniques that can allow us to improve classification accuracy. We achieved very high level of classification accuracy using the 10-fold cross-validation approach, which means that, by combining augmenting subject independent training data with a small amount of training data derived from the target subject, the classification performance can be improved significantly. We would also like to do an exhaustive analysis of the feature space to find the most useful features and remove the noisy ones. Also, we would like to understand the changes in physiological responses of patients with respect to different activities.

[9] [10] [11] [12] [13]

[14]

[15]

ACKNOWLEDGMENT We would like to thank Intel® for supporting this work. [1] [2]

World Health Report. Geneva, Switzerland: World Health Organization, 2000. C. J. Murray and A. D. Lopez, "Mortality by cause for eight regions of the world: Global Burden of

239 241

Disease Study," Lancet, vol. 349, pp. 1269-76, 1997. S. D. Sullivan, S. D. Ramsey, and T. A. Lee, "The economic burden of COPD," Chest, vol. 117, pp. 5S-9S, 2000. T. M. Wilkinson, G. C. Donaldson, J. R. Hurst, T. A. Seemungal, and J. A. Wedzicha, "Early therapy improves outcomes of exacerbations of chronic obstructive pulmonary disease," Am J Respir Crit Care Med, vol. 169, pp.1298-303, 2004. E. Munguia Tapia, S. Intille, and K. Larson, “RealTime Recognition of Physical Activities and Their Intensities Using Wireless Accelerometers and a Heart Rate Monitor”, in Proceedings of the 11th International Conference on Wearable Computers (ISWC '07). 2007 N. Ravi, N. Dandekar, P. Mysore and M. Littman “Activity Recognition from Accelerometer Data”, in Proceedings of the 17th Innovative Applications of Artificial Intelligence Conference, 1541 – 1546. 2005 IT Jolliffe, Principal Components Analysis. New York: Springer-Verlag, 1986. D. Aha and D. Kibler, “Instance-based learning algorithms” Machine Learning, vol. 6, pp.37-66, 2001 RO Duda, PE Hart, DG Stroke, “Pattern Classification”, 2nd edition, Wiley Interscience, 2001. Quinlan, J. R. “C4.5: Programs for Machine Learning”, Morgan Kaufmann Publishers, 1993. L. Breiman, "Random Forests," Machine Learning, vol. 45, pp. 5-32, 2001. Vapnik, V., “The Nature of Statistical Learning Theory”, Springer-Verlag, 1995. D. M. Sherrill, M. L. Moy, J. J. Reilly, and P. Bonato, "Using hierarchical clustering methods to classify motor activities of COPD patients from wearable sensor data," J NeuroEngineering Rehab, vol. 2, pp. 16, 2005. L. Bao and S. S. Intille, "Activity recognition from user-annotated acceleration data," in Proceedings of PERVASIVE 2004, vol. LNCS 3001, A. Ferscha and F. Mattern, Eds. Berlin Heidelberg: Springer-Verlag, pp. 1-17, 2004 Ian H. Witten and Eibe Frank, "Data Mining: Practical machine learning tools and techniques", 2nd Edition, Morgan Kaufmann, San Francisco, 2005