User Daily Activity Classification from Accelerometry ... - Springer Link

5 downloads 0 Views 390KB Size Report
fixed sensor configurations or/and off line data processing, when only one sen- sor is used ... action occurring just before (after) a sit down (stand up) action. 4.
User Daily Activity Classification from Accelerometry Using Feature Selection and SVM Jordi Parera1, Cecilio Angulo1 , A. Rodr´ıguez-Molinero2, and Joan Cabestany1, 1

UPC. Technical University of Catalonia - CETpD. Technical Research Centre for Dependency Care and Autonomous Living Ne` apolis. Rambla de l’Exposici´ o, 59-69. 08800 Vilanova i la Geltr´ u - Spain 2 FHCSAA. Sant Antoni Abad Hospital - CETpD. Mobility and Gait Lab Sant Josep, 21-23. 08800 Vilanova i la Geltr´ u - Spain

Abstract. User daily activity monitoring is useful for physicians in geriatrics and rehabilitation as a indicator of user health and mobility. Real time activities recognition by means of a processing node including a triaxial accelerometer sensor situated in the user’s chest is the main goal for the presented experimental work. A two-phases procedure implementing features extraction from the raw signal and SVM-based classification has been designed for real time monitoring. The designed procedure showed an overall accuracy of 92% when recogninzing experimentation performed in daily conditions.

1

Introduction

One of the consequences of chronic diseases and strokes is the limitation of the motion capacity and a straightforward lack of physical activity of the people, having a direct impact on their quality of life. By analyzing user daily activity, medical treatments would count with valuable additional information, allowing a better diagnose and treatment assessment. Usual instruments to supervise patient’s mobility are based on the subjective perceptions of the observer, collected throught individual tests. However acceleration-based activity monitoring wearable systems are currently being investigated as a right direction to overcome subjectiveness and to provide a compactne and robust technical solution. Previous studies on ambulatory activity monitoring rely either, in multiple body fixed sensor configurations or/and off line data processing, when only one sensor is used for activity identification. The purpose of this presented research is to implement real time events’ classification of user’s daily life activity by means of a Support Vector Machine based on features processed from signals provided by one small-sized low power consumtion wearable sensor module, so the system can be used everywhere during user daily activity without any external infrastructure 

This work has been partly supported by the Spanish MEC project ADA-EXODUS (DPI2006-15630-C02-01) and the FP6 EU project CAALYX (IST-2005-045215). First author is supported by an UPC research grant. The authors would like thank Carlos P´erez-L´ opez, and Marc Torrent-Poch at CETpD for their technical support.

J. Cabestany et al. (Eds.): IWANN 2009, Part I, LNCS 5517, pp. 1137–1144, 2009. c Springer-Verlag Berlin Heidelberg 2009 

1138

J. Parera et al.

Fig. 1. 3D axis accelerometer sensor

needed. The device holds a battery, a triaxial accelerometer, all the necessary electronic components and a low power DSPic microprocessor, in Fig. 1 there is a photo of this measuring system. By implementing SVM-based real time activity labeling instead of simple acceleration logging, a possible interaction with the user is enabled, for drug taking alerts for instace, and the ammount of data used to store the user activity will also be reduced which will increase the device’s operating time. The rest of this paper is organized as follows: In section 2 the problem is formulated. Section 3 presents the proposed feature selection strategy based on discrimiant power for generated features. Section 4 is devoted to the experiment design for validating the performance of the procedure. Finally, conclusions and further research are given in Section 5.

2 2.1

Problem Formulation Input Signals

Input signals generating raw data are obtained from the triaxial accelerometer of the sensor device located on the user’s chest sampled at 50 Hz. Activities to be classified last from 0.5 seconds, for walk activity (steps), to 2.4 seconds, for the stand up activity, hence the input data at the processing block is windowed into a 3-column matrix with 120 samples. Data will be processed every half time window, so the classifier will identify which activity is being performed with a delay of 1.2 seconds. 2.2

User Daily Activities

The final objective in our research is to identify a set of five usual daily activities performed by the user while wearing the sensor device. The activities to be classified are: 1. Standing up from a sit state. This action lasts for 1-2.5 seconds with differentiated phases: forward bending, active raising, passive raising and downward bending. The timing and magnitude between these phases de termine many pathologic characteristics.

User Daily Activity Classification from Accelerometry

1139

2. Sitting down. This action also ranges 1-2.5 seconds. It is similar as an inverse of standing up, these pairs of signals being the most similar in the group of activities. 3. Transition action. Since the sit down and stand up actions are very similar, a control variable has been defined that improves the final classification by avoiding signals that are apart of the stand or sit signals. It is defined as an action occurring just before (after) a sit down (stand up) action. 4. Walk activity. The duration of a step is variable between 0.5 and 1.5 seconds, hence the classifier is trained to search for a complete walking action instead of focusing on individual steps. 5. Steady stand activity. This signal has no significant peaks or timing, but is not constant. The sitting steady state gives the same signal than the steady stand state, so we must detect before the sit-stand transition to correctly label the activity. 2.3

Signal Processing

Preprocessing raw data provided by accelerometer will help to avoid overfitting and generalizing the solution. Reducing the input dimension of data from a 3column matrix with 120 samples to a vector of 12 features, this reduction will also speed up the training and classification procedures, but it will add the cost of preprocessing. Main features extracted from the triaxial accelerometer data are the following, – Computing the modulus of the vector components is a simple solution to reduce the data dimension by 3 and also it makes the data orientation independent [1]: 1.5

1

Acceleration (G)

0.5

0

−0.5 accX accY accZ Labeled action

−1

−1.5

0

5

10

15

20

25

30

time(s)

Fig. 2. 3 axis Acceleration signal and labeled action

35

1140

J. Parera et al.

AM =

 AX 2 + AY 2 + AZ 2

(1)

– Orientation angles θ and φ based on the earth gravity allows to calculate the orientation of the sensor device. This method works fine in static movement conditions but also low centripetal accelerations don’t affect the result [2]. Impacts or large accelerated movements will, however, add error:  θ = arctan 2(X, Y 2 + Z 2 ) (2) φ = arctan 2(AY, AZ)

(3)

– Vertical (AV ) and forward (AF ) acceleration: once the orientation angles are known, the accelerations in the earth fixed reference frame can be computed from the mobile reference frame, that is, the wearable accelerometer, by applying the rotation matrix of the X− and Y − axis. These features are interesting because they give a value that is robust to the orientations of the measuring device: AV = − cos(Φ) · sin(Θ) · AX + sin(Φ) · AY + cos(Θ) · (AZ + G)

(4)

AF = cos(Θ) · AX + sin(Θ) · AZ

(5)

– Energy expenditure indicators using the acceleration signals are used to calculate the integral of absolute value (IAA) and the integral of magnitude (IAV)1 :  IAA = IAV =

|AX| + |AY | + |AZ|dt

 

AX 2 + AY 2 + AZ 2 dt

(6) (7)

– The increments of the acceleration module. – Frequency features can be obtained by performing the Fast Fourier Transform (FFT) on the accelerometer signals either, on each axis or on the acceleration magnitude, returning the main frequency components and magnitudes. However, calculating these features demand a high processing time, so they should be used only if proven effective.

3

Feature Selection

Combining both, precedent and standard statitiscal features (mean, max, min, range, standard deviation, entropy...), a huge set of possible features can be obtained that have been proved useful in other similar SVM-based articles [4]. These features gives relevant information about the signal as well as they enable 1

IAV has been identified as less accurate than IAA [3].

User Daily Activity Classification from Accelerometry

1141

to process the input stream of data in a batch process every half-size input window and not every sampled acceleration. The method used to incrementally select features is the discriminant power measured as the distance between the group of data obtained from observations when performing a certain activity and the other groups of observations when doing other activities. If different activities are represented by non overlapping ranges of values then this feature can classify a number of activities by itself. In order to add robustness, we will select several features that can identify different activities. Since the features have many different units they are normalized so the relevance and performance of each feature can be compared [5]. The grouping method for data obtained from different observations of the same activity can be the mean value or an interval of values. The indicators used to define intervals of data are percentiles generating boxplots and whiskers (see Fig. 3. The bounds for the boxplot will be the 25% and the 75% percentile. Whiskers used to bound the intervals are defined as the most extreme data value within 1.5 · IQR, where IQR is the interquartile range of the sample. ‘range θ’ feature 2 1.5

Distance

1 0.5 0 −0.5 −1 −1.5 stand up

transit

sit down

steady

walk

Fig. 3. Intervals for the 5 activities based on ‘range θ’ feature

Once the intervals and boxplots of each activity on each designed feature are known, the distance between intervals is measured. In the case of not overlapped intervals the distance is positive and it means that this feature can be used to classify the pair of activities. The distance between the different activities is computed and stored in a matrix that records the distance from one activity from the others. The number of positive distances, or the number of distances that are greater than a certain threshold in each row is the number of classes that are different from the other classes in that particular feature. We tested 3 indicators to select the best discriminant set of features: 1. The number of classes that a single feature classifies, i.e. the number of positive values of the distance matrix. Since 5 classes are considered, the

1142

J. Parera et al.

5 × 5 distance matrix maximum value is 101 . For our collected data set the maximum value obtained for a feature was 9, so not a single feature exist that can classify all the 5 classes. 2. The sum of distances between classes: this is a value that represents how a feature is different for each class. Since the features are normalized and considering negative values for overlapping activities, values from −20 to +20 can be expected. 3. The sum of positive distances between classes: this is a value that represents how a feature is different in each class when intervals do not overlap. We can expect values from 0 to +20.

4

Experimentation and Results

The exercise performed by the test subjects to collect data was: to stay steady in vertical position, then walk about 4 meters, sit down on a chair, stay sitted for a few seconds, next stand up, walk, and finally sitting down again and staying steady again. Each test subject repeats the exercise 3 times. At the figure 2 there is the raw acceleration signal(converted to G unit) of a single ecercise, thre is also the labeled action as a solid black line. Experiment were video recorded to enable the labeling of activities. The test group consists on 6 healthy subjects with no mobility limitations with mean age 38.17 and standard deviation 12.6. Using the three previously defined indicators, features were ordered but the length of the feature vector was left variable in order to check the accuracy of the method with different lengths of the feature vector. Five one-versus-rest SVM were trained for the multiclassification problem with Gaussian kernel and inputs were assigned to the class with a higher number of votes. When no positive votes exist, then no label is assigned. A 10-fold cross-validation method with 3 repetitions was performed for each length of the feature vector, ranging from 1 to 30 features, and accuracy was defined as the best value between the 3 defined indicators for each input vector’s length. The Fig.4 shows the best performance result (the mean of the 3 repetitions of the 10-fold test) using any of the 3 methods of feature selection versus the feature vector length. In the table 1 there is the result of the classification in the form of a confusion matrix, where the rows represent the activity data and the columns represent wich percentage the current activity has been labeled. The sum of the elements of a row doesn’t have to sum 100% because the resulting vote of the 5 SVM can be negative so no activity is identified. A 10% of stand up actions are labeled as transition but this wrong labeling is not a hard mistake because transition is a state that happens just before or after stand up or sit down actions, hence this label still provides information whether we know which was the action before a false transition is labeled. 1

th distance matrix is simetric and diagonal of zeros so it can be cosidered only the lower triangle, hence the max distance value is 10.

User Daily Activity Classification from Accelerometry

1143

95 svm max

Accuracy (%)

90

85

80

75

70 0

5

10 15 20 Length best feature vector

25

30

Fig. 4. Accuracy results versus number of features used Table 1. Confusion Matrix (%) Label stand up sit down steady walk trancision stand up 75.56 14.44 0 0 10 sit down 6.06 89.09 0 0 4.85 steady 0 0 93.33 6.67 0 walk 0 0 2.22 95.56 2.22 trancision 1.39 1.39 0 3.06 94.17

The best accuracy, 91.06%, is achieved when the length of the feature vector is 7 and the selected features are: ‘STD AX’,’‘min Ax’, ‘STD AV’, ‘STD AF’, ‘min AF’,‘STD θ’ and ‘range θ’. Calculating standard deviation features (STD) requires a relative hard processing time effort. Lower, but not bad, accuracy results can be achieved using features easier to be calculated. A 90.03% accuracy is obtained with a vector having length 13.

5

Conclusions and Further Work

A two-phases procedure implementing (i) features extraction from the raw signal of a triaxial accelerometer sensor situated in the user chest based on a distances matrix and (ii) SVM classification has been designed for a real time monitoring problem. The designed algorithm showed an overall accuracy of 91% when recogninzing five different activities during experimentation performed in daily conditions. Further research includes verifying the performance of the classifier on very long time experimentations and increase the number of daily activities to be identified by the system.

1144

J. Parera et al.

References 1. Bidargaddi, N., Sarela, A., Klingbeil, L., Karunanithi, M.: Detecting walking activity in cardiac rehabilitation by using accelerometer, December 2007, pp. 555–560 (2007) 2. Giansanti, D.: Does centripetal acceleration affect trunk flexion monitoring by means of accelerometers? Physiological Measurement 27(10), 999–1008 (2006) 3. Bouten, C.V.C., Koekkoek, K.T.M., Verduin, M., Kodde, R., Janssen, J.D.: A triaxial accelerometer and portable data processing unit for the assessment of daily physical activity. IEEE Transactions on Biomedical Engineering 44(3), 136–147 (1997) 4. Begg, R.K., Palaniswami, M., Owen, B.: Support vector machines for automated gait classification. IEEE Transactions on Biomedical Engineering 52(5), 828–838 (2005) 5. Isabelle, G., Andr´e, E.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)