Online Sequential Extreme Learning Machine ...

6 downloads 4367 Views 316KB Size Report
health care through mobile phones can be implemented. For example .... Samsung Galaxy S2 smartphone on the waist. This ... ii. Calculate the hidden layer output matrix. iii. Calculate the output weights Y by using equation 3 : Y= ∗. (3) where.
Yazan Al Jeroudi*, M. A. Ali**, Marsad Latief***, Rini Akmeliawati*** *Department of Mechanical Engineering, **Department of Electrical and Computer Engineering ***Department of Mechatronics Engineering International Islamic University Malaysia Jl. Gombak, 53100 Kuala Lumpur, Malaysia, Email: [email protected] Abstract— Human activity recognition (HAR) is the basis for many real world applications concerning health care, sports and gaming industry. Different methodological perspectives have been proposed to perform HAR. One appealing methodology is to take an advantage of data that are collected from inertial sensors which are embedded in the individual’s smartphone. These data contain rich amount of information about daily activities of the user. However, there is no straightforward analytical mapping between a performed activity and its corresponding data. Besides, online training for the classification in these types of applications is a concern. This paper aims at classifying human activities based on the inertial data collected from a user’s smartphone. An Online Sequential Extreme Learning Machine (OSELM) method is implemented to train a single hidden layer feed-forward network (SLFN). Experimental results with an average accuracy of 82.05% are achieved. Keywords—extreme learning machine; human activity recognition; online multi-classification; inertial sensing; pattern recognition;

I.

INTRODUCTION

Human Activity Recognition (HAR) is defined as identifying actions carried out by a human given a set of observations. Actions can be any daily activities such as lying down, standing, walking, walking upstairs or walking downstairs. HAR has become very popular in recent years as it has many real world applications. For example, in health care, for monitoring elderly people or people with disabilities and identifying their activities in order to provide assistive care whenever needed. Moreover, HAR is useful for more effective human-environment interaction. Different events can be triggered automatically when recognizing specific set of actions. For example, lights can be turned on when walking upstairs or downstairs activities are detected and can be turned off automatically when lying down activity is detected. Additionally, statistical analysis can be performed on daily activity data of the individuals or group of people. This analysis might be useful for decision-making purposes such as analyzing health conditions of people based on their daily activities. Consequently, an interaction scheme for preventive health care through mobile phones can be implemented. For example, a system has been deployed for monitoring patients through physiological body-worn and contextual sensors in the environment which can be analyzed by medical personals. In

978-1-4799-7862-5/15/$31.00 © 2015 IEEE

another example, a system has been proposed for monitoring the postures of elderly people in their daily life. Aside from the health sector, HAR is also deployed in various sports related applications. For example, classifying sport activities like cycling, swimming, rowing etc. can be performed through HAR [1]. Moreover, the gaming industry has deployed HAR to develop interactive games, which are near-realistic. In another example, a video gaming system was developed which required players to do various physical activities [2]. Many techniques have been explored for HAR. Classical approaches are based on vision [3]. Although such approaches provide very rich information about HAR, they require specific environment illumination conditions, huge computational resources, and pre-installation of vision equipment in the environment. Unfortunately, this might not be feasible in all cases. Other more appealing approaches for HAR are based on smartphone inertial data. It is well known that smartphone usage is growing significantly in the last several years. Smartphone has self-contained computational resources for processing data and capability of transferring results to a cloud under social networking concepts. Almost every smartphone has a built in inertial sensor. Body inertial data such as acceleration and angular rates embed rich information about activities carried out by the user. Consequently, incorporating a classification algorithm based on smartphone inertial data is an appealing strategy to solve HAR. Therefore, researchers have proposed using smartphone for recognizing human activity continuously on an individual level [4]. Numerous inertial based systems have been developed for the purpose of achieving human activity recognition [5-10]. One of such systems is a method in which readings of acceleration along the three axes are synthesized, and the magnitude of the synthesized acceleration is utilized to obtain 17 statistical features. Principal component Analysis (PCA) is used to remove noise feature and extract the robust features for recognition. After that, the adaptation of the recognition model is achieved using extreme learning machine [5]. In another system, a statistical motion primitive-based framework for human activity recognition is developed. This system’s framework is based on Bag of Feature (BoF), which is used to build the models of activities by using histograms of primitive

model. A total of nine activity classes are recognizable through this system [6]. Another system makes use of a single accelerometer to achieve human activity recognition. Using an accelerometer attached at the lateral side of the lower right leg, the data are collected at a sample rate of 50Hz. Each test subjects is asked to perform simple day-to-day activities. The data collected are represented as colors to differentiate one activity from another [7]. In a different system, [8] uses a mobile phone’s accelerometer to collect data by placing the phone in six different locations on the body. A total of five activities is considered. Thus, the 3D acceleration data are modeled using Autoregressive Models. An augmented feature vector is then formed by augmenting the AR-coefficients with Signal Magnitude Area (SMA). Kernel Discriminant Analysis was then used to extract significant non-linear discriminating features. Finally, Artificial Neural Nets (ANN) was used to perform the activity recognition. Reference [9] employs a compressed sensing method for human activity sensing using a mobile phone’s accelerometer and a remote server. The compressed sensing technique consists of simple matrix operations done at the mobile and reconstruction is performed on the network. Reference [10] proposes a system, which uses a combination of tri-axial accelerometer and electrocardiograph to achieve human activity recognition. This system follows the assumption that the heart rate and acceleration data are related to the activity being performed. The system first calculates the acceleration data and then moves on to acquire the exercise intensity (EI) from the ECG data obtained from the electrocardiograph. This is followed by the calculation of distinction frequency from the acceleration data. Then, a decision tree is employed to process the collected data and output the result. The disadvantage of this approach is the increasing rate of error during activity transition. Although some of the above works have reported results with good accuracy but performance and robustness differ depending on the experimental setup and validation conditions as well. In addition, many of the above systems have not considered real nature of this application. Sequential training is needed in real world applications. These applications by nature have sequential data availability with respect to time. Thus, it is not possible to have a complete training dataset initially. Consequently, data have to be sent in packets to the training algorithm as soon as packets are available. In the current context, online nature indicates the online availability of the inertial data. In other words, while a human subject is performing an activity, inertial data from smartphone sensors comes sequentially. Besides, it is not efficient to train the system prior to the running mode. By using an online approach, whenever the system performance is degrading the online mode is triggered and learning is enabled. In this paper, extreme learning machine (ELM) has been selected to train the ANN for many considerations. It outperforms traditional gradient based learning algorithms in terms of speed, simplicity, general approximation, and avoiding local minima and over fitting problems. Furthermore, ELM is not constrained to differentiable activation functions as in the case of gradient based learning algorithms. More importantly, ELM can be formulated in an online sequential training form. Thus, it has more potential to be implemented in real time applications. In this paper, the OSELM classification

algorithm is proposed and evaluated on a benchmark data for a multi classification of six different activities. The rest of the paper is organized as follows: Section II gives a brief explanation about the considered dataset. Section III introduces our methodology. Section IV discusses classification, results and validation. Finally, Section V presents the conclusion and further recommendation for future work. II. DATA SET In this research, a publicly available benchmark data to allow researchers to test and compare different classification algorithms has been used [4]. This dataset was collected from group of 30 volunteers. Each one has performed previously monitored six activities (i.e. walking on flat ground, walking upstairs, walking downstairs, sitting, standing, and lying down). All the activities were performed while wearing A Samsung Galaxy S2 smartphone on the waist. This smartphone contains tri-axial accelerometer and gyros. The data were sampled at 50 Hz, which is sufficient for capturing knowledge about human body motion. This dataset is combined into two parts. The first part includes pre-processed data or raw data. That is organized based on fixed-width sliding window of 2.56 sec and 50% overlap. Therefore, each row of this part includes 128 values (50 HZ × 2.56 sec). The second part consists of the extracted features in the time and frequency domains from each window. The total number of features is 561. Different types of features were extracted. These features are statistical descriptors such as mean, standard deviations, min, max, and auto-regression coefficients; Fourier-based features, and signal information measures such as entropy and energy. Table I shows the detailed size of training and testing feature data. TABLE I. SIZE OF SUBCLASSES OF TRAINING AND TESTING FEATURE DATA

Feature Vector Activity

Training

Testing

Walking

1226

496

Walking Upstairs

1073

471

986

420

Sitting

1286

491

Standing

1374

532

Walking Downstairs

Laying

1407

537

Total

7352

2947

III. METHODOLOGY ELM in general is an offline supervised batch learning algorithm [11]. The data is combined of N arbitrary distinct samples ("# , %# ) where

"# = ["#) , "*# , … , "#, ]. ∈ 1, %# = [%#) , %*# , … , %#2 ]. ∈ 12 It is possible to model standard Single Hidden Layer Feedforward Network (SLFN) with an activation function 3(") 5 hidden layer neurons as equation 1: and 4 5 8 ∑*9) 7* 3:;* "# + =* > = %*

(1)

training data is used for learning. The output of the first phase is the initial hidden layer matrix. The second phase is an iterative phase, called the sequential Learning Phase. This phase deals with sending the new observation data sequentially to the training algorithm in packets [12] as stated in equation 4. ∑TYZ UVW 8 U

4NO) = {(("# , %# ))}

#9R∑T UVW 8SXO)

(4)

where where 4[O) are the number of observations in the (\ + 1) packet.

? = 1; … 4, ;* = (B*) , B*C , … , B*, )D , ;* , =* are the weights of the ith hidden node, and 7* is the weight connecting the ith hidden node and the output. Another more compact form is equation 2: E7 = F

(2)

where E=G

3(B) ") + =) ) ⋮ 3(B) "8 + =) )

… ⋱ …

3(B85 ") + =85 ) ⋮ J 3(B85 "8 + =85 )

7) 7=K ⋮ L 785 where H is called the hidden layer output matrix of the neural network. It has been proven by [10] that if the activation function is differentiable, then the required number of the hidden layer neurons is lower than the data size. The training algorithm of the ANN described above is introduced in three steps as following: i. ii. iii.

Assign randomly random weights and biases as ;* , and =* . Calculate the hidden layer output matrix. Calculate the output weights 7 by using equation 3 : 7 = E∗ F

(3)

where E ∗ is the Moore-Penrose generalized inverse of hidden layer output matrix. This form of ELM assumes that all the data samples are available for training. But in a realistic application, all samples are not available at once. Data are collected in packets with more packets arriving with time. Therefore, the above mentioned general ELM algorithm has to be modified to accommodate sequentially arriving training data. This form of ELM is called an Online Sequential Extreme Learning Machine Algorithm (OSELM). OSELM is carried out in two phases. The first phase is the initialization of ELM. The steps involved are the same as mentioned in the general description of ELM. The only difference is that initially a small part of the

With every new packet arriving, a new output matrix called the partial hidden layer output matrix, ENO) is calculated. The process is repeated until the last packet has arrived. IV. RESULTS AND DISCUSSION The length of the feature vector is relatively large (561 elements), and more than the actual length of the raw data vector (128 elements). Statistical and signal information features were chosen from the 561 elements set to train OSELM in an incremental manner. The performance was characterized based on increasing the number of the selected features/neurons. The activation function tested is the Radial Basis Function (RBF). Fig. 1 shows an increasing accuracy as the number of neurons in hidden layer of the ANN increases. Apparently, it converges to accuracy between 70% and 80%. A best accuracy of 82% was obtained at 210 neurons/71 features. After 100 neurons in the hidden layer the performance shows saturation regardless of an increase in the number of neurons. In addition to the overall accuracy curve, separated classification decisions of OSELM for comparison with the actual data are plotted in Figures-2a-2f. Classes are walking, walking upstairs, walking downstairs, standing, lying down, and sitting. These figures show the activities only over a range of 500 observations (X axis) for better visualization. Each observation point represents a window as defined earlier in Section II. The blue color indicates the actual activity being conducted by the subject and the red color indicates the estimated activity of the subject. The “points valued 1” on Y axis represent the respective activity being performed while as “the points valued -1 represent the respective activity not being performed. As it can be observed, two types of misclassifications have occurred: positive and negative. Positive misclassification is defined as a case in which the estimation is positive (estimated value of the observation point is +1) while the corresponding true value is negative (actual value of the observation point is -1). Similarly, it is meant by negative misclassification the case in which the estimated observation point gets the value -1 while the corresponding actual observation point has a value +1. For more detailed information about the performance, the confusion matrix of 90 features/100 neurons is generated in Table II.

Figure-2a, actual vs. ANN predicted walking decision Figure-1 the accuracy of ELM trained ANN with respect to increasing the number of neurons

TABLE II. CONFUSION MATRIX OF OSELM CLASSIFICATION OF HAR Walking Walking Walking Upstairs Downstairs Sittiing Standing Laying Walking Walking Upstairs Walking Downstairs Sittiing Standing Laying Accuracy %

Recall %

443

38

15

0

0

0

89.31

121

333

15

2

0

0

70.70

321 1 0 0 0 307 178 4 0 48 481 0 0 3 0 533 91.45 85.00 73.00 99.00

76.43 62.53 90.41 99.26 82.05

53 45 2 0 3 0 0 1 71.20 79.90

Sitting activity was reported negatively because of confusion with standing, which caused a recall of 63% for sitting. Also, confusion between walking and walking upstairs was reported which caused an accuracy of 70%. Meanwhile, an accuracy of 99% is obtained for lying down activity. The overall accuracy of this example is 82%. In addition, training and testing time has been reported for different combination of features and neurons as shown in Table 3. These results were generated by using MATLAB 2014a/Intel i7 2.5 GHz processor. The achieved testing time is less than the window that generated one feature vector, which means that the system can function in real time considering that training is performed periodically while the system is running.

Figure-2b, actual vs. ANN predicted walking upstairs decision

Figure-2c, actual vs. ANN predicted walking downstairs decision

TABLE III. EXAMPLES OF TIME PERFORMANCE ORMANCE VERSUS ACCUR ACCURACY OF DIFFERENT COMBINATIONS NS OF FEATURES AND N NEURONS. CONFUSION MATRIX OF OSELM CLASSIFICATION ION OF HAR

Features/Neurons 50/100 Testing time [sec] Training time [sec] Testing Accuracy %

0.5 4.5938 70.00

90/100 60/250 71/250 0.6094 1.25 0.82 5.0781 12.56 25.59 79.13 74.55 82.05

V. CONCLUSION AND FUTURE WORK

Figure-2d, actual vs. ANN predicted standing decision

Figure -2e, actual vs. ANN predicted lying down decision

In this paper, we proposed the use of OSELM for the goal of human activity recognition based on benchmark data that includes both accelerometers and gyros data co collected from set of volunteers. The proposed technique has a great potential towards online classification of HAR provided time dependency, periodic patterns and optimization of random generated weights are considered. It has provided an overall accuracy of 82%. However, there here are many useful aspects that can be taken into consideration to enhance the ANN performance. Firstly, ELM does not take tim time dependency into consideration while the nature of the activity is very dependent on time sequence. For example, when a person is walking there is very less likelihood that the recognized activity will change to lying down in between. Secondly, there are two categories of human activities: ies: those which involve periodic trends, such as walking, walking upstairs, and walking downstairs, and those that do not have periodic trends, such as sitting, standing and lying down.. Therefore, features that are related to the periodic nature have to bbe separated for reducing the set of classes before feeding them to the ELM. Thirdly, ELM algorithm has infinite numbers of degrees of freedom for approximating a given training dataset. This infinite number of degrees of freedom is caused by the random nature nat of the input-hidden hidden layer weights. Consequently, one promising potential improvement in performance is to assign these weights based on an objective function. Additionally, a deeper mathematical analysis of the impact of the number of neurons and the type ype of activation function on the performance can be further investigated. REFERENCES [1]

[2]

[3] [4]

Figure -2f,, actual vs. ANN predicted sitting decision [5]

A. Avci, S. Bosch, M. Marin-Perianu, Perianu, R. Marin-Perianu, Marin and P. Havinga, "Activity Recognition Using Inertial Sensing for Healthcare, Wellbeing and Sports ports Applications: A Survey," in Architecture of Computing Systems (ARCS), 2010 23rd International Conference on,, 2010, pp. 11-10. N. Alshurafa, W. Xu, J. J. Liu, M.--C. Huang, B. Mortazavi, C. K. Roberts, et al.,, "Designing a robust activity recognition recognitio framework for health and exergaming using wearable sensors," Biomedical and Health Informatics, IEEE Journal of, vol. 18, pp. 1636-1646, 2014. R. Poppe, "A survey on vision-based based human action recognition," Image and Vision Computing, vol. 28, pp. 976-990, 976 6// 2010. D. Anguita, A. Ghio, L. Oneto, X. Parra, and J. L. Reyes-Ortiz, Reyes "Human activity recognition on smartphones using a multiclass hardware-friendly friendly support vector machine," in Ambient assisted living and home care,, ed: Springer, 2012, pp. 2216-223. Y. Chen, Z. Zhao, S. Wang, and Z. Chen, "Extreme learning machine-based based device displacement free activity recognition model," Soft Computing, vol. 16, pp. 1617-1625, 1617 2012/09/01 2012.

[6]

[7]

[8]

[9]

[10]

[11]

[12]

M. Zhang and A. A. Sawchuk, "Motion primitive-based human activity recognition using a bag-of-features approach," in Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium, 2012, pp. 631-640. H. S. AlZubi, S. Gerrard-Longworth, W. Al-Nuaimy, Y. Goulermas, and S. Preece, "Human activity classification using a single accelerometer," in Computational Intelligence (UKCI), 2014 14th UK Workshop on, 2014, pp. 1-6. A. M. Khan, Y. K. Lee, S. Y. Lee, and T. S. Kim, "Human Activity Recognition via an Accelerometer-Enabled-Smartphone Using Kernel Discriminant Analysis," in Future Information Technology (FutureTech), 2010 5th International Conference on, 2010, pp. 1-6. D. Akimura, Y. Kawahara, and T. Asami, "Compressed sensing method for human activity sensing using mobile phone accelerometers," in Networked Sensing Systems (INSS), 2012 Ninth International Conference on, 2012, pp. 1-4. T. Fujimoto, H. Nakajima, N. Tsuchiya, H. Marukawa, K. Kuramoto, S. Kobashi, et al., "Wearable human activity recognition by electrocardiograph and accelerometer," in MultipleValued Logic (ISMVL), 2013 IEEE 43rd International Symposium on, 2013, pp. 12-17. G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, "Extreme learning machine: theory and applications," Neurocomputing, vol. 70, pp. 489-501, 2006. N.-Y. Liang, G.-B. Huang, P. Saratchandran, and N. Sundararajan, "A fast and accurate online sequential learning algorithm for feedforward networks," Neural Networks, IEEE Transactions on, vol. 17, pp. 1411-1423, 2006.