Emotion Recognition System Based on Physiological

2017 3rd International Conference on Frontiers of Signal Processing

Emotion Recognition System Based on Physiological Signals with Raspberry Pi III Implementation

Mimoun Ben Henia Wiem

Zied Lachiri

Université de Tunis El Manar, Ecole Nationale d'Ingénieurs de Tunis, LR-11-ES17, Signal, Images et Technologies de l’Information (LR-SITI-ENIT) BP. 37 Belvédère, 1002, Tunis, Tunisie e-mail: [email protected]

Université de Tunis El Manar, Ecole Nationale d'Ingénieurs de Tunis, LR-11-ES17, Signal, Images et Technologies de l’Information (LR-SITI-ENIT) BP. 37 Belvédère, 1002, Tunis, Tunisie e-mail: [email protected] or Electromyogram, Skin Temperature, Blood Volume Pulseand Respiratory Volume[14], [15], [2].To perform the emotion recognition system and enrich the Human-Computer Interactivity, it is possible to merge physiological signals with other modalities[16], [17]. In priorstudies, the emotion can be presented using different models. Indeed, it can be defined using the Eckman’s model who has presented out six discrete basic emotions: (1) happiness, (2) sadness, (3) surprise, (4) fear, (5) anger and (6) disgust [18]. On the other hand, emotion can be modeled in a continuous space with two or more dimensions which arevalence, arousal[17], dominance[19] and liking[20]. Other works merged the two mentioned models to represent out the affective states in arousal valence space using emotional keywords[21], [19]. In this paper, we focused on the emotion recognition applications that were implemented on embedded systems. It is difficult to compare these studies for several reasons. Indeed, the modality to recognize the human emotional states was different. Moreover, the way to induce or even to define the emotion is not similar. We mention also the used hardware support isdifferent. For example in the study well developed in[22],the authorsexplored the FPGA (FieldProgrammable Gate Array)to implement their emotion recognition system which is based on the speech as modality. However, the contribution described in [23] used the Raspberry Pi II as hardware support and the authors referred to facial expressionsto assess the emotional states. On the other hand, these previous studies are similar in the way to define the emotion. In fact, both of them have described the feeling using affective keywords as proposed by Ekman’s Model [18].Otherwise,Cheng et al.[24]who implemented their system in PC with GPU asan accelerator, defined the emotion using the arousal-valence evaluation. This studyaims to recognize and classify the human emotional assessments into two specified classes in arousalvalence area using the self-reported rating values (from 1 to 9 as discrete scales). We used the peripheral physiological signals collected in the MAHNOB-HCI database. Then, a judiciously process was applied as well as preprocessing data, features extraction and the classification stages.In the last step, we used the support vector machine and we

Abstract—Human machine interaction fieldhas potentialapplications in different domainssuch as medicine therapies for vulnerable persons. Thus, allowing the machine to identify and understand emotional states is one of the primordial stagesfor affective interactivity with Humans.Recent studies have proved that physiological signals contribute to recognize the emotion. In this paper, we aim to classify the affective states into two defined classes in arousalvalence modelusing peripheral physiological signals.For this aim, we explored the recent multimodal MAHNOB-HCI database that containsthe bodily responses of 24 participants to 20 affective videos. After preprocessing the data and extracting features, we classified the emotion using the Support Vector Machine (SVM). The classification stage was implemented on Raspberry Pi III model B using Python platform. The obtained results are encouraging compared to recent related works. Keywords-emotion recognition; peripheral physiological signals; arousal valence model; raspberry Pi III; SVM

I.

INTRODUCTION

According to the development of artificial intelligence and affective computing fields, endowing the machine to sense the human emotional states became an innovative researcher’s debates[1].Thanks to emotion recognition systems, many applications were enhanced. We citethose investigatedin medicinedomain especially for persons who are unable to explicitly express their emotions like aged person [2]or children with autism [3], [4], [5]. The most popular modality to recognize the affective statements is the facial expressions [6], [7], [8]. Moreover, the emotion can be noticeable from speech [9], [10] and motions[11]. However these modalities cannot usually identify the real emotion because it is easy to secret a facial expression or fake a tone of voice. Also, they are not suitable for vulnerable people with serious psychiatric disability. Picard [12] has proved that human physiological signals proceed these problems[13].In fact, he hasdemonstrated the effectiveness of this modality because these signals originate from Autonomic Nervous Systems (ANS) and they cannot be falsified or hidden [2]. Among these physiologicalsignals, we citeElectroencephalograms, Electrocardiogram, Heart Rate Variability, Galvanic Skin Response, Muscle Activity

978-1-5386-1038-1/17/$31.00 ©2017 IEEE

20

implemented it on the Raspberry Pi III model B using the Python platform. This paper is organized as follows: Section II presents the proposed approach including MAHNOB-HCI database description,overview system and the Raspberry Pi structure. Experimental setupis reported in section III.SectionIVdiscusses the obtained results. Conclusion and future work are given in section V.

Pre-processing

Feature Extraction and Normalization

PROPOSED METHOD

A. Multimodal MAHNOB-HCIDatabase Recent studies have proved the relevance of physiological signals in emotion recognition problem. Thus, several databases have been established to test the pertinence of this modality such asMIT [25], HUMAINE [26], DEAP [20] and MAHNOB-HCI [26].Several differences between these databases can be reported: on the video type to induce the emotion, the number ofparticipants, the number of videos per participant,the recorded modalities and the collected physiological signals. In this work, we used the recent multimodal MAHNOBHCI database. It has the bodily responses of 24 participants after inducing their emotional states using 20 affective movies. The collected peripheral physiological signals were the Electrocardiogram (ECG), Galvanic Skin Response (GSR), Skin Temperature (Temp) and Respiration Amplitude (RESP). We are based on these signals because we aim to use wearable and non obtrusive sensors for future work. Consequently, EEG and eye gaze signals were excluded. We chose this database for several reasons: It had five modalities which werewell synchronized. In addition, a comparative study between DEAP and MAHNOB done by Godin e al. [27] proved that the obtained results after using the recorded signals in MAHNOB database were better than those obtained after using DEAP. Their explanation was that the videos chosen in MAHNOB induce the emotion stronger than those used in DEAP. Moreover, the authors demonstrated that the heart rate variability power calculated from the ECG (not available in DEAP), is a very relevant feature in emotion recognition task and it is more accurate than the HRV calculated from the PPG signal, which is recorded in DEAP database.

SVM Classification

Accuracy Rate

Raspberry Pi Implementation

II.

MAHNOB Data

Figure1. Block Diagram of our Approach

C. Raspberry Pi III Implementation In the proposed approach, we used the recent Raspberry Pi III model B released earlier last year to enrich the interactivity between Human and Computer.It is a small single board computer that contains a system on chip Broadcom BCM2837. Its instruction set is an ARMv8-A, running at 1.2 GHz with 1 GB SDRAM. The Raspberry Pi can be easily mounted on robots because it is small card, light weighted that needs less power for it. Furthermore, it allows machines to identify and recognize the emotional states for affective interaction with Human. Moreover, thanks to its different ports and its Bluetooth and Wi-Fi board, it is easy to interact with vulnerable people and children with autism having physiological wearable sensors. Table I summarizes the important technical specifications of this card. The steps for Raspberry Pi III offline implementation of an emotion assessing system using peripheral physiological signals is explained as follows: Step 1:Raspberry Pi software configuration for first booting into Raspbian2 Operating System (OS). Step2: For remotely access from another device in the same network, we made the raspberry Pi as a Host using SSH server (Secure Shell). Afterwards, we set up the Client computer or mobile device using SSH which is built for different operating systems. In this work, we used theclient called PuTTY3 for Windows and Android OSs. Step 3: With SSH, we can only access to the command line. For full remote desktop, we used the Virtual Network Computing (VNC Viewer 4 ) developed for different operating systems. Step 4:After the Raspberry Pi connection setup, we implemented the classification stage for emotion recognition

B. System Process In emotion classification task, three main steps should be carefully done to obtain promising results. The first is the preprocessing data and in this stage aims to eliminatethe baseline wandering and noise to smooth the signals. The second is the features extraction. The third is the classification step and in the proposed approach, we used the Support Vector Machine as a classifier. The later stage was implemented on the Raspberry Pi III model B 1 . Fig.1 presents the block diagram of the proposed approach and all the steps will be detailed in the foregoing sections.

2

https://www.raspbian.org/ http://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html 4 https://www.realvnc.com/download/viewer/

1

3

https://www.element14.com/community/docs/DOC-80899/l/raspberry-pi3-model-b-technical-specifications

21

system using LibSVM 5 under Python platform [28]. The obtained results are reported and discussed in section IV. TABLE I.

with = {1, … ,169}, 169 is the number of features. , : The original extractedfeature. , , , , are respectively the mean and standard , deviation over the videos for the participantnumber . After extracting and normalizing the features from each physiological signal as well as the electrocardiogram, respirationamplitude, skin temperature and galvanic skin response, an early level feature fusion was applied before the training stage [30] to compare the proposed approach to related works.

TECHNICAL SPECIFICATIONS OF RASPBERRY PI III MODEL B

Introduction Date SoC CPU Instruction set GPU RAM Storage Ethernet Wireless Video Output Audio Output GPIO

III.

Raspberry Pi III Model B 29/2/2016 BCM2837 Quad Cortex A35 @ 1.2 GHz ARMv8-A 400MHz VideoCore IV 1GB SDRAM Micro-SD 16GB 10/100 802.11n/ Bluetooth 4.0 HDMI / Composite HDMI / Headphone 40

B. SVM Classification Different machine learning algorithms were successfully applied to classify the human emotional states given a bunch of physiological features. We cite the Artificial Neural Network (ANN) ([31], [32]), k-Nearest Neighbors (kNN)([33], [34]), Bayesian Network[35] and Regression Tree (RT)[36]. In this approach, we employed the support vector machine which is the most popular and pertinent classifier in this issue [37]. Indeed, a comparative study described in [35], proved that the SVM gave the best accuracy rates rather than other machine learning techniques such as k-NN, regression tree and Bayesian network. The SVM is mainly a pair-wise binary classifier thatis based on hyper-plans to distinguishtwo classes by maximizing the margin. For Ndimensional inputs xi (i=1,…,N) having two classes with label Li= 1 for Class “1” et label Li=-1 for Class “2”, the hyper-plan is defined as follows:

EXPERIEMENTAL STUDY

The typical emotion recognitionsystem can be divided into threemainsteps. The first was the preprocessing stage. Next, we extracted selected features from signals. After normalizing them, we classified the data corresponding to their labels using SVM. A. Preprocessing and Feature Extraction Prior to feature extraction step, we pre-processed the data to smooth the signals by removing the noise and baseline wandering. In the manual of the MAHNOB-HCI dataset before the start and after the end of each trial (corresponding to the recorded signal after one affective stimuli video), there is also the signal of 30 seconds after watching a neutral video. The baseline of the galvanic skin response (GSR) was calculated and subtracted from the original signal to consider only relative amplitudes and we note that w considered the neutral 30 seconds as baseline. Afterwards, we removed the two 30 seconds from all the signals to have the correct information. The next step was to apply selected Butterworth filter for the peripheral physiological signals[29]. The cutoff frequencies were 0.3 Hz, 0.7 Hz and 1 Hz, respectively for the galvanic response, electrocardiogram and respiratory amplitude signals. We calculated the heart rate variabilityfrom the electrocardiogram signal [29] and the breathing rate from the respiration amplitude.Then, statistical features were computed for all the signals. We cite, mean, maximum, median and standard deviation from the first and second deviation. Overall, 169 features were extracted from peripheral physiological signals. To reduce the inter-individual difference and variability, we normalized the 169 calculated features over the videos for each participant as follows: (1) , , − , , , =

( )=

with: is an Ndimensional vector is a scalar. The SVM hyper-plan for lineal separable data is correspondingto ( ) = 0. In the other case, the input data are projected into a high dimensional feature space usingφ function: ( )=

( )+

= 0 (3)

The decision function is presented as follows: ( )=

(

( ) + )(4)

We applied the LibSVM library under the python platform [28] on the Raspberry Pi III. The best parameters for each kernel were selected through a k-fold cross validation technique. C. Emotion Classes Weclassified the emotion into two defined classes in arousal and valence dimensions which are the most commonlyused in related works.We consider two classes which are “High” and “Low” in arousal, “Negative” and “Positive” in valence. In this work, unlike the repartition of the labels done in the previous work well developed in[38],we normalized themfor each participant to minimize the inter-individual variability using the following equation:

,

5

+ (2)

https://www.csie.ntu.edu.tw/~cjlin/libsvm/

22

, ,

=

not recorded in DEAP’s dataset, which is justified in the comparative study described in [27]. Moreover, we first, correctly pre-processed the signals to have the significant information. Next, we selected features which are more relevant than those chosen in the previously mentioned studies [20], [39]. As early mentioned, this contribution is to our knowledge the first to implement emotion recognition system on the Raspberry Pi using physiological signals and for this reason; we compared our work to those who didn’t use any hardware supports to classify the emotion.

(5)

−

with: is the label for Arousal or Valence , : The originallabel given in MAHNOB database which are discrete scales from 1 to 9. , are respectively the mean and standard deviationof the label over videos for the participant number . The After normalized the labels, the two classes are presented in Table II. TABLE II.

V.

In this paper, an improved emotion recognition system using peripheral physiological signals was presented.For evaluation, we used the recent multimodal MAHNOB-HCI database, which is freely available to the research community. After the pre-processing stage, followed by feature extraction, we implemented the classification step on the Raspberry Pi III model B (ARMv8-A, 1 GB SDRAM). This hardware support can be easily mounted on a robot to enrich the emotional interactivity with Human. The affective human states were defined into two classes in arousal valence using the self-reported rating values. After testing several SVM’s kernels, the best results were achieved using the RBF kernel. We obtained 68.5% in arousal and 68.75%in valence dimension. As future work, different additional mechanisms can be implemented to improve the classification accuracy. We cite the feature selection technique to eliminate the redundant information and select the most relevantfeatures. In addition, we aim to implement the whole emotion recognition process on the Raspberry Pi.

TWO CLASSES IN AROUSAL-VALENCE MODEL Categorization

Arousal

Valence

High Low

Negative Positive

IV.

Rating Values “r” ,

0≤

≤0 ,

RESULTS AND ANALYSIS

This section presents and discussestheobtained resultsafter classifying the emotional states into two defined classes in arousal valence spaceusing the support vector machine.Toperform the SVM classification accuracy, we testedits several kernels which are: linear, polynomial, Sigmoid and Gaussian. As illustrated inTable III, we achieved 68.5% in arousal and 68.75%in valence dimension using the RBF algorithm.These results are slightly improved to those reported in [38]. For a comparative study, Table IV resumes the obtained results and two recent related works and it proves that the achievedaccuracies are promisingand encouraging.In fact,Koelstra et .al [20] obtained 62.7% in valence 57% in arousal and Torres Valencia et al.[39]achieved55% ± 3.9 and 57.50% ± 3.9 in arousal and valence, respectively. Both of these previous related studies used the DEAP database. TABLE III.

ACKNOWLEDGMENT The authors of this work would like to thank the MAHNOB-HCI’s team for providing this database to develop this researchwww.mahnob-db.eu/hci-tagging.

CLASSIFICATION RATES USING DIFFERENT SVM’S KERNELS

REFERENCES

Accuracy rates (%) Linear Polynomial Sigmoid Gaussian TABLE IV. Arousal Valence

Arousal

Valence

61.25% 65% 58.75% 67.5%

62.5% 58.75% 60% 68.75%

[1]

[2] [3]

EVALUATION OF OUR OBTAINED ACCURACIES Our Work 67.5% 68.75%

[20] 57% 62.7%

CONCLUSION

[39] 55.00% ± 3.9 57.50% ± 3.9

[4]

[5]

The obtained results prove the potential of the recorded data in MAHNOB-HCI database and their chosen videos were more powerful to evoke the emotion than videos clips used in DEAP. In addition, we can justify the improved obtained accuracies by the fact that the electrocardiogram is

[6]

23

P. Santi-Jones and D. Gu, “STATIC FACE DETECTION AND EMOTION RECOGNITION WITH FPGA SUPPORT,” presented at the 3nd International Conference on Informatics in Control, Automation and Robotics, 2006, pp. 390–397. S. Basu et al., “Emotion recognition based on physiological signals using valence-arousal model,” 2015, pp. 50–55. M. A. Miskam, S. Shamsuddin, M. R. A. Samat, H. Yussof, H. A. Ainudin, and A. R. Omar, “Humanoid robot NAO as a teaching tool of emotion recognition for children with autism using the Android app,” 2014, pp. 1–5. K. G. Smitha and A. P. Vinod, “Low Complexity FPGA Implementation of Emotion Detection for Autistic Children,” in 2013 7th International Symposium on Medical Information and Communication Technology (ISMICT), 2013, pp. 103–107. K. G. Smitha and A. P. Vinod, “Hardware efficient FPGA implementation of emotion recognizer for autistic children,” in 2013 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT), 2013, pp. 1–4. Suchitra, Suja P., and S. Tripathi, “Real-time emotion recognition from facial images using Raspberry Pi II,” 2016, pp. 666–670.

[7]

[8] [9] [10] [11]

[12]

[13]

[14]

[15]

[16] [17]

[18]

[19]

[20]

[21] [22]

[23] [24]

N. Chanthaphan, K. Uchimura, T. Satonaka, and T. Makioka, “Facial Emotion Recognition Based on Facial Motion Stream Generated by Kinect,” 2015, pp. 117–124. C. Turan, K.-M. Lam, and X. He, “Facial expression recognition with emotion-based feature fusion,” 2015, pp. 1–6. C. Busso et al., “Analysis of emotion recognition using facial expressions, speech and multimodal information,” 2004, p. 205. S. G. Koolagudi and K. S. Rao, “Emotion recognition from speech: a review,” Int. J. Speech Technol., vol. 15, no. 2, pp. 99–117, Jun. 2012. A. Kapur, A. Kapur, N. Virji-Babul, G. Tzanetakis, and P. F. Driessen, “Gesture-Based Affective Computing on Motion Capture Data,” in Affective Computing and Intelligent Interaction, vol. 3784, J. Tao, T. Tan, and R. W. Picard, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2005, pp. 1–7. R. W. Picard, E. Vyzas, and J. Healey, “Toward machine emotional intelligence: analysis of affective physiological state,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 23, no. 10, pp. 1175–1191, Oct. 2001. Jonghwa Kim and E. Andre, “Emotion recognition based on physiological changes in music listening,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 12, pp. 2067–2083, Dec. 2008. Jonghwa Kim and E. Andre, “Emotion recognition based on physiological changes in music listening,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 12, pp. 2067–2083, Dec. 2008. Y. Velchev, S. Radeva, S. Sokolov, and D. Radev, “Automated estimation of human emotion from EEG using statistical features and SVM,” 2016, pp. 40–42. S. Thushara and S. Veni, “A multimodal emotion recognition system from video,” 2016, pp. 1–5. C. A. Torres, Á. A. Orozco, and M. A. Álvarez, “Feature selection for multimodal emotion recognition in the arousal-valence space,” in 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2013, pp. 4330–4333. J. Fleureau, P. Guillotel, and Q. Huynh-Thu, “Physiological-Based Affect Event Detector for Entertainment Video Applications,” IEEE Trans. Affect. Comput., vol. 3, no. 3, pp. 379–385, Jul. 2012. M. Soleymani, J. Lichtenauer, T. Pun, and M. Pantic, “A Multimodal Database for Affect Recognition and Implicit Tagging,” IEEE Trans. Affect. Comput., vol. 3, no. 1, pp. 42–55, Jan. 2012. S. Koelstra et al., “DEAP: A Database for Emotion Analysis ;Using Physiological Signals,” IEEE Trans. Affect. Comput., vol. 3, no. 1, pp. 18–31, Jan. 2012. H. Xu and K. N. Plataniotis, “Subject independent affective states classification using EEG signals,” 2015, pp. 1312–1316. M. Shah, L. Miao, C. Chakrabarti, and A. Spanias, “A speech emotion recognition framework based on latent Dirichlet allocation: Algorithm and FPGA implementation,” in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013, pp. 2553–2557. Suchitra, Suja P., and S. Tripathi, “Real-time emotion recognition from facial images using Raspberry Pi II,” 2016, pp. 666–670. J. Cheng, Y. Deng, H. Meng, and Z. Wang, “A facial expression based continuous emotional state monitoring system with GPU acceleration,” in 2013 10th IEEE International Conference and

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36] [37] [38]

[39]

24

Workshops on Automatic Face and Gesture Recognition (FG), 2013, pp. 1–6. J. A. Healey and R. W. Picard, “Detecting Stress During Real-World Driving Tasks Using Physiological Sensors,” IEEE Trans. Intell. Transp. Syst., vol. 6, no. 2, pp. 156–166, Jun. 2005. E. Douglas-Cowie et al., “The HUMAINE Database: Addressing the Collection and Annotation of Naturalistic and Induced Emotional Data,” in Affective Computing and Intelligent Interaction, vol. 4738, A. C. R. Paiva, R. Prada, and R. W. Picard, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2007, pp. 488–500. C. Godin, F. Prost-Boucle, A. Campagne, S. Charbonnier, S. Bonnet, and A. Vidal, “Selection of the most relevant physiological features for classifying emotion,” in ResearchGate, 2015. C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vector machines,” ACM Trans. Intell. Syst. Technol., vol. 2, no. 3, pp. 1–27, Apr. 2011. J. Wagner, Jonghwa Kim, and E. Andre, “From Physiological Signals to Emotions: Implementing and Comparing Selected Methods for Feature Extraction and Classification,” 2005, pp. 940–943. Z. Guendil, Z. Lachiri, C. Maaoui, and A. Pruski, “Emotion recognition from physiological signals using fusion of wavelet based features,” 2015, pp. 1–6. S. K. Yoo, C. K. Lee, Y. J. Park, N. H. Kim, B. C. Lee, and K. S. Jeong, “Neural Network Based Emotion Estimation Using Heart Rate Variability and Skin Resistance,” in Advances in Natural Computation, vol. 3610, L. Wang, K. Chen, and Y. S. Ong, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2005, pp. 818–824. A. Choi and W. Woo, “Physiological Sensing and Feature Extraction for Emotion Recognition by Exploiting Acupuncture Spots,” in Affective Computing and Intelligent Interaction, vol. 3784, J. Tao, T. Tan, and R. W. Picard, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2005, pp. 590–597. F. Nasoz, K. Alvarez, C. L. Lisetti, and N. Finkelstein, “Emotion recognition from physiological signals using wireless sensors for presence technologies,” Cogn. Technol. Work, vol. 6, no. 1, pp. 4–14, Feb. 2004. J. Wagner, Jonghwa Kim, and E. Andre, “From Physiological Signals to Emotions: Implementing and Comparing Selected Methods for Feature Extraction and Classification,” 2005, pp. 940–943. Changchun Liu, P. Rani, and N. Sarkar, “An empirical study of machine learning techniques for affect recognition in human-robot interaction,” 2005, pp. 2662–2667. C. Liu, P. Rani, and N. Sarkar, “Human-Robot Interaction Using Affective Cues,” 2006, pp. 285–290. C. J. C. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition,” Data Min. Knowl. Discov., vol. 2, no. 2, pp. 121–167. M. B. H. Wiem and Z. Lachiri, “Emotion Classification in Arousal Valence Model using MAHNOB-HCI Database,” Int. J. Adv. Comput. Sci. Appl. IJACSA, vol. 8, no. 3, 2017. C. A. Torres-Valencia, H. F. Garcia-Arias, M. A. A. Lopez, and A. A. Orozco-Gutierrez, “Comparative analysis of physiological signals and electroencephalogram (EEG) for multimodal emotion recognition using generative models,” 2014, pp. 1–5.