Emotion Prediction from Physiological Signals: A Comparison Study ...

6 downloads 803 Views 4MB Size Report
quantified by the F1 measure (visual: 76.2% vs. auditory: 76.1%) among six emotion categories. (excited, happy ..... a stimulus for 6s, the participants were asked to report their emotions ...... choice strategy plus another categories noted as others. 8. ...... methodologies, such as airport terminals, the iPhone ecosystem,.
Interacting with Computers Advance Access published August 1, 2013 © The Author 2013. Published by Oxford University Press on behalf of The British Computer Society. All rights reserved. For Permissions, please email: [email protected] doi:10.1093/iwc/iwt039

Emotion Prediction from Physiological Signals: A Comparison Study Between Visual and Auditory Elicitors Feng Zhou1,2 , Xingda Qu1,∗ , Jianxin (Roger) Jiao2 and Martin G. Helander1 1 Center

Unlike visual stimuli, little attention has been paid to auditory stimuli in terms of emotion prediction with physiological signals. This paper aimed to investigate whether auditory stimuli can be used as an effective elicitor as visual stimuli for emotion prediction using physiological channels. For this purpose, a well-controlled experiment was designed, in which standardized visual and auditory stimuli were systematically selected and presented to participants to induce various emotions spontaneously in a laboratory setting. Numerous physiological signals, including facial electromyogram, electroencephalography, skin conductivity and respiration data, were recorded when participants were exposed to the stimulus presentation. Two data mining methods, namely decision rules and k-nearest neighbor based on the rough set technique, were applied to construct emotion prediction models based on the features extracted from the physiological data. Experimental results demonstrated that auditory stimuli were as effective as visual stimuli in eliciting emotions in terms of systematic physiological reactivity. This was evidenced by the best prediction accuracy quantified by the F1 measure (visual: 76.2% vs. auditory: 76.1%) among six emotion categories (excited, happy, neutral, sad, fearful and disgusted). Furthermore, we also constructed culturespecific (Chinese vs. Indian) prediction models. The results showed that model prediction accuracy was not significantly different between culture-specific models. Finally, the implications of affective auditory stimuli in human–computer interaction, limitations of the study and suggestions for further research are discussed. RESEARCH HIGHLIGHTS • Emotions can be predicted accurately based on physiological data using computational models. • Prediction accuracy was on the same level for auditory and visual stimuli. • Auditory stimuli can thus have potential in affective human–computer interaction applications. Keywords: human computer interaction; interaction paradigms; empirical studies in HCI Editorial Board Member: Timothy Bickmore Received 5 November 2012; Revised 3 May 2013; Accepted 29 June 2013

1.

INTRODUCTION

Predicting a user’s emotion in real-time provides opportunities for the computer system to address the idiosyncratic needs of the user. Many studies have been published on automatic emotion prediction using behavioral measures, such as facial

expressions, vocal expressions and gestures or postures (Calvo and D’Mello, 2010; Zeng et al., 2009). Although the usefulness of emotion prediction based on behavioral measures is well agreed, its accuracy tends to be overestimated in a controlled laboratory environment since the acted or elicited

Interacting with Computers, 2013

Downloaded from http://iwc.oxfordjournals.org/ by guest on December 22, 2013

for Human Factors and Ergonomics, School of Mechanical and Aerospace Engineering, Nanyang Technological University, Blk N3, North Spine, Nanyang Avenue, Singapore 639798 2 The George W. Woodruff School of Mechanical Engineering, Georgia Institute of Technology, Atlanta, GA, USA ∗Corresponding author: [email protected]

2

Feng Zhou et al. noise, laughs, sighs and grunts (Carpentier and Potter, 2007; Jäncke et al., 1996; Sauter et al., 2013). Visual stimuli, however, were applied widely in previous studies for emotion elicitation and prediction (Frantzidis et al., 2010; Gu et al., 2010; Haag et al., 2004; Heraz and Frasson, 2007; Lang et al., 1993; Petrantonakis and Hadjileontiadis, 2010). Although film clips were used as stimuli to elicit emotions in emotion prediction studies (Bailenson et al., 2008; Lisetti and Nasoz, 2004), it is difficult to isolate the effects arising from affective features of auditory stimuli due to the existence of visual input (Bradley and Lang, 2000). The physiological reactivity to affective stimuli might depend on the sensory modality of the processed information. The primary difference is that auditory stimuli are essentially dynamic and presented in a chronological way, while visual information (by viewing still pictures) is static and scanning and focus may change over time (Bradley and Lang, 2000). To gain a complete understanding of affective responses to different sense modalities, it is necessary to examine physiological responses to auditory stimuli compared with visual stimuli. Toward this end, the purpose of this research was to examine whether emotion prediction based on physiological changes due to auditory affective stimuli can be as accurate as that due to visual affective stimuli. If this purpose can be achieved, we would be able to determine whether auditory affective stimuli can elicit emotions as effectively as their visual counterparts. Besides, as humans are socially living species, to a great extent, cultural backgrounds affect the way in which people respond to emotions (Zhou et al., 2011a,b). Therefore, we also considered the cultural factor (Chinese vs. Indian) in the comparison results. The emotion prediction models were built using decision rules (DRs) and k-nearest neighbor (k-NN) based on the rough set technique, since rough approximations are good at dealing with vague data (Pawlak, 1991; Zhou et al., 2011a). Six emotion categories were labeled by the participants for the prediction purpose. Multiple physiological signals were recorded to measure participants’ reactions to affective stimuli, including facial electromyography (EMG), respiration rate, electroencephalography (EEG) and skin conductance response (SCR).

2.

RELATED WORK

The emotion elicitation techniques and physiological differentiation of emotions are reviewed in this section. 2.1.

Emotion elicitation

When studying emotion prediction, reference emotions must be provided. Emotion elicitation is the procedure to induce reference emotions in emotion prediction. Therefore, the effectiveness of emotion elicitation is essential for the success of an emotion prediction model. The early work of

Interacting with Computers, 2013

Downloaded from http://iwc.oxfordjournals.org/ by guest on December 22, 2013

emotions are often considered artificial (Kleinsmith et al., 2011; Schuller et al., 2010). A limitation for using facial expressions is that they are subject to social protocols. Gestures or postures have been reported to be less influenced by social protocols (Kleinsmith and Bianchi-Berthouze, 2012; Kleinsmith et al., 2011) and are often combined with other emotional cues for better prediction performance (Kapoor et al., 2007), especially when different modalities of emotional cues are synchronized both semantically and temporally (Gunes and Piccardi, 2009). This is also evidenced when multiple physiological measures are used to predict emotions (Kim and André, 2008; Liu et al., 2008; Mandryk and Atkins, 2007; Picard et al., 2001), despite the fact that the mapping relationship between specific emotions and physiological patterns is not entirely one-to-one but rather many-to-one, one-to-many or many-to-many (Cacioppo and Tassinary, 1990). Emotion prediction using physiological signals also has attractive merits. Physiological responses are generally involuntary and cannot be easily triggered by any conscious or intentional control (Kim and André, 2008). In this sense, they are least vulnerable to human social protocols and can be used to facilitate capturing spontaneous and subconscious facets of user states continuously. As a result, physiological data can be transformed into emotion features without any overt response of the user, and help predict user emotions in real time. This attractive real-time feature of emotion prediction can aid in developing applications of affective human–computer interaction (HCI). For example, real-time emotion prediction can help alert sleepy drivers and pilots with low vigilance, or help adjust content and forms in the online learning context based on the predicted user’s emotional states. Furthermore, previous studies have evidenced that emotion prediction using physiological measures has comparable accuracies to those using audio and/or visual measures extracted from facial and vocal expressions (Kim and André, 2008; Koelstra and Patras, 2013; Lisetti and Nasoz, 2004; Liu et al., 2008; Picard et al., 2001; Soleymani et al., 2009, 2012). Auditory stimuli (e.g. growls, cries and shouts) were as effective as visual stimuli in information processing of environmental cues with direct survival significance and complex affective information (Verona et al., 2004). One notable fact is that investigations of emotion prediction based on the reactions to affective auditory stimuli are relatively few. In the area of HCI and affective computing, many work employed emotive music to emotion elicitation and prediction using computational models (Baumgartner et al., 2006; Kim and André, 2008). Recently, auditory stimuli have been extended to other forms, such as baby crying, snoring, laughing and screaming for emotion elicitation and prediction (Uzun et al., 2012; Yisi and Sourina, 2012). In the area of psychology, most researchers employed statistical methods (e.g. analysis of variance) to compare the physiological responses with different affective auditory stimuli, including music, environmental

Emotion Prediction from Physiological Signals

2008, 2010a,b) and it is recommended that emotions be elicited spontaneously. 2.2.

Physiological differentiation of emotions

In psychophysiology, statistical methods, such as regression analysis and analysis of variance, are primarily used for data analysis. Specifically, for emotions modeled in the valence–arousal space, correlation or covariation between selfconstructed ratings on valence and arousal, and physiological changes were often studied (Bradley and Lang, 2000; Lang et al., 1993; Smith, 1989; Tajadura-Jiménez et al., 2010a,b). For categorical emotions, physiological changes with regard to different emotions were tested statistically to find out whether there were any significant differences (Baumgartner et al., 2006; Ekman et al., 1983; Levenson et al., 1990). In affective computing, advanced computational models were constructed based on various physiological features extracted from different physiological signals to predict emotions (see Table 1). Picard (1997) is one of the pioneers who advocated affective interaction by coordinating emotions and cognitive tasks. Among the advanced computational models, many were constructed specifically for one participant (i.e. subject dependent). Thus, they are less likely to be applicable and generalized to other participants due to individual differences. Unlike subject-dependent models, subject-independent models have also been developed which were constructed from physiological data extracted from several participants to a dozens of them. 3. 3.1.

EXPERIMENT DESIGN FOR EMOTION ELICITATION Participants

Twenty-eight college students (14 Chinese and 14 Indians with gender-balance) from Nanyang Technological University, Singapore, participated in the experiment. All the participants were aged between 20 and 30 (mean = 24.6; standard deviation = 2.5). To increase homogeneity within cultural groups, Chinese and Indian students were required to be raised in mainland China and India, respectively, and to have lived in Singapore for (emotion = excited, 8)’. It means that if the values of Features 2, 3 and 5 belong to the intervals (−6.66, 5.09), (1.64, Inf) and (−Inf, 3.14), respectively, then the corresponding emotion is excited with a support of eight. Support is the number of training samples from the decision table for which this rule applies correctly. These rules provide bases to predict emotions. For a given test sample, tst, the subset of rules matched by tst is selected. If tst matches only rules with the same emotion, then the affective state predicted by those rules is assigned to tst. However, if tst matches rules with different emotions, a commonly used measure in Equation (13) for conflict resolving is to be made so that the emotion with the highest measure value is chosen (Gora and Wojna, 2002).        Strength(tst, c) =  SupportSet(r) , (13) r∈MatchRules(tst,c)  where c denotes the cth emotion, SupportSet(r) is a set of training examples matching the rule, r, MatchRules(tst, c) is a subset of minimal rules that are applicable to tst and the emotion is c and | ∗ | denotes the cardinality of a set ‘∗’. k-nearest neighbor: This method predicts one emotion by a majority vote of its neighbors based on the class that is the most common among its k nearest neighbors. Here the k-NN (k = 10, 20, 30 chosen by the training set and applied to the test set) method does not compute the whole support set of the minimal rules covering tst, but is restricted to its neighborhood S(tst,k) as the set of k training examples defined by a similarity measure (Gora and Wojna, 2002): δf (x, y) = |(f (x) − f (y))/(max(f ) − min(f ))|,

(14)

where x = (f1 (x), f2 (x), . . . , f5 (x), d(x)) and y = (f1 (y), f2 (y), . . . , f5 (y), d(y)) are two training samples; max(f ) and min(f ) are the maximum and minimum values for feature f among the training examples (f is not discretized here as the measure δf in Equation (14) is numeric). And the algorithm is described as follows: for any training sample, trn, within

Interacting with Computers, 2013

Downloaded from http://iwc.oxfordjournals.org/ by guest on December 22, 2013

where C is the total number of emotions, xc∗j is the j th column of the cth emotion in X, uc is the mean vector of class c, Nc is the total number of samples in class c and u is the grand mean across C classes. In this research, there were six (C = 6) classes of emotions, 24 (Nc = 24) samples in each class, and 42 features (N = 42). According to Equations (10) and (11), we obtained the between-class and within-class scatter matrices SB and SW , from which the projection matrix W was derived. The final 5D (L = 5) feature subspace was determined. 5.2.

where Q denotes the total number of features instantiated by this rule, the predecessor of the rule takes the conjunction of φ certain feature instances, fq (z), and the successor takes on specific values of decision variables, d φ (z). The general form of a decision rule constructed for reduct  and object z is thus given as the following:

10

Feng Zhou et al. Table 4. Confusion matrix of emotion labeling by self-report for visual stimuli. Emotions Excited Happy Neutral Sad Fearful Disgust Total Precision

Excited 81 10 0 0 2 0 93 0.871

Happy 7 76 2 1 0 0 86 0.884

Neutral 3 7 94 4 5 0 113 0.832

Sad 0 1 0 82 1 8 92 0.891

Fearful 5 2 0 4 84 10 105 0.800

Disgust 0 0 0 5 4 78 87 0.897

Total 96 96 96 96 96 96 576 0.862+

Recall 0.844 0.792 0.979 0.854 0.875 0.813 0.859+ 0.859∗

0.862+ and 0.859+ are the mean precision and the mean recall, respectively; 0.859* is the mean F1 measure. Table 5. Confusion matrix of emotion labeling by self-report for auditory stimuli. Excited 80 8 0 0 2 0 90 0.889

Happy 8 76 0 2 1 0 87 0.874

Neutral 8 12 92 2 0 12 126 0.730

Sad 0 0 1 77 16 1 95 0.811

Fearful 0 0 0 8 73 4 85 0.859

Disgust 0 0 3 7 4 79 93 0.849

Total 96 96 96 96 96 96 576 0.835+

Recall 0.833 0.792 0.958 0.802 0.760 0.823 0.828+ 0.828∗

0.835+ and 0.828+ are the mean precision and the mean recall, respectively; 0.828* is the mean F1 measure.

S(tst, k) the algorithm constructs a local rule rtst (trn) for tst. Then it checks whether it is consistent with the remaining training examples in S(tst, k), If so, rtst (trn) is added to the support set for predicting emotion. Finally, the algorithm selects the emotion with the support set of the highest cardinality.

Hence, precision is a measure of exactness or fidelity whereas recall is a measure of completeness. The F1 measure that combines the precision and the recall is the harmonic mean of them. It thus gives the optimal accuracy. 6.1.

6.

RESULTS

Four models based on the rough set technique were constructed to predict emotions, including DR, 10-NN, 20-NN and 30-NN. A 10-fold cross-validation method was adopted to validate these prediction models. The level of significance (α) was set at 0.05 for all the statistical tests in this study. Besides, of the results reported here, precision, recall and F1 measure were calculated. In the context of classification, there are true positives (tp ) in which a match is correctly predicted, false positives (fp ) in which a non-match is declared to be a match and false negatives (fn ) in which an actual match is not detected. Based on these terms, precision, recall and F1 measure are defined as follows: tp Precision = , tp + f p tp Recall = , tp + f n 2 × Precision × Recall . F1 = Precision + Recall

Therefore, the self-report process generated the confusion matrices (see Kohavi and Provost, 1998) for the visual and auditory stimuli (Tables 4 and 5, respectively). Note that in a confusion matrix, each row represents the instances in an actual class (96 instances in total) while each column represents the instances in a predicted class (thus the total number of instances changes). Based on recall and precision in Tables 4 and 5, F1 measures of excited, happy, neutral, sad, fearful and disgusted were 0.857, 0.835, 0.900, 0.872, 0.835 and 0.853 for the visual stimuli and were 0.860, 0.831, 0.829, 0.806, 0.806 and 0.836 for the auditory stimuli, respectively. 6.2.

(15) (16) (17)

Results for self-report

Results for visual stimuli

Figure 3 shows the results of F1 measures for each emotion induced by the visual stimuli. Of all the emotions, neutral (88.2%) was predicted most accurately and happy (67.4%) least accurately based on the mean F1 measures across the four predictive models. Secondly, it is observed that 30-NN (76.2%) performed best of the four models based on the mean

Interacting with Computers, 2013

Downloaded from http://iwc.oxfordjournals.org/ by guest on December 22, 2013

Emotions Excited Happy Neutral Sad Fearful Disgust Total Precision

11

Emotion Prediction from Physiological Signals

F1 measures across the six emotions. The confusion matrix produced by 30-NN is presented in Table 6. In terms of both precision and recall (see Table 6), the greatest numbers of false predictions and misses were found in the context of happiness. 6.3.

Results for auditory stimuli

The results of F1 measures for predicting emotions induced by auditory stimuli were presented in Fig. 4. Of all the emotions, neutral (90.0%) was predicted most accurately and happy (52.2%) least accurately based on the mean F1 measures across the four prediction models. Secondly, 20-NN (76.1%) performed a little better than other models according to the mean F1 measures across the six emotions. Furthermore, the confusion matrix produced by 20-NN was presented in Table 7. According to precision and recall in Table 7, the greatest false predictions and the greatest misses lie in happiness and happiness, respectively. When comparing the results in Figs 3 and 4 and Tables 6 and 7, similar patterns were found. In both cases, neutral were predicted most accurately and happy the least. The lowest precision and recall of best prediction model were also obtained

Figure 4. F1 measure of predictive models for each emotion for auditory stimuli.

when predicting happiness. This was also manifested in the last column and last row in Tables 6 and 7. However, when individual emotions predicted by all the four models in Figs 3 and 4 were compared, significant differences were found. Specifically, F1 measures for ‘excited’ in Fig. 4 were significantly larger than those in Fig. 3 (t (6) = 2.83, p < 0.05), whereas F1 measures for ‘happy’in Fig. 3 were significantly larger than those in Fig. 4 (t (6) = 5.20, p < 0.01). Furthermore, prediction results by physiological data are less accurate when compared with those by subjective self-reports in terms of the mean F1 measures (visual vs. self-report: 76.2% vs. 85.9%; auditory vs. self-report: 76.1 vs. 82.8%). 6.4.

Cultural differences

Firstly, the results of F1 measures for predicting emotions induced by visual stimuli for Chinese and Indian participants were presented in Fig. 5. Of all the emotions, neutral (86.6 vs. 89.2%) was predicted most accurately and happy (66.1 vs. 69.0%) least accurately based on the mean F1 measures across the four prediction models for Chinese and Indian participants, respectively. 30-NN (76.1%) and DR (79.9%) performed the

Table 6. Confusion matrix of 30-NN for visual stimuli. Emotions Excited Happy Neutral Sad Fearful Disgust Total Precision

Excited 18 3 0 0 0 2 23 0.783

Happy 5 15 2 0 0 0 22 0.682

Neutral 0 4 22 0 0 0 26 0.846

Sad 0 1 0 19 3 3 26 0.731

Fearful 0 0 0 2 16 2 20 0.800

Disgust 0 0 0 2 4 17 23 0.739

Total 23 23 24 23 23 24 140 0.763+

0.763+ and 0.764+ are the mean precision and the mean recall, respectively; 0.762* is the mean F1 measure.

Interacting with Computers, 2013

Recall 0.783 0.652 0.917 0.826 0.696 0.708 0.764+ 0.762*

Downloaded from http://iwc.oxfordjournals.org/ by guest on December 22, 2013

Figure 3. F1 measure of predictive models for each emotion for visual stimuli.

12

Feng Zhou et al. Table 7. Confusion matrix of 20-NN for auditory stimuli. Emotions Excited Happy Neutral Sad Fearful Disgust Total Precision

Excited 22 5 0 0 1 0 28 0.786

Happy 2 10 2 2 4 2 22 0.455

Neutral 0 2 22 0 0 0 2.4 0.917

Sad 0 0 0 18 1 4 23 0.783

Fearful 0 4 0 0 18 0 22 0.818

Disgust 0 2 0 2 0 17 21 0.810

Total 24 23 24 22 24 23 14 0.761+

Recall 0.917 0.435 0.917 0.818 0.750 0.739 0.763+ 0.761*

0.761+ and 0.763+ are the mean precision and the mean recall, respectively; 0.761* is the mean F1 measure.

Downloaded from http://iwc.oxfordjournals.org/ by guest on December 22, 2013

Figure 5. F1 measure of predictive models for each emotion for visual stimuli: (a) Chinese participants and (b) Indian participants.

best of all the other models according to the mean F1 measures across the six emotions for Chinese and Indian participants, respectively. The results of F1 measures for predicting emotions induced by auditory stimuli for Chinese and Indian participants were presented in Fig. 6. Of all the emotions, neutral (85.6 vs. 90.9%) was predicted most accurately and happy (62.2 vs. 56.1%) least accurately based on the mean F1 measures across the four prediction models for Chinese and Indian participants, respectively. 20-NN (76.3%) and 20-NN (78.8%) performed the

Figure 6. F1 measure of predictive models for each emotion for auditory stimuli: (a) Chinese participants and (b) Indian participants.

best of all the other models according to the mean F1 measures across the six emotions for Chinese and Indian participants, respectively. Secondly, when comparing the results of Fig. 5a and b, no significant difference was found between Chinese and Indian participants for visual stimuli across six emotions. Similarly, when comparing the results of Fig. 6a and b, no significant difference was found between Chinese and Indian participants for auditory stimuli, except fearful in which F1 measures for Indian participants were significantly larger than those

Interacting with Computers, 2013

Emotion Prediction from Physiological Signals for Chinese participants (t (6) = −2.68, p < 0.05). When comparing the results of Figs 5a and 6a, within the Chinese participants, a significant difference was found for excited, in which measures for auditory stimuli were significantly larger than those for visual stimuli (t (6) = −3.04, p < 0.05). When comparing the results of Figs 5b and 6b, within the Indian participants, a significant difference was found for excited and happy, in which measures of excited for auditory stimuli were significantly larger than those for visual stimuli (t (6) = −2.48, p < 0.05) and measures of happy for auditory stimuli were significantly smaller than those for visual stimuli (t (6) = 4.75, p < 0.01).

7.

DISCUSSION

Unlike previous studies in the area of emotion prediction that mainly used musical stimuli, this study expands the auditory stimuli to a wide range of semantics, such as music, laughs, growls, cries, and shouts, and thus effectively incorporates the important role of auditory information in survival significance and in daily life. More importantly, the present study facilitates comparisons between different studies by using standardized auditory and visual stimuli from the IADS and the IAPS, respectively. Note that the standard deviations in Table 2 for auditory stimuli seem larger than those for visual stimuli in terms of self-reported rating. However, by comparing the prediction results using physiological data, auditory stimuli were as effective as visual stimuli to elicit emotions in the laboratory environment evidenced by the prediction accuracies. The discrepancy between the results from self-report ratings and physiological signals is probably due to the fact that selfreport ratings are subjective in nature, whereas physiological signals are objective per se in that they are generally involuntary and cannot be easily triggered by any conscious or intentional control. Another possible reason is due to the individual difference in the common ground-truth obtained by self-report (felt-emotions) and in the specificity of physiology. Our findings are consistent with those from previous studies that reported auditory cues were effective in activating the same appetitive and defensive motivational systems underlying emotional expressions activated by visual cues (Bradley and Lang, 2000). It is noted that excited was predicted more accurately in the auditory scene than that in the visual scene. This might be explained by the erotic stimuli employed. Compared with the erotic visual stimuli (e.g. half-naked couple), the erotic auditory stimuli might be even more potent and can lead to stronger effects of emotions. This may be due to the fact that physiological systems, such as SCR, can react reflexively in a relatively stronger fashion to dynamic and ever-changing stimulation in the form of auditory stimuli than to static visual information (Bradley and Lang, 2000). This situation tends to happen when the arousal level of the stimuli is high. For

example, fearful (mean value 69.7% (visual in Fig. 3) vs. mean value 72.9% (auditory in Fig. 4)) was also predicted more accurately in the auditory scene than that in the visual scene, although the difference was not significant. Another finding is that neutral is most accurately predicted using physiological measures (see Figs 3–6). This result is partially consistent with the self-reported result. One possibility accounting for this is that the selected stimuli, especially visual stimuli, have highly agreed valence ratings (see Table 2) which might boost the predicted results to some extent. In addition, this result might also be due to the changed scores used in the feature extraction process. These scores in the neutral condition are similar to baseline as no particular physiological responses to neutral stimuli were found. Furthermore, in both situations, happy was not well predicted compared with other emotions. This might be because happiness is a complicated construct with a multiplicity of meanings, such as pleasure, life satisfaction, positive emotions, and so on (Diener et al., 2003). This may also explain the difference between prediction accuracy by visual stimuli (around 67%) and that by auditory stimuli (around 52%). The auditory stimuli consisted of laughing and giggling and the visual stimuli primarily presented the happiness in the family. In this context, the stimuli might bias toward one component or another of its meanings and thus the physiological responses to happiness may scatter stronger in the context of happiness than in other emotions and in the auditory scene than in the visual scene. However, the self-report results showed a reasonably good result of predicting happiness. From close examination of the emotion-labeling procedure, we conjectured that it may ascribe to the forced choice among the six emotions which can turn a range of different interpretations into a single response category (in this case, happy) and thus exaggerates the degree of prediction (Russell et al., 1993). Nevertheless, we adopted this strategy in this research largely because of its tremendous easiness of data processing when compared with free choice methods. 7.2.

Cultural differences

Despite the fact that emotion is modulated by culture (Mesquita, 2003; Soto et al., 2005), it is also argued that it is often the case that subjective reports of emotions and behavioral emotional responses tend to be different among different cultural groups while physiological aspects of emotions are less susceptible to diverse cultures (Tsai et al., 2002). This may help explain why cultural differences between Chinese and Indian participants in emotion prediction accuracy were not well identified by the proposed models. Another possibility is that the strictly controlled laboratory environment limited the social–cultural interactions where cultural differences often tend to occur. However, culture-specific models do perform better than general models with mixed participants by comparing Fig. 5 (Visual and Chinese: 75.0%; Visual and Indian: 77.0%) with Fig. 3 (Visual

Interacting with Computers, 2013

Downloaded from http://iwc.oxfordjournals.org/ by guest on December 22, 2013

7.1. Visual stimuli vs. auditory stimuli

13

14

Feng Zhou et al.

7.3. Affective auditory stimuli in HCI Recently, affective computing has attracted much attention (Picard, 1997). An affective HCI system can detect and predict user’s emotions in real time, and can accordingly seek to respond to the emotions in ways to improve the interaction (Bailenson et al., 2008). For example, earlier studies have suggested that physiological data collected in real time can help optimize user pleasure and efficiency (Fairclough, 2009) and detect unnecessary irritation or frustration caused by HCI applications without having to interrupt users (Scheirer et al., 2002), thereby helping designers in finding target areas for redesign and improvement. However, the effects of auditory stimuli in affective computing are less well known and this is partly in relation to the less well-developed studies in recognizing emotions from auditory stimuli. In this study, we found that auditory stimuli are also effective in eliciting emotions from users. Thus, auditory stimuli can be incorporated into human–computer interfaces designed for both sighted and particularly partially sighted and congenitally blind users (i.e. blind from birth) (Brewster, 2002; Klinge et al., 2010). For example, companies could provide the customer services on the phone in which customers’ emotional reactions can be detected in real time (e.g. agitated or frustrated) and the system can change its programming and dialogue (calm the customer down) accordingly to improve communications, or refer the customer to a human correspondent. For the blind users, visual information is not available and the haptic modality is usually inadequate in everyday social interaction. Therefore, they heavily rely on the auditory modality in order to interact efficiently and effectively with others. For instance, ideal alerts and alarms in industrial systems and games often keep blind users informed of what is going on and induce a right emotional state for blind users to react appropriately. The results from our work indicate that affective auditory stimuli can effectively changes users’ emotional states. First,

this finding can be used to aid in enhancing the interface of mobile devices that have a very limited screen space. One possible solution is to use physiological signals to detect the user’s emotional state and then use affective auditory information to interact with the user. It is possible that smart phones can record the physiological signals of users such as SCR, heart rate and temperature by the embedded sensors. Under these circumstances, smart phones with speech recognition based personal assistant like Siri can not only talk to the user with their requests, but could also offer affective tones to soothe users’ negative emotional states or promote positive emotional states. Secondly, a wide range of affective auditory stimuli can be employed in applications in smart phones and tablets for the purpose of entertainment and education. For example, ‘Fart cushion’as a classic practical joke can embarrass all your friends by emitting a loud farting noise, ‘iFight’ mimics realistic weapon sound to scare people, ‘Auditory workout’ can help improve auditory attention and memory by frequent, challenging and intense auditory training (Bellis, 2002), ‘baby learns ABCs’ attracts the attention of baby by emitting vivid nursery rhymes, and ‘Baby loves Salsa’ can foster the music talent of kids, and so forth. The preeminent functions of music are socially and psychologically beyond cultures and thus another important implication for HCI aims to employ music to regulate mood, release stress, heal love pangs, etc. We also employed music auditory stimuli for emotion elicitation and prediction and its effectiveness has already been proved in previous studies (Baumgartner et al., 2006; Kim and André, 2008). Certain emotional labels prominently apply to music in particular genres, e.g. angry for punk music, sad for slow blues, happy for children music and so on (Tao and Ogihara, 2006). Based on the physiological responses to certain music and songs, they can be tagged with particular emotional labels, according to which the emotional content of the music and songs can be applied to such scenarios as a DJ choosing music to regulate the emotional level of people on the dance floor, to composer scoring a film, to someone preparing the soundtrack for his/her daily workout, to people in love, to music therapists monitoring their patients’ emotional responses, to putting a baby to sleep and so on. 7.4.

Data mining methods and feature extraction

The results obtained from both visual and auditory stimuli demonstrated that two data mining models, namely DR and k-NN based on the rough set theory, were capable of predicting emotions with the accuracy comparable with previous multisubject emotion prediction models. The proposed emotion measurements and data analysis methods probably work well in both visual and auditory affective HCI applications, because these methods can effectively identify the patterns of physiological responses to both affective visual and auditory stimuli. In this sense, a universal emotion prediction system may be developed, which would work in the context of different

Interacting with Computers, 2013

Downloaded from http://iwc.oxfordjournals.org/ by guest on December 22, 2013

and Mixed: 74.8%) and by comparing Fig. 6 (Auditory and Chinese: 75.1%; Auditory and Indian: 76.7%) with Fig. 4 (Auditory and Mixed: 74.5%). In this sense, culture does play some role in prediction models and participants in the same cultural group generally show higher homogeneity than those in the mixed cultural groups. In particular, when the confounding factor, i.e. culture, is excluded, the specificity of physiological patterns increases. This finding is consistent with previous studies that reported emotional responses were culturally shaped (e.g. Mesquita, 2003). Nevertheless, it is both theoretically and practically meaningful to compare culture-specific models with mixed models (Zhou et al., 2011a). Theoretically, this can help identify to what extent physiological signals account for different cultural effects (e.g. sensitivity) on specific emotions. It thus provides practical guidelines when designing products and HCI applications that people might respond to culturally specific clothing and accessories, for example.

Emotion Prediction from Physiological Signals

7.5.

Limitations and future studies

Firstly the prediction accuracy in this study is comparable with numerous previous studies (e.g. Frantzidis et al., 2010), but it is inferior to that from Lisetti and Nasoz (2004). In Schiano et al. (2004), film clips over a longer period were used as stimuli for emotion elicitation and the best classification accuracy was 84.1% among six emotions. Furthermore, it was also evidenced by Baumgartner et al. (2006) in which auditory stimuli (e.g. music) were found to markedly enhance the emotional experience evoked by affective pictures compared with unimodal (i.e. visual or auditory) affective stimuli alone. Therefore, it might be helpful to further our study when visual and auditory information is used as elicitors at the same time. Secondly, the emotion elicitation was passive with a simple stimulus response paradigm and took place in a well-controlled lab environment. However, for real affective HCI applications, it is often not the case. Physiological reactivity to visual and auditory stimuli might be different when complex social interactions are involved. Human beings can be actively

involved in complex social interactions in which culture plays an important role and modulate their emotional reactions to stimuli to a great extent (Calvo and D’Mello, 2010). Therefore, it is important to study cultural differences in social interaction processes in the future. During social interactions, it is also promising to explore other emotion-related behaviors, such as facial expressions, speech and postures, to study possible differences between visual and auditory stimuli. It is important to identify an appropriate set of emotion-related behaviors that has higher correlations with a certain modality of information presentation to improve prediction accuracy for future studies. Thirdly, although we tried to control the factors that influence results, such as within-subject experiment design, environment and the stimuli themselves, it is cautious to apply and generalize the current results to other stimuli significantly different from those in this study, as we selected only a portion of visual and auditory stimuli from all possible stimuli with much wider affective tones. Therefore, a more wide range of auditory stimuli is needed to further understand the relationships between auditory stimuli and emotion elicitation and prediction. Fourthly, we adopted a forced choice strategy for emotion labeling. It is limited, because it only accounts for six emotions to the majority of the participants and can turn a range of different interpretations into a single response category, whereas selected participants might have other possible emotions. This effect can be exaggerated when the participants are from different cultures and speak different languages. Moreover, forced choice requires participants to treat the emotion choices offered as mutually exclusive, which could not be the case in reality (Russell et al., 1993). However, free-response labels for emotions are quite difficult to deal with as slight difference in one dimension (e.g. intensity of emotion) of one word from another will need to be distinguished, such as surprise and startle. Therefore, a more appropriate emotional labeling strategy is needed for future studies in order to improve the validity of the prediction results, for example, using a forced choice strategy plus another categories noted as others.

8.

CONCLUSION

In this paper, we demonstrated that auditory stimuli were able to induce emotional states as effectively as visual stimuli by using emotion prediction results obtained from physiological computational models. In order to do so, a number of physiological measures were recorded, including facial EMG, SCR, EEG and respiration data. The emotion prediction model was constructed using the rough set-based data mining methods (i.e. DR and k-NN) and based on the features abstracted from the physiological measures. For the six selected emotions from 24 participants, we obtained an average F1 measure of 76.2 and 76.1% in the visual and auditory scenes, respectively. Furthermore, culture differences were not well identified among Chinese and Indian participants in terms of prediction

Interacting with Computers, 2013

Downloaded from http://iwc.oxfordjournals.org/ by guest on December 22, 2013

affective HCI applications and (at least visual and auditory) modalities. Other advanced computational methods reported in the previous literature are also promising, such as support vector machine (Liu et al., 2008) and Marquardt back propagation (Lisetti and Nasoz, 2004).A combination of different techniques may be more successful. For example, Gu et al. (2010) applied a two-stage Gaussian mixture model with a combination of sequential floating forward selection and k-NN, and obtained the best prediction accuracy of 90.3% among four emotions. Feature extraction plays an important role when using the data mining methods for emotion prediction. Some useful information regarding feature extraction can be obtained from the present study. First, it is not necessarily true that the more features will lead to better results. In fact, in our initial attempt, besides what we report in this paper, we also extracted features from heart rates and peripheral temperature. However, the inclusion of these additional features was found to make no difference in the prediction accuracy. This may be due to the fact that heart rates and peripheral temperature are not sensitive to emotion changes (Haag et al., 2004) or the features used are not relevant for heart rate and temperature. Secondly, instead of using statistical tests, advanced feature selection methods, such as greedy forward selection, greedy forward elimination and genetic algorithms, may be used to figure out the combination of the features that is associated with the optimal prediction performance. However, a problem with using advanced feature selection methods is that the model constructed can be overfitting for the particular problem. Thus, it may be difficult to generalize the results into other scenarios. Another caveat should be noted is that dimension reduction using LDA without cross-validation in the present study may forbid comparisons of absolute accuracy with other studies incorporating proper cross-validation.

15

16

Feng Zhou et al.

accuracy. Based on these findings, auditory stimuli can be better understood as emotional elicitors that may influence practical HCI applications.

Diener, E., Scollon, C.N. and Lucas, R.E. (2003) The evolving concept of subjective well-being: the multifaceted nature of happiness. Adv. Cell Aging Gerontol., 15, 187–219. Duntsch, I. and Gediga, G. (1998) Uncertainty measures of rough set prediction. AI Commun., 106, 109–137.

REFERENCES Asutay, E., Västfjäll, D., Tajadura-Jiménez, A., Genell, A., Bergman, P. and Kleiner, M. (2012) Emoacoustics: a study of the physical and psychological dimensions of emotional sound design. J. Audio Eng. Soc., 60, 21–28. Bailenson, J.N., Pontikakis, E.D., Mauss, I.B., Gross, J.J., Jabon, M.E., Hutcherson, C.A.C., Nass, C. and Oliver, J. (2008) Real-time classification of evoked emotions using facial feature tracking and physiological responses. Int. J. Hum.-Comput. Stud., 66, 303–317.

Ekman, P., Friesen, W.V. and Hager, J.C. (1978) Facial Action Coding System: A Technique for the Measurement of Facial Movement. Consulting Psychologists Press, Palo Alto, CA, USA. Ekman, P., Levenson, R.W. and Friesen, W.V. (1983) Autonomic nervous system activity distinguishes among emotions. Science (New Series), 221, 1208–1210. Fairclough, S.H. (2009) Fundamentals of physiological computing. Interact. Comput., 21, 133–145.

Bellis, T.J. (2002) Developing deficit-specific intervention plans for individuals with auditory processing disorders. Semin. Hear., 23, 287–296.

Frantzidis, C.A., Bratsas, C., Klados, M.A., Konstantinidis, E., Lithari, C.D., Vivas, A.B., Papadelis, C.L., Kaldoudi, E., Pappas, C. and Bamidis, P.D. (2010) An integrated data-mining-based approach for healthcare applications. IEEE Trans. Inf. Technol. Biomed., 14, 309–318.

Bongard, S., Martin, N.M., Seip, M. and Al’absi, M. (2011) Evaluation of a domain-specific anger expression assessment strategy. J. Pers. Assess., 93, 56–61.

Gora, G. and Wojna, G. (2002) Riona: a new classification system combining rule induction and instance-based learning. Fundam. Inform., 51, 369–390.

Bos, D.O. (2008) EEG-based emotion recognition: the influence of visual and auditory stimuli. Capita Selecta Paper, http://hmi.ewi.utwente.nl/verslagen/capita-selecta/CS-Oude_BosDanny.pdf.

Gu, Y., Tan, S.L., Wong, K.J., Ho, M.H.R. and Qu, L. (2010) A GMM Based 2-Stage Architecture for Multi-Subject Emotion Recognition using Physiological Responses. Proc. 1st Augmented Human International Conference, Megève, France,ACM NewYork, NY, USA.

Bradley, M.M. and Lang, P.J. (2000) Affective reactions to acoustic stimuli. Psychophysiology, 37, 204–215. Bradley, M.M. and Lang, P.J. (2007) The International Affective Digitized Sounds (2nd edn; IADS-2). Affective Ratings of Sounds and Instruction Manual, University of Florida, Gainesville, FL, 2007. Brewster, S.A. (2002) Non-Speech Auditory Output. In Jacko, J.A. and Sears, A. (eds), Human–Computer Interaction Handbook, pp. 220–239. Lawrence Erlbaum Associates, Mahwah, NJ. Cacioppo, J.T. and Tassinary, L.G. (1990) Inferring psychological significance from physiological signals. Am. Psychol., 45, 16–28. Calvo, R.A. and D’Mello, S. (2010) Affect detection: an interdisciplinary review of models, methods, and their applications. IEEE Trans. Affect. Comput., 1, 18–37. Carpentier, F.R.D. and Potter, R.F. (2007) Effects of music on physiological arousal: explorations into tempo and genre. Media Psychol., 10, 339–363.

Gunes, H. and Piccardi, M. (2009) Automatic temporal segment detection and affect recognition from face and body display. IEEE Trans. Syst. Man Cybern. Part B 39, 64–84. Haag, A., Goronzy, S., Schaich, P. and Williams, J. (2004) Emotion Recognition using Bio-Sensors: First Steps Towards an Automatic System. In André, E., Dybkjær, L., Minker, W. and Heisterkamp, P. (eds), Affective Dialogue Systems, pp. 36–48. Springer, Berlin. Heraz, A. and Frasson, C. (2007) Predicting the three major dimensions of the learner’s emotions from brainwaves. World Acad. Sci. Eng. Technol., 25, 323–329. Jäncke, L., Vogt, J., Musial, F., Lutz, K. and Kalveram, K.T. (1996) Facial EMG responses to auditory stimuli. Int. J. Psychophysiol., 22, 85–96. Kandel, E.R., Schwartz, J.H. and Jessell, T.M. (2000) Principles of Neural Science (4th edn). McGraw-Hill, New York.

Chanel, G., Kronegg, J., Grandjean, D. and Pun, T. (2006) Emotion Assessment: Arousal Evaluation using EEG’s and Peripheral Physiological Signals. In Gunsel, B., Jain, A.K., Tekalp, A.M. and Sankur, B. (eds), Multimedia Content Representation, Classification and Security, Lecture Notes in Computer Science, Vol. 4105, pp. 530–537. Springer, Heidelberg.

Kapoor, A., Burleson, B. and Picard, R. (2007) Automatic prediction of frustration. Int. J. Hum.-Comput. Stud., 65, 724–736.

Choppin, A. (2000) EEG-based human interface for disabled individuals: emotion expression with neural networks. MSc Thesis, Tokyo Institute of Technology, Yokohama, Japan.

Kim, J. and André, E. (2008) Emotion recognition based on physiological changes in music listening. IEEE Trans. Pattern Anal. Mach. Intell., 30, 2067–2083.

Katsis, C.D., Katertsidis, N., Ganiatsas, G. and Fotiadis, D.I. (2008) Toward emotion recognition in car-racing drivers: a biosignal processing approach. IEEE Trans. Syst. Man Cybern. Part A: Syst. Hum., 38, 502–512.

Interacting with Computers, 2013

Downloaded from http://iwc.oxfordjournals.org/ by guest on December 22, 2013

Baumgartner, T., Esslen, M. and Jäncke, L. (2006) From emotion perception to emotion experience: emotions evoked by pictures and classical music. Int. J. Psychophysiol., 60, 34–43.

Ekman, P. (1992) An argument for basic emotions. Cogn. Emotion, 6, 169–200.

Emotion Prediction from Physiological Signals

17

Kleinsmith, A. and Bianchi-Berthouze, N. (2012) Affective body expression perception and recognition: a survey. IEEE Trans. Affect. Comput. DOI: 10.1109/T-AFFC.2012.16.

Petrantonakis, P.C. and Hadjileontiadis, L.J. 2010. Emotion recognition from brain signals using hybrid adaptive filtering and higher order crossings analysis. IEEE Trans. Affect. Comput., 1, 81–97.

Kleinsmith, A. Bianchi-Berthouze, N. and Steed, A. (2011) Automatic recognition of non-acted affective postures. IEEE Trans. Syst. Man Cybern. Part B, 41, 1027–1038.

Picard, R.W. 1997. Affective Computing. The MIT Press, Cambridge, MA, USA.

Klinge, C., Röder, B. and Büchel, C. (2010) Increased amygdala activation to emotional auditory stimuli in the blind. Brain, 133, 1729–1736. Koelstra, S. and Patras, I. (2013) Fusion of facial expressions and EEG for implicit affective tagging. Image Vision Comput., 31, 164–174. Kohavi, R. and Provost, F. (1998) Glossary of terms. Mach. Learn., 30, 271–274.

Lazarus, R.S. (1991) Emotions and Adaptation. Oxford University Press, Oxford, UK. Levenson, R.W., Ekman, P. and Friesen, W.V. (1990) Voluntary facial action generates emotion-specific autonomic nervous system activity. Psychophysiology, 27, 363–384. Leventhal, H. (1980) Toward a Comprehensive Theory of Emotion. In Leonard, B. (ed.), Advances in Experimental Social Psychology, pp. 139–207. Academic Press, New York. Lisetti, C.L. and Nasoz, F. (2004) Using noninvasive wearable computers to recognize human emotions from physiological signals. EURASIP J. Appl. Signal Process., 2004, 1672–1687.

Ribeiro, R.L., Teixeira-Silva, F., Pompéia, S. and Bueno, O.F.A. (2007) IAPS includes photographs that elicit low-arousal physiological responses in healthy volunteers. Physiol. Behav., 91, 671–675. Russell, J.A., Suzuki, N. and Ishida, N. (1993) Canadian, Greek, and Japanese freely produced emotion labels for facial expressions. Motiv. Emotion, 17, 337–351. Sauter, D.A., Panattoni, C. and Happé, F. (2013) Children’s recognition of emotions from vocal cues. Br. J. Dev. Psychol., 31, 97–113. Scheirer, J., Fernandez, R., Klein, J. and Picard, R.W. (2002) Frustrating the user on purpose: a step toward building an affective computer. Interact. Comput., 14, 93–118. Schiano, D.J., Ehrlich, S.M. and Sheridan, K. (2004) Categorical Imperative Not: Facial Affect is Perceived Continuously. Proc. SIGCHI Conf. Human Factors in Comput. Syst., Vienna, Austria, pp. 49–56, ACM New York, NY, USA. Schuller, B., Vlasenko, B., Eyben, F., Wöllmer, M., Stuhlsatz, A., Wendemuth, A. and Rigoll, G. (2010) Cross-corpus acoustic emotion recognition: variances and strategies. IEEE Trans. Affect. Comput., 1, 119–131. Smith, C.A. (1989) Dimensions of appraisal and physiological response in emotion. J. Pers. Soc. Psychol., 56, 339–353.

Liu, C., Conn, K., Sarkar, N. and Stone, W. (2008) Physiology-based affect recognition for computer-assisted intervention of children with autism spectrum disorder. Int. J. Hum.-Comput. Stud., 66, 662–677.

Soleymani, M., Chanel, G., Kierkels, J.J.M. and Pun, T. (2009) Affective characterization of movie scenes based on content analysis and physiological changes. Int. J. Semant. Comput., 3, 235–254.

Mandryk, R. and Atkins, M. (2007) A fuzzy physiological approach for continuously modeling emotion during interaction with play technologies. Int. J. Hum.-Comput. Stud., 65, 329–347.

Soleymani, M., Pantic, M. and Pun, T. (2012) Multimodal emotion recognition in response to videos. IEEE Trans. Affect. Comput., 3, 211–223.

Martinez, A.M. and Kak, A.C. (2001) PCA versus LDA. IEEE Trans. Pattern Anal. Mach. Intell., 23, 228–233.

Soto, J.A., Levenson, R.W. and Ebling, R. (2005) Cultures of moderation and expression: emotional experience, behavior, and physiology in Chinese Americans and Mexican Americans. Emotion, 5, 154–165.

Mesquita, B. (2003) Emotions as Dynamic Cultural Phenomena. In Davison, R.J., Scherer, K.R. and Goldsmith, H.H. (eds), Handbook of Affective Sciences. Oxford University Press, New York. Nasoz, F., Alvarez, K., Lisetti, C.L. and Finkelstein, N. 2003. Emotion recognition from physiological signals for presence technologies. Int. J. Cogn. Technol. Work. (Special Issue on Presence) 6, 4–14. Niedermeyer, E. and Lopes Da Silva, F. 2004. Electroencephalography: Basic Principles, Clinical Applications, and Related Fields (5th edn). Lippincott Williams & Wilkins, Philadelphia, PA, USA.

Tajadura-Jiménez, A., Väljamäe A. and Västfjäll D. (2008) Selfrepresentation in mediated environments: the experience of emotions modulated by auditory–vibrotactile heartbeat. CyberPsychol. Behav., 11, 33–38. Tajadura-Jiménez, A., Larsson, P., Väljamäe, A., Västfjäll, D. and Kleiner, M. (2010a) When room size matters: acoustic influences on emotional responses to sounds. Emotion, 10, 416–422.

Partala, T. and Surakka, V. 2003. Pupil size variation as an indication of affective processing. Int. J. Hum.-Comput. Stud., 59, 185–198.

Tajadura-Jiménez, A., Väljamäe, A., Asutay, E. and Västfjäll, D. (2010b) Embodied auditory perception: the emotional impact of approaching and receding sound sources. Emotion, 10, 216–229.

Pawlak, Z. (1991) Rough Sets: Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Boston, MA, USA.

Tao, L. and Ogihara, M. (2006) Toward intelligent music information retrieval. IEEE Trans. Multimedia, 8, 564–574.

Interacting with Computers, 2013

Downloaded from http://iwc.oxfordjournals.org/ by guest on December 22, 2013

Lang, P.J., Greenwald, M.K., Bradley, M.M. and Hamm, A.O. (1993) Looking at pictures: affective, facial, visceral, and behavioral reactions. Psychophysiology, 30, 261–273.

Picard, R.W., Vyzas, E. and Healey, J. (2001) Toward machine emotional intelligence: analysis of affective physiological state. IEEE Trans. Pattern Anal. Mach. Intell., 23, 1175–1191.

18

Feng Zhou et al.

Tsai, J.L., Chentsova-Dutton, Y., Friere-Bebeau, L.H. and Przymus, D. (2002) Emotional expression and physiology in European Americans and Hmong Americans. Emotion, 2, 380–397. Uzun, S.S., Yildirim, S. and Yildirim, E. (2012) Emotion Primitives Estimation from EEG Signals using Hilbert Huang Transform. IEEE-EMBS Int. Conf. Biomedical and Health Informatics (BHI), Hong Kong. Verona, E., Patrick, C., Curtin, J., Bradley, M. and Lang, P. (2004) Psychopathy and physiological response to emotionally evocative sounds. J. Abnorm. Psychol., 113, 99–108. Welch, P.D. (1967) The use of fast fourier transform for the estimation of power spectra: a method based on time averaging over short, modified periodograms. IEEE Trans. Audio Electroacoust., 15, 70–73.

Liu, Y. and Sourina, O. (2012) EEG-Based Dominance Level Recognition for Emotion-Enabled Interaction. IEEE Int. Conf. Multimedia and Expo (ICME) Melbourne, VIC, Australia. Zeng, Z., Pantic, M., Roisman, G.I. and Huang, T.S. 2009. A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans. Pattern Anal. Mach. Intell., 31, 39–58. Zhang, Q. and Lee, M. 2012. Emotion development system by interacting with human eeg and natural scene understanding. Cogn. Syst. Res., 14, 37–49. Zhou, F., Qu, X., Helander, M.G. and Jiao, R.J. (2011a) Affect prediction from physiological measures via visual stimuli. Int. J. Hum.-Comput. Stud., 69, 801–819. Zhou, F., Xu, Q. and Jiao, R. (2011b) Fundamentals of product ecosystem design for user experience. Res. Eng. Des., 22, 43–61.

Interacting with Computers, 2013

Downloaded from http://iwc.oxfordjournals.org/ by guest on December 22, 2013

Wrase, J., Klein, S., Gruesser, S.M., Hermann, D., Flor, H., Mann, K., Braus, D.F. and Heinz, A. (2003) Gender differences in the processing of standardized emotional visual stimuli in humans: a functional magnetic resonance imaging study. Neurosci. Lett., 348, 41–45.

Wu, D., Courtney, C.G., Lance, B.J., Narayanan, S.S., Dawson, M.E., Oie, K.S. and Parsons, T.D. (2010) Optimal arousal identification and classification for affective computing using physiological signals: virtual reality Stroop task. IEEE Trans. Affect. Comput., 1, 109–118.

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 42, NO. 1, JANUARY 2012

201

User Experience Modeling and Simulation for Product Ecosystem Design Based on Fuzzy Reasoning Petri Nets Feng Zhou, Roger J. Jiao, Qianli Xu, and Koji Takahashi

Abstract—Product ecosystem design entails complex user experience (UX) that involves interactions among multiple users, products, and the ambience. This paper aims to capture causal relationships between UX and design elements and in turn to provide decision support to product ecosystem analysis. A fuzzy reasoning Petri net is developed to deal with the uncertainty, complexity, and dynamics associated with UX modeling. Reasoning of diverse constructs of UX is embedded in the fuzzy production rules that are derived from self-report UX data based on rough set mining. A fuzzy reasoning algorithm is implemented to perform parallel inference by multicriteria rules and to simulate most likely UX under different ambient factors. A case study of subway station UX design demonstrates the potential of product ecosystem FRPN formulation. Index Terms—Fuzzy reasoning Petri net (FRPN), modeling and simulation, product ecosystem, user experience (UX) design.

I. I NTRODUCTION

M

ANY companies across industries have echoed the importance of user experience (UX) as a key success factor for product design [1]. The designers’ ability to create meaningful UX goes far beyond usability itself. It is also indispensible to consider other cognitive, sociocognitive, and affective aspects of UX in the interaction process, such as users’ enjoyment, aesthetic experience, brand loyalty, and mental models [2]. Depending on the context in which designers work, they can utilize the latest technologies, combine multimedia platforms with services, or make use of sensory information to create meaningful UX [3]. As so many products are no longer islands of their own to fulfill self-contained functionality, the most impressed means is probably a product ecosystem paradigm, whereby multiple interconnected products are considered as a consistent whole to create unique UX, and importantly, to achieve high economic value [4]. For example, the Apple

Manuscript received July 6, 2010; revised November 17, 2010; accepted January 22, 2011. Date of publication May 23, 2011; date of current version December 16, 2011. This paper was recommended by Associate Editor W. Pedrycz. F. Zhou and R. Jiao are with the G.W. Woodruff School of Mechanical Engineering, Georgia Institute of Technology, Atlanta, GA 30332-0405 USA (e-mail: [email protected]). Q. Xu is with Institute for Infocomm Research, Agency for Science, Technology and Research, Singapore 138632. K. Takahashi is with Department of Electrical and Electronic Engineering, Tokyo Institute of Technology, Tokyo 152-8552, Japan. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TSMCA.2011.2147309

iPhone is upholstered by a suite of software, hardware, services, retailing stores, and system developers and user groups, all of which forms a product ecosystem. The concept of product ecosystems coincides with the notion of experience economy, where strategic business advantages depend less on the technological power of a given product, but more on UX within a product ecosystem [5]. However, a major area of research on UX is how positive UX can be promoted for a single product, overlooking influences of other factors by interactions with the ambience. A. Product Ecosystem Design for UX Product ecosystem design for UX underscores interactions of interdependent products and fulfillment of UX in terms of affective and cognitive dimensions. However, current practices in product development appear to be limited in tackling these aspects. 1) Product Ecosystem: A product ecosystem can be specified as hard functional requirements, necessitating the technical system of a product, along with soft functional requirements, characterizing UX [6]. Therefore, to design an effective product ecosystem, two aspects need to be considered. The first aspect involves creation of products in the form of resources, tools, services, and/or technologies that offer technical solutions and create meaningful UX. It inherently implies a customization strategy, which compels the producer to examine different combinations of existing design elements and value profiles to identify key success factors. The second aspect entails complex human–product interactions and user integration to anticipate and adapt to users’ latent needs. It implies the notion of personalization that requires profound understanding of user decision-making processes when interacting with the product and the ambience where the products are collectively operating. Such an ecosystem thinking of product innovation and service operation is more likely to guarantee pleasurable UX at both the individual and community levels. 2) UX Measure: In order to design effective product ecosystems, appropriate UX measures with both construct validity and predictive power are required to address the particularities with regard to the evaluation of product ecosystems [2]. Law et al. [7] argue that UX is dynamic, context-dependent, and subjective, and that it is scoped to products, systems, services, and objects that a person interacts with through a user interface. Therefore, all the possible elements relevant to UX in the product ecosystem should be investigated for a deep

1083-4427/$26.00 © 2011 IEEE

202

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 42, NO. 1, JANUARY 2012

understanding of complex interactions between users and products. Furthermore, UX is associated with a broad range of fuzzy and dynamic concepts, including emotional, affective, experiential, hedonic, and aesthetic variables, and so on [8]. While numeric values are preferred by engineering designers, many human–computer interaction practitioners have controversies over the measurability of these fuzzy concepts [2]. Thus, the compelling concern is whether UX measures are meaningful and valid to reflect the nature of UX in question and are amenable to be quantified, modeled, and formalized. 3) UX Modeling: Models of UX are necessary as a starting point for the purpose of understanding, predicting, and reasoning about processes of UX for informing system design. According to Hassenzahl [9], the primary constructs of UX consist of hedonic quality, pragmatic quality, beauty, and goodness. Basically, two types of models are identified in behavioral sciences, i.e., measurement models and structural models [2]. The former is used to measure the constructs of UX in a particular domain, while the latter is used to establish (causal) relations between constructs [2], [10]. Although measurement models contribute to a sound basis for measuring UX, it is more helpful if structural models can provide theoretical understanding of (casual) relations between UX constructs and design elements for informing system design. Another challenge is to take the temporal dimension of UX into account, i.e., the dynamics of UX, in modeling UX. Traditional ways to validate UX models rely on data of different phases of product use. The initial judgment may not be confirmed by the subsequent UX as relevant constructs might only become salient in the user’s prolonged interaction with the product [11]. As a result, the content validity of the measurement model and predicative power of the structural model tend to suffer. Therefore, it is crucial to monitor UX as a process throughout a sufficiently long product use and to use these findings as a guide for future product design. B. Strategy for Solution Product ecosystem design encompasses multiple interrelated products, users, and their ambience. It is necessary to develop a systematic formulation of the product ecosystem in order to characterize interdependent products, whose functionalities complement with one another, in addition to multiple users, whose objectives and behaviors are diverse. Moreover, it must be able to capture interactions among users and products, while being adaptable to the evolution of the ambience [12]. Graphbased methods suggest themselves to be powerful tools to represent different physical elements in the product ecosystem [13], [14]. Additionally, their extension can capture fuzzy relations between different elements with great uncertainty, such as emotion and cognition [14]. Among them, Petri nets (PNs), especially fuzzy PNs, are suitable to deal with the uncertainty in knowledge representation and cognitive reasoning [15]. PNs have an exact mathematical definition of execution semantics with a well-developed mathematical theory for process analysis [16]. The dynamic and nondeterministic nature is well suited to model UX in product ecosystems. The qualitative nature of PNs is expected to foster the development of systematic methods

for the representation of product ecosystems, based on which UX can be analyzed in a quantitative manner with simulation methods. It is also possible to carry out “what-if” analysis for constructing UX and simulating the business processes. The outcome of the analysis can thus provide designers with decision support to improve UX, and enhance the operational efficiency of the product ecosystem. Towards this end, this paper develops a fuzzy reasoning PNs (FRPNs) for product ecosystem design and UX modeling. UX can be defined as dynamic context-dependent internal states of users, including two essential aspects, namely, the affect and the cognition [8]. Consistent with this understanding, both cognitive and affective factors are considered as important aspects to measure UX. The former focuses on instrumental aspects, such as usability and user mental models in the interaction process, while the latter focuses on affective responses stemming from user expectations, ambience, tasks, evaluations, etc. Moreover, UX measurement is self-reported, trajectorybased, and dynamic per se [7], [8]. In this regard, data about UX should be collected mainly from users’ self-reports. By trajectory-based, it means that UX evolves over time in different contexts [2]. Therefore, various aspects of UX should be measured at different points of time within possible contexts. As such, simulation is applied to generate various contexts in which UX is measured for a large number of users over time. II. R ELATED W ORK A. Design for UX Traditional design theory in human factors engineering emphasizes mainly the cognitive aspects, which empower a principled approach to the design and development of humancentered systems [17]. For example, Johnson and Turley [18] apply a think-aloud protocol to investigate cognitive requirements of nurses and physicians for designing a healthcare system. Coffey and Carnot [19] develop concept maps from retiring engineers to analyze cognitive requirements in rocket science for designing learning systems. These cognitive requirements are subsequently used to inform system analysis and design in various areas, such as information systems and marketing research, and so on [17]. These endeavors in the cognitive aspects help improve systems’ pragmatic quality, usability, as well as user mental models. However, with the focus being fixed on cognitive aspects only, the consideration of affective aspects is generally missing. An experiential perspective to UX has also attracted much attention. It stresses affect and emotions, such as users’ imaginary expectation and momentary emotions in different contexts and at different points of time [2], [20]. These ideas have been disseminated into various research areas to improve affective UX. For example, Kansei engineering improves affective UX (AUX) by selecting appropriate design elements that are able to elicit positive emotions [21]. In the field of consumer and marketing research, many studies address AUX from the perspective of advertisements and brands. They capitalize on the role of affect in purchase decision making and judgment by anticipated or expected interactions with the company, its products, and services [7], [22]. Jiao et al. [4] define the product

ZHOU et al.: UX MODELING AND SIMULATION FOR PRODUCT ECOSYSTEM DESIGN BASED ON FRPNS

203

ecosystem structure and provides a solution strategy based on ambient intelligence for AUX design. To better understand UX, it has been advocated to study both affective and cognitive aspects. For example, a multimodal affective user interface is developed to investigate how affect interacts with cognition [23]. Helander and Khalid [24] propose an integrated model combining affective and cognitive aspects for pleasurable design. B. UX Modeling Systematic modeling, design, and evaluation methods of product ecosystems for UX are generally in their infancy. It is imperative to model UX in the context of product ecosystems, addressing both cognitive and affective aspects. Two types of UX models are distinguished in behavioral sciences, namely, measurement model and structural model [10]. As for measurement models, the constructs are regarded as latent variables, which have one or more manifest variables. Data about the manifest variables are then collected (e.g., with self-reports) to test the measurement model using statistic techniques based on the specially designed experiments, such as analysis of variance and correlation regression analysis. For example, Hassenzahl [9] explore the interplay of beauty, goodness, and usability in interactive products. van Schaik and Ling [25] model UX with web sites where relationships among usability, hedonic value, beauty, and goodness are explored. A well-tested measurement model enables UX measures to be meaningful and validated, to a good extent, and the nature and properties of UX constructs can inform its operationalization and manifest measures [2]. Subsequently, structural models are used to identify causeand-effect relations between UX constructs using statistic techniques, such as structural equation modeling [26] and Cohen’s path analysis method [27]. Other models of UX often assume that (psychological) need fulfillment leads to positive experiences [20]. For example, Jordan [28] proposes to design pleasurable products with four pleasures: physiopleasure, sociopleasure, psychopleasure, and ideopleasure. Hassenzahl et al. [20] model UX to accommodate ten important psychological needs. Although these models give insights and inform system design based on the relationships among different UX constructs, they are often statically structured at one point or interval in time. Therefore, it is recommended that the relationships between antecedents of design elements of the system and consequents of UX should be modeled dynamically over time. III. E XAMPLE OF S UBWAY S TATION E COSYSTEM A subway station is exemplified for subsequent discussions. Typical design elements involved in the subway station are ticketing machines (TMs), gates that control entry and exit, lifts, staircases, escalators, information boards, shops, restaurants, trains, service persons, users, their luggage, and the environmental settings, such as lighting conditions and noise levels. In this example, the subway station provides services to transport users from one station to another via interactions with multiple networked products. It thus involves a series of interaction events in the service process, such as “buying a

Fig. 1. Evolution of UX in a subway station ecosystem.

ticket,” “entering the gate,” “going to the platform,” “boarding the train,” “going to the toilet,” “shopping,” and “dining.” The first four events are necessary while the remaining ones are optional based on personal needs. The user interacts with the system through a series of interaction events during which UX evolves. A product ecosystem is defined as a dynamic unit that consists of interdependent products and users, functioning together with its surrounding ambience, as well as their interactive relations and business processes. Generally, the user is interacting with only one product. Nevertheless, the interaction can be directly influenced by other factors within the user’s ambience. Therefore, ambience is defined according to the user’s relationships with his/her surroundings, including other users, products, as well as environmental, social, and cultural factors. For example, in Fig. 1, user k is queuing in order to go to the platform via gate 1. This process is directly influenced by other products, such as TMs (whether user k needs a ticket), the wide gate, his or her luggage (using wide gate or not), gate 2 (length of its queue), user k − 1 (who is just in front of user k), and surrounding lighting and noise, and so on. These elements are called ambient factors with regard to user k, and they collectively constitute user k’s ambience. A third notion is the interaction sequence the user adopts to board the train. For example, whether the user looks up an information board before boarding if s/he is not familiar with the station? In this sense, the interaction sequence also influences the efficiency and UX in the whole process. Only in a way that interdependent products and services are well designed and coordinated can the efficiency and productivity goal, as well as UX, be better accomplished. Based on the product ecosystem definition, UX is described as an evolution of the user’s internal states along the chain of interaction events. Fig. 1 shows the UX evolution of user k along the event sequence in the subway station ecosystem. Therefore, UX, in the context of a specific product ecosystem, stems from a sequence of interactions regarding all the events needed to perform a particular task. IV. UX S AMPLING Experience sampling has been employed for decades to collect assessments of users’ intentions, needs, and affective

204

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 42, NO. 1, JANUARY 2012

TABLE I C ONTEXTS OF “B UYING A T ICKET ” AND THE R ELATED U SERS ’ A FFECTIVE S TATES AND C OGNITIVE D ECISIONS

states [29]. In order to build an effective UX model that can predict UX in different contexts of the product ecosystem, UX measures need to be collected over time.

TABLE II D ESIGN E LEMENTS FOR S UBWAY S TATION UX DATA C OLLECTION

A. Affective and Cognitive Measures In order to effectively measure UX, two major dimensions of UX are formally defined in this paper, including (1) users’ affective states and (2) cognitive processes. 1) Affective States: Affective states are developed from core affect when it is attributed to product ecosystem elements through interaction events and are changeable over time. Russell [30] defines human core affect as a neurophysiological state that is consciously accessible as a simple feeling, an integral blend of valence (pleasure–displeasure) and arousal (sleep-activated). It is often complicated how one’s core affect is attributed to the elements and then developed into affective states. This can be caused by various factors, including the elements themselves and representations of the elements, either from the hedonic quality of the products and surrounding environmental settings (e.g., noise level), or internally generated from users’ expectation (e.g., train arrival). Moreover, affective states can shift from one to another due to the interaction events in different contexts and at different points of time. Hence, they are dynamic and can be short-lived with regard to emotions in the current state or become a long-lasting attitude as a chronic disposition [30]. 2) Cognitive Processes: Human cognition deals with the information processing tasks with respect to the interaction events in the product ecosystem. It is advocated that product ecosystems should cater to human cognitive capabilities and limitations, including numerous components of cognition that are attributed to attention, action and control, memory, and decision making [31]. Various aspects in the product ecosystem play a role in influencing users’ cognitive responses, such as the usability issues of TM interfaces, the decision-making process in different contexts, and user satisfaction by the cost needed and other services provided. B. Data Collection In order to collect UX related data, self-reports about users’ affective states and cognitive decisions are recorded based on users’ navigation through one real subway station in Tokyo, Japan. Fifteen users (nine males and six females) were recruited. Among them, five were new exchange students, and others were local students at Tokyo Institute of Technology. Hence, given the familiarity with the station, the exchange

students tend to need more cognitive efforts than the local ones when navigating through the station. Various contexts about the interaction events were created, leveraging various elements in the product ecosystem as listed in Table I. In addition to the products in Table I, environmental settings, including lighting conditions (low versus not low) and noise levels (high versus not high) are also employed as context variables. The resultant cognitive decisions depend on the specific interaction events; affective states have three general ones, i.e., pleasant, neutral, and unpleasant. For example, with regard to the interaction event, “buying a ticket,” queues of three TMs, the operation, the ticket price, and surrounding lighting and noise are used to create corresponding contexts. Ninety contexts (i.e., product profiles) are formulated through design of optimal experiment [32] using software SPSS 15.0 (www.spss.com), based on which users’ affective states and cognitive decisions were reported, as illustrated in Table II. Similar procedure was also applied to other interaction events in order to collect UX selfreport data.

ZHOU et al.: UX MODELING AND SIMULATION FOR PRODUCT ECOSYSTEM DESIGN BASED ON FRPNS

TABLE III G ENERATED RULES FOR THE I NTERACTION E VENT OF B UYING A T ICKET

V. ROUGH S ET-BASED F UZZY P RODUCTION RULES The prediction of users’ affective states and cognitive decisions involves immense uncertainty and dynamics. This necessitates an effective inference engine. As indicated in Table I, some variables are fuzzy, using terms like long, low, and high. The classical means to tackle this issue is to employ fuzzy production rules. Based on fuzzy logic theories, fuzzy production rules are used to represent uncertain knowledge and fuzzy reasoning process [33], taking the form of rj : P1 (θ1 )&P2 (θ2 )& . . . Pk (θk ) ⇒ Pl (θl ), cj

(1)

where θl = min{θ1 , θ2 , . . . , θk } × cj ; θi ∈ [0, 1]; i = 1, . . . , k, l denotes the truth degree of proposition Pi (i.e., fuzzy variable) in rule rj ; and cj ∈ [0, 1] denotes the confidence degree of applying rj . In the context of UX modeling, the fuzzy production rules are developed with respect to the events along the chain of UX. For a specific event, let P denote a set of propositions, some of which are fuzzy. These propositions are assertions about the status and attributes of elements in the product ecosystem. Furthermore, these fuzzy rules are knowledge base about the system, which can be codified as a fuzzy rule set. Take one user buying a ticket in the subway station as an example. Provided that the products, ambient factors, and users involved in this interaction event are three TMs, the operation of buying a ticket, the noise level, the light condition, the price of the ticket to the destination. The resultant UX is concerned with the user’s cognitive decision over which TM to choose and the user’s affective state. The aforementioned elements can be represented by 13 propositions, i.e., P = {P1 , P2 , . . . , P13 } (see Table III for their detailed definitions). Then, one possible rule is (¬P1 (0.8))&(P2 (0.7))&(P3 (0.9)) ⇒ (P8 (0.7)) with a confidence degree of 1, where θ8 = min{θ1 , θ2 , θ3 } × c = 0.7. It means that if the queue of TM1 is not (see the negation sign “¬” before P1 ) long with a truth degree of 0.8, the queues of TM2 and TM3 are long with truth degrees of 0.7 and 0.9, respectively, then the user will use TM1 with a truth degree of 0.7.

205

However, the challenge is how to obtain typical fuzzy production rules that can be regarded as the inference engine. In this regard, the rough set theory excels in tackling vagueness and uncertainty using rough approximations [34]. It can produce a complete set of consistent and minimal decision rules using an objective knowledge induction process. Furthermore, it provides criteria for selecting and refining mined patterns, such as support and confidence. Therefore, the rough set theory is applied to mine the fuzzy production rules in this paper [34]. The general reasoning process is described in the following: The self-report data are first organized as a set of feature vectors which depict different contexts of interaction events. Generally, these feature vectors can be seen as an information system, S = (U, V ), such that ∀ v ∈ V , v : U → V ∗ , where U is a nonempty finite set called the universe, V is a nonempty finite set of context feature variables, and V ∗ is the value set of a feature vector v. For example, in the context of buying a ticket, the feature vector is defined with the first seven context variables in Table II, i.e., v = {Queue1 , Queue2 , Queue3 , Operation, Noise, Lighting, Price}, describing the contexts of buying a ticket. Corresponding to context variables, the last two decision variables in Table II, i.e., affect and cognition, are defined to characterize UX. Let d = {d1 , d2 } ∈ D denote a decision vector, where d1 and d2 are the user’s resulting affective state and cognitive decision, respectively. Accordingly, Dh∗ = {d∗1h , d∗2h }H is the value set of d, where H is the total number of decision scenarios. In the example of buying a ticket, the resulting UX can be Dh∗ = {pleasant, TM1 }, and the total number of decision scenarios H is 3 × 3 = 9. For all the data with regard to one interaction event, training data are composed as a decision table (see Table II for example), Ω = (V ∪ D, C), where V ∪ D constitutes the universe of inference C, i.e., c = {Cl∗ }L ∈ C, where L denotes the total number of training patterns in Ω. A specific entry of Ω embodies an inference relationship from one context, Vj∗ , to the corresponding decision, Dh∗ , i.e., Cl∗ ∼ (Vj∗ ⇒ Dh∗ ). For instance, for the first entry in Table II, the potential inference relationship is V1∗ = {not long, long, not long, not easy, high, not low, not expansive} ⇒ D1∗ = {neutral, TM1 }. The mining of fuzzy production rules is based on the concept of reduction [34]. A reduct is defined as a subset of variables in S such that Φ = {φg }G ⊂ Ω, where φg = (vgφ , dφg ) is subject to an indiscernibility relation, in which, for objects X ∈ U and Y ∈ U , a pair (X, Y ) ∈ U × U belongs to Φ. Therefore, for any object P ∈ U , we can generate a rule, such that ∀i ∈ [1, K], the antecedent of the rule takes the conjunction of certain variable instances, viφ (P ), and the consequent takes specific values of decision variables, dφ (P ), where K denotes the total number of variables instantiated by this rule. The general form of a decision rule constructed for reduct Φ and object P is 

   v1φ = v1φ (P ) & . . . & vkφ = vkφ (P ) ⇒ dφ = dφ (P ). (2)

For the previously mentioned rule (¬P1 (0.8))&(P2 (0.7))& (P3 (0.9)) ⇒ (P8 (0.7)), it can be represented as (v1 = Queue1 is not long)&(v2 = Queue2 is long)&(v3 = Queue3

206

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 42, NO. 1, JANUARY 2012

Fig. 2. FRPN model of the interaction event “buying a ticket.” The detailed definitions of propositions are given in Table III. (a) Predicting user’s cognitive decision over choosing a TM, (b) predicting user’s affective state.

is long) ⇒ (d2 = TM1 ). The truth degrees of propositions can be predefined to create different contexts in the product ecosystem. The confidence of a rule “X ⇒ Y ” is defined as conf(X ⇒ Y ) = supp(X ∪ Y )/supp(X) and supp(∗) is the proportion of item “∗” in the database. Although (1) and (2) take two different forms, they express the same rule. In order to mine rules, the rough set software system (RSES 2.2.2) is applied [35]. With the decision table (e.g., Table II) as input, and the refining measure of minimum support at 0.6, it generates 34 rules for “buying a ticket” shown in Table III. Therefore, concerning all the interaction events, the fuzzy production rules can be mined using the similar method mentioned earlier. VI. UX FRPN M ODEL A. FRPN Definition The PN’s graphical nature allows users to visualize the structure of a rule-based system, and the mathematical foundation makes it possible to analyze the dynamic behavior of the system in algebraic forms [36]. Towards this end, FRPN is applied, which is defined as an eight-tuple [37]: For 1 ≤ i ≤ n, 1 ≤ j ≤ m below, FRPN = (P, R, I, H, O, θ, γ, C), where 1) P = {p1 , p2 , . . . , pn } is a finite set of n propositions, called places, describing element status of the product ecosystem. 2) R = {r1 , r2 , . . . , rm } is a finite set of m rules, called transitions, representing the fuzzy rule base. 3) I : P × R → {0, 1} is an n × m input matrix defining the noncomplementary directed arcs from nonnegation propositions to rules; I(pi , rj ) = 1, if there is a directed arc from pi to rj and I(pi , rj ) = 0, otherwise. 4) H : P × R → {0, 1} is an n × m matrix defining the complementary directed arcs from negation propositions

to rules; H(pi , rj ) = 1, if there is a complementary arc directed from pi to rj and H(pi , rj ) = 0, otherwise. 5) O : R × P → {0, 1} is an n × m output matrix defining the directed arcs from rules to propositions; O(rj , pi ) = 1, if there is a directed arc from rj to pi and O(rj , pi ) = 0, otherwise. 6) θ is a truth degree vector, θ = (θ1 , θ2 , . . . , θn )T , where θi denotes the truth degree of pi . 7) γ : P → {0, 1} is a marking vector, γ= (γ1 , γ2 , . . . , γn )T . γi = 1, if there is a token in pi and γi = 0, otherwise. 8) C = diag{c1 , . . . , cm }, where cj is the confidence of rj , and c1 , . . . , cm is the main diagonal of the confidence matrix C. As an example, the fuzzy production rules for the interaction event “buying a ticket” in Table III have been converted into the FRPN model as shown in Fig. 2. Note that complementary and noncomplementary arcs are represented by a directed arc terminated with a small circle and a small triangle, respectively. In Fig. 2, “|” denotes transitions to which both complementary and noncomplementary arcs are directing while “ ” denotes transitions to which no complementary arcs are directing. For instance, rule 1: (P1 )&(¬P2 )&(P3 ) ⇒ (P9 ) in Table III is represented as two noncomplementary arcs from P1 and P3 and one complementary arc from P2 directed to transition R1 and then an arc from R1 to P9 ; the associated truth degree is omitted. B. Rule Execution Mechanism For any rule rj ∈ R, it is enabled if and only if all the input propositions of rj are marked. Unlike ordinary PNs, the number of token cannot be larger than one, and the token is not removed from the input places of a transition after firing in FRPNs [37].

ZHOU et al.: UX MODELING AND SIMULATION FOR PRODUCT ECOSYSTEM DESIGN BASED ON FRPNS

All enabled rules can be fired in parallel. Assume the initial marking and truth degree vector are γ 0 and θ0 , respectively, the new marking γ 1 and truth degree vector θ1 are calculated as follows after firing all the enabled rules once: γ 1 = γ 0 ⊕ [O ⊗ µ0 ]

207

TABLE IV A SSUMPTIONS IN THE S IMULATION

(3)

T

where µ0 = (µ01 , µ02 , . . . , µ0m ) is a firing vector such that µ0j = 1 if rj fires; ⊕ and ⊗ are max algebra operators [38] [Am×n ⊕Bm×n ]ij = aij ⊕bij = max(aij , bij ); [An×k ⊗Bk×m ]il = ⊕kj=1 (aij ⊗bij ) = max (aij +bjl ). j∈{1,2,...,k}   (4) θ1 = θ0 ⊕ (O · C) ⊗ ρ0 T

where ρ0 = (ρ01 , ρ02 , . . . , ρ0m ) is a control vector; and ρ0j = {xi |xi = θi , if I(pi , rj ) = 1; xi = 1 − θi , if H(pi , rj ) = min •

pi ∈ rj

1}, where • rj represents all the input propositions of rule rj and rj = {pj |I(pi , rj ) = 1 or H(pi , rj ) = 1}. For the illustration purpose, the first 15 rules in Table III are taken as example: P = {P1 , P2 , P3 , P8 , P9 , P10 }, R = {r1 , . . . , r15 }. Based on the FRPN definition, the output matrix O is given as   0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0   0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 O = ; 0 0 0 1 0 1 0 0 0 1 0 1 0 0 1   1 1 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 1 0 1 0 0 1 1 0 0 0 0 1 0 •

confidence matrix C = diag([1 0.5 0.5 0.5 0.5 0.33 0.33 0.33 1 0.5 0.5 0.33 0.33 0.33 1]); the initial marking γ 0 = [1 1 1 0 0 0]T and assume the initial truth degree vector θ0 = [0.4 0.9 0.8 0 0 0]T . As shown in Fig. 2(a), all the fifteen rules are enabled. Hence, µ0 = [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]T and ρ0 = [0.1 0.1 0.1 0.2 0.2 0.1 0.1 0.1 0.2 0.1 0.1 0.4 0.4 0.4 0.6]T . Based on (3) and (4), γ 1 = [1 1 1 1 1 1]T and θ1 = [0.4 0.9 0.8 0.6 0.132 0.2]T . If the marking is updated again, we find that γ 2 = γ 1 and θ2 = θ1 . Therefore, the update is halted. This example demonstrates that the user will most probably (θ14 = 0.6 as the highest truth degree) use TM1 which has the shortest queue (θ01 = 0.4). C. Reasoning Algorithm According to the FRPN model and the mechanism of rule execution, it is possible to predict users’ behavior and affective states from the initial marking and truth degrees of the input propositions. In this regard, a reasoning algorithm of FRPNs is proposed [36], [37]. First, a neg operator is introduced [39] so that neg θ0 = θ0 = 1m − θ0 , where 1m = [1, 1, . . . , 1], for example. With the input of I, H, O, C, the initial truth degree vector θ0 , and the initial marking vector γ 0 , for the kth reasoning step, µk and ρk can be calculated as follows:   µk = neg (I + H)T ⊗ γ k = (I + H)T ⊗ γ k (5)     ρk = I T ⊗ (γ k ⊕ θk ) ⊕ H T ⊗ (γ k ⊕ θk ) . (6)

Then, based on (3), (4), (5), and (6), θk+1 and γ k+1 are calculated as follows:   (7) θk+1 = θk ⊕ (O · C) ⊗ ρk γ k+1 = γ k ⊕ [O ⊗ µk ].

(8)

When γ k+1 = γ k or θk+1 = θk , the loop stops, and θk+1 is the final output. Consider the example in Fig. 2, where P = {P1 , P2 , . . . , P13 }, R = {r1 , r2 , . . . , r34 }, matrices I, O, and H can be obtained based on the definition of FRPNs; C = diag([1 0.5 0.5 0.5 0.5 0.33 0.33 0.33 1 0.5 0.5 0.33 0.33 0.33 1 1 1 1 1 1 1 0.87 0.91 0.91 1 0.82 1 1 0.82 0.91 0.90 0.88 0.92 0.90]) according to the rules in Table III; the initial marking vector, γ 0 = [1 1 1 1 1 1 1 0 0 0 0 0 0]T ; suppose the initial truth degree vector, θ0 = [0.4 0.9 0.8 0.2 0.2 0.4 0.6 0 0 0 0 0 0]T . According to the reasoning algorithm, the final θk+1 = [0.4 0.9 0.8 0.2 0.2 0.4 0.6 0.6 0.132 0.2 0.184 0.4 0.522]T , when k = 2. With the highest truth degree of propositions among P8 , P9 , and P10 and among P11 , P12 , an P13 as criteria, respectively, the user’s choice over “buying a ticket” is most likely TM1 with the probability of 0.6, and the user’s affective state is most likely unpleasant with the probability of 0.522. Likewise, all other interaction events can also be modeled using FRPNs, and users’ cognitive decisions and affective states can be obtained with the reasoning algorithm. According to Gao et al. [37], the reasoning algorithm for the acyclic FRPN

208

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 42, NO. 1, JANUARY 2012

Fig. 3. PN simulation of the subway station ecosystem.

terminates at the (h + 1)th step for the worst case, where h is the number of transitions in the longest place-transition direct path. Hence, the computational complexity in the worst case is O(mnh), where m and n are the numbers of transitions and places, respectively. Thus, both users’ affective states and cognitive decisions can be predicted efficiently as the values of h, m, and n are small in this paper. VII. UX S IMULATION With regard to all the interaction events that one user experiences, the complete UX can be constructed by connecting individual FRPNs that model the individual interaction events accordingly with appropriate sequencing and branching relations. The UX thus formed is characterized by a series of interaction events with a number of what-if scenarios. Since it is difficult to validate decisions on the product ecosystem using analytical methods, simulation methods are appropriate. In this paper, the design of the subway station ecosystem involves determining the product attributes and the service processes that contribute to positive UX. Among the design elements that constitute the subway station ecosystem in Table I, assumptions are made for the simulation, as shown in Table IV. Moreover, the chain of interaction events in accordance with the UX is partially controllable, i.e., some interaction events are indispensible while others are optional in view of the business process and user needs. Additional constraints are exerted on diverse relationships among users, the ambience, and the cause-and-effect relations between design elements and UX. With these considerations, the objectives of subway ecosystem design are to: 1) optimize UX and to 2) reduce the operational cost by maximizing the utilization of the ecosystem capacities. With the aforementioned objectives, a PN simulation based on the software SIMUL8 (www.simul8.com) has been conducted. Fig. 3 shows the corresponding PN structures, where the FRPN model is codified using the Visual Logic programming in SIMUL8. The simulation is run based on the assump-

tions made in Table IV, which can be preset with SIMUL8. In order to deal with the fuzzy variables, membership functions are applied to produce truth degrees of fuzzy variables in Table I, such as queue length, ticket prices, crowdedness of shops, and so on. During the simulation, the performance of the subway ecosystem is measured by system capacity utilization and UX. System capacity utilization is computed from the average utilization of the major design elements, including TMs, gates, service persons, escalators, lifts, staircases, toilets, information boards, shops, and restaurants. In the simulation environment, all these performance factors can be monitored and computed automatically. UX is measured with users’ affective states and cognitive decisions, which are indirectly evaluated with user stay-time (or processing time) due to cognitive decisions made in the subway station. Assume that these two parts have the same weight; and thus, UX is the mean of them. Furthermore, they are quantified between −1 and 1 with an initial value of 0, and the larger the value, the more positive UX. An individual user’s affective state at a particular time is defined as: AFi = AF × θi , where AF = 1 if affective state is pleasant, AF = 0 if affective state neutral, and AF = −1 if affective state unpleasant, and θi is the truth degree associated with the affective state. Thus, the average affective state of an individual user T during his/her stay-time Ti is given as AFi = 1/Ti 0 i AFi dt. Then, the AUX of M users in the product ecosystem is given as AUX =

M 1 AFi . M i=1

(9)

And the cognitive UX (CUX) according to the user stay-time is given as CUX =

M 1 (1 − Ti /Texp ) M i=1

(10)

where Texp is the expected user stay-time that is assumed to be no perceptible influence on UX. Assuming that 0 ≤ Ti ≤

ZHOU et al.: UX MODELING AND SIMULATION FOR PRODUCT ECOSYSTEM DESIGN BASED ON FRPNS

Fig. 4.

209

Aggregated UX of 6000 users evolving during simulation.

Fig. 6. Influence of individual design elements on UX.

Fig. 5. Capacity utilization of major design elements during simulation: (a) high-capacity utilization; (b) low-capacity utilization. ES: escalator, LT: lift, TR: train, WG: wide gate, G1: gate 1, G2: gate 2, REST: restaurant, TM1: ticketing machine 1, TM2: ticketing machine 2, TM3: ticketing machine 3, SC: staircase, IB: information board, SP: service person. (a) Design elements with high-capacity utilization; (b) Design elements with low-capacity utilization.

2Texp , −1 ≤ CUX ≤ 1. Then, UX is the aggregated sum of AUX and CUX with the same weight, i.e., UX = (AUX + CUX)/2.

(11)

VIII. R ESULTS The simulation was conducted with regard to the ecosystem configuration specified in Table I. The performance is evaluated against UX and system capacity utilization. The simulation results bring about useful guidelines for the subway ecosystem design in the following aspects. Fig. 4 shows the aggregated UX evolving along the simulation time during which about 6000 users utilized the subway ecosystem. Overall, the ecosystem produced an average negative UX of −0.013 (std = 0.024), indicating an unacceptable service process. Further, capacity utilization of major design elements in the ecosystem was also calculated as shown in Fig. 5. Note that the capacity utilization of trains, restaurants, shops, and toilets is defined as the current numbers of users divided by their amounts to hold people, while the capacity utilization of

other products is defined as the busy time divided by the total simulation time. As seen from Fig. 5(a), escalator, lift, trains, wide gate, gates 1 and 2, and restaurant are characterized by high utilization rates. However, the design elements in Fig. 5(b), including three TMs, staircase, information boards, and service persons, show relatively low utilization rates. Although a higher average utilization rate indicates a lower unit cost per service process, it suggests a possible bottleneck of the service process, provided that UX is significantly reduced by the product. In addition, the general influence of individual design elements on the average UX of all the users is also shown in Fig. 6. While service persons, shops, and toilet lead to positive UX substantially, TMs, two gates, escalator, trains, and restaurants lead to UX deterioration. From the design perspective, it is imperative to examine the causes of the negative UX. Considering the high utilization rates of escalator, lift, trains, gates 1 and 2, as well as restaurants, they seem to be designed with insufficient capacities with regard to the subway ecosystem configuration in Table I. However, it seems implausible that low utilization rates of TMs also give rise to negative UX. Due to the reasoning algorithm of the FRPN model, it is able to trace the rules that were fired during the simulation. Table V shows the rules of major interaction events that cause negative AUX and the percentages of the users invovled for each rule. Consistent with the fact that TMs leading to negative UX in Fig. 6, 53.8% of the users (by r22 , r23 , and r24 ) reported negative AUX when “buying a ticket.” Thus, the major causes are the difficult operation (i.e., ¬P4 in r22 , r23 , and r24 ), the long queues of TM1 (i.e., P1 in r23 and r24 ) and TM3 (i.e., P3 in r23 ), and the high ticket price (i.e., P7 in r22 and r24 ). As for “entering the gate,” 47.8% of the users reported negative AUX mainly caused by long queues of gates 1 and 2 and the wide gate (i.e., P1 , P2 , and P3 in r44 ). The major antecedents that result in negative AUX of “going to the platform” are the crowded escalator (i.e., P1 in r14 and r19 ), the crowded lift (i.e., P2 in r14 ), luggage with the users and the low light intensity in the surrounding area (i.e., both P7 and P9 in r19 ). The eighth rule most frequently fired in the interaction event, “boarding the train” links to negative affective UX, which is due to long waiting time (i.e., P1 ), no seats available (i.e., ¬P2 ), and the high noise level (i.e., P5 ).

210

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 42, NO. 1, JANUARY 2012

TABLE V S UMMARY OF THE RULES L EADING TO N EGATIVE AUX

During the interaction event, “going to the restaurants,” 40.2% of the users complained a long waiting time (i.e., P1 in r6 ) and a high level of noise (i.e., P4 in r6 ) that give rise to negative affective UX. IX. D ISCUSSIONS 1) Integration of Affect and Cognition: The notion of product ecosystem integrates multiple relevant products and services into a coherent model. Thus, interdependent products and ambient factors codetermine dynamic UX in terms of both the affective and cognitive dimensions along the series of interaction events. In a holistic fashion, it facilitates the integration of affect and cognition and their interactions in one design paradigm. This is consistent with recent findings that a unity and interrelatedness of the cognitive and affective processes is needed [40]. Moreover, UX dynamically evolves along the progression of interaction events. It is important to capture UX within a legitimately valid time horizon so that the relationship between UX constructs and design elements can most likely approximate the fact. 2) Human–Object–Ambience Interaction: Consideration of the ambience where users’ behavior is contextualized is generally helpful to achieve reliable UX prediction with a relatively high fidelity. It involves interactions not only between the user and a product (object), but also with the ambience, namely, a form of human–object–ambience interactions. Compared with traditional product design for UX, it expands the scope and facilitate UX design both for users and the producer. For users, it is possible to better reflect the causal relationships between UX and design elements in the product eocsystem. For the producer, relevant products and services are designed at a system level that can avoid the design pitfalls where overly narrow optimization of one aspect of UX can be made invalid by a broader context of the design problem [41]. 3) Complexity, Dynamics, and Uncertainty: For product ecosystem design, it would be often difficult, if not impossible,

to find out a comprehensive and unambiguous list of design elements that can significantly influence UX in a broader scope. In addition, since UX is influenced by many factors inside the product ecosystem, and UX data are collected by selfreports, UX modeling exhibits high complexity, dynamics, and uncertainty. In order to tackle the complexity issue, various system design elements, including products, ambient factors, and users, as well as their interactive relationships, should be represented in a systematic form. It is reasonable to model UX as dynamic evolution along individual series of interaction events, in which system elements and their interactive relationships can be specified in a relatively simple way. Although it is not possible to record UX all the time, it is often adequate to sample dynamic UX at the interval of individual interaction events. With regard to the uncertainty issue, the rough set technique helps generate fuzzy production rules that can be used to represent vague concepts. In order to effectively predict UX in different contexts of a product ecosystem, the FRPN reasoning algorithm can handle the complexity of decision making with multicriteria. It allows one to exploit the maximum parallel reasoning capacity so that UX with the largest possibility can be prognosticated [36]. By linking (nesting) individual FRPNs corresponding to individual interaction events under various circumstances, UX can be aggregated to explore the whole product ecosystem. 4) Simulation: Simulation is used to predict the performance of a predefined subway station ecosystem in terms of two measures, i.e., UX and capacity utilization of individual products. The outcome of simulation analysis can provide designers with decision support to improve UX, enhance the operational efficiency of the product ecosystem, reduce costs, and finally optimize the product ecosystem configuration. For example, it is obvious that the operation of “buying a ticket” is not easy; two gates and the escalator are not adequate to accommodate the current traffic flow; there should be more restaurants; the interval of trains should also be shortened. In this sense, for product ecosystem design, it is easy and

ZHOU et al.: UX MODELING AND SIMULATION FOR PRODUCT ECOSYSTEM DESIGN BASED ON FRPNS

cost-effective to obtain redesign solutions by applying simulation for what-if analysis. 5) Limitations and Future Work: As an exploratory study, the approach proposed in this paper also suffers a few limitations, which deserve further investigation. First, this paper does not support designers to optimize the ecosystem configuration with respect to the identified system objectives. Therefore, it is desirable to develop methods that can optimize product ecosystem design problem in the future. Such a process might be automated by applying artificial intelligence for configuration design, such as genetic algorithms and constraint satisfaction techniques. Second, UX modeling based on FRPNs requires profound understanding of human behavior, product information, as well as user affective states and cognitive processes. However, much knowledge related to the FRPN model is based on assumptions. Logical construction of UX necessitates more accurate replication of human affect and cognition in various scenarios. Third, the nonlinear nature of the FRPN model makes it difficult to apply analytical methods to study the system behavior. Further work may be directed to study of UX modeling using other possible methods with more flexibility and learning capabilities, such as machine learning PNs [42]. Moreover, it is meaningful to expand the FRPN model to large scale applications with rigorous experimentation and replicable methodologies, such as airport terminals, the iPhone ecosystem, shopping malls, and the like.

X. C ONCLUSION UX suggests itself to be fundamental to product ecosystem design. In this paper, multiple users and relevant system design elements have been formulated within a coherent framework. Two important aspects of UX, i.e., the cognitive and affective dimensions, are used to measure UX. Unlike traditional methods that approximate the relationships between different constructs of UX by statistical techniques (e.g., regression), the FRPN model exploits fuzzy production rules and the efficient reasoning algorithm for knowledge-based multicriteria decision making. It captures the causal relationships between UX and design elements so as to provide design decision support. The simulation results based on the subway station ecosystem design show the feasibility and potential of the FRPN modeling. The case study also underscores the importance and prospect of UX modeling in product ecosystem design. R EFERENCES [1] J. Nielsen, Usability ROI Declining, But Still Strong, Jan. 22, 2008. [Online]. Available: www.useit.com/alertbox/roi.html [2] E. L. C. Law and P. van Schaik, “Modelling user experience—An agenda for research and practice,” Interact. Comput., vol. 22, no. 5, pp. 313–322, 2010. [3] P. Hekkert and H. N. J. Schifferstein, “Introducing product experience,” in Product Experience, P. Hekkert and H. N. J. Schifferstein, Eds. New York: Elsevier, 2008, pp. 1–8. [4] J. Jiao, Q. Xu, J. Du, Y. Zhang, M. Helander, H. M. Khalid, P. Helo, and C. Ni, “Analytical affective design with ambient intelligence for mass customization and personalization,” Int. J. Flexible Manuf. Syst., vol. 19, no. 4, pp. 570–595, Dec. 2007. [5] B. J. Pine and J. H. Gilmore, The Experience Economy. Boston, MA: Harvard Business School Press, 1999.

211

[6] M. M. Tseng, R. J. Jiao, and C. Wang, “Design for mass customization and personalization,” CIRP Annals, vol. 59, no. 1, pp. 175–178, 2010. [7] E. L. C. Law, V. Roto, M. Hassenzahl, A. P. O. S. Vermeeren, and J. Kort, “Understanding, scoping and defining user experience: A survey approach,” in Proc. 27th Int. Conf. Human Factors in Computing Systems, 2009, pp. 719–728. [8] M. Hassenzahl and N. Tractinsky, “User experience—A research agenda,” Behav. Inf. Technol., vol. 25, no. 2, pp. 91–97, Mar./Apr. 2006. [9] M. Hassenzahl, “The interplay of beauty, goodness and usability in interactive products,” Human Comput. Interact., vol. 19, no. 4, pp. 319–349, Dec. 2004. [10] J. R. Edwards and R. Bagozzi, “On the nature and direction of relationships between constructs and measures,” Psychological Methods, vol. 5, no. 2, pp. 155–174, Jun. 2000. [11] E. Karapanos, J. Zimmerman, J. Forlizzi, and J.-B. Martens, “Measuring the dynamics of remembered experience over time,” Interact. Comput., vol. 22, no. 5, pp. 328–335, Sep. 2010. [12] E. H. Melan, Process Management—Methods for Improving Products and Services. New York: McGraw-Hill, 1999. [13] F. Zhou, Q. Xu, and R. Jiao, “Fundamentals of product ecosystem design for user experience,” Res. Eng. Des., vol. 22, no. 1, pp. 43–61, Jan. 2010. [14] Y. Tang, M. Zhou, and M. Gao, “Fuzzy-Petri-net-based disassembly planning considering human factors,” IEEE Trans. Syst., Man, Cybern. A, Syst., Humans, vol. 36, no. 4, pp. 718–726, Jul. 2006. [15] W. Pedrycz and F. Gomide, “A generalized fuzzy Petri net model,” IEEE Trans. Fuzzy Syst., vol. 2, no. 4, pp. 295–301, Nov. 1994. [16] T. Murata, “Petri nets: Properties, analysis and applications,” Proc. IEEE, vol. 77, no. 4, pp. 541–580, Apr. 1989. [17] E. M. Roth, “Uncovering the requirements of cognitive work,” Human Factors, vol. 50, no. 3, pp. 475–480, Jun. 2008. [18] C. M. Johnson and J. P. Turley, “The significance of cognitive modeling in building healthcare interfaces,” Int. J. Med. Inf., vol. 75, no. 2, pp. 163– 172, Feb. 2006. [19] J. W. Coffey and M. J. Carnot, “Graphical depictions for knowledge generation and sharing,” in Proc. Int. Conf. Inf. Know. Sharing, Scottsdale, AZ, 2003, pp. 18–23. [20] M. Hassenzahl, S. Diefenbach, and A. Göritz, “Needs, affect, and interactive products—Facets of user experience,” Interact. Comput., vol. 22, no. 5, pp. 353–362, Sep. 2010. [21] J. Jiao, Y. Zhang, and M. G. Helander, “A Kansei mining system for affective design,” Expert Syst. Appl., vol. 30, no. 4, pp. 658–673, May 2006. [22] J. D. Morris, C. Woo, J. A. Geason, and J. Kim, “The power of affect: Predicting intention,” J. Advertising Res., vol. 42, no. 3, pp. 7–17, May/Jun. 2002. [23] C. L. Lisetti and F. Nasoz, “MAUI: A multimodal affective user interface,” in Proc.10th ACM Multimedia, Juan-les-Pins, France, 2002, pp. 161–170. [24] M. G. Helander and H. M. Khalid, “Affective and pleasurable design,” in Handbook of Human Factors and Ergonomics, G. Salvendy, Ed., 3rd ed. New York: Wiley-Interscience, 2006. [25] P. van Schaik and J. Ling, “Modelling user experience with web sites: Usability, hedonic value, beauty and goodness,” Interact. Comput., vol. 20, no. 3, pp. 419–432, May 2008. [26] D. Cyr, M. Head, and A. Ivanov, “Design aesthetics leading to m-loyalty in mobile commerce,” Inf. Manage., vol. 43, no. 8, pp. 950–963, Dec. 2006. [27] P. Zhang, N. Li, and H. Sun, “Affective quality and cognitive absorption: Extending technology acceptance research,” in Proc. 39th Annu. Hawaii Int. Conf. Syst. Sci., 2006, vol. 8, pp. 207–217. [28] P. W. Jordan, Designing Pleasurable Products: An Introduction to the New Human Factors. London, U.K.: Taylor & Francis, 2000. [29] A. Kapoor and E. Horvitz, “Experience sampling for building predictive user models: A comparative study,” in Proc. 26th Annu. SIGCHI Conf. Human Factors Comput. Syst., 2008, pp. 657–666. [30] J. A. Russell, “Core affect and the psychological construction of emotion,” Psychol. Rev., vol. 110, no. 1, pp. 145–172, Jan. 2003. [31] M. H. Ashcraft, “Math anxiety: Personal, educational, and cognitive consequences,” Current Directions Psychol. Sci., vol. 11, no. 5, pp. 181–185, Oct. 2002. [32] S. K. Nair, L. S. Thakur, and K. Wen, “Near optimal solutions for product line design and selection: Beam search heuristics,” Manage. Sci., vol. 41, no. 5, pp. 767–785, May 1995. [33] C. V. Negoita, Expert Systems and Fuzzy Systems. Reading, MA: Benjamin Cummings, 1985. [34] Z. Pawlak, Rough Sets: Theoretical Aspects of Reasoning About Data. Boston, MA: Kluwer, 1991.

212

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 42, NO. 1, JANUARY 2012

[35] J. G. Bazan and M. Szczuka, “The rough set exploration system,” in Transactions on Rough Sets III, J. F. Peters and A. Skowron, Eds. Berlin, Germany: Springer-Verlag, 2005, pp. 37–56. [36] M. Gao, M. Zhou, and Y. Tang, “Intelligent decision making in disassembly process based on fuzzy reasoning Petri nets,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 34, no. 5, pp. 2029–2034, Oct. 2004. [37] M. Gao, M. Zhou, X. Huang, and Z. Wu, “Fuzzy reasoning Petri nets,” IEEE Trans. Syst., Man, Cybern. A, Syst., Humans, vol. 33, no. 3, pp. 314– 324, May 2003. [38] R. A. Cuninghame-Green and P. Butkovic, “Bases in max-algebra,” Linear Algebra Appl., vol. 389, pp. 107–120, Sep. 2004. ˇ [39] S. G. Tzafestas and F. Capkoviˇ c, “Petri net-based approach to synthesis of intelligent control systems for DEDS,” in Computer Assisted Management and Control of Manufacturing Systems, S. G. Tzafestas, Ed. New York: Springer-Verlag, 1997, pp. 325–351. [40] J. Storbeck and G. L. Clore, “On the interdependence of cognition and emotion,” Cogn. Emotion, vol. 21, no. 6, pp. 1212–1237, Sep. 2007. [41] T. T. Hewett, ACM SIGCHI Curricula for Human-Computer Interaction. New York: ACM Press, 1992. [42] V. Shen, Y. Chang, and T. T. Y. Juang, “Supervised and unsupervised learning by using Petri nets,” IEEE Trans. Syst., Man, Cybern. A, Syst., Humans, vol. 40, no. 2, pp. 363–375, Mar. 2010.

Feng Zhou was born in Hangzhou, China, in 1982. He received the Bachelor degree in electronic engineering from Ningbo University, Ningbo, China, in 2005 and the M.S. degree in computer engineering from Zhejiang University, Hangzhou, China, in 2007 and is currently working toward the Ph.D. degree in mechanical engineering at the Georgia Institute of Technology, Atlanta. He is a Student Member of ASME. His main research interests include affective product design, product ecosystem design, and human–computer interaction.

Roger J. Jiao received the Ph.D. degree in industrial engineering from the Hong Kong University of Science and Technology, Kowloon, Hong Kong, in 1998 and the Bachelor degree in mechanical engineering and Master degree in manufacturing engineering from the Tianjin University of Science and Technology, Tianjin, China. He is an Associate Professor of Design and Manufacturing Systems Engineering with the G.W. Woodruff School of Mechanical Engineering at the Georgia Institute of Technology (Georgia Tech), Atlanta. Prior to joining Georgia Tech, he has worked as an Assistant Professor and Associate Professor with the School of Mechanical and Aerospace Engineering at Nanyang Technological University, Singapore. His research involves engineering design, manufacturing systems and logistics, and systems engineering.

Qianli Xu received the B.Eng. and M.Eng. degrees from Tianjin University, Tianjin, China, in 1999 and 2002, respectively, and the Ph.D. degree from the National University of Singapore, Singapore, in 2007. He is a Research Fellow with the Institute for Infocomm Research, Agency for Science, Technology, and Research, Singapore. His major areas of interest include product–service ecosystem design, design reuse, and intelligent products and manufacturing systems. His publications appear in Design Studies, CIRP, Journal of Mechanical Design, Research in Engineering Design, International Journal of Production Research, etc.

Koji Takahashi received the Bachelor, Master, and Doctor degrees in control engineering from the Tokyo Institute of Technology, Tokyo, Japan, in 1980, 1982, and 1985, respectively. He has been an Associate Professor with the Department of Electrical and Electronic Engineering, Tokyo Institute of Technology, since 1991. Prior to that he was an Assistant Professor with the Department of Control Engineering, Tokyo Institute of Technology. His main research areas include discrete event systems control, Petri net application, and hybrid control. Dr. Takahashi is a member of The Society of Instrument and Control Engineers, Japan, The Institute of Electronics, Information, and Communication Engineers, Japan, The Institute of Electrical Engineers of Japan, and The Robotics Society of Japan.

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 41, NO. 2, MARCH 2011

179

A Case-Driven Ambient Intelligence System for Elderly in-Home Assistance Applications Feng Zhou, Member, IEEE, Jianxin (Roger) Jiao, Member, IEEE, Songlin Chen, Member, IEEE, and Daqing Zhang

Abstract—Elderly in-home assistance (EHA) has traditionally been tackled by human caregivers to equip the elderly with homecare assistance in their daily living. The emerging ambience intelligence (AmI) technology suggests itself to be of great potential for EHA applications, owing to its effectiveness in building a contextaware environment that is sensitive and responsive to the presence of humans. This paper presents a case-driven AmI (C-AmI) system, aiming to sense, predict, reason, and act in response to the elderly activities of daily living (ADLs) at home. The C-AmI system architecture is developed by synthesizing various sensors, activity recognition, case-based reasoning, along with EHA-customized knowledge, within a coherent framework. An EHA information model is formulated through the activity recognition, case comprehension, and assistive action layers. The rough set theory is applied to model ADLs based on the sensor platform embedded in a smart home. Assistive actions are fulfilled with reference to a priori case solutions and implemented within the AmI system through human–object–environment interactions. Initial findings indicate the potential of C-AmI for enhancing context awareness of EHA applications. Index Terms—Ambient intelligence (AmI), case-based reasoning, context awareness, elderly assisted living.

I. INTRODUCTION EALTHCARE systems, in particular for the elderly, have attracted enormous attention worldwide [1]. The National Association of State Units on Aging has reported an urgent need for elderly in-home assistance (EHA), whereby the elderly suffering from various cognitive impairments are equipped with homecare assistance in their daily living. Due to high cost of institutional living, it is imperative for social security and healthcare systems to take advantage of the prevailing assistive technologies [2], such as wireless sensor networks (WSNs), human– computer interaction (HCI), and artificial intelligence (AI). The potential of assistive technologies for EHA applications manifests itself through a smart system design that is capable of

H

Manuscript received June 8, 2009; revised January 16, 2010 and May 19, 2010; accepted June 3, 2010. Date of publication July 8, 2010; date of current version February 16, 2011. This paper was recommended by Associate Editor M.-H. Lim. F. Zhou and S. Chen are with the School of Mechanical and Aerospace Engineering, Nanyang Technological University, NA 639798, Singapore. J. (Roger) Jiao is with the G. W. Woodruff School of Mechanical Engineering, Georgia Institute of Technology, Atlanta, GA 30332 USA (e-mail: [email protected]). D. Zhang is with the Networks and Telecommunication Services Department, Institut Telecom SudParis, Paris 91011, France. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TSMCC.2010.2052456

detecting those undesirable situations that are building up, like a hazard or security threat [3]. An important aspect of EHA is to monitor the behavior of the elderly regarding their activities of daily living (ADLs) when they are not accompanied by caregivers. Such activities include self-care, work, homemaking, and leisure [4]. It is likely that elderly people tend to exhibit certain symptoms like memory loss, and as a result may forget to turn off power after cooking, e.g. It is desirable for the system to act upon the ongoing EHA situation based on activity recognition to remind users to turn off power if the system has detected any anomaly as compared with the normal user activity patterns. However, successful operations of such systems rely not only on hardware and software infrastructures, but also on soft computing techniques, as well as integration of sensing, predicting, reasoning, and acting in a coherent and systematic fashion [5]. In this regard, ambient intelligence (AmI), which leverages pervasive computing, WSNs, HCI, and AI, has emerged as a promising platform and attracted much attention [6], [7]. The key feature of AmI is context awareness, i.e., to enable systems to understand user needs and situational contexts so as to provide personalized services by tailoring its reaction upon the environment and user needs proactively [3]. To make the user–system interaction more natural, the system is embedded in the environment and adaptive to user feedback. AmI lends itself to be a new paradigm of information and communication technologies, taking the integration provided by pervasive computing one step further to realize context awareness [8]. While AmI provides a powerful technological platform to meet users’ needs, the EHA solutions must be embedded into the unique decision-making scenarios of specific EHA applications. Moreover, a system model for coherent integration of various AmI components is imperative [9]. In this regard, this paper proposes a case-driven AmI (C-AmI) model so as to sense, predict, reason, and act in response to the elderly ADLs in a home environment. A C-AmI system architecture is developed by synthesizing various sensors, activity recognition, case-based reasoning (CBR), along with EHA customized knowledge, within a coherent framework. II. RELATED WORK Various sensors have been employed in EHA applications to increase mobility, safety, security, and self-care abilities of the elderly. Ambient sensors can be embedded in a smart home environment to collect data about lighting, heating, ventilation, humidity, etc. They are used to monitor users’ ADLs and to obtain general activity patterns based on spatial and temporal

1094-6977/$26.00 © 2010 IEEE

180

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 41, NO. 2, MARCH 2011

information processing. Should any anomaly, deviation, or hardship from the activity pattern be detected, assistive messages will be automatically delivered to the user, the environment, and/or caregivers. Miskelly [10] gives an overview of the devices embedded in a home environment for EHA, such as electronic sensors, fall detectors, door monitors, bed alerts, pressure mats, smoke, and heat alarms, to name but a few. More recent work on ambient sensors involves cameras [11], and lighting, gas, and humidity sensors [12]. While supporting natural interactions with users, ambient sensors often perform too ambiguously to differentiate detailed information concerning specific users [13]. In this regard, wearable sensors can provide more detailed and user-specific information. These sensors often form a body sensor network and provide a platform to establish a pervasive health monitoring system [14]. Wearable sensors include ECG sensors for measuring heart rhythms [15], accelerometers for measuring human motion [16], and microphones for recording sounds [17], etc. While these signals can indicate the trends or symptoms of certain diseases, they can hardly offer information about users’ behavior or locations. For example, accelerometers can provide certain motion information, but are unable to tell whether the user is standing or sitting [13]. Hence, it is preferable to combine ambient and wearable sensors to provide rich contextual information for EHA services. In the field of pervasive computing, various sensors, including RF identification (RFID), cameras, and motion sensors, are available for activity recognition [18]. Activity recognition usually involves collecting a sequence of observations and training appropriate activity models for interpreting new activities. Recognition models can be probabilistic based or logic based. The former models are popular owing to its ability to handle noise and uncertainty in sensor readings, using static or temporal classification methods. Typical static classification methods include na¨ıve Bayes [19], decision trees [19], [20], and k-nearest neighbor [20], etc. In temporal classification methods, statespace models are used to infer hidden states (i.e., activities) based on observations, such as hidden Markov model [21], dynamic Bayesian network [22], and conditional random field [23]. Data mining methods have been reported to excel in activity recognition. The gist of these methods is to treat activity recognition as a pattern-based classification problem. For instance, Gu et al. [24] employ emerging patterns to recognize activities. Contextual information, such as location and time, also plays an important role in addressing personalized EHA. The value of such information, however, can only be realized when it is modeled and interpreted appropriately [22]. Dey et al. [25] describe a generic toolkit to build context-aware applications. Winograd [26] presents a blackboard model with a data-centric method, using a pattern matching mechanism. However, these models lack formality and may encounter difficulty in capturing rich contextual information, particularly regarding ambiguous contexts. To address these drawbacks, Henricksen et al. [27] propose an object-based model to tackle variations in information quality and the temporal aspect of contexts. An activity theory has also been applied to model contexts from a sociotechnical perspective [28].

Fig. 1.

Levels of abstraction in EHA.

AI techniques have attracted much attention as a powerful tool for understanding complex and dynamic contexts and offering intelligence and quality judgment [6]. For instance, spatiotemporal reasoning has been applied in smart home environments to tackle emergency [29]. Nonetheless, AI-based methods are primarily built upon rule-based reasoning, which might be insufficient for EHA. For complex systems and decision making, the number of rules can be overwhelming, leading to inconsistency and conflict among rules [30]. One might argue that logicbased systems, such as inductive logic programming (ILP) [31] with background knowledge, should be suitable for AmI applications if employing appropriate agent architectures, such as the Knowledge-Goals-Plan model of agency [32]. Nonetheless, for EHA applications, the elderly users often expect explanations of a solution, instead of only descriptions of the solution. As such, rule-based ILP might contain “open-textured” (i.e., imprecise and not well defined) terms that hinder the interpretation of solutions for elderly users [33]. Therefore, a hybrid approach combining rules with other AI methods turns out to be desirable. One potential method is CBR, capable of offering explanations by describing solutions in the form of cases [34]. CBR has been introduced to identify and assess user cases and provide context-sensitive information for a mobile system [35]. Zimmermann [36] argues that CBR excels in identifying correct combinations of contextual information for situation reasoning and then taking appropriate actions. The challenge, however, lies in how to integrate CBR with WSNs, pervasive computing, and HCI within a coherent AmI environment for EHA applications. III. EHA INFORMATION MODEL The EHA has been envisioned to shift from the traditional emphasis on providing physical assistance by caregivers to health promotion and quality-of-life conservation through a contextaware AmI system embedded in a home environment [12]. To achieve AmI context awareness, enormous EHA information needs to be organized systematically in accordance with the EHA decision-making process. As shown in Fig. 1, EHA decisions imply a sophisticated information model, which entails a pyramid of abstraction from data to knowledge, involving four levels of decision making. The activity level is a physical layer that comprises all the hardware (including sensors) and interactions between users

ZHOU et al.: CASE-DRIVEN AMBIENT INTELLIGENCE SYSTEM FOR ELDERLY IN-HOME ASSISTANCE APPLICATIONS

and the environment. A context space exists corresponding to various activities that take place in relation to each particular EHA scenario, yet explicit contexts can hardly be inferred at this level. Based on these raw data, a specific EHA context can be identified at the context level. Activity recognition is fulfilled through diverse sensors embedded in the environment. The AmI system aggregates and interprets the collected activity-related sensor data into important contextual labels (e.g., John, aged 68, arthritis), based on which an EHA context model can be constructed. Then the AmI decision-making process further goes up to the case comprehension level, whereby concrete stories of an EHA scenario, namely, an EHA case, are articulated by a reasoning engine. Structured descriptions of each individual EHA case are stored in the knowledge base. The AmI system analyzes potential problems that the elderly are facing in the EHA case, by matching them with previous stored EHA patterns. These EHA cases must be knowledge intensive and capable of explaining the ongoing activities in the situation. At the assistive action level, the system suggests appropriate assistive actions based on similar cases retrieved from the case base. Each assistive action (e.g., a reminder through a personal digital assistant, PDA) is transited down to the activity level for acting upon the user and the environment. It is important that the entire process constitutes a closed-loop system such that it can be adaptive and sensitive to user feedback.

IV. CASE-DRIVEN AMBIENT INTELLIGENCE The complexity of EHA decision making manifests itself through such key technical challenges for achieving EHA context awareness as: 1) context identification—how to identify user needs through ADLs and other relevant contexts embodied in the activities; 2) context modeling-–how to acquire activityrelated data and describe, interpret, and organize such data as structured information; 3) case comprehension—how to reason adequately about the contextual information and articulate user needs and behavior; and 4) assistive action-–how to act appropriately to provide homecare assistance corresponding to the ongoing EHA situation. To deal with these challenges, we propose a C-AmI system, which performs as an intelligent system to sense, predict, reason, and act in response to the elderly ADLs at home. As a well-established AI paradigm, CBR lends itself to many advantages toward modeling and implementation of EHA context awareness. Cases are composed and represented from multiple sources of contextual information, and thus benefit from prior experience and make comprehensive knowledge easy to understand. In addition, each prior case implies an actual problemsolving episode of experience. Representative cases chosen after validation are conducive to producing clarification of causes and consequences of problems. Moreover, a C-AmI system can solve new problems based on solutions of similar past cases with a presumably extensive, multirelational model of general domain knowledge [30].

Fig. 2.

181

Case-driven AmI system architecture.

As illustrated in Fig. 2, a C-AmI system exemplifies a layered architecture comprising four steps, including context identification, context modeling, case comprehension, and assistive actions [37], [38]. While the context space represents the physical environment where ADLs of users take place, the mediation of the C-AmI system starts from context identification, encompassing sensing, perception, context middleware, and recognizing layers. The sensing layer is to capture the raw contextual data, using various sensors in the AmI environment (e.g., RFID tags and readers, motion, and environmental sensors). The perception layer is to transform a continuous sensor stream into discrete percepts of data in a multimodal form, e.g., a hexadecimal ID representing a tagged object. Percepts can be further processed by cognitive subsystems and constructed as a conceptual network of knowledge by the upper layers of the C-AmI system. The context middleware layer describes the hardware and software on the server, and converts the physical space, where heterogeneous contexts are acquired, into a semantic space. It facilitates the contexts to be easily accessed by context-aware services, including a context interpreter and aggregator [39]. The interpreter maps contextual information from a lower to a higher abstraction level. The aggregator gathers logic-related information and makes it available within a single component for the next layer [25]. The recognizing layer formulates activity models to predict key contextual information of ADLs by processing location, time, personal profiles, environmental, and other relevant information obtained from the context interpreter and aggregator. The next step is context modeling. It constitutes a representation layer, where context models are built for a rigorous

182

Fig. 3.

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 41, NO. 2, MARCH 2011

Layout of the STARhome with each testing case.

description and management of diverse contextual information. Following this step is case comprehension, which aims to understand specific EHA cases through the CBR engine at the reasoning layer. CBR entails a mapping mechanism from the problem space (new cases) to the solution space (the case base). The last step comprises three layers that perform as a problem solver to provide homecare assistance services. The assisting layer finds solutions of assistance actions for a specific EHA case. The controlling layer then executes commands by decomposing these solutions into single actions [37]. The acting layer performs physical actions upon the user and/or the environment, e.g., turning off the power switch of a coffee maker, or sending a message reminder via a mobile phone. It is noteworthy that the C-AmI architecture is structured as a closed-loop system, which is adaptive to the dynamic context, responsive to users, and consequently conducive to natural human–object–environment interactions.

Fig. 4. EHA testing cases. (a) Testing case 1. (b) Testing case 2. (c) Testing case 3. (d) Testing case 4.

V. TESTING CASES Four EHA cases have been composed to test the C-AmI system in a smart home environment, called STARhome (starhome.i2r.a-star.edu.sg). Fig. 3 shows the layout of the STARhome along with testing cases. These cases involve five different types of contexts: personal, task, social, spatiotemporal, and environmental contexts. Case 1 [see Fig. 4(a)] is assumed to be a new EHA case that seeks for smart solutions, while the other three cases, depicted in Fig. 4(b)–(d), are solved cases stored in the case base. To solve a new case, it is important to decide precise feedback for triggering the assistive action. In the case of taking medication or downloading videos, the system needs to know if the medicine has really been taken or the video has been successfully downloaded. This means that, even though CBR can make high-level decisions, the exact decision still needs to be fine tuned. Hence, a rule-based customized knowledge model is adopted for solution adaptation within the CBR framework. VI. CONTEXT IDENTIFICATION A. Sensor Platform for Data Acquisition Fig. 5 shows the configuration of a sensor platform for raw data collection, involving a user’s hand motion, locations, user ID, temperature, humidity, and lighting. Wearable sensors include three Crossbow iMote2 sets and two RFID readers. Ambient sensors consist of multiple video cameras installed on

Fig. 5.

Sensor platform.

the ceilings of the testing area, and more than 100 RFID tags attached to the daily objects involved in the testing. The user wears an RFID wristband reader and a Crossbow iMote2 set on each of his/her wrist and hand, and a third Crossbow iMote2set on the waist. The RFID reader is used to detect tagged objects involved in an activity within a distance from 6 to 8 cm and with a sampling rate of 2 Hz. The iMote2 set is for detecting hand and body motion (e.g., three-axis accelerometer), surrounding temperature, lighting, and humidity with a sampling rate of 128 Hz. The tag ID detected will be transmitted wirelessly to a Mica2Dot Module linked to the server through a serial port. The raw data sensed by the iMote2 set will

ZHOU et al.: CASE-DRIVEN AMBIENT INTELLIGENCE SYSTEM FOR ELDERLY IN-HOME ASSISTANCE APPLICATIONS

be transmitted wirelessly to iMote2 module linked to the server via a USB port. The server runs on a Tiny OS-based laptop with a MIB510CA serial interface board. A UHF RFID reader is embedded in each room to identify the user ID and his/her location, using two different UHF tags. The cameras record the overall situation of an activity. These ambient sensors are useful to construct the ground truth for activity recognition testing.

183

TABLE I ADLS INVOLVED IN THE EXPERIMENT

B. Rough Sets for Activity Model Construction The rough set theory is applied to synthesize diverse sources of ADLs to construct activity models as rule-based classification problems. A rough set software system (RSES) 2.2.2,(logic.mimuw.edu.pl/rses/) is adopted for data analysis. The rough set model excels in tackling vagueness and uncertainty using rough approximations [40]. There exist inevitably noise and missing data during data acquisition. RFID readers might collect “Null” values, when outside the sensing area from the tagged objects. Unlike those probabilistic activity recognition methods, rough-set data analysis is self-contained and does not require a priori assumptions of probabilistic distributions [41]. A rough-set model produces a complete set of consistent and minimal decision rules, using an objective knowledge induction process [40]. It also facilitates data fusion from various sensors without considering their distributions. Moreover, it can handle both symbolic and numeric data [42], thus valuable for dealing with qualitative and quantitative reasoning that is always involved in EHA applications. The raw data from multiple sensor readings are preprocessed and organized as feature vectors. In general, a dataset of sensor readings (i.e., observations) can be represented as an information system, S = (U, V ), such that ∀v ∈ V, v : U → V ∗ , where U is a nonempty finite set called the universe, V is a nonempty finite set of sensor variables, and V ∗ is the value set of an observation vector v. An observation vector of symbolic and numeric sensor readings depicts an EHA state, including user motion, environmental information, and the IDs of tagged objects in the form of v = vs ∪ vn , where vs = {vss }S and vn = {vnn }N , denoting the symbolic and numerical vectors with S and N ∗ ∗ variables, respectively. Accordingly, V ∗ = V s ∪ V n , where s∗ n∗ s n V , V are the respective value sets of v and v . For the testing cases at STARhome, an observation vector is defined with 16 sensor variables, i.e., v = {accel_body_x, accel_body_y, accel_body_z, accel_right_x, accel_right_y, accel_right_z, accel_left_x, accel_left_y, accel_left_z, temperature, humidity, lighting, location, left_object, right_object, user_ID}. These variables describe the contextual information in a particular EHA case. The first 12 variables are numeric measures and the last four are symbolic. The data are collected at a fixed frequency (i.e., once per second) such that the numeric ones take average values of the raw data, whereas the symbolic ones might be “Null,” indicating no tagged objects are involved during that sampling interval. All the data are then tabulated into a standard vector form. For example, a specific instance of ∗ = [164, 9513, 1894, 36501, the observation vector could be V10 5457, −54704, −3420, −122147, −28837, 26, 49, 12, D269, Null, A8E2, D4B0]. The first 12 numerical values indicate the

user’s body and hand motion and the environment status. The last four tag ID numbers suggest the “location” of the activity, the “objects” that interacted with the left hand and right hand, and the “user” involved in the activity, respectively. To facilitate rule generation, numeric measures are discretized using cuts that are produced with the global strategy based on the maximal discernibility heuristics [42]. Corresponding to sensor data collected at the activity level, a set of EHA decision variables are defined to characterize the context labels at the context level (see Fig. 1). Let d = {dk }K ∈ D denote a decision vector with K decision variables. Accordingly, Dh∗ = {d∗1h , . . . , d∗K h }H is the value set of d, where H is the total number of decision scenarios. For the testing cases, training data are composed as a decision table, Ω = (V ∪ D, C), where V ∪ D constitutes the universe of inference C, i.e., ∗ }M ∈ C, where M denotes the total number of trainc = {Cm   ∗ ∼ Vj∗ ⇒ Dh∗ , ing patterns in Ω. A specific entry of Ω, Cm embodies an inference relationship from an observation of an activity (i.e., the predecessor) Vj∗ to the corresponding decision (i.e., the successor), Dh∗ . Two types (K = 2) of decision variables exist for the testing cases: “activity_name” and “user_role.” The former refers to the context of an activity, performing one of the 20 ADLs in Table I. The latter indicates who is carrying out the activity, either “John” or “Mary.” For example, an instance ∗ = [“taking medication,” of the context vector would be D10 ∗ . “Mary”], corresponding to observation vector V10 Rule generation is based on the concept of reduction [40]. A reduct is defined as a subset of variables in S, such that  Φ = {φn }N ⊂ Ω, where φn = vnφ , dφn is subject to an indiscernibility relation, in which, for objects x ∈ U and y ∈ U , a pair (x, y) ∈ U × U belongs to Φ. Therefore, for any object z ∈ U , we can generate a decision rule, such that ∀q ∈ [1, Q], the predecessor of the rule takes the conjunction of certain sensor variable instances vqφ (z), and the successor takes on specific values of decision variables dφ (z), where Q denotes the total number of sensor variables instantiated by this rule. The general form of a decision rule constructed for reduct Φ and object z is thus given as the following:     φ φ v1φ = v1φ (z) ∧ . . . ∧ vQ = vQ (z) ⇒ dφ = dφ (z) . (1) The extent to which a rule matches an entry is determined by the degree of support, i.e., the number of objects from Ω for which this rule applies correctly. However, finding a minimal reduct is NP-hard, as the size of the reduct set can be exponential with respect to the size of the decision table. Usually, genetic

184

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 41, NO. 2, MARCH 2011

TABLE II EXPERIMENT RESULTS OF 20 ADLS

classification is correctly identified, and “recall” is defined as “TP/(TP + FN),” indicating the probability of correctly inferring a true activity. For all testing cases, the precision and recall for “take medication” are 91% and 71%, respectively (case 3), while 88% and 100% for “using a computer” (cases 2 and 4). The average precision and recall across all activities are both 92%, indicating an acceptable activity model recognized for 20 common ADLs. VII. CONTEXT MODELING

algorithm-based heuristics are adopted to handle a large number of reducts within acceptable time [40]. We set up a fivefold cross-validation experiment for the testing EHA cases. All sample data are partitioned into five subsets. At every time of the experiment, one of the five subsets serves as testing data, while the rest four subsets are used as training data. There are 810, 923, 893, 927, and 839 rules generated by five iterations, respectively. Among five reducts, the maximal support is 16 and the minimum is one. For example, one identified rule reads as: “((left_object = water)∧(right_object = Null)∧(location = kitchen)∧(user_ID = D4B0)⇒(activity_name = taking_medication, user_role = Mary(4)).” It means that if left hand holds “water” and right hand touches “nothing,” and location is “kitchen,” and user ID is “D2B0,” then we can infer that “Mary” is “taking medication,” for which the support level is four. Such mined rules provide the basis to classify various ADLs associated with particular EHA cases. C. Experimental Results The experiment in the STARhome involves 20 common ADLs (see Table I). Four participants are involved on an alternate basis. For each ADL, two persons are tested. One carries out the ADLs sequentially, while the other annotates each activity and controls the time. Each participant is asked to repeat all 20 ADLs for five times. All sensor data for ADLs are recorded and manually labeled with corresponding decision variables. Along with particular ADL context, each observation is tabulated into the decision table, from which a large number of entries can be obtained for the training dataset. For the illustrative simplicity, this study selects 330 most representative activity data for analysis. Table II shows the experiment results. “True positive (TP)” indicates that an activity is correctly classified as the right class, whereas “false positive (FP)” means that an activity is labeled as a wrong class. “False negative (FN)” corresponds to activities not reported when they occur. The column “precision,” defined as “TP/(TP + FP),” is the measure of the probability that a given

In order to deal with imperfect and ambiguous information, the context space is described as five components, namely, personal (ct1 ), task (ct2 ), social (ct3 ), spatiotemporal (ct4 ), and environmental (ct5 ), based on the activity theory [28]. Fig. 6 shows taxonomy of EHA context representation for case 1. Each component is characterized by one of the three properties: profiled, sensed, or predicted. Each property further assumes a confidence level regarding the truth of each piece of contextual information. Profiled contexts are mainly descriptive and relatively unchanged in most situations. Hence, we assume its confidence level as Pprofiled = 98%. Sensed contexts are those directly acquired from the sensors. The confidence level is determined by the sensor itself, which is set at Psensed = 95%. Predicted contexts correspond to the information derived from the activity recognition module. The confidence level is the overall accuracy (recall), i.e., Ppredicted = 92%. Each context has several feature variables to describe detailed contextual information. For example, the feature variables in spatiotemporal context are time, event sequence, and location, whereas user location is at the granularity of rooms. Time is sensed using the iMote2 set, and event sequence is based on the time. Consistent with a bottom-up approach, pieces of contextual information are first specified in great details and then organized into different contexts associated with different properties to form upper levels, which in turn are synthesized to form a complete context space at the top level, as shown in Fig. 6. VIII. CASE COMPREHENSION The C-AmI system employs a hybrid method combining casebased and rule-based reasoning for case comprehension. Highlevel decisions are first derived based on CBR, and then an EHA customized knowledge model compatible with rule-based reasoning is deployed to fine-tune the decisions for particular EHA cases. A. Case Base Organization The case base is denoted as C b = C1 , . . . , C20 , I, C k , where C1 , . . . , C20 are 20 classes of cases, I = A, L is a case indexing model, and C k is the EHA customized knowledge model for case adaptation (see an example in Fig. 7). An ADL in a case is denoted as A, performing as the major index, while L depicts the location where the ADL takes place, acting as the subindex. All cases are organized hierarchically according to the caseindexing model. They are first grouped into 20 classes based

ZHOU et al.: CASE-DRIVEN AMBIENT INTELLIGENCE SYSTEM FOR ELDERLY IN-HOME ASSISTANCE APPLICATIONS

Fig. 6.

185

An example of context model for case 1. TABLE III CASE REPRESENTATION OF CASE 3

Fig. 7.

EHA customized knowledge model.

on the major index, i.e., ADLs, and within each case class, cases are further categorized into subclasses according to their locations, i.e., the subindex. Unlike the context model, each case is represented in a top-down fashion, where contexts with different properties are described with multiple feature-value pairs, as shown in Table III. It is not uncommon in EHA applications that the same activity recognized from the sensor platform may imply different stories (i.e., context), when taking place at different locations. It is hence necessary to customize general case knowledge further according to locations of the activities. Fig. 7 illustrates an EHA customized knowledge model developed for case adaptation. It is constructed based on personal habits of the elderly and modifiable for individuals based on adaptation knowledge derived through expert interviews. Corresponding to the cases at different locations, this model cohesively links major tagged objects (e.g., personal computers) that related to typical tasks (e.g., online searching). Based on the customized knowledge model, EHA solutions (e.g., Google) are fine-tuned through

a downward-branching hierarchical reasoning process. Such knowledge can be articulated using IF-THEN rules. For example, the rule recommending “Google and/or Yahoo” is expressed as “IF (use PC for online searching in the study room), THEN (try Google and/or Yahoo).”

B. Case Retrieval Case retrieval is the process of finding prior solved cases that are closest to the current case. The retrieval process starts with case 1 and proceeds with the following steps: 1) Identify case class k that equals to the class of case 1; 2) Set j = 1; select case j from class k; 3) Compare the location of case 1 with that of case j; 4) If their locations are identical, then go to Step 5; else j = j + 1, go to Step 3; 5) Calculate the similarity between cases 1 and j; 6) Set j = j + 1; if j ≤ Kc (the total number of cases in class k), then go to Step 3; else go to Step 7;

186

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 41, NO. 2, MARCH 2011

7) Rank the retrieved cases by a similarity measure and choose those with similarity larger than the predefined threshold. The cosine coefficient between two cases is adopted as the similarity measure, which is calculated based on latent semantic analysis (LSA) [43]. A semantic space for the case base must be constructed before LSA can be applied. One case base composed of n cases and m key terms (extracted from these cases, usually m  n) can be represented as a term-by-case matrix Am ×n , where element aij is the frequency of the ith key term that appears in the jth case. Then, Am ×n is weighted using a log-entropy transformation to improve retrieval performance. The weighted matrix Aw m ×n is further decomposed into orthogonal components with singular value decomposition. That is T Aw m ×n = Um ×r Σr ×r (Vn ×r ) , where the rows of Um ×r describe the key-term vectors, and the rows of Vn ×r the case vectors. Matrix Σr ×r is diagonal one with descending-ordered scaling values. In order to remove noise, Aw m ×n is reconstructed with a reduced dimension by keeping the first k largest scaling values while setting the others to zero values in Σr ×r , such T ˜ ˜w that A m ×n = Um ×k Σk ×k (Vn ×k ) . Hence, case vectors can be derived by back multiplying, i.e., T ˜ −1 ˜w Vn ×k = (A m ×n ) Um ×k Σk ×k .

(2)

In order to obtain a fine-grained similarity, the similarity between two corresponding case contexts (e.g., an identical task occurs in two different cases) are computed first. Such measure is then aggregated using a weighted function. Any context can be projected into the case semantic space. Based on (2), the pth contexts of cases cx and T y ˜ −1 cy are represented as ctxp = (ctxw p ) Um ×k Σk ×k and ctp = ˜ −1 in the case semantic space, respectively. (ctyp w )T Um ×k Σ k ×k T The respective values of vectors ctxw p = (α1 , α2 , . . . , αm ) and ctyp w = (β1 , β2 , . . . , βm )T are zeros and weighted frequencies (i.e., α1 . . . αm , β1 . . . βm ) of the key terms specified for the p th contexts. If the angle between ctxp and ctyp is θp , then similarity between two corresponding contexts is computed as follows:   x ctp • ctyp    cos (θp ) =  (3) ctxp  ∗ ctyp  . The more similar semantically, the more close to one (i.e., the maximum of a cosine value). Accordingly, similarity between cases cx and cy is defined by taking weights into account, i.e., 5 p=1 wp cos (θp ) sim (cx , cy ) = (4) 5 p=1 wp where wp is the weight of the pth contexts of ctxp and ctyp . For our context model (e.g., Fig. 6), the weights of context properties can be determined as: w1 = w3 = Pprofiled = 0.98, w4 = w5 = Psensed = 0.95, and w2 = Ppredicted 0.92, by assessing the truth of each value associated with every context property. For the testing cases, the cosine coefficients of the similarity measure are calculated using online LSA software (lsa.colorado.edu). The demo version provides one-to-many and document-to-document comparisons, whereby “general reading

up to 1st year college” is used as the LSA space, containing 200 dimensions, corresponding to k = 200 in (2). First, cases 2 and 4 are obtained as a result of the retrieval process. Then, according to (3), the similarity values between the corresponding contexts in case 1 and case 2 from ct1 to ct5 are calculated as 1.00, 0.28, 1.00, 0.93, and 0.75, respectively. Similarity between the corresponding contexts in case 1 and case 4 is measured as 0.36, 0.59, 0.73, 0.93, and 0.54, respectively. The fine-grained similarity values are calculated using (4), i.e., sim (c1 , c2 ) = 0.80 and sim (c1 , c4 ) = 0.59. Finally, case 2 is returned as the result of case retrieval, if the threshold is set at 0.70. C. Case Adaptation The EHA customized knowledge model constitutes the basis of case adaptation. The adaptation is implemented by integrating the substitution and rule-based approaches into a soft reasoning mechanism [44], involving the following steps. 1) Substitution: It replaces invalid parts of the old solution with new content, according to key differences of a new case from the old one. 2) Rule-based adaptation: The system further refines the solution according to the EHA customized knowledge model. 3) Evaluation: The user performs evaluation and feedback to the system for improvement. 4) Storage: If the adaptation is successful, the new case, along with adaptation knowledge, is stored for future use; and the customized knowledge model is also updated if necessary. This procedure can be illustrated with the testing cases. Assume the most similar case is retrieved as case 2. It is first analyzed that the main difference is the task context (similarity = 0.28), which is critical for the solution. Therefore, the solution is adapted as “remind John to use Google for searching political news video,” by substituting the main task context in case 2 with the new task. Then the EHA customized knowledge model is deployed to refine the solution by applying the rule “IF (use PC for watching video online in the study room, THEN (try YouTube).” The refined solution becomes “remind the user to try YouTube for searching videos of political news.” At Step 3, a user, “John,” should evaluate the refined solution. If he is satisfied, the new case and knowledge about the adaptation are stored in the case base. Otherwise, the proposed adaptation needs to be revised based on the feedback. IX. CASE-DRIVEN ASSISTIVE ACTIONS There are generally three categories of EHA assistive services, including emergency treatment (e.g., sudden fall, stroke), autonomy enhancement (e.g., a cooking assistance system), and comfort services (e.g., infotainment assistance) [45]. For emergency treatment, it is desirable for the system to be able to predict any emergency situation proactively. Taking “arthritis” in case 1 as an example, it has been reported that arthritis has a certain relation with the environment (e.g., temperature and humidity). In order to prevent the aggravation of any symptom, the system should give an early warning. In this sense, the system

ZHOU et al.: CASE-DRIVEN AMBIENT INTELLIGENCE SYSTEM FOR ELDERLY IN-HOME ASSISTANCE APPLICATIONS

187

assistive actions may be executed via the same sensor platform that is employed for activity recognition. For example, the lighting and noise sensors can also be used to draw attention from the users for taking action. X. DISCUSSIONS

Fig. 8.

The process flow of assistive actions.

should be sensitive to those ADLs related to the symptom. For autonomy enhancement, the key issue is to what extent a user is willing to be controlled by the decisions of C-AmI system. One decisive factor is whether the usefulness of a C-AmI service outweighs the cost and inconvenience of the control. Therefore, the system should be reliable and trustable with an appropriate degree of autonomy for different user profiles. Fig. 8 shows the procedure of case-driven assistance decision making. The assisting layer announces a call for the type of services as recommended by the CBR engine. The controlling layer then triggers action conditions to check whether it should change the control state. If “yes,” the acting layer executes the recommended actions upon the user or the environment; otherwise, it returns to the assisting layer for next round. Whenever an action is performed, the context state will be changed; otherwise, it goes back to execute the input action again. If the problem is solved, the process terminates; otherwise, the assisting layer issues another command based on the user feedback. Regardless of various types of services, usability is an important issue of the EHA. The C-AmI system entails a paradigm of implicit HCI [46] and facilitates human—object–environment interactions. The system first takes the user’s behavior as input implicitly and then recognizes, interprets, and understands user needs and finally provides proactive assistance. Nevertheless, such proactive behavior of the system could puzzle the users if explanations of the behavior are provided inadequately. Fortunately, CBR is helpful, to some extent, by showing the tracks of decision making. Another issue is the interface between users and the system. It is desirable that users have certain dialogues rather than monologues with the system. Current implementation of the system takes advantage of existing interfaces. For instance, the reminder can be sent to the user’s PDA or home telephone. Considering the fact that certain impairments, such as hearing or visual loss, are not uncommon among the elderly,

The proposed C-AmI system takes advantage of wearable and ambient sensors to provide distributed and pervasive sensing capabilities. One example is the tagged objects that the user interacts with using the RFID technique, which offers a natural way to predict the ongoing activity. The limitation is that these prototype sensors are obtrusive for a long time of wearing. Further work lies in consideration of unobtrusive off-the-shelf sensors that are available in the market now. The rough set-based model achieves a reasonably good performance of activity recognition in terms of the averaged recall and precision. However, it might still be inadequate for some activities. The recall of “taking medication,” e.g., is 71%, implying that the system might remind the user to take medication even though the user has already done so. Upon examining the rules, unlike other activities that all take place in one location only, this particular activity may occur in three locations (i.e., kitchen, living room, or bedroom), corresponding to three times of taking medication per day. Although the EHA customized knowledge model deliberately incorporates location information into activity patterns, a limited set of training data used in the testing cases leads to such a low recall, which on the other hand suggests possible means for improvement. The LSA method can capture indirect information contained in myriads of contextual information. The example presented in this study shows the potential of CBR. Case adaptation capitalizes on customized knowledge, contributing to relatively high quality of solutions. However, adaptation knowledge is obtained by a tedious process of interviews by domain experts. Usually, a causal model between the problem space and the solution space is needed. In addition, complex activity patterns (e.g., interleaved and concurrent activities) are not taken into consideration in the experiment, which may restrict applicability of the system in the real world. Several soft computing techniques can be introduced to case adaptation, such as fuzzy decision trees and neural networks, for supervised or unsupervised learning of adaptation knowledge from prior cases [44]. Future work also lies in the consideration of privacy and security issues and design EHA applications as social-technical systems. XI. CONCLUSION The AmI lends itself to be a promising technology for developing smart and cost-effective solutions for EHA applications. It excels in revealing the context awareness of ongoing EHA situations through interactions of human users with the environment. A C-AmI system integrates WSNs, activity recognition, CBR, and EHA customized knowledge within a coherent framework. It supports the sensing, predicting, and reasoning of assistive actions by taking advantage of a priori knowledge about solved cases to meet the elderly needs in a smart home environment. It entails a multilayered architecture that coincides with

188

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 41, NO. 2, MARCH 2011

the EHA information model encompassing the activity, context, case comprehension, and assistive action levels. The C-AmI concept implies a unique perspective for home control, security, safety, and comfort services, specifically geared toward the elderly. The C-AmI system can also be leveraged to other environments, such as hospitals and nursing homes, with similar hardware and software infrastructures. It has meaningful implications in developing architectures, methods, and tools that are capable of combining various technologies into AmI systems across different usage environments, and thus promises personalized services for the elderly. ACKNOWLEDGMENT The authors would like to thank the support of Dr. Tao Gu for sensor platform setup and the Institute for Infocomm Research of Singapore for providing STARhome facilities. They would also like to thank the anonymous reviewers for constructive comments. REFERENCES [1] U. Cort´es, C. Urdiales, and R. Annicchiarico, “Intelligent healthcare managing: An assistive technology approach,” in Proc. 9th IWANN, San Sebasti´an, Spain, 2007, pp. 1045–1051. [2] D. J. Blake and C. Bodine, “An overview of assistive technology for person with multiple sclerosis,” J. Rehabil. Res. Develop, vol. 39, pp. 229–312, 2002. [3] E. Aarts, “Ambient intelligence: A multimedia perspective,” IEEE Multimedia, vol. 11, no. 1, pp. 12–19, Jan.–Mar. 2004. [4] Definition of ADLs. (1998, Oct.). [Online]. Available: www.medterms.com [5] U. Cort´es, R. Annicchiarico, J. V´azquez-Salceda, C. Urdiales, L. Ca˜namero, M. L´opez, M. S`anchez-Marr`e, and C. Caltagirone, “Assistive technologies for the disabled and for the new generation of senior citizens: The e-tools architecture,” AI Commun., vol. 16, pp. 193–207, 2003. [6] J. C. Augusto, “Ambient intelligence: the confluence of ubiquitous/pervasive computing and artificial intelligence,” in Intelligent Computing Everywhere, A. Schuster, Ed. London, U.K.: Springer-Verlag, 2007, pp. 213–234. [7] P. Remagnino and G. L. Foresti, “Ambient intelligence: A new multidisciplinary paradigm,” IEEE Trans. Syst., Man, Cybern. A, vol. 35, no. 1, pp. 1–6, Jan. 2005. [8] K. Ducatel, M. Bogdanowicz, F. Scapolo, J. Leijten, and J.-C. Burgelman, “Scenarios for ambient intelligence in 2010,” Eur. Commun., IST Advisory Group, Belgium, Final Rep., Feb. 2001. [9] R. Chen, Y. B. Hou, Z. Q. Huang, and J. He, “Modeling the ambient intelligence application system: concept, software, data, and network,” IEEE Trans. Syst., Man, Cybern. C, vol. 39, no. 3, pp. 299–314, May 2009. [10] F. G. Miskelly, “Assistive technology in elderly care,” Age Ageing, vol. 30, no. 6, pp. 455–458, 2001. [11] J. Pansiot, D. Stoyanov, B. P. L. Lo, and G. Z. Yang, “Towards image based modeling for ambient sensing,” presented at the 2006 Int. Workshop Wearable and Implantable Body Sensor Networks, Cambridge, MIT, pp. 195–198. [12] D. J. Cook, “Health monitoring and assistance to support aging in place,” J. Universal Comput. Sci., vol. 12, no. 2, pp. 15–29, 2006. [13] J. Pansiot, D. Stoyanov, D. Mcllwraith, B. P. L. Lo, and G. Z. Yang, “Ambient and wearable sensor fusion for activity recognition in healthcare monitoring systems,” presented at the Int. Workshop Wearable and Implantable Body Sensor Networks, Aachen, Germany, 2007. [14] G. Z. Yang, Body Sensor Networks. London, U.K.: Springer-Verlag, 2006. ´ Kiss, “Ambient assisted living in rural areas: Vision and [15] F. Havasi and A. pilot application,” in Constructing Ambient Intelligence, M. M¨uhlh¨auser, A. Ferscha, and E. Aitenbichler, Eds. Berlin, Germany: Springer-Verlag, 2007, pp. 246–252.

[16] K. V. Laerhoven, H. W. Gellersen, and Y. G. Malliaris, “Long term activity monitoring with a wearable sensor node,” presented at the Int. Workshop Wearable and Implantable Body Sensor Networks, Cambridge, MIT, 2006. [17] D. Chen, R. Malkin, and J. Yang, “Multimodal detection of human interaction events in a nursing home environment,” presented at the 6th Int. Conf. Multimodal Interfaces, New York, Oct. 13–15, 2004. [18] M. Tentori and J. Favela, “Activity-aware computing for healthcare,” IEEE Pervasive Comput., vol. 7, no. 2, pp. 51–57, Apr.–Jun. 2008. [19] B. Logan, J. Healey, M. Philipose, E. Munguia-Tapia, and S. Intille, “A long-term evaluation of sensing modalities for activity recognition,” presented Int. Conf. UbiComp, Innsbruck, Austria, Sep. 16–19, 2007. [20] L. Bao and S. S. Intille, “Activity recognition from user-annotated acceleration data,” presented at the 2nd Int. Conf. Pervasive, Vienna, Austria, Apr. 21–23, 2004. [21] N. P. Cuntoor, B. Yegnanarayana, and R. Chellappa, “Activity modeling using event probability sequences,” IEEE Trans. Image Proc., vol. 17, no. 4, pp. 594–607, Apr. 2008. [22] T. P. Moran and P. Dourish, “Introduction to this special issue on contextaware computing,” Human-Computer Interaction, vol. 16, no. 2, pp. 87– 95, 2001. [23] T. Wu, C. Lian, and J. Y. Hsu, “Joint recognition of multiple concurrent activities using factorial conditional random fields,” presented at the AAAI Workshop Plan, Activity, and Intent Recognit Menlo Park, CA, AAAI Press, 2007. [24] T. Gu, Z. Wu, X. Tao, H. Pung, and J. Lu, “epSICAR: An emerging patterns based approach to sequential, interleaved and concurrent activity recognition,” presented at the PerCom 2009, Dallas, TX, Mar. 9–13. [25] A. K. Dey, G. D. Abowd, and D. Salber, “A conceptual framework and a toolkit for supporting the rapid prototyping of context-aware applications,” Human-Computer Interaction, vol. 16, no. 2, pp. 97–166, 2001. [26] T. Winograd, “Architectures for context,” Human-Computer Interaction, vol. 16, no. 2, pp. 401–419, 2001. [27] K. Henricksen, J. Indulska, and A. Rakotonirainy, “Modeling context information in pervasive computing systems,” in Proc. 1st Int. Conf. Pervasive, Zurich, Switzerland, 2002, pp. 167–180. [28] A. Kofod-Petersen and J. Cassens, “Using activity theory to model context awareness,” in Modeling and Retrieval of Context, T. R. Roth-Berghofer, S. Schulz, and D. B. Leake, Eds. Berlin, Germany: Springer-Verlag, 2005, pp. 1–17. [29] J. Liu, J. C. Augusto, H. Wang, and J. B. Yang, “Considerations on uncertain spatio-temporal reasoning in smart home systems,” presented at the 7th Int. Conf. Applied AI, Genova, Italy, Aug. 29–31, 2006. [30] J. Prentzas and I. Hatzilygeroudis, “Categorizing approaches combining rule-based and case-based reasoning,” Expet. Syst., vol. 24, no. 2, pp. 97– 122, 2007. [31] F. Sadri, “Ambient intelligence for care of the elderly in their homes,” in Proc. 2nd Workshop Artif. Tech. Ambient Intell., Hyderabad, India, 2007, pp. 62–67. [32] K. Stathis and F. Toni, “Ambient intelligence using KGP agents,” in Proc. 2nd Eur. Symp. Ambient Intell., Eindhoven, The Netherlands, 2004, pp. 351–362. [33] E. L. Rissland and D. B. Skalak, “CABARET: Rule interpretation in a hybrid architecture,” Int. J. Man-Mach. Stud., vol. 34, no. 6, pp. 839–887, 1991. [34] A. Aamodt and E. Plaza, “Case-based reasoning: foundational issues, methodological variations, and system approaches,” AI Commun., vol. 7, no. 1, pp. 39–59, 1994. [35] A. Kofod-Petersen and A. Aamodt, “Case-based situation assessment in a mobile context-aware system,” presented at the Workshop Artificial Intelligence In Mobile Systems, UbiComp, Seattle, WA, Oct. 12–15, 2003. [36] A. Zimmermann, “Context-awareness in user modeling: Requirements analysis for a case-based reasoning application,” in Case-Based Reasoning Research and Development, vol. LNCS 2689/2003, K. D. Ashley and D. G. Bridge, Eds.. Berlin, Germany/Heidelberg, Germany: SpringerVerlag, 2003. [37] M. Becker, E. Werkman, M. Anastasopoulos, and T. Kleinberger, “Approaching ambient intelligent home care systems,” in Proc. Pervasive Health Conf. Workshops, Innsbruck, Austria, 2006, pp. 1–10. [38] C. Jo¨elle, L. C. James, D. Simon, and G. David, “Context is key,” Commun. ACM, vol. 48, no. 3, pp. 49–53, 2005. [39] T. Gu, H. K. Pung, and D. Zhang, “Aservice-oriented middleware for building context-aware services,” J. Netw. Comput. Appl., vol. 28, no. 1, pp. 1–18, 2005.

ZHOU et al.: CASE-DRIVEN AMBIENT INTELLIGENCE SYSTEM FOR ELDERLY IN-HOME ASSISTANCE APPLICATIONS

[40] Z. Pawlak, Rough Sets: Theoretical Aspects of Reasoning About Data. Dordrecht, The Netherlands: Kluwer, 1991. [41] I. Duntsch and G. Gediga, “Uncertainty measures of rough set prediction,” AI Commun., vol. 106, no. 1, pp. 109–137, 1998. [42] J. Bazan, H. S. Nguyen, S. H. Nguyen, P. Synak, and J. Wr´oblewski, “Rough set algorithms in classification problem,” in Rough Set Methods and Applications, L. Polkowski, S. Tsumoto, and T. Lin, Eds. Heidelberg/New York: Physica-Verlag, 2000, pp. 49–88. [43] Handbook of Latent Semantic Analysis, 1st ed. Lawrence Erlbaum, Mahwah, NJ, 2007, pp. 35–55. [44] S. K. Pal and S. Shiu, Foundations of Soft Case-Based Reasoning. Hoboken, NJ: Wiley-Interscience, 2004. [45] J. Nehmer and A. Karshmer, “Living assistance systems–an ambient intelligence approach,” in Proc. ICSE 2006, Shanghai, pp. 43–50. [46] A. Schmidt, “Interactive context-aware systems interacting with ambient intelligence,” in Ambient Intelligence, G. Riva, F. Vatalaro, F. Davide, and M. Alca˜niz, Eds. Amsterdam: IOS Press, 2005.

Feng Zhou (M’08) was born in Hangzhou, China, in 1982. He received the M.S. degree in computer engineering from Zhejiang University, Hangzhou, China, in 2007. He is currently working toward the Ph.D. degree in human factors engineering at the School of Mechanical and Aerospace Engineering, Nanyang Technological University, Singapore His current research interests include affective product design, product ecosystem design, and human–computer interaction. Mr. Zhou is a Student Member of the American Society of Mechanical Engineers.

Jianxin (Roger) Jiao (M’01) received the Bachelor’s degree in mechanical engineering from Tianjin University of Science and Technology, Tianjin, China, the Master’s degree in manufacturing engineering from Tianjin University, Tianjin, China, and the Ph.D. degree in industrial engineering from Hong Kong University of Science and Technology, Kowloon, Hong Kong, in 1998. He is currently an Associate Professor of enterprise systems engineering in the G. W. Woodruff School of Mechanical Engineering, Georgia Institute of Technology, Atlanta. He was an Assistant Professor and Associate Professor in the School of Mechanical and Aerospace Engineering, Nanyang Technological University, Singapore. His research interests include engineering design, manufacturing systems and logistics, affective computing, and engineering management.

189

Songlin Chen (M’08) received the Bachelor’s degree in aerospace engineering from the National University of Defense Technology, Changsha, China, in 2001, the Master’s degree in aeronautics and astronautics from Stanford University, Palo Alto, CA, in 2003, and the Ph.D. degree in industrial engineering and engineering management from the Hong Kong University of Science and Technology, Kowloon, Hong Kong, in 2008. He is currently an Assistant Professor at the Division of System and Engineering Management, School of Mechanical and Aerospace Engineering, Nanyang Technological University, Singapore. His research interests include the design and operations of advanced manufacturing/service systems, with special interest in mass customization.

Daqing Zhang received the Ph.D. degree from the University of Rome “La Sapienza,” Rome, Italy and the University of L’Aquila, L’Aquila, Italy, in 1996. Since 2007, he has been a Professor of ambient intelligence and pervasive system design at the Networks and Telecommunication Services Department, Institute TELECOM SudParis, France. From 2000 to 2007, he was with the Institute for Infocomm Research, Singapore, where he was engaged in research on smart home, healthcare/elderly care, and contextaware computing . He is the author or coauthor of more than 100 papers published in referred journals, conferences, and books.