Emotion Elicitation in an Empathic Virtual Dialog Agent - Magalie Ochs

Emotion Elicitation in an Empathic Virtual Dialog Agent Magalie Ochs ([email protected]) France Telecom, R&D Division, Technology Center, France LINC Lab., IUT of Montreuil, Université Paris VIII, France

Catherine Pelachaud ([email protected]) LINC Lab., IUT of Montreuil, Université Paris VIII, France

David Sadek ([email protected]) France Telecom, R&D Division, Technology Center, France

Abstract Recent research has shown that a virtual agent able to express empathic emotions enhances human-machine interaction. To identify under which circumstances a virtual agent should express such emotions, we analyze the conditions of users’ expression of emotion during their interaction with a virtual dialog agent. We have developed a new method of annotation based both on the psychological theory of emotion elicitation and on the philosophical speech act theory. Moreover, the tags of the coding scheme have been specifically designed to be easily integrated in a virtual dialog agent. A humanmachine dialog corpus has been annotated with the scheme. The method of annotation and the results of the analysis of annotated dialogs are presented in this paper. Hypotheses as to the appropriate conditions of emotion elicitation for an empathic virtual dialog agent are set out.

Introduction A growing interest in using virtual agents as interfaces to computational systems has been observed in recent years. This is motivated by an attempt to enhance human-machine interaction. Humanoid-like agents are generally used to embody some roles typically performed by humans, as for example a tutor (Johnson, Rickel & Lester, 2000) or an actor (André, Klesen, Gebhard et al., 2001). The expression of emotions can increase their believability by creating an illusion of life (Bates, 1994). Recent researches have shown that virtual agent’s expressions of empathic emotions enhance users’ satisfaction, engagement, perception of the virtual agents, and performance in task achievement (Brave, Nass & Hutchinson, 2005; Klein, Moon & Picard, 1999; Partala & Surakka, 2004; Prendinger, Mori & Ishizuka, 2005). In our research, we are particularly interested in the use of virtual dialog agents as information systems. Users interact in natural language to find out information on a specific domain. We aim to give such agents the capability of expressing empathic emotions towards users during dialog, and thus to improve interaction(Klein, Moon & Picard, 1999; Prendinger, Mori & Ishizuka, 2005). Empathy can be defined as the capacity to “put yourself in someone else’s shoes”. Through the empathic process, someone may feel the same emotion as another person because the former thinks that the latter feels (or could or should feel) this emotion (Poggi, 2004).

Introducing empathy into a virtual dialog agent means to give it the ability to identify emotions potentially felt by a user during interaction. This requires that the virtual dialog agent knows the circumstances under which a user may feel an emotion during the interaction. To identify them, we study, in light of the psychological theory of emotion elicitation and the philosophical speech act theory, real human-machine dialogs that lead a user to express emotions. In this paper, we first introduce theoretical foundations that guide us in the design of our corpora-based method. We then set out a coding scheme that enables us to highlight the circumstances of emotions elicitation in human-machine dialogs. Finally, we present hypotheses on the conditions of users’ emotion elicitation that we have identified through the analysis of dialogs annotated with our scheme.

Emotion Elicitation in Dialog: Theoretical Foundations An empathic virtual agent should be able to identify the emotional meaning of a situation to determine emotions potentially felt by its interlocutors. The Appraisal Theory of Emotion. To highlight characteristics of emotion-elicited situations, the Appraisal Theory of Emotion (Scherer, 2000), which aims to explain how human emotions are triggered, can be used. According to this theory, emotions are elicited by the evaluation of an event. This evaluation depends mostly on a person’s beliefs and goals. Indeed, an event may trigger an emotion only if the person thinks that it affects one of her goals (Lazarus, 1991). The consequence of the event on the individual goal determines the elicited emotion. For instance, fear is triggered when a survival goal is threatened. Generally, failed goals elicit negative emotions whereas achieved goals trigger positive ones. Emotions depend on the causes of the event (another person for example). For instance, a goal failure caused by another agent may trigger anger. The Speech Act Theory. To be able to identify a user’s potentially felt emotions, a virtual dialog agent has to know, first, the user’s goals and beliefs during the dialog. Researchers in philosophy have observed that language is not only used to describe

something or to give some statement but also to do something with intention, i.e. to act (Austin, 1962; Searle, 1969). Then, a communicative act (or speech act) is defined as the basic unit of language used to express an intention. Based on the Speech Acts Theory (Austin, 1962; Searle, 1969), we suppose that users’ goal during human-machine dialog is to achieve the perlocutory effects of the performed communicative act. The perlocutory effects describe the intention that the user wants to see achieved through the performed communicative act. For instance, the perlocutory effect of the act to inform agent j of proposition p is that agent j knows proposition p. In addition, we suppose that the user has the intention that her interlocutor knows her intention to produce the perlocutory effects of the performed communicative act. This intention corresponds to the intentional effect of the act (Sadek, 1991). For instance, the intentional effect of the act to ask agent j some information p, is that agent j knows that the speaker has the intention to know information p. The achievement of intentional effect of an act represents a precondition for the feasibility of the act’s perlocutory effects. In the context of dialog, an event corresponds to a communicative act. Consequently, according to the appraisal theory of emotion (Scherer, 2000), a communicative act may trigger a user’s emotion if it affects one of her goals. The elicited emotion depends on consequences of the communicative act on her goal and its causes. Before modeling an empathic dialog agent, we have to answer several questions. During human-machine dialog, how can communicative acts affect a user’s goals? What are the consequences on a user’s goals that can lead her to feel emotions? Do the causes have an impact on a user’s emotion elicitation? To answer these questions, we aim to extract the proprieties to drive the emotion elicitation of a dialog agent by analyzing real human-machine dialogs that have triggered users’s emotions.

users express emotions. We have studied them in order to identify the different impacts of communicative acts on users’ goals and their causes which can occur in human-machine interaction. From this analysis, we have identified the following consequences and causes of communicative acts that appear most often. Let’s u be the user, a the agent, e an event, and g a goal. • Consequences of an event on a user’s goal. Event e is annotated by this tag if: – Goal achievement tag goal achieveu (e, g) event e has enabled u to achieve goal g, that she thought to produce by e. – Goal failure tag: goal f ailureu (e, g) event e has not enabled u to achieve goal g, that she thought to produce by e. • Causes of an event that lead to goal failure. Event e is annotated by this tag if it is caused by the fact that: – Unfeasibility tag unf easibilityu (e, a, g) a does not have the capacity to achieve goal g of u. – Belief conflict tag: belief conf lictu (e, a, g a) a believes that u has another goal g a other than her own one. – Goal conflict tag: goal conf lictu (e, a, g) a had a goal g that u thought that it was already achieved. These tags constitute the coding scheme that enables us to annotate human-machine dialog corpora. To use this coding scheme, beliefs and goals of the users (and of the virtual dialog agent) have to be known. Based on the Speech Act Theory (introduced previously), we suppose that if the user (or the virtual dialog agent) performs a communicative act, it means that: • she has the goals to achieve the intentional and perlocutory effects of the act

A Coding Scheme for Annotating Conditions of Emotion Elicitation

• she believes that she can achieve the intentional and perlocutory effects of the act that she expresses.

In order to identify what triggers users’ emotions, we annotate real human-machine dialogs to emphasize the characteristics of the situations that lead users to express emotions.

For instance, a communicative act e performed by user u is annotated by the tag goal f ailureu (e, g) if the expression of act e has not allowed u to achieve the intentional or perlocutory effect g of act e. Keeping in mind that we aim at using results of the annotation to create an empathic virtual dialog agent, we have represented such tags in a computational way. In the next section, after introducing the concept of a rational dialog agent used to create virtual dialog agents, we present our computational tag representation.

The Coding Scheme An emotion may be elicited when a communicative act affects a user’s goal. The consequences and causes of the communicative act on the user’s goal determine the elicited emotion. Therefore, the annotation has to highlight impacts of communicative acts on users’ goals and their causes. We have manually analyzed 10 real human-machine dialogs (that corresponds approximately to 1000 dialog turns) during which users interact with a virtual dialog agent to find a specific restaurant in Paris or to obtain information on stock exchange. During theses dialogs,

A computational representation of the coding scheme The concept of a rational dialog agent To create a virtual dialog agent, we use a model of a rational agent based on a formal theory of interaction (called Rational Interaction Theory) (Sadek, 1991). This model

uses a BDI-like approach (Rao & Georgeff, 1991; Cohen & Levesque, 1990). An implementation of this theory produced the rational dialog agent technology (named Artimis) that provides a generic framework to instantiate intelligent agents able to engage in a rich interaction with both human interlocutors and artificial agents (Sadek, 1997). The mental state of a rational agent is composed of two mental attitudes: belief and intention. They are formalized with the modal operators B and I (p being a closed formula denoting a proposition): Bi p means “agent i thinks that p is true”. Ii p means “agent i intents to bring about p”. Based on its mental state, a rational agent acts to achieve its intentions. Several others operators have been introduced to formalize the occurred actions, the agent who has achieved it, and the temporal relations. For instance, the formula Done(e, p) means that event e has just taken place and p was true before e occurred. For more details see (Sadek, 1991; Sadek, Bretier & Panaget, 1997). The computational tags representation To easily integrate the results of the annotation in such rational dialog agents, we describe the tags of the coding scheme in terms of beliefs and intentions. Examples of tags follow: • Goal failure tag can be described by the following combination of mental attitudes: goal f ailureu (e, g) =def Bu (Done(e, Iu (g) ∧ Bu (Done(e) => g))

Annotation of Human-Machine Dialog Corpora Description of the Corpora To construct the corpora, we collected 20 real vocal human-machine dialogs which lead users to express emotions. The corpora we use are derived from two vocal applications developed at France Telecom R&D: PlanResto and Dialogue Bourse. PlanResto is a restaurant guide for Paris. Users interact using natural language with a rational dialog agent to find a restaurant (for more details see Ancé, Bretier, Breton et al. ,2004). Dialogue Bourse is a stock exchange application that allows a user to consult her accounts, to find rates and to perform transactions by interacting with a rational dialog agent by phone. The dialogs are composed of 86 dialog turns on average. The user’s utterances are generally short, averaging 4 words per utterance. The dialogs have been annotated with two labels of emotions (positive and negative) by two annotators. However, only negative labels of emotion have been used by the two annotators (no agreement is appeared between the two annotators’ annotations of positive emotion). The annotations were made based on vocal and semantic cues of users’ emotions. In the dialogs transcribed into text, these tags represent the moment where a user expresses her emotion. An example of annotated dialog extracted from the corpora (translated from French) is below. Annotations based on the coding scheme described above are in bold in the dialog:

∧ Bu (¬g) This formula means: “agent u believes that event e has just taken place (Bu (Done(e))). Before the occurrence of event e, agent u had the intention g (Iu (g)), and believed that e would enable the achievement of g (Bu (Done(e) => g)). After the occurrence of e, agent u does not believe that g has been achieved (Bu (¬g))”. • Belief conflict tag can be described by the following combination of mental attitudes: belief conf lictu (e, a, g a) =def Bu (Done(e, ¬Iu (g a) ∧ Ba (Iu (g a)))) This formula means: “agent u believes that event e has just taken place” (Bu (Done(e))). Before the occurrence of event e, agent u did not have the intention g a (¬Iu (g a)), and believed that agent a thought that u had this intention (Ba (Iu (g a)))”.

The Conditions of User’s Emotion Elicitation in Human-Machine Dialogs In order to study the user’s conditions of emotion elicitation, we have annotated human-machine dialogs with the tags described in the previous section. After introducing the copora we use, we present an example of annotated dialog.

Agent: Welcome to PlanResto. PlanResto enables you to look for a restaurant according to its speciality, its localization and your budget. What are you looking for? User : The Hippopotamus restaurant hmmm in the 14th urban district Agent: A restaurant in the 14th urban district. I have found more than a hundred. You can specify a speciality, for example specific type of food. You can speak now. (event e1) goal failureuser (e1, the agent knows the user’s intention is to have information on the Hippopotamus Restaurant) User : hmmm Agent: A Greek restaurant (event e2) goal failureuser (e2, the agent knows the user’s intention is to have information on the Hippopotamus Restaurant) belief conflictuser (e2, agent, to have information on Greek restaurant) User : no, grill food Agent: I now have 8 restaurants. You can ask for them to be listed. What price range are you looking for? (event e3) goal achieveuser (e3, the agent knows the user’s intention to find a grill restaurant) User : hmmm more than fifteen euros Agent: I have not understood you (event e4)

goal failureuser (e4, the agent knows the user’s intention to find a restaurant with a price superior to 15 euros) [. . .] A user’s Expression of Negative Emotion

the goal failures caused by the unfeasibility of the agent to achieve the user’s goal do not seem to elicit an expression of a negative emotion (Table 2). In the other goal failure situations, our coding scheme has not enabled us to highlight their causes. It’s why we are able to identify the causes of only 31 out of 45 goal failures

Results of the Annotation From the 20 dialogs annotated with the coding scheme presented, 47 dialogical situations that lead a user to express negative emotions have been studied manually.These situations are sequences of communicative acts that occurred just before an emotional expression. First of all, we have observed that negative emotions appeared generally after goal failures. In 45 situations, the user expresses a negative emotion after a sequence of communicative acts lead to the failure of one of her goals (Table 1). In the corpora studied, the most frequent situation is the failure of the goal to achieve the intentional effect of the performed communicative act. In other words, the goal failure is mainly due to the fact that the agent does not understand the user’s request. Table 1: The situations leading to a user’s expression of emotion Type of Situation Goal Failure Belief Conflict Goal Conflict

Number of Situations Observed 45 2 3

We have observed that in some cases the expression of a negative emotion seems to be triggered by a belief conflict or a goal conflict, or by a belief conflict and a goal conflict. In two situations, the user expressed a negative emotion when it appeared that the virtual dialog agent tried to achieve a goal different from the user’s one(belief conflict). In three situations, the user’s expression of negative emotion appeared when the virtual agent tried to achieve a goal that the user knew had already been achieved and thus it is no more a user’s active goal (goal conflict) (Table 1). Goal failure, belief conflict and goal conflict can appear in a single dialogical situation. For instance, in some cases, the virtual agent believed that the user had a goal different from her own one (belief conflict) and the user thought that this goal had already been achieved (goal conflict). A belief conflict or a goal conflict can also lead to a goal failure. We have studied more precisely the situations of goal failures that lead a user to express emotion. We have observed that the causes of these goal failures are belief conflicts and goal conflicts. In the dialogs leading to a user’s expression of emotion, the user’s goals failed in the majority of observed situations (30 situations out of 45) because the virtual agent thought that the user had a goal different from her own one (belief conflict). In one situation, a goal failure led to a negative emotion due to the fact that the agent tried to achieve a goal that had already been achieved (goal conflict). However,

Table 2: The causes of goal failures Causes of Goal Failure Belief Conflict Goal Conflict Unfeasibility

Number of Situations Observed 30 1 0

The expressions of a negative emotion are generally elicited after several successive failures to complete a goal. On average, three to four goal failures led a user to express a negative emotion (Table 3). We have also studied the influence of an emotion, already expressed during the dialog, on the elicitation of a new emotion. Less goal failures are required to trigger a negative emotion expression when a negative emotion has already been expressed during the dialog. Indeed, the first negative emotion is triggered after the occurrence of 3 or 4 goal failures successively while the second negative emotion will be triggered after 2 goal failures (see Table 3). A negative emotion seems to be triggered more rapidly when the user has already expressed another negative emotion since the beginning of the dialog. Table 3: The average number of goal failures that lead to a user’s expression of emotion Situation General Case 2nd Emotion Expressed

Number of Goal Failures 3-4 2

Discussion Hypotheses on users’ conditions of emotion elicitation According to the results of the annotation, negative emotions seem to be triggered after goal failures or after belief or goal conflicts. From these results, we introduce 2 definitions: Primary conflicting mental state. A goal failure can be described as a conflict between a user belief before and after the goal failure: before the goal failure, the user believes that her goal is going to be achieved and after the goal failure she realizes that it is not. We introduce the concept of primary conflicting mental state. A user has this mental state when one of her beliefs about her environment is different from the reality that she has

just observed. The reality observed is then in conflict with her beliefs. The goal failure is a primary conflicting mental state. Secondary conflicting mental state. A belief or goal conflict corresponds to a conflict between a user’s and the virtual agent’s beliefs. Indeed, in the case of a belief conflict, the virtual agent thinks that the user has a goal that the user does not have. In a goal conflict, the user thinks a goal has already been achieved whereas the virtual agent does not. To describe these conflicting mental states, we use the term secondary conflicting mental state. A user has this mental state when one of her beliefs about her environment is different from her belief about the mental state of another agent. For instance, let’s p be a user’s belief on her environment. The user has a secondary conflicting mental state if she thinks that the agent thinks (not p). The beliefs of the user about her environment are in conflict with her beliefs on the mental state of another agent. The belief conflict and the goal conflict are secondary conflicting mental states. Given the definitions just introduced and the results of the annotation, we venture the following hypotheses: Hypothesis 1. A negative emotion may be elicited by primary conflicting mental states. A negative emotion can also be triggered by a secondary conflicting mental state. We have observed that goal failures leading to a user’s expression of emotion are caused by a belief or a goal conflict, which let us make the following hypothesis: Hypothesis 2. The primary conflicting mental states that elicit emotions are caused by a secondary conflicting mental state. In the dialogs observed, several goal failures are required to elicit a user’s expression of emotion. The number of goal failures depends on the dialogic situation (see Table 3). We can suppose that this number informs us of the intensity of the emotion. Indeed, all the emotions felt are not expressed. More precisely, emotions with low intensity are generally not perceptible. Only emotions with an intensity reaching a certain threshold are expressed. We can presume that an emotion with low intensity requires more goal failures to be expressed. On the other hand, an emotion with high intensity is expressed after few goal failures. We have observed in the dialog corpora that less goal failures are necessary to elicit an emotional expression if an emotion has already been expressed during the dialog. This leads us to our third hypothesis: Hypothesis 3. The intensity of an elicited emotion depends on the emotions already triggered during the dialog.

This influence may be explained by the fact that the intensity of the first elicited emotion is not null when the second one is triggered (since the dialogs we have studied are relatively short). Then, the intensity of the first emotion is added to the second one. The intensity are cumulative.

Future Works According to psychological theories on emotion elicitation, other variables may influence the intensity of emotion. For instance, the effort invested to complete a goal influences the intensity of an emotion elicited by a goal failure (Ortony, Clore & Collins, 1988). If one fails to achieve something after trying very hard to achieve it, the triggered negative emotion is likely to be of greater intensity than if one fails after trying less hard. In the context of dialog, the effort can be represented by the number of communicative acts performed to achieve a goal. Our analysis of the corpora has not enabled us to show the influence of this variable on the intensity of emotion. This may be due to the few amount of dialogs studied. Others variables, such as the importance for the agent to achieve its goal or the unexpectedness of an event that occurs, can significantly influence the intensity of the elicited emotion. However, the current coding scheme does not highlight such variables. Given the corpus used in this work, some affective information has not been taken into account, such as the influence of a user’s personality and mood in the emotion elicitation. The emotional tags (positive and negative) do not enable us to study the elicitation of different kinds of emotion (such as anger, satisfaction, joy, relief, disappointment, etc) and their intensity. This work is, of course, not sufficient to model a user’s emotions but enables us to provide to a virtual agent with some information on the dialogical situations that may trigger a user’s emotions and thus enables us to model an empathic dialog agent.

Conclusion An empathic virtual agent should express emotions in the situations that may potentially elicit a user’s emotion. To identify these emotional situations, we have annotated human-machine dialogs that lead users to express negative emotions. We have constructed a particular coding scheme based on a theoretical and empirical approach. Each tag is described in terms of mental attitudes of beliefs and intentions. This semantically grounded formal representation will enable us to easily integrate the results of the analysis of the annotated corpora in a rational agent system. Dialogs that have been annotated with this coding scheme enable us to emphasize some features of emotional human-machine dialogs. A user’s negative emotions seem to be elicited by particular conflicting mental states. The intensity of these emotions depends on if another emotion has already been triggered during the dialog or not. The number of dialog situations that have been analyzed and that lead to a user’s expression of emotions is not sufficient to draw conclusions on the user’s conditions

of emotion elicitation during human-machine dialogs. The work presented in this paper represents a first step towards the creation of an empathic virtual dialog agent. We are currently implementing them in a rational dialog agent. A subjective evaluation will be performed to verify the believability of the conditions of the agent’s empathic emotions expressions.

Acknowledgments We thank Emilie Chanoni for her valuable comments on the concept of conflicting mental states.

References Ancé, C., Bretier, P., Breton, G., Damnati, G., Moudenc, T., Pape, J.-P., Pele, D., Panaget, F., & Sadek, D. (2004). Find a restaurant with the 3d embodied conversational agent nestor. Proceedings of the 5th Workshop on Discourse and Dialogue (SigDIAL), Boston, USA. André, E., Klesen, P., Gebhard, P., Allen, S., & Rist, T. (2001). Integrating models of personality and emotions into lifelike characters. In A. Paiva (Eds), Affective interactions: towards a new generation of computer interfaces. New York: Springer-Verlag. Austin, J. (1962). How to do things with words. London: Oxford University Press. Bates, J. (1994). The role of emotion in believable agents. Communications of the ACM, 37, 122–125. Brave, S., Nass, C., & Hutchinson, K. (2005). Computers that care: Investigating the effects of orientation of emotion exhibited by an embodied computer agent. International Journal of Human-Computer Studies, 62, 161–178. Cohen, P., & Levesque, H. (1990). Intention is choice with commitment. Artificial Intelligence, 42(2-3), 213–232. Johnson, W., Rickel, J., & Lester, J. (2000). Animated pedagogical agents: Face to- face interaction in interactive learning environments. International Journal of Artificial Intelligence in Education, 11, 47–78. Klein, J., Moon, Y., & Picard, R. (1999). This computer responds to user frustration. Proceedings of the Conference on Human Factors in Computing Systems, (pp. 242–243). New York: ACM Press. Lazarus, R. S. (1991). Emotion and adaptation. New York: Oxford University Press. Ortony, A., Clore, G., & Collins, A. (1988). The cognitive structure of emotions. United Kingdom: Cambridge University Press. Partala, T., & Surakka, V. (2004). The effects of affective interventions in human-computer interaction. Interacting with computers, 16, 295–309. Poggi, I. (2004). Emotions from mind to mind. Proceedings of the Workshop on Empathic Agents AAMAS, (pp. 11–17).

Prendinger, H., Mori, J., & Ishizuka, M. (2005). Using human physiology to evaluate subtle expressivity of a virtual quizmaster in a mathematical game. International Journal of Human- Computer Studies, 62, 231–245. Rao, A. S. & Georgeff, M. (1991). Modeling rational agents within a bdiarchitecture.Proceedings of the International Conference on Principles of Knowledge Representation and Reasoning (KR) (pp. 473–484). Allen, J., Fikes, R., and Sandewall, E., editors, San Mateo: Morgan Kaufmann. Sadek, D. (1991). Attitudes mentales et interaction rationnelle: vers une théorie formelle de la communication. Doctoral dissertation, Department of Computer Science, University of Rennes I., Rennes. Sadek, D., Bretier, P., & Panaget, F. (1997). Artimis: Natural dialogue meets rational agency. Proceedings of the 15th International Joint Conference on Artificial Intelligence (IJCAI 97) (pp. 1030–1035), Nagoya, Japon. Scherer, K. (2000). Emotion. In M. Hewstone, & W. Stroebe (Eds), Introduction to Social Psychology: A European perspective. Oxford: Oxford Blackwell Publishers. Searle, J. (1969). Speech Acts. United Kingdom: Cambridge.