Motiv Emot (2012) 36:371–381 DOI 10.1007/s11031-011-9257-2
Do you see what I see? Learning to detect micro expressions of emotion Carolyn M. Hurley
Published online: 11 November 2011 Ó Springer Science+Business Media, LLC 2011
Abstract The ability to detect micro expressions is an important skill for understanding a person’s true emotional state, however, these quick expressions are often difficult to detect. This is the first study to examine the effects of boundary factors such as training format, exposure, motivation, and reinforcement on the detection of micro expressions of emotion. A 3 (training type) by 3 (reinforcement) fixed factor design with three control groups was conducted, in which 306 participants were trained and evaluated immediately after exposure and at 3 and 6 weeks post-training. Training improved the recognition of micro expressions and the greatest success was found when a knowledgeable instructor facilitated the training and employed diverse training techniques such as description, practice and feedback (d’s [ .30). Recommendations are offered for future training of micro expressions, which can be used in security, health, business, and intercultural contexts. Keywords Micro expression Facial expression Emotion Training
This work was submitted in partial fulfillment of a Doctor of Philosophy degree at the University at Buffalo by the author. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the Transportation Security Administration, the Department of Homeland Security, or the United States of America. The author would like to thank Dr.’s Mark Frank and David Matsumoto for loan of the Micro Expression Training Tool, second edition. C. M. Hurley (&) Transportation Security Administration, 601 South 12th street, Arlington, VA 22202, USA e-mail: [email protected]
Introduction If facial expressions of emotion were delivered uniformly each and every time an emotion was elicited, eventually all of us would be near perfect perceivers of others. However, pressures to conceal or mask one’s true feelings may result in emotional displays that are quick or fragmented (called micro momentary expressions, Haggard and Isaacs 1966; or micro expressions, Ekman and Friesen 1969). Since daily life features many pressures to conceal or mask one’s emotions, as a function of status, culture, context, politeness, and so forth (Ekman 1972), the ability to accurately perceive and interpret these quick expressions would improve our interpersonal skills, allowing us to better understand individuals’ true emotional states. The ability to ‘‘read’’ others is advantageous for the average person, but in particular for clinicians and security practitioners where the ability to understand others can result in more informed judgments regarding threats to oneself and others. Practitioners are already utilizing webbased micro expression (ME) training in security (e.g., Department of State, Department of Homeland Security, Department of Defense) and health contexts, although testing of these efforts has been largely limited to clinical populations (e.g., Marsh et al. 2010; Russell et al. 2006, 2008). Identifying effective training methods is imperative, especially in these critical situations where a superior understanding of emotion can significantly improve our national security and quality of life. The best available research in concealment of emotion suggests that these masked emotional signals, particularly MEs, are very difficult to detect (Ekman and Friesen 1969, 1974a; Etcoff et al. 2000; Porter and ten Brinke 2008). Recent research has found that it is possible to train these skills in a short period (Matsumoto and Hwang, in press),
yet few boundary factors that may affect training success have been explored. This manuscript examines the trainability of MEs of emotion, the optimal method of training, the role of motivating factors, the effect of reinforcement, and the retention of training materials over a 6-week period. This will help identify more effective training methods, which can be used to train individuals—such as those in national security contexts—who may encounter concealed emotions like MEs.
Background Micro expressions of emotion Emotions can be defined as ‘‘short-lived psychologicalphysiological phenomena that represent efficient modes of adaptation to changing environmental demands’’ (Levenson 1994, p. 123). Emotions are automatic responses that are triggered—aroused in a fraction of a second—by environmental stimuli that alter our attention and organize biological responses, preparing us to react. Emotions are complex and involve a number of bodily response systems such as expression, muscular tonus, voice, and autonomic nervous system activity (Levenson 1994). Besides unique internal signals, emotions also generate external signals—such as facial expressions—that provide clues of these internal changes. A significant body of literature has examined the basic emotions of anger, contempt, disgust, fear, happiness, sadness, and surprise, revealing that each appears to have a characteristic expression that is universal across cultures (e.g., Ekman 2003; Elfenbein and Ambady 2002). The universal production of these facial signals suggests that these emotional expressions are genetically determined and biology is largely responsible for establishing which facial movements are associated with certain emotions (DeJong 1979; DeMyer 1980). A ME is a special case of the basic emotional expression, which was first discovered by Haggard and Isaacs (1966) while studying clinical interviews. They believed MEs were caused by an unconscious repression of conflict and that those expressions occurred too quickly to be seen in real time. Ekman and Friesen (1969, 1974b) undertook a more rigorous program of study that fully articulated the nature of MEs. After examining recorded psychiatric interviews frame-by-frame they found that MEs were emotional expressions that ‘‘leaked’’ out when individuals attempted to inhibit or manage their facial displays (Ekman, 2003). They concluded that these quick expressions represented signs of concealed emotion, as uninhibited or naturally occurring emotional expressions generally last several seconds in length or more (Hess and Kleck 1990).
Motiv Emot (2012) 36:371–381
The existence of MEs has been verified in studies of concealment (Porter and ten Brinke 2008) and is relevant to high-stakes contexts like law enforcement and national security. For example, if someone is transiting a security checkpoint and is in possession of illegal drugs, he may have a fear of discovery. He will in all likelihood try to hide these feelings, so any emotional clues he produces may be more subtle then in a context where he is not trying to manage his behavior. Research has shown that the ability to detect MEs is related to skill at detecting deception in high-stakes scenarios (Ekman and O’Sullivan 1991, 2006; Ekman et al. 1999), likely because it is easier to judge veracity when an observer is able to accurately understand how the target is feeling. This research emphasizes the importance of ME recognition skills for any individual whose profession requires interpersonal interaction or deception detection. Facial and micro expression training Scientists have long endeavored to train people to better recognize facial expressions. As early as the 1920s researchers had students study pictorals or verbal descriptions of facial expressions (Allport 1924; Guilford 1929; Jarden and Fernberger 1926; Jenness 1932). However, the absence of clear stimulus materials (drawings versus photographs) and clear identification of expressions limited this training research. After researchers began to systematically study and define the muscle movements inherent in emotional expressions they were able to create detailed facial coding systems (e.g., Ekman and Friesen 1978; Izard 1979). This allowed researchers to create standardized sets of valid emotion training and testing materials (e.g., BART, Ekman and Friesen 1974b; PoFA, Ekman and Friesen 1975; JACFEE, Matsumoto and Ekman 1988; JACBART, Matsumoto et al. 2000). The Japanese and Caucasian Brief Affect Recognition Test (JACBART) was the first published test of micro expression recognition accuracy (MERA) that was rigorously evaluated (Matsumoto et al. 2000). The JACBART created the appearance of more dynamic expressions, as each poser’s neutral face was imposed before and after the emotional expression face, reducing the after effects of the stimuli. All expression images were scored with the Facial Action Coding System (FACS; Ekman and Friesen 1978) to ensure the same muscle actions occurred for each emotion and were consistent with universally recognized expressions (Ekman 2003). Additionally, these images were tested with an international audience to ensure crosscultural agreement (Biehl et al. 1997). Matsumoto and colleagues provided evidence of internal and temporal reliability and convergent and concurrent validity for this test across five studies and found similar accuracy patterns
Motiv Emot (2012) 36:371–381
even with the differences made to presentation speed and judgment task (Matsumoto et al. 2000). This ME testing procedure evolved into a self-instructional training tool, originally called the Micro Expression Training Tool (METT; now available as the METT Advanced at face.paulekman.com and the Microexpression Recognition Tool [MiX] at www.humintell.com). The METT is presented as a stand-alone training tool; it offers a pre-test, a training section, practice examples with feedback, a review section, and a post-test. The stimuli used in these training tools are laboratory produced which provides the necessary consistency and reliability of expression, poser, intensity, angle and so forth to provide scientific test of MERA. However, use of this type of materials limits the ability to generalize to naturally occurring spontaneous expression, which have more dynamic features (Naab and Russell 2007). Researchers have used versions of the METT to train department store employees and trial consultants (Matsumoto and Hwang, in press) and individuals with Schizophrenia (Marsh et al. 2010; Russell et al. 2006, 2008) to detect MEs. A 2-h instructor led session using the MiX not only significantly improved Korean department store employees’ ability to identify MEs (N = 81, 18% increase), but also led to higher social and communication skills scores (Matsumoto and Hwang, in press). A similar experiment using a small group of trial consultants also showed improvements in accuracy (N = 25, 18% increase). Further analyses revealed no skill decay over a 2-week period for both groups (Matsumoto and Hwang, in press). The METT has also been used to train clinical patients with emotion recognition deficiencies to more accurately recognize emotion (Marsh et al. 2010; Russell et al. 2006, 2008). Training individuals with Schizophrenia to read facial expressions using the METT resulted in a significant improvement in ME recognition at the post-test (9% increase, Russell et al. 2006; 18% increase, Russell et al. 2008), illustrating the tool’s robustness to different populations. These studies support a meaningful training-accuracy relationship for identifying MEs, as well as, highlight some possible social benefits. Researchers have used other materials to teach others about facial expressions. Stickle and Pellegreno (1982) and Elfenbein (2006) used the Pictures of Facial Affect (PoFA, Ekman and Friesen 1975) to train American students to recognize emotional expressions (Elfenbein also used a subset of Chinese posing facial expressions Wang and Markham 1999). Although both studies reported success for training, the authors did not report either the pre and post accuracy scores and within subjects change (Stickle and Pellegreno 1982) or the baseline recognition accuracy
(Elfenbein 2006). Those limitations inhibit interpretation of these data. These studies also did not examine the ability to detect quick expressions—such as MEs—further limiting the ability to compare these methods to standardized tools such as the METT or MiX. Boundary factors to training While research demonstrates the validity of using commercial ME training tools to train recognition skills (Matsumoto and Hwang, in press; Russell et al. 2006, 2008), little research has analyzed the underlying factors associated with these skill improvements. Training formats such as simple feedback (Elfenbein 2006), lecture and practice (Stickle and Pellegreno 1982), and the METT/MiX (Matsumoto and Hwang, in press) have all improved expression recognition; but it is unknown which methods have produced the greatest improvements or had the greatest retention, due to differences in both testing materials and measures of effectiveness. It is also unknown which format and materials are optimal for training individuals to detect MEs. These studies revealed that individuals can be trained to recognize laboratory produced MEs fairly quickly and effectively, however, retention has only been examined in one study and only at 2 weeks (Matsumoto and Hwang, in press). Although training with the METT can improve individuals’ recognition in as little as a few hours, the length that this training outlasts the post-test is unknown. Skill decay is an important variable to examine as many military or government employees may only be able to receive ME training once a year or once in a career span. Another factor to consider is that understanding emotional expressions is a skill that may improve with practice. People who have repeated exposure to individuals who try to conceal their emotions or who scrutinize nonverbal behavior for their jobs—such as law enforcement officers, judges, clinical psychologists, and secret service personnel—are often more accurate judges of how others are feeling (Ekman and O’Sullivan 1991; Ekman et al. 1999). Studies that have repeatedly tested the same participants have found they improved without training (Matsumoto et al. 2000). This suggests that repeated exposure to the task or stimuli may serve as a training function as well and should be examined. Motivation can also influence a person’s ability to learn material. Even though micro expression training may improve MERA for all individuals, those who are more motivated may learn and retain more material. Motivation to learn is positively related to skill acquisition (Colquitt et al. 2000), deeming it an important area for investigation. It is important to examine individuals’ motivation to learn
both at the start and completion of each testing phase, as motivation may be affected by external factors such as the quality or content of the training or assignment to the training or control group. Any differences must be controlled for to insure that any gains made post-training can be properly attributed to the training. Overall, the previously published studies raise questions regarding the optimal method of training, the role of exposure and motivating factors, and the persistence of training effects over time. It is important to examine these boundary factors that may reduce skill loss so that researchers can identify more effective training techniques. The METT is an ideal instructional tool for testing these differences. This training can be self-administered or administered by an instructor in a group setting and provides enough stimulus materials to examine skill retention. This will allow us to assess these factors in an existing and well-used training. Based on the above literature review, which found significant improvements in MERA with different iterations of the METT training (Matsumoto and Hwang, in press; Russell et al. 2006, 2008), the following set of specific hypotheses are proposed: H1 ME Training will significantly improve participants’ MERA and result in greater skill retention, opposed to the control conditions, which will experience no change in MERA. Although training by feedback alone has significantly improved expression recognition skills (Elfenbein 2006), ME recognition is an advanced skill which requires understanding of subtle differences among expressions. Thus, H2 An instructor-led, multi-faceted ME training condition will produce the greatest increases in MERA, opposed to ME training conditions that are self-led, or only provide feedback to participants. Any increased exposure to training material should also provide an advantage to the exposed group. Thus, H3 Reinforcement will significantly improve retention of MERA. Previous studies have assumed that a comparison group assigned to do nothing during the training time serves as an adequate control for examining training effects. Factors such as mere exposure to stimuli or motivation to learn could affect ME post-test scores or moderate effectiveness of training. Thus, three control groups will also be examined to answer the following research question: RQ1 What is the effect of motivation and simple exposure on MERA?
Motiv Emot (2012) 36:371–381
Method Participants Three hundred thirty four (334) participants were recruited from large introductory communication courses. An inclass announcement advertised the study as ‘‘an evaluation of students’ nonverbal communication skills’’ and interested students signed up for three 1-h appointments through an online sign-up system. Participants who completed the study received 3 h of research credit in partial fulfillment of their 5 h departmental requirement. Design The study employed a 3 (training type—instructor feedback; instructor feedback plus description; or self led) by 3 (reinforcement—none; at time 2 only; or at time 3 only) fixed factor design with three control groups (traditional control; control with additional exposure of items; or control with a motivating lecture). The four times at which participants’ accuracy at judging MEs was assessed (pretraining, immediately after training, 3 weeks later, and 6 weeks later) was treated as a within-subject independent variable. The dependent variable was the participants’ accuracy on the various ME tests. Participants were randomly assigned to each condition. Conditions Participants in the control conditions received no training to serve as comparison groups to the training manipulations. Participants in the ‘‘traditional’’ control condition occupied themselves for the length of the manipulation and were not exposed to any other emotional expression items. Participants in the ‘‘exposure’’ control condition were exposed to the same stimulus items (photographs of facial expressions) as the training conditions during the manipulation period, but received no feedback or other information to facilitate their judgment. Participants in the ‘‘motivating lecture’’ control condition were provided with a lecture on the importance of accurately perceiving and interpreting human emotion based on the work of Ekman (2001, 2003), but were not exposed to any other facial expression material. Training techniques previously published were combined to allow for a fair comparison and evaluation of these different training methods. Participants in both the ‘‘feedback only’’ training and ‘‘full instruction’’ training conditions received the METT training led by an instructor highly knowledgeable in the area of facial expressions of emotion (the author). The difference was that the feedback only training manipulation consisted solely of feedback
Motiv Emot (2012) 36:371–381
regarding the MEs of emotion (available in the practice section of the METT), whereas in the full instruction training manipulation the instructor also discussed subtle differences among expressions (according to the ‘‘training’’ and ‘‘review’’ sections of METT) and answered questions raised by participants. Participants in the ‘‘self-led’’ training condition also received training via the METT. These participants led themselves through the training, feedback, and review sections of the METT on a personal computer (monitored by the instructor). The self-led training group was exposed to the same materials as the full instruction training group except the instructor was not allowed to answer questions or discuss subtle differences to mirror a true self-led training environment. The length of time was standardized (25 min) for all six manipulations (both training and control). Reinforcement was manipulated by randomly assigning the training participants to either receive or not receive re-training at their second and third appointments. Refreshers were identical in format to participants’ original training conditions (i.e., feedback only, full instruction, or self-led) although the instruction time was reduced to 15 min. Trained participants were randomly assigned into one of three Refresher conditions: approximately one-third received no refresher training, one-third received refresher training at time 2, and one-third received refresher training at time 3. Stimulus materials The second version of the METT was used for the testing and training of MERA. The laboratory produced METT expression items involve full-face flash displays that show a subject’s neutral expression, a quick expression flash (1/15th of a second), and then a return to the subject’s neutral face. The METT training is divided into five sections: (1) a 14-item pre test (anger, contempt, disgust, fear, happiness, sadness, and surprise, each shown twice), (2) a training section in which each of the universal expressions are introduced and described, (3) a 42-item practice section, (4) a review section, and (5) a 28-item post-test (the same seven emotions shown four times). Elements of this training program were manipulated to form the stimulus materials used to assess MERA as well as functioned as the training materials in the training manipulations. To enable three post-training assessment periods, a pilot test was conducted to evaluate the difficulty of the expression items so they could be grouped into equivalent post-tests. (The third post-test was also used to assess MERA at the pre-test period.) The 42 ME items taken from the pre-test and posttest sections of the METT were shown separately to 12 communication undergraduates, who judged each of these items at the speed of 1/15th second. These 42 items were
then divided into three sets to create three MERA posttests, each having two examples of each emotion. Paired samples t-tests revealed no significant differences in test difficulty among the three tests. The mean difficulty for each of these tests based on the pilot data was 0.63 (posttest 1), 0.66 (post-test 2), and 0.62 (post-test 3). Procedure Time 1 This study was conducted over an 8-week period and was approved by the University’s Institutional Review Board. Participants were scheduled in small groups for hour-long sessions at three points in the semester. Participants were randomly assigned to one of the six conditions and each condition was run separately. One instructor (the author) led all sessions. After arrival, participants completed an informed consent document and then completed a demographic questionnaire and personality indexes. Then the instructor provided an overview of the experiment and explained the ME test procedure to the group. The format and procedure of each ME test was identical. At this point in the experiment the pre-test was administered according to the procedure described below. Before each test, participants were asked to indicate their confidence in their ability to perform well, as well as their motivation to correctly identify the ME items. Confidence was measured using a 1 (Very poor) to 7 (Very good) rating to the question: How well do you think you will do at recognizing the upcoming facial expressions of emotion? Motivation was assessed using a 1 (Not Motivated) to 7 (Very Motivated) rating response to the question: How motivated are you to recognize people’s emotional expressions? Next, participants viewed the fourteen-item ME test. Each item was projected on a blank wall in the research room at the speed of 1/15th of a second. Participants were given approximately 10 s to judge each expression by circling the appropriate response on the provided answer sheet (choices included anger, contempt, disgust, fear, happiness, sadness, surprise, and none of the above). After all items had been judged, participants indicated their confidence in their judgments. Post-confidence was measured using a 1 (Very poor) to 7 (Very good) rating to the question: How well do you think you did at recognizing these facial expressions of emotion? After the pre-test was completed, the next 25 min served as the manipulation period for the experiment. Control participants received no training, and training participants received ME training in one of three styles described previously. After the manipulation, all participants completed the fourteen-item ME post-test (1) according to the
procedure described above. After the post-test participants were reminded of their next research appointment, and dismissed from the research space.
Motiv Emot (2012) 36:371–381
(1.6%), or another ethnic background (1.4%). Participants were mostly sophomores (38.2%) and juniors (32.7%), although some seniors (15.0%) and freshman (13.1%) also participated (1.0% did not list class year).
Time 2 Exactly 3 weeks after the first session, participants returned to the research space. At this time participants in the training conditions were randomly assigned as a group to one of the three Reinforcement conditions: none, refresher at time 2, or refresher at time 3. Participants assigned to a refresher at time 2 received 15 min of training based on their original training condition. After the manipulation participants completed the fourteen-item ME post-test (2) according to the procedure described previously. After all participants completed the post-test they were reminded of their next research appointment, and dismissed from the research space. Time 3 Exactly 6 weeks after the original training, participants returned to the research space. Participants assigned to a refresher at time 3 received 15 min of training based on their original training condition. After the manipulation participants completed the fourteen-item ME post-test (3) according to the procedure described previously. After the post-test, all participants completed a questionnaire exploring how this study had impacted their lives. Last, participants were debriefed regarding the purpose of the study, provided research credit, and dismissed from the research space.
Results Participants A total of 334 students participated at Time 1, with a 92% completion rate (N = 306). Analyses were conducted to determine if there were any differences in the demographic makeup (age, gender, and ethnicity) of the 306 final subject sample and the 28 participants who did not complete the study. These analyses revealed no significant demographic differences between the group who completed the study and the group that dropped out. From hence forth, only the final sample (N = 306) is discussed. The participants were 174 female (57%) and 132 male (43%) undergraduates with an average age of 20.13 (SD = 3.08) years. Participants were mostly Caucasian (70.9%), but there were also participants who identified themselves as Asian or Pacific Islander (11.1%), African or Caribbean (8.8%), Hispanic (6.2%), Middle Eastern
Motivation In this study, participants were asked to rate how motivated they were on a one item scale (1 = Not Motivated to 7 = Very Motivated). Independent samples t tests were conducted to examine motivation differences between untrained participants and trained participants. One significant difference was uncovered for the pre-test, t (304) = -2.133, p = .034, d = -.24, suggesting that trained participants (M = 5.51, SD = 1.07) were more motivated to succeed than the controls (M = 5.23, SD = 1.04) before the manipulation. At this point in the experiment participants had not received any information regarding the training manipulation so the cause of the greater motivation level is unknown. There were no significant differences in motivation between controls (M = 5.18, SD = 1.06) and training (M = 5.40, SD = 1.19) participants after the manipulation was introduced. A one-way ANOVA was conducted to examine change in motivation at Time 1. No significant differences were uncovered; suggesting that assignment to a training group did not significantly increase motivation to perform well in this paradigm. Pearson correlations were computed to examine the relationship between motivation and accuracy. Motivation was not significantly related to accuracy at any test for control participants. For trained participants, motivation was significantly positively related to accuracy at post-test 1, r (212) = .191, p = .005, and post-test 3, r (212) = .157, p = .021, revealing that trained participants who were more motivated to succeed were more accurate on these tests. Since motivation was not significantly related to accuracy at the pre-test—the only test in which groups differed in motivation—it was dropped as a potential covariate in ensuing analyses. Confidence In the current study confidence in judgment was measured on a one-item scale (1 = Very poor to 7 = Very good) both before and after each ME test. Pearson correlations were computed to examine the relationship between confidence and accuracy for both trained and control participants. All but two relationships were significant (Table 1). The only negative relationship occurred for trained participants at the pre-test, all other relationships were positive. This suggests that people’s perceptions regarding their
Motiv Emot (2012) 36:371–381
Table 1 Relationship between confidence and accuracy Condition
-.010 .404*** -1.77** .388***
* p \ .05; ** p \ .01; *** p \ .001
MERA were not so different from their objective ability after MEs had been defined. Training effects H1 predicted a significant main effect for training, such that trained participants would improve in accuracy post manipulation at Time 1 and retain this improvement, whereas controls would experience no change in accuracy. A mixed model ANOVA was conducted to examine the differences in accuracy across time within each of the six conditions. Mauchly’s test indicated that the assumption of sphericity had been violated, v2 (5) = 14.849, p = .011, therefore degrees of freedom were corrected using Huynh–Feldt estimates of sphericity (e = .994). There was a significant main effect for time, F (2.983, 894.954) = 104.967, p \ .001, g2 = .259. Pairwise comparisons uncovered that accuracy was significantly different (p \ .001) for all tests except between post-test 2 and post-test 3, showing that accuracy improved from pre-test to post-test 1 and post-test 2. There was also a significant main effect for condition, F (5, 300) = 4.994, p \ .001, g2 = .077. Pairwise comparisons indicated that the full instruction training condition was significantly more accurate than the traditional control and motivating lecture control conditions, but was not significantly different from the control group with exposure, or feedback only training, or self-led training conditions. A significant interaction was revealed for time by condition, F (14.916, 894.954) = 5.421, p \ .001, g2 = .083. To further explore this interaction, one-way ANOVAs were conducted at each test (pre-test, post-test 1, post-test 2, and post-test 3) to examine between subject differences. There were no significant differences at the pre-test, revealing that all groups began at approximately the same skill level. A significant difference was revealed at post-test 1, F (5, 300) = 7.561, p \ .001. Bonferroni post hoc tests revealed that the three control conditions were significantly less accurate than the two instructor-led training conditions. There were no significant differences between the
control conditions and the self-led training condition. There was a significant main effect at post-test 2, F (5, 300) = 3.388, p = .005. Bonferroni post hoc tests revealed that the full instruction training participants were significantly more accurate than the traditional control participants. At post-test 3 there was a significant main effect for condition, F (5, 300) = 8.328, p \ .001. Bonferroni post hoc tests revealed that full instruction training participants were significantly more accurate than the traditional control, motivating lecture control, feedback only training, and selfled training participants. There was no significant difference between the full instruction training condition and the exposure control condition at post-test 3. To further explore the within subjects differences, paired samples t-tests were conducted for each of the six conditions to examine accuracy change over time. A total of 3 comparisons (pre-test vs. post-test 1, post-test 1 vs. post-test 2, and post-test 2 vs. post-test 3) were conducted for each condition. The significant differences are outlined in the Table 2. Between the pre-test and post-test at Time 1, all three training conditions significantly increased in accuracy (feedback only: ?14.19%; full instruction: ?19.52%; and self-led ?10.66%), and two of the control conditions experienced no significant increase (traditional: ?1.43%; and exposure: -0.76%), revealing support for H1. Surprisingly one of the control conditions (motivating Table 2 Within subjects comparisons for accuracy from test to test Condition
Pre-test versus post-test 1, t
Post-test 1 versus post-test 2, t
Post-test 2 versus post-test 3, t
Training Feedback only Full instruction
* p \ .05; ** p \ .01; *** p \ .001
Motiv Emot (2012) 36:371–381
(?14.19%), t (143) = 1.811, p \ .05, d = .30, supporting H2. There were no significant differences between the accuracy change of the feedback only and self-led conditions. The role of refreshers
Fig. 1 Accuracy across time and conditions
lecture: ?6.15%) also significantly increased from the pretest to post-test 1, t (28) = 2.143, p = .041, d = -.40. Between post-test 1 and post-test 2, all of the conditions significantly improved in accuracy, suggesting a possible exposure or practice effect to the stimuli, or that the material shown in post-test 2 was easier than the other tests, although pilot testing suggested that all three tests were equivalent in difficulty. Between post-test 2 and posttest 3, both the traditional control condition (-5.95%) and the self-led training condition (-6.91%) significantly decreased. H1 was partially supported, as the combined training participants outperformed control participants and more specifically the full instruction training participants outperformed most controls on all tests. However, the motivating lecture control group also significantly improved after the manipulation, suggesting an effect for the motivating lecture. Additionally, all control groups significantly improved from post-test 1 to post-test 2. This pattern of results is illustrated in Fig. 1.
H3 predicted that refreshers would aid in retention of training material. A mixed model ANOVA showed a significant main effect for time, F (2, 410) = 23.338, p \ .001, g2 = .102 (Table 3). Pairwise comparisons revealed that accuracy was significantly higher for post-tests 2 and 3 compared to post-test 1 (p \ .001). There was also a significant main effect for condition, F (2, 205) = 4.017, p = .019, g2 = .038. Pairwise comparisons indicated that the full instruction condition was significantly more accurate than the self-led condition. There was no main effect for refresher type. There was a significant interaction for time by condition, F (4, 410) = 4.300, p = .002, g2 = .040. This interaction was previously explored and reported, and revealed that full instruction condition improved significantly more than the self-led and feedback only conditions. The two-way interactions for time by refresher, and condition by refresher, and the three-way interaction for time by refresher by condition, were not significant. Paired samples t tests (post-test 1 vs. post-test 2, posttest 2 vs. post-test 3, and post-test 1 vs. post-test 3) were conducted to evaluate MERA retention for each of the three refresher manipulations within the three training groups. In the feedback only condition, the time 2 refresher group significantly improved in accuracy after the refresher, t (21) = 3.346, p = .003, d = .71 (post-test 1 to post-test 2). T tests also revealed that accuracy significantly decreased from post-test 2 to post-test 3, t (21) = -3.186, p = .004, d = -.68. No other significant differences,
Table 3 Within subjects differences for ME accuracy over time Training type
Type of training H2 predicted that the full instruction training would result in greater improved accuracy compared to feedback only and self-led trainings. Independent samples t tests (onetailed) were conducted to examine the differences in improvement from the pre-test to post-test 1 for the three training types. Tests revealed that the full instruction condition (?19.52%) improved significantly more than both the self-led condition (?10.66%), t (138) = 3.021, p \ .005, d = .51, and the feedback only condition
Accuracy Post-test 1 (%)
Post-test 2 (%)
Post-test 3 (%)
At time 2
At time 3
At time 2
At time 3
At time 2
At time 3
Motiv Emot (2012) 36:371–381
including improvements for the time 3 refresher group, were uncovered. In the full instruction condition, the time 2 refresher group significantly increased in accuracy from post-test 1 to post-test 2, t (24) = 2.402, p = .024, d = .48, and posttest 1 to post-test 3, t (24) = 3.578, p = .002, d = .72. For the time 3 refresher group, paired samples t tests revealed that accuracy significantly increased from post-test 1 to post-test 3, t (21) = 3.241, p = .004, d = .69. No other significant differences were uncovered. In the self-led condition, the no refresher group significantly increased in accuracy from post-test 1 to post-test 2, t (24) = 3.496, p = .002, d = .70. Accuracy at post-test 3 was significantly less than at post-test 2, t (24) = -4.064, p \ .001, d = -.81. This was the only non-refresher group that performed significantly different on one of the posttests. For the time 2 refresher group, paired samples t tests revealed success for the refresher in significantly increasing accuracy from post-test 1 to post-test 2, t (19) = 2.323, p = .031, d = .52. For the time 3 refresher group, accuracy significantly increased from post-test 1 to post-test 2, t (23) = 5.175, p \ .001, d = 1.06, and from post-test 1 to post-test 3, t (23) = 3.646, p = .011, d = .74, revealing an increase prior to the refresher. No other significant differences were uncovered. These inconsistent results do not support H3. The time 2 refresher groups did experience significant increases in accuracy after their refresher, but this result did not outlast the time period. Actually, all groups increased at Time 2 (although not all changes were significant) suggesting that the refresher may not have caused increases but rather by some condition of the ME test.
Discussion This study provides the first data comparing different methods of training micro expressions, the effects of motivation and exposure on recognition, and skill retention over three points in time. As predicted, the training was successful. At the pre-test, there were no significant differences in accuracy based on condition, but after the manipulation trained participants performed better on posttests 1, 2 and 3 (76, 84, and 82% respectively) than controls (64, 75, and 73% respectively). Further, having an expert present to guide participants through the subtle differences among these expressions and answer questions was an advantage over the other tested training methods. The full instruction training condition continually provided consistent results: it was the only training condition that was significantly more accurate than one or more control conditions on all three post-tests.
These results revealed that the best method for using the METT in a short session was to fully explore all sections of the program including the training and review, have a knowledgeable instructor describe subtle differences between the expressions, and practice identifying the ME items and provide feedback to trainees. This suggests that feedback paired with additional training techniques may produce a more effective training manipulation. Although the METT has been designed as a self-instructional tool to train emotion recognition, this method of training was considerably less effective compared to an instructor-led training. Two explanations for this finding may be that the full instruction training provided both more material than the self-led training (instructor answered questions) as well as enthusiasm for the topic. This study does not definitively show which factor provided significant benefits. However, the surprising finding that the motivating lecture control group also significantly increased at after the manipulation (?6%) suggests that the instructor’s enthusiasm may have provided motivation to concentrate or attend closer to the post-test. Further research should examine the both the role of content and instructor in training ME skills, as there may be some ideal combination of content and charisma that produces the greatest effects. In this experiment, the same instructor was used for all trials in attempt to keep the presentation consistent and eliminate the possibility of attention or motivation biases caused by the instructor’s appearance or presentation style. However, this is also a limitation of the study since the instructor was aware of the hypotheses and experimental design, which could have unintentionally affected portrayals within the experiment. Research examining confidence of judgment has generally not found a relationship between one’s confidence in judgment and the accuracy of that judgment (DePaulo et al. 1997; Patterson et al. 2001). The current study revealed a clear relationship between confidence and MERA: before individuals were introduced to the concept of MEs, their confidence and accuracy were not calibrated, but after individuals had seen MEs, they became calibrated such that accurate judges were more confident and less accurate judges were less confident. Previous studies examining emotion recognition have not found this strong a link. In this study, participants’ confidence in interpreting the MEs was significantly positively correlated to their accuracy post-testing period, even though they had never received feedback on their performance. This may suggest a very parsimonious means to determine whether trainees understand the material—the trainer merely has to ask. The novelty of this finding suggests that this relationship should be verified in subsequent research. Of particular importance
would be to replicate this finding with naturally occurring MEs, which may be more difficult to spot. In this study a possible repeated exposure effect was found, as untrained individuals improved without training—and most markedly in the repeated exposure control. If practice—or exposure—improves performance, this suggests additional training time would be beneficial. One limitation of the current study was the limited time spent training (25 min) and refreshing (15 min) ME skills. Perhaps this is the reason why the reinforcement sessions were ineffective. This data suggests that exposure is an important element for learning, and future studies should explore increased exposure and training time manipulations. A limitation of the current study was all conditions performed significantly better on post-test 2 (as compared to the pre-test and post-test 1), which suggests that the stimuli utilized in post-test 2 may have been easier than the other tests. Pilot tests were conducted to ensure that the ME items were divided into equally difficult post-tests, and while not significant the pattern revealed by the means suggests that post-test 2 was slightly easier. Although the availability of a subject pool was prohibitive in this sense, future research should counterbalance the order of these tests to insure that accuracy is due to the manipulation, not the ease of any particular test. Another limitation was the nature of the MEs used to both test and train recognition. These MEs were full-face but very quick expressions of emotion that were imbedded within a poser’s neutral expression. Research has shown that spontaneous expressions are more difficult to interpret than posed expressions, as often naturally occurring expressions blend with other emotions or expressions (Naab and Russell 2007). Naturally occurring MEs may not engage every action unit, do not occur at the same speed or intensity every time, may be masked or covered by other expressions, and accuracy of recognition may be affected by lighting, angle, attention, and the observer’s cognitive load (Porter and ten Brinke 2008). Length of expression itself is an additional variable that was not examined in this study; it is possible that good ME perceivers could identify MEs that last for varying lengths of time, as they would naturally occur. While this study presented a necessary first step to understand boundary factors, these results should be repeated using more ecologically valid stimuli. It is important to explore this training on samples of real world practitioners—such as law enforcement, behavior detection officers, physicians, clinical groups, and negotiators. However, this training is only useful to real world practitioners if it relates to a real world skill. One study under review (Frank et al. 2011) found that the METT training resulted in significantly better ability to catch ‘‘real life micros’’ for Coast Guard officers. Although these officers almost doubled their abilities to accurately read
Motiv Emot (2012) 36:371–381
these MEs, their average post training accuracy (38%) was much lower than ME accuracy for posed photographs seen both in their study (78%) and this current study (81% across all post-tests). The posed faces selected for training with the METT are not representative of all of the facial expressions encountered daily, and therefore future research is required to determine whether these are the best materials for training spontaneous expression recognition. There is a need to correctly recognize and interpret a person’s true feelings in any number of interpersonal, health, business, legal, and social contexts. The detection of concealed or masked emotions is invaluable in law enforcement and national security settings, medical contexts and the corporate world—where better understanding of our suspects, patients, or partners can allow us to make more informed decisions about a person’s true feelings and intent. In any context, the ability to recognize emotional displays can make us more effective perceivers of others, which can enhance the quality of our interpersonal relationships and reduce the potential for misunderstanding. This study was among the first to evaluate the specific features and use of the METT, a facial expression training program currently in use in security and health contexts. These findings validate use of the METT for improving MERA, suggest the training persists at 6 weeks, and further provide the optimal way to deploy that training. Previous work suggests that this type of training will translate to real time spontaneously expressed MEs, but this particular tool requires further testing to conclusively demonstrate its utility across the wide variety of situations seen in daily life.
References Allport, F. H. (1924). Social psychology. Boston: Houghton Mifflin. Biehl, M., Matsumoto, D., Ekman, P., Hearn, V., Heider, K., Kudoh, T., et al. (1997). Matsumoto and Ekman’s Japanese and Caucasian facial expressions of emotion (JACFEE): Reliability data and cross-national differences. Journal of Nonverbal Behavior, 21, 3–21. doi:10.1023/A:1024902500935. Colquitt, J. A., LePine, J. A., & Noe, R. A. (2000). Toward an integrative theory of training motivation: A meta-analytic path analysis of 20 years of research. Journal of Applied Psychology, 85, 678–707. DeJong, R. N. (1979). The neurologic examination. Hagerstown, MD: Harper & Row. DeMyer, W. (1980). Technique of the neurologic examination. New York: McGraw-Hill. DePaulo, B. M., Charlton, K., Cooper, H., Lindsay, J. J., & Muhlenbruck, L. (1997). The accuracy–confidence correlation in the detection of deception. Personality and Social Psychology Review, 1, 346–357. doi:10.1207/s15327957pspr0104_5. Ekman, P. (1972). Universals and cultural differences in facial expression of emotion. In J. Cole (Ed.), Nebraska symposium on motivation (Vol. 19, pp. 207–283). Lincoln: University of Nebraska Press.
Motiv Emot (2012) 36:371–381 Ekman, P. (2001). Telling Lies: Clues to deceit in the marketplace, politics, and marriage. New York: W. W. Norton & Co. Ekman, P. (2003). Emotions revealed: Recognizing faces and feelings to improve communication and emotional life. New York: Henry Holt & Co. Ekman, P., & Friesen, W. V. (1969). Nonverbal leakage and cues to deception. Psychiatry, 32, 88–106. Ekman, P., & Friesen, W. V. (1974a). Detecting deception from the body or the face. Journal of Personality and Social Psychology, 29, 124–129. doi:10.1037/h0036006. Ekman, P., & Friesen, W. V. (1974b). Nonverbal behavior and psychopathy. In R. J. Friedman & M. Katz (Eds.), The psychology of depression: Contemporary theory and research (pp. 3–31). Washington, D.C.: Winston and Sons. Ekman, P., & Friesen, W. V. (1975). Pictures of facial affect instrument. Palo Alto, CA: Consulting Psychologist Press. Ekman, P., & Friesen, W. V. (1978). Facial action coding system. Palo Alto, CA: Consulting Psychologists Press. Ekman, P., & O’Sullivan, M. (1991). Who can catch a liar? American Psychologist, 46, 189–204. doi:10.1037/0003-066X.46.9.913. Ekman, P., & O’Sullivan, M. (2006). From flawed self-assessment to blatant whoppers: The utility of voluntary and involuntary behavior in detecting deception. Behavioral Sciences & The Law, 24, 673–686. doi:10.1002/bsl.729. Ekman, P., O’Sullivan, M., & Frank, M. G. (1999). A few can catch a liar. Psychological Science, 10, 263–266. doi:10.1111/14679280.00147. Elfenbein, H. A. (2006). Learning in emotion judgments: Training and the cross-cultural understanding of facial expressions. Journal of Nonverbal Behavior, 30, 21–36. doi:10.1007/s10919005-0002-y. Elfenbein, H. A., & Ambady, N. (2002). On the universality and cultural specificity of emotion recognition: A meta-analysis. Psychological Bulletin, 128, 203–235. doi:10.1037/0033-2909. 128.2.203. Etcoff, N. L., Ekman, P., Magee, J. J., & Frank, M. G. (2000). Lie detection and language comprehension. Nature, 405, 139. doi: 10.1038/35012129. Frank, M. G., Matsumoto, D. M., Ekman, P., Kang, S., & Kurylo, A. (2011). Improving the ability to recognize micro-expressions of emotion. Manuscript under review. Guilford, J. P. (1929). An experiment in learning to read facial expression. The Journal of Abnormal and Social Psychology, 24, 191–202. doi:10.1037/h0069973. Haggard, E. A., & Isaacs, K. S. (1966). Micromomentary facial expressions as indicators of ego mechanisms in psychotherapy. In L. A. Gottschalk & A. H. Auerbach (Eds.), Methods of research in psychotherapy (pp. 154–165). New York: Appleton Century Crofts. Hess, U., & Kleck, R. E. (1990). Differentiating emotion elicited and deliberate emotional facial expressions. European Journal of Social Psychology, 20, 369–385. doi:10.1002/ejsp.2420200502. Izard, C. E. (1979). The maximally discriminative facial movement coding system. Unpublished manuscript; Instructional Resources Center, University of Delaware.
381 Jarden, E., & Fernberger, S. (1926). The effects of suggestion on the judgment of facial expressions of emotion. American Journal of Psychology, 37, 565–570. doi:10.2307/1414917. Jenness, A. (1932). The effects of coaching subjects in the recognition of facial expressions. Journal of General Psychology, 7, 163–178. Levenson, R. W. (1994). Human emotion: A functional view. In P. Ekman & R. J. Davidson (Eds.), The nature of emotion: Fundamental questions (pp. 123–126). New York: Oxford University Press. Marsh, P. J., Green, M. J., Russell, T. A., McGuire, J., Harris, A., & Coltheart, M. (2010). Remediation of facial emotion recognition in schizophrenia: Functional predictors, generalizability, and durability. American Journal of Psychiatric Rehabilitation, 13, 143–170. doi:10.1080/15487761003757066. Matsumoto, D., & Ekman, P. (1988). Japanese and Caucasian facial expressions of emotion (JACFEE) [Slides]. San Francisco, CA: Intercultural and Emotion Research Laboratory, Department of Psychology, San Francisco State University. Matsumoto, D., & Hwang, H. S. (in press). Evidence for training the ability to read microexpressions of emotion. Motivation and Emotion. Retrieved from http://www.humintell.com/wp-content/ uploads/2011/05/Uncorrected-Proof2.pdf. Matsumoto, D., LeRoux, J., Wilson-Cohn, C., Raroque, J., Kooken, K., Ekman, P., et al. (2000). A new test to measure emotion recognition ability: Matsumoto and Ekman’s Japanese and Caucasian brief affect recognition test (JACBART). Journal of Nonverbal Behavior, 24, 179–209. doi:10.1023/A:10066681 20583. Naab, P. J., & Russell, J. A. (2007). Judgments of emotion from spontaneous facial expressions of New Guineans. Emotion, 7, 736–744. doi:10.1037/1528-3522.214.171.1246. Patterson, M. L., Foster, J. L., & Bellmer, C. (2001). Another look at accuracy and confidence in social judgments. Journal of Nonverbal Behavior, 25, 207–219. doi:10.1023/A:1010675210696. Porter, S., & ten Brinke, L. (2008). Reading between the lies: Identifying concealed and falsified emotions in universal facial expressions. Psychological Science, 19, 508–514. doi:10.1111/j. 1467-9280.2008.02116.x. Russell, T. A., Chu, E., & Phillips, M. L. (2006). A pilot study to investigate the effectiveness of emotion recognition in schizophrenia using micro-expression training tool. British Journal of Clinical Psychology, 45, 579–583. doi:10.1348/014466505X90866. Russell, T. A., Green, M. J., Simpson, I., & Coltheart, M. (2008). Remediation of facial emotion perception in schizophrenia: Concomitant changes in visual attention. Schizophrenia Research, 103, 248–256. doi:10.1016/j.schres.2008.04.033. Stickle, F. E., & Pellegreno, D. (1982). Training individuals to label nonverbal facial cues. Psychology in the Schools, 19, 384–387. doi: 10.1002/1520-6807(198207)19:3\384:AID-PITS2310190321[ 3.0.CO;2-A. Wang, L., & Markham, R. (1999). The development of a series of photographs of Chinese facial expressions of emotion. Journal of Cross-Cultural Psychology, 30, 397–410. doi:10.1177/0022022 199030004001.