Relating Perceived Web Page Complexity to ...

4 downloads 0 Views 745KB Size Report
within each of the dimensions: happy, sad, angry, surprised, scared, disgusted, and 'neutral'. EV is computed as a quantitative, linear combination of these ...
Relating Perceived Web Page Complexity to Emotional Valence and Eye Movement Metrics Joseph H. Goldberg Oracle America, Inc. Initial impression of visual complexity has major significance for both consumer and enterprise web page designs. Research is still needed, however, before complexity assessment methods can become part of the usability tool arsenal. In this regard, a study was conducted to compare subjective ratings, eye tracking, JPEG-compressed file size, and emotional valence measures. Professional enterprise users conducted search tasks, then judged the complexity of web pages. Multivariate factor analysis was followed by ordinal logistic regressions on subjective ratings. Subjective ratings of page complexity were driven in part by selfperception of search difficulty, and in part by page density. Fixation durations increased and search area decreased with lower complexity ratings. Aggregated emotional valence, from facial analysis, also increased with higher ratings of page clarity. Overall, both pre-attentive eye tracking and emotional valence measures were related to conscious subjective judgments of complexity. Further research is recommended to be able to ascribe complexity-inducing features to measurable qualities.

INTRODUCTION Impression of Complexity Initial impression of complexity is an important determinant of both consumer and enterprise web page design success. An overly complex page can drive consumers away, interrupt transactional flows, and can negatively impact brand marketing (Michailidou, et al., 2008; Lindgaard, et al., 2006). Complexity is elusive and difficult to define. It increases with: the number of elements on a page, the dissimilarity between elements, the inverse of the degree to which elements can be considered as members of a larger unit (Berlyne, 1960), perceived unfamiliarity (Forsythe, et al., 2008), preexisting expectations, lack of task context, task abstraction, and age (Donderi, 2006). Many elements of web pages can impact judgments of visual complexity. Harper, et al. (2009) found that pages perceived as visually simple contain information that concentrates on one subject, have few links, and small images. Adding tables, boxes, and small lists can make a page increasingly complex. Visually complex pages are long, with a large amount of text, links, images, tables, and/or menus. Several contexts may be present in these pages, allowing the completion of many types of tasks such as reading, searching, and buying products. Impressions of complexity and visual appeal are rapidly formed according to the rules of Gestalt psychology (Forsythe, 2009). In early perceptual stages, we see larger shapes, or forms, then fill these in with visual details over time. Of the many Gestalt rules, the most relevant for web page design include Figure/ground, Proximity, Closure, Similarity, and Continuation (Graham, 2008). By having participants rate and rank pages based upon dimensions such as complexity, interest, layout, and use of color, Lindgaard, et al. (2006) noted that visual appeal judgments are formed within 50 msec of a page’s presentation. Geissler, et al. (2001) found that number of links, number of graphic elements, large text

blocks, page length, and animation were most influential on perceived complexity. Automated analysis of page complexity relies upon segmentation and modeling elements of pages, in order to form relevant metrics. Segmentation breaks pages into text or image ‘chunks’ with varying information. Using Gestalt principles to compute the complexity of visually segmented pages, Song (2007) found that the number of segments were unrelated, but a composite visual information metric was related to perceived complexity. Dong, et al. (2007) found that perceived complexity was unrelated to exposure durations, but did depend on clarity of segmentation, page density, layout, and color. Image complexity was due to both quantity and variety of page objects. Visual complexity perception may also be correlated with JPEG-compressed file size of images (at 70% compression), due to similar bottom-up sensory processes in the retina (Donderi, 2006; Stickel, et al., 2010).

Emotional Valence Emotion is a transient, emergent behavior that is both difficult to define and elusive to measure. Emotions carry both valence (positive to negative) and arousal (resting to excited) dimensions. Post-task assessment by overt methods (e.g., scales or surveys) may not capture short-lived feelings; conversely, assessment by physiological methods (e.g., EEG or GSR) suffers from potential validity issues (Hazlett and Benedek, 2007). Patterns in eye movements, blinks, and pupil changes may infer emotional valence and arousal (deLemos, et al., 2008). Observers look earlier and longer at emotionallycharged images than neutral images, perhaps to prepare for rapid defensive responses (Calvo and Lang, 2004). Automated analysis of facial features may provide cues about emotional valence (EV). Subtle changes in facial gestures can be interpreted using machine learning methods coupled with Facial Action Coding (Ekman, et al., 1978; Essa and Pentland, 1997). Software examples include Affectiva Affdex and QSensor products (http://www.affectiva.com/), and Noldus FaceReader (http://www.noldus.com/).

Objective There is a dearth of research relating the assessment of visual complexity by subjective rating to that by pre-attentive physiological methods. To help provide this foundation research, a study was conducted to empirically relate impressions of web page complexity/clarity, metrics of task completion time, eye tracking metrics, EV from facial analysis, and JPEG file compression size. Results from this study will guide the development of further usability methods and tools for comparing various design alternatives.

METHOD Participants Twenty-two professional employees of Oracle (8 F, 14 M) were recruited from three internal organizations: IT, Facilities, and Sales. None had any prior exposure to the software screens that were presented. All were collegeeducated, and had substantial background viewing and using transactional web pages to complete administrative tasks. Each participant was individually tested within a quiet usability lab, over a 30-minute period.

Measures and Tools Eye tracking. Eye movements were recorded using a Tobii T60 eye tracker, running Tobii Studio ver. 2.3 (Tobii Technology, www.tobii.com), using a 5-point calibration. Fixations were defined by an I-VT filter, with 30°/sec velocity threshold, and 60 msec minimum fixation threshold (Salvucci and Goldberg, 2000). Specific eye tracking measures that were used in the present study include: • Time to First Fixation (TFF, in msec): The elapsed time until a fixation was first made within the Area of Interest (AOI) associated with the intended visual search target. • Fixation Duration (msec): The average duration of fixations within a specified time interval or AOI. • Search Area (in pixels2): The product of horizontal and vertical search extents, which are independently computed from the 25th through 75th percentile spatial range of fixations along each dimension. Facial analysis. A webcam captured face video, which was used by Noldus FaceReader (Ver. 3.0) to record emotional valence (EV). Within each video frame, this software initially finds the user’s face, then assigns 491 facial feature locations, using an underlying anatomical face model (Figure 1). These locations are input into a previously

Figure 1. Noldus FaceReader realtime facial mesh of 491 locations. (Figure courtesy Oracle Applications User Experience.)

trained 3-layer neural network, whose outputs are values within each of the dimensions: happy, sad, angry, surprised, scared, disgusted, and ‘neutral’. EV is computed as a quantitative, linear combination of these values, from positive (+1) to negative (-1) valence. The software continuously calibrates, and operates at a real-time video frame rate. Complexity Scale. A subjective scale was constructed to assess some component of perceived complexity. The scale was not previously validated, and did not intend to differentiate between visual and cognitive complexity. Rather, it served as a qualitative measure for comparison with eye tracking and EV metrics. The scale anchors were intended to access people’s internal feelings about page complexity: (1) Extremely Overwhelming; Frustrating, (2) Very Overwhelming, (3) Somewhat Overwhelming, (4) Neutral, (5) Somewhat Clear, (6) Very Clear, and (7) Extremely Clear; Enjoyable. The anchor terms Frustrating and Overwhelming were drawn from Geissler, et al’s (2001) complexity scale items, which in turn were drawn from Berlyne’s (1960) study. Participants discussed and practiced with this scale prior to the start of data trials. Comments were also recorded for later transcription and classification.

Procedure Following an explanation of the study, participants were calibrated to the eye tracker and positioned relative to the face video camera. A practice trial was completed, followed by a full, 25-trial set. Each trial started with a task slide, stating a question to be answered on the following screen; e.g., “Please determine the current job grade of Klaus Beckenbauer.” The screen then appeared, and the participant scanned it and verbally completed the task. A screen with the complexity scale next appeared, and a verbal response was provided. Trials continued until the entire set was completed.

Stimuli and Design Each participant viewed 25 different variations of Human Capital Management pages, representing Overview Dashboards with organizational charts (9 page), Personal Information (4 pages), Goals Work Area (3 pages; Figure 1), Promotion transaction (3 pages), Goals Editing transaction (3 pages), and Person Search (3 pages). Pages were presented in the same order to each participant; subsequent analyses modeled trial order to control for learning effects. Screen variations were designed with variations of background Gradient (Gradient, Flat), Font (Tahoma, Calibri), and Font Size (Smaller, Larger). Gradient differences were somewhat subtle with a light blue color presented in the flat condition. The examples screen shown in Figure 2 has a flat background, Tahoma font, and a smaller font size. The present paper considers these designed screen differences only as ways to create a range of screen complexity ratings, and does not further consider their individual factor influences.

quantitative factors or covariates. OLR does not require assumptions such as normality and homoscedasticity in the levels of the dependent variable, and independent variables can be either ordinal or categorical (Hosmer and Lemeshow, 2000). Based upon computed odds ratios, the TFF significantly influenced Ratings (Z=8.3, pabs[0.5]) were: • Time-sensitive (Ratings, .86; TFF, -.81), indicating a close relationship between Ratings and completion time. • Space-sensitive (Search Area, .64; JPEG File Size, .89), confirming that larger compressed file sizes were associated with larger search areas. • Cognitive (Fixation Duration, .98) • Emotional (EV, -.99)

Completion Time, TFF, and Ratings The relationship between completion time and TFF was extremely significant (R2=.39, p