Designing and Evaluating Multimodal Interaction ... - Semantic Scholar

6 downloads 10821 Views 827KB Size Report
Oct 22, 2008 - interaction concepts, task breakdown, application UI and interaction design) ... resemble those of desktop personal computers. Yet the typical.
Designing and Evaluating Multimodal Interaction for Mobile Contexts Saija Lemmelä, Akos Vetek, Kaj Mäkelä, Dari Trendafilov Nokia Research Center P.O. Box 407, FIN-00045 Nokia Group, Helsinki, Finland

[email protected], [email protected], [email protected], [email protected] potentially more compromised than those of personal computers. As a result, mobile user interfaces have recently received much attention from HCI researchers. Some of the important issues that need to be considered in the case of mobiles include: the limited size of the graphical displays, the type of input that can be performed on the move, the amount of attention a user can devote to the interface, and the social acceptance of devices and interaction techniques [6].

ABSTRACT In this paper we report on our experience on the design and evaluation of multimodal user interfaces in various contexts. We introduce a novel combination of existing design and evaluation methods in the form of a 5-step iterative process and show the feasibility of this method and some of the lessons learned through the design of a messaging application for two contexts (in car, walking). The iterative design process we employed included the following five basic steps: 1) identification of the limitations affecting the usage of different modalities in various contexts (contextual observations and context analysis) 2) identifying and selecting suitable interaction concepts and creating a general design for the multimodal application (storyboarding, use cases, interaction concepts, task breakdown, application UI and interaction design), 3) creating modality-specific UI designs, 4) rapid prototyping and 5) evaluating the prototype in naturalistic situations to find key issues to be taken into account in the next iteration. We have not only found clear indications that context affects users’ preferences in the usage of modalities and interaction strategies but also identified some of these. For instance, while speech interaction was preferred in the car environment users did not consider it useful when they were walking. 2D (finger strokes) and especially 3D (tilt) gestures were preferred by walking users.

Users’ tasks, goals, social situations and interaction enablers differ based on the user’s context. Especially mobile users are usually engaged in multiple activities simultaneously, which naturally gives rise to multimodal interfaces [24]. The selection of modalities can be based on the identified usage situation so that interaction channels are chosen according to the resources which are typically free in that situation. It has been suggested [24] that context-sensitive selection of modalities could be achieved by recognizing prototypical ‘‘modes’’ of mobility, such as walking, waiting, hurrying, or navigating, and choosing the interaction channels according to the resources that are typically free in that mode. According to Obrenovic et al. [13] sensory limitations caused by limiting environment characteristics can be described the same way as user preferences and features. In this sense, for example, external factors, such as noise or lighting can lead to similar design solutions used for people having problems with hearing or seeing [13], [29]. In addition, [14] have shown that users can self-manage limitations on working memory when task complexity increases by distributing communicative information across multiple modalities.

Categories and Subject Descriptors H.5 Information Interfaces and Presentation (I.7); H5.2 User Interfaces: User-centered design

In this paper we are putting forward our design process for mobile multimodal applications including our approach for identifying issues affecting the usefulness of interaction methods and modalities in varied contexts of daily life. The process we used during our design work includes 1) identifying the limitations affecting the usage of different modalities in different contexts, 2) creating usage scenarios, selecting interaction concepts and creating a general design for the multimodal application, 3) creating modality-specific designs, 4) rapid prototyping and 5) evaluating the prototype in naturalistic situations to find key issues to be taken into account in subsequent design iterations. In addition, we have collected information on the key issues affecting the selection of optimal modalities for various situations. For the design and prototyping purposes we selected a messaging application and evaluated it in a simulated car driving setup and while walking.

General Terms Design, Experimentation, Human Factors, Theory

1. INTRODUCTION Even though the computational power of mobile devices is advancing at a rapid pace, their user interfaces continue to resemble those of desktop personal computers. Yet the typical contexts of use of mobile devices are far more varied and Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICMI’08, October 20–22, 2008, Chania, Crete, Greece. Copyright 2008 ACM 978-1-60558-198-9/08/10...$5.00.

The remaining sections of the paper are organized as follows. After motivating our own work with background information and

265

Figure 1. Overview of our iterative design process for multimodal applications different naturalistic settings. However, during the participatory design process they realized that the requirements collecting techniques they were using did not gather enough data about the voice interaction and they created a special technique to gather this type of information from users. Envisioning novel interaction solutions seems to be one of the challenges when co-operatively designing multimodal interaction solutions with users. Users tend to consider typically only interaction methods already familiar to them, like graphical user interfaces, and tend not to consider any new methods even though they might find them useful and natural when first time trying them in real life. In this sense, we believe, that designing the use of new modalities like speech or gestures, should be mainly done by experienced interaction designers and design solutions should be discussed and evaluated with users at as early a phase in the design process as possible.

related work in section 2, we detail the design method we devised in sections 3.1-3.4. This is followed by a description of the implementation in section 3.5 and the evaluation of the multimodal messaging application we have built in section 3.6. Finally in section 4 we discuss the findings and learnings and conclude in section 5.

2. BACKGROUND The amount of studies concentrating on the design and usage of mobile multimodal applications in different contexts is still limited, though some research has been done and applications created to support multimodal interaction in visually, cognitively or otherwise challenging situations [4], [15], [16], [30]. Furthermore potential solutions utilizing different interaction methods and modalities to compensate for the limitations of mobile devices, like limited screen and button sizes, have been dealt with by several researchers [5]. Some studies have also been conducted on context-aware mobile devices adapting their modality configuration profile automatically based on context changes detected with an array of sensors [18], [20]. However, in these studies the focus has been on context recognition, not on the multimodal configuration as such.

3. METHOD Our iterative design process (see Figure 1) includes five basic steps: 1) identification of the limitations affecting the usage of different modalities in various contexts (contextual observations and context analysis) 2) identifying and selecting suitable interaction concepts and creating a general design for the multimodal application (storyboarding, use cases, interaction concepts, task breakdown, application UI and interaction design), 3) creating modality-specific UI designs, 4) rapid prototyping and 5) evaluating the prototype in naturalistic situations to find key issues to be taken into account in the next iteration.

Recently there have been some theoretical considerations regarding context-dependent interaction limitations and the usage of different modalities to overcome these limitations. The framework described by Obrenovic et al. [13] is based on modalities, effects of modalities and constraints reducing or eliminating these effects. Jameson et al. [8] created a task analysis for voice- and eye-based dialing, concentrating on the usage of different human capabilities (hand, eyes, voice, working memory, and ear) in different situations during the dialing task. Observations and survey results showed that user’s decision and behavior are strongly influenced by factors not covered by task analysis, such as previous experience and beliefs about social acceptability. To better understand social situations and how they would affect the interaction, we concentrated on them already in the observation phase during the first step of our design process.

3.1 Identifying interaction limitations of mobile situations To better understand the everyday interaction and the challenges of mobile contexts, we observed three persons in their daily activities. To get an extensive overview on their varying activities, we selected both working and non-working people and working and non-working days. We observed a 27-years old researcher of biology at her work and after her working hours, a banker (age 40) during his day off and a retired lady (age 60) during her normal day. During the study the activities, environment, communication and device usage were recorded. The study proved our expectations that different contexts, tasks, environments and social situations set varying restrictions and requirements for the interaction.

Baillie et al. [2] introduced an approach to combine laboratory and field studies for the design and evaluation of multimodal mobile applications. The technique presented in their paper comprised five design phases. They conducted observations in realistic environments to discover current practices and found the results useful when building scenarios. After that they facilitated discussions and used questionnaires to collect data and created action scenarios to demonstrate situations to users and allowed users to sketch the user interface. They also evaluated their multimodal prototypes both in the field and laboratory and found it very useful to evaluate the usage of multimodal features in

We selected a set of representative contexts from the ones we encountered in the observations that posed significant limitations to the interaction and analyzed them in more detail. The selected main contexts include: walking in a public place (e.g. in a street), standing in a bus or tram stop, traveling in a bus or tram, driving a car, having a lunch break in a cafeteria, shopping in a grocery

266

store, being at work, being at home and being in a gym. Some of the selected contexts and their general limitations are summarized in Table 1 and a more detailed description of one of these contexts is presented in Table 2. Based on the observations data we selected and further analyzed the car driving and walking contexts, which are clearly setting different and challenging requirements for interaction.

3.2 Storyboarding, creation of interaction concepts and general UI and interaction design One of the main goals of our design work was to find innovative and useful interaction solutions for common everyday situations and interaction problems. Scenario-based design has been previously used to support envisioning new media, while observation has been retaining ties to everyday life [25].

Table 1. Some of the identified contexts and a rough estimate of a person’s aural, visual, physical and cognitive load in them on a four-point scale (the darker the color the higher the load) Aural

Visual

Phys.

Cogn.

Based on the data collected during the observation phase we identified a set of use cases and sketched scenarios illustrating the basic tasks and contexts of use. For both contexts (walking and driving) we created a storyboard (see Figure 2) supporting us in understanding the situation and the task. Storyboards also visualized and created a common understanding inside our design and prototyping teams on aspects and contextual variables affecting the interaction and usage of multimodal features. Contextual variables we identified included the objects causing noise, person’s social situation, availability of hands, clothing, lighting, temperature and so on. These variables were identifying a set of aspects, which would make the solution work in one context, but not in another. Some possible changes of these variables were visualized in storyboards. This approach helped us find optimal interaction solutions for the situation at hand.

Social

Walking (public pl.) Driving a car Being in a gym Shopping groceries Lunch in a cafeteria Traveling by bus

Table 2. Some typical issues affecting the interaction when a person is walking in a street Aural

Visual

Physical

Cognitive

- Traffic

-Following the environment (e.g. navigation, avoiding obstacles)

- Clothing (caused by temperature, humidity)

- Social situation (e.g. people around, companion)

- Person’s plans and schedule, pace of walk

- Amount and a type of traffic (e.g. when crossing road)

- People around (e.g. conversation, footsteps)

- Amount of natural light (caused by time, location, weather) - Lighting (e.g street lights, car lights etc.)

- Person is typically carrying item(s)

- Walking surface (e.g. slippery, snowy, bumpy, avoiding obstacles)

Figure 2. Storyboard visualizing the person’s tasks, context and a set of typical contextual variables The UI was tailored to be suitable in different contexts of use while maintaining consistency, an important design principle [19] that should also be adhered to across modalities [17]. To achieve this we created a generic view of the application by analyzing the messaging task. During the task analysis we identified generic interaction primitives occurring for different sub-tasks. An example of such a primitive is selecting an item from a larger set of items, typically implemented as a listbox widget in graphical user interfaces. After that, the possible interaction concepts were identified and a set of interaction methods and modalities applicable for each primitive were specified. Special emphasis was put on common primitives occurring at various points during the interaction to ensure that the application will behave consistently when used with different modalities in different situations.

Recent empirical studies have shown that even though the share of emerging smartphone services, such as browsing and multimedia is on the rise with regards to the time people spend using smartphones, voice and Short Message Service (SMS) text messaging are still the most popular smartphone services [28]. Furthermore, these services are used on a daily basis and in various situations and places (e.g. at home, on the move, at work). Thus we selected both of these applications for prototyping, but for the sake of brevity and clarity we only present the design and evaluation of one application in this paper, the messaging application.

3.3 Selecting optimal modalities for different situations During the design process the modalities were used and combined based on their abilities to present different types of information or based on the ways they support human perception, attention and

267

include speech, voice and non-speech audio. Each manifestation has its own characteristics, based on which it can be identified and selected for use. A subset of the information we collected and analyzed is presented in Table 3. A more detailed description of this work is available in [10].

understanding in the particular situation. To support this process by fully utilizing various modalities, we collected and analyzed information regarding the characteristics, limitations, properties, and strengths of different input and output modalities. The purpose of this work was to identify modalities and modality combinations best suited for different situations and information presentation needs.

3.4 Modality-specific design Finally, we created user interface designs for the separate modalities including visual and auditory representations of information, tactile and auditory cues representing the status of the interaction, and speech, 3D gesture (tilt) and 2D gesture (finger stroke) based input (see Figure 3).

Table 3. Examples of information collected on some output modalities Output

Visual (static)

Auditory

Haptic

Types

Text, graphics

Speech, sound (abstract/natural)

Tactile, kinaesthetic

Features

Non-temporal, spatial

Temporal, nonspatial, to be presented sequentially rather than parallel

Temporal, spatial, with mobile devices often arbitrary

Most suitable

Spatial qualities

Temporal qualities

Movement

Pros

High-specificity, supports privacy

Usable when user's focus not on screen, obtrusive / draws attention

Discreet, good on manipulation and exploration, usable when user's focus not on the screen

Cons

User's focus needs to be on a task and screen, challenging in bright/dark

Not usable always in noisy environment, when privacy needed, certain social situations, obtrusive

Conveys often only limited amount of information (understandability), interference, perceiveability (body contact needed)

Example context

In meeting, office, public places, transportation

Driving, sporting, certain outdoor situation (bright sunshine, gloves on)

Public (noisy) places

Properties

Light, color, transparency, shape, texture, size, motion, spatial relations…

Volume, pitch/ frequency, timbre, spatial location, intensity/amplitude, rhythm, synchrony with temporal on/offset...

Frequency, amplitude, rhythm, vibration pattern, position, resistance, capacitance, temperature, roughness, stiffness…

Manifestations

Text, image, map, diagram, graph, notation, icon…

Synthetic/recorded speech, auditory icon, earcon, sonification, audification, music…

Vibration (e.g. tactons), force feedback, temperature, pressure distribution…

While output modalities were mainly defined by the designer for the selected task and context, users were able to freely select the input modality. Based on the information we collected on the characteristics of different modalities we expected that some modalities might be more appropriate in certain situations than others. For example, based on earlier work [8], we expected speech interaction to be preferred more in the car than in the walking context. Thus we designed and created two versions of the user interface, the first version targeted for the car driving situation and the second version for walking. The modality characteristics we had collected earlier guided us in the design work. We also used different initiation method in each situation. For instance, for the car context we designed a system-directed speech dialog and for the walking context we employed a mixedinitiative strategy.

Figure 3. The prototype application designed for the walking context

The suitability of a particular modality for a certain purpose is affected by the social and physical environment, devices available, the information to present, the interaction process and the user’s long and short term abilities, the cognitive and emotional context, the experience level and preferences [13]. The physical environment and the user’s task are naturally affecting the user’s temporal sensory and motor capabilities. The basis of this work relies on existing multimodal design taxonomies and frameworks [3], [7], [11], [22].

3.5 Rapid prototyping The prototype was built using existing software components and off-the-shelf hardware. For rapid prototyping purposes, we chose to use web technologies where possible. The application ran on a PC, in a speech-enabled internet browser [26]. Finger stroke gesture recognition was implemented using JavaScript, while communication with the accelerometer sensor and vibra modules were carried out over Bluetooth, and were accessible from the browser content.

To create a simple classification we divided the input and output modalities into groups of visual, audio and haptic modalities. The three main modality classes include a set of manifestations of input and output modalities. For example, manifestations of auditory output include speech, auditory icons, earcons, sonification, audification and music, and manifestations of input

We created two different configurations meant to be evaluated in the driving and the walking contexts. Both hardware configurations had a touch screen. In the car scenario a Tablet PC was mounted to the car’s dashboard to provide good visibility in a

268

visually and cognitively challenging situation. The walking scenario required a smaller, handheld device for single or two handed interaction, so the application was displayed on a Nokia N800 Internet Tablet. In both scenarios the users had a head-worn microphone for voice commands. In the car the audio was played through speakers, during walking through headphones.

3.6.1 Car driving context evaluation setup The prototype for the car context was evaluated in a simulated driving situation (see Figure 5). The evaluation was conducted in a full-size car installed in a laboratory room with a video projector. The control equipments of the car, the steering wheel, pedals and gears of the car were connected to a game console, which was running a driving simulator projected on the screen in front of the car. The user sat in the car and was instructed to drive on the track and to avoid hitting other cars. At the same time (s)he was given tasks to use the multimodal messaging application to read and respond to messages. The five tasks were given verbally by the moderator sitting next to the driver. Tasks included opening a just received message, responding with a customizable template message, browsing messages and sending a recorded voice message. The users were encouraged to explain aloud their actions and thoughts during the evaluation session.

In the car scenario the application was running on the Tablet PC. In the walking configuration we used a client-server setup where the participants were using the Internet Tablet client displaying the prototype application through a wireless VNC connection. The prototype application was running on a PC carried in the backpack together with the audio-video capturing devices used to record information for evaluation purposes (see Figure 4). We implemented a simple stroke input mechanism on the touch screen and scaled the GUI to a size enabling accurate finger input in both scenarios. The walking scenario introduced the possibility to use the haptic modality as the device was held in hand. We accompanied the Internet Tablet with an additional haptic interaction device consisting of a 3-axis accelerometer, a standard rotational vibra motor and a Bluetooth module. We implemented a simple tilt recognition for detecting basic 3D tilting gestures left, right, up and down.

Three voluntary participants, two males and one female, were selected among the office personnel for the actual evaluation. Two of the participants did not have an engineering or research background. We considered that 3-4 users would be enough for our first evaluations, because our purpose at this stage was not to find all existing usability problems, but get a feeling of the most important issues, which should be taken into account when designing our application further. By using the estimate introduced by [12] evaluating a system with 3-4 users would help finding most of the usability problems (~70%). We expect these findings to include the most critical issues affecting the usage of the system.

Figure 5. Evaluating the multimodal application in car driving context

Figure 4. Evaluation setup for the walking context

The users who participated in the evaluations were not familiar with the application in advance, so they were given a five minute introduction to the user interface without showing the actual interaction. The available modalities were explained verbally, leaving space for exploration. The duration of the evaluation lasted from 45 minutes to 1 hour and 15 minutes. The participants were given a small gift for their effort. The evaluation was captured on video with one camera behind the user, covering the device and the surrounding dashboard, but also a part of the projected view.

3.6 Evaluation in simulated environments Prior to the actual user evaluation the first version of our prototype was evaluated with an expert in order to detect the most evident, context independent usability issues. Based on the expert evaluation findings the two context specific versions of the prototype were finalized and evaluations were conducted in simulated environments. The evaluations were concluded by a 1020 minute interview, focusing on problematic situations, for verifying users’ conception on the system behavior and to inquire about their interaction preferences.

269

noticed nor used gestures, before they were instructed to do so by the test moderator. This happened even though the availability of 2D and 3D gestures was mentioned, when the system was introduced verbally to the users in the beginning of the evaluation. There were also subtle arrow signs on the screen acting as directional cues. This indicated that efficient cues or hints both for 2D and 3D gestures are needed. Suitable cues suggested by users themselves were: animations suggesting the availability and usage of gestures and auditory and tactile cues reflecting the state of the user interface (e.g. when focus changes).

3.6.2 Walking context evaluation setup The handheld walking context prototype was evaluated indoors, in an office environment (see Figure 6). The user walked through various open office areas, stairs and doors during the evaluation, so tasks and social situations varied during the route. The moderator was walking together with the user, giving instructions and tasks to be done with the application. The same five tasks were used as in the car context and they were also given verbally. Two observers could follow the interaction via their own displays.

After getting more familiar with the gestures, most users stated their preference for 2D and 3D gestures especially when walking, since these require no or very little visual attention. 3D gestures were especially liked and considered a nice way to interact by all users. During our evaluation left and right tilts were causing less confusion than tilting the device towards or away from the user. This was mainly due to the shape of the device and the interaction metaphor used. The users had different opinions on which direction the focus should move, when they are tilting the device towards or away from themselves. Typical metaphors users mentioned when using 2D or 3D gestures were: direct manipulation, rolling, and the idea that UI items have weight and they are dropping downwards, when the device is tilted. Some users stated that they did not feel it natural to tilt the screen away from themselves. The way the device is usually kept in hand resulted that finger strokes upwards and downwards were considered more natural than stokes to the left and to the right.

Figure 6. Evaluating the multimodal application in the walking context Two males and two females participated in the evaluation in the walking context. The users were given a five minute introduction to the user interface without showing the actual interaction. The available modalities were explained verbally, leaving space for exploration. The duration of the evaluation varied from 30 minutes to 1 hour. The participants were given a small gift for their effort.

Speech feedback reading out the current item in focus, was considered very useful by all users in both contexts. Users generally mentioned speech output to be the main feedback modality for 3D gestures. One of the users explained her preference for 3D gestures and speech feedback: “I liked strokes and tilting, no attention was needed when audio was telling where you are."

The evaluation was captured on video with two small cameras, one capturing the device screen interaction and the other placed on the shoulder of the user captured the surroundings of the user. These two video streams were mixed together (see Figure 6) and recorded on the device carried by the participant in a backpack.

4. DISCUSSION

In this section we summarize our findings from the two evaluations. Based on our studies it is clear that context affects user's preferences in the usage of modalities and interaction strategies. While speech interaction was preferred in the car environment users did not consider it useful when they were walking. 2D (finger strokes) and especially 3D (tilt) gestures were preferred by walking users.

The outcome of the first iteration of our design process once again underlined the complexity and the challenges of designing usable multimodal interfaces. Mobile devices are used in varying situations setting different limitations for interaction. According to our study, people’s interaction preferences vary based on the context the application is used in. Multimodal user interfaces should adapt to different contexts of use, as well as to the needs and abilities of different users [17].

All users considered speech interaction easy and especially usable when driving a car and stated that they would be ready to use even a speech-only system when driving. System-directed step-by-step speech interaction was considered efficient and easy to use, and this style did not seem to irritate driving users at all. When users were walking, the situation was quite the opposite and they did not consider speech input too useful. Long help prompts and speech recognition errors seemed to be disturbing walking users more than driving users, even though the dialog was shortened and transformed to mixed-initiative for the walking context.

One of the challenges of multimodal interaction design is to know when and how to adapt the user interface, and how to keep it consistent and easy to use. Our method proved useful when identifying the key issues affecting the interaction in different contexts. It also helped us create interaction concepts and designs to overcome the interaction challenges set by the context. The evaluations enabled us to identify some aspects and challenges related to multimodal interaction in varied contexts, especially the need for a coherent metaphor that efficiently supports interaction in all available modalities was clear.

People often stick to familiar interaction methods, and do not necessarily change their habits, even if they would themselves consider that some other method would be more useful in the given situation [8]. In our case we could see that users neither

We utilized and experimented with a set of design and prototyping tools, including SUEDE [9], a wizard of Oz tool for speech interaction, the CSLU Speech Toolkit [23] and CrossWeaver [21], a tool for prototyping multimodal and multi-

3.6.3 Findings

270

[3] Bernsen, N.O. 1993. Modality Theory: Supporting Multimodal Interface Design. In Proceedings of ERCIM Workshop on Multimodal Human-Computer Interaction. 1323.

device user interfaces. As we found out, most design tools support only a limited amount of interaction modalities, which makes it challenging to create holistic design of coherent user interfaces by using the available tools. Thus we were able to use these tools only in the modality-specific design phase. The lack of integrated tools supporting the design of multimodal user interfaces leads to diverse methods used by interaction designers.

[4] Bernsen, N.O. and Dybkjær, L. 2001. Exploring Natural Interaction in the Car. In Proceedings of the International Workshop on Information Presentation and Natural Multimodal Dialogue (Verona, Italy, December 75-79, 2001). 75-79.

Although our design method provided us with valuable support for the selection of suitable modalities for different situations we still discovered that even the best modality configuration is worthless if the user is not able to discover the interaction possibilities. In the evaluation we noticed that the participants were not able to discover the 2D or 3D gestures without the moderator’s help. We believe that more efficient coupling of a user’s action and a product’s function [27] would support users in finding and exploring new interaction methods. Our design method did not provide any support for the design of appropriate interaction cues and affordances for modalities, which should be considered when further developing the method. Also, the ways to take the users’ previous knowledge and habits into account and to support the learning of new interaction methods more suitable for a specific context, should be considered in the future.

[5] Brewster, S. 2002. Overcoming the Lack of Screen Space on Mobile Computers. Personal and Ubiquitous Computing. 6, 3 (May 2002), 188-205. [6] Costanza, E., Inverso, S.A., and Allen, R. 2005. Toward subtle intimate interfaces for mobile devices using an EMG controller. In Proceedings of the Conference on Human Factors in Computing Systems (Portland, USA, April 2-7, 2005) CHI 2005, 481-489. [7] Frohlich, D. M. 1992. The design space of interfaces. In L Kjelldahl (Ed.) Multimedia - Principles, systems and applications. Springer-Verlag, Berlin, Germany, 53-69. [8] Jameson, A. and Klöckner, K. 2005. User Multitasking with Mobile Multimodal Systems. Spoken Multimodal HumanComputer Dialogue in Mobile Environments, Springer, Netherlands, 349–377.

5. CONCLUSION We found that all phases in our design process had an important contribution to the successful application user interface design. The observation phase enabled us to discover everyday situations which are challenging for mobile users. This helped us create the use cases and interaction concepts and supported us in including context specific requirements of interaction already in the early phase of the design. By identifying the interaction requirements and limitations in different mobile situations and by analyzing modalities and their characteristics in detail we were able to match the modalities with the interaction requirements and this way select the optimal interaction methods for different contexts. The use of rapid prototyping tools and methods enabled us to concentrate and focus more on the design process, where we invested most of the resources. After the first round of design, prototyping and implementation we were able to identify a set of key issues, which affect the usage of our multimodal application in different situations and which would need to be taken into account in further designs. Our user studies proved that the context highly affects users’ preferences in the usage of modalities and interaction strategies and when creating multimodal mobile applications these preferences should be incorporated in the design process of multimodal mobile interaction.

[9] Klemmer, S.R., Sinha, A.K., Chen, J., Landay, J.A., Aboobaker, N., and Wang, A. 2000. SUEDE: A Wizard of Oz Prototyping Tool for Speech User Interfaces. In Proceedings of the 13th annual ACM symposium on User interface software and technology (San Diego, USA, November 5-8, 2000, 2000). UIST 2000, ACM Press, New York. NY, USA, 1-10. DOI=http://doi.acm.org/10.1145/354401.354406 [10] Lemmelä, S. 2008. Selecting optimal modalities for multimodal interaction in mobile and pervasive environments. In Pervasive 2008 Workshop Proceedings (Sydney, Australia, May 18, 2008). IMUx 2008, 208-217. [11] Nesbitt, K.V. 2004. MS-Taxonomy: a conceptual framework for designing multi-sensory displays. In Proceedings of Eighth International Conference on Information Visualisation (London, UK, July 14-16, 2004). IV 2004, IEEE Computer Society, New York, NY, USA, 665-670. DOI= http://doi.ieeecomputersociety.org/10.1109/IV.2004.1320213 [12] Nielsen, J., and Landauer, T. K. 1993. A mathematical model of the finding of usability problems, In Proceedings of ACM INTERCHI'93 Conference (Amsterdam, Netherlands, April 24-29, 1993). INTERCHI'93, ACM Press, New York. NY, USA, 206-213. DOI= http://doi.acm.org/10.1145/169059.169166

6. REFERENCES [1] Bachvarova, Y., van Dijk, B. and Nijholt, A. 2007. Towards a Unified Knowledge-Based Approach to Modality Choice. In Proceedings Workshop on Multimodal Output Generation (Aberdeen, Scotland, January 25-26, 2007). MOG 2007. CTIT Workshop Proceedings Series WP07-01, 5-15.

[13] Obrenovic, Z., Abascal, J. and Starcevic, D. 2007. Universal accessibility as a multimodal design issue. Communications of the ACM, 50, 5 (May 2007), 83-88. DOI= http://doi.acm.org/10.1145/1230819.1241668

[2] Baillie, L. and Schatz, R., 2005. Exploring multimodality in the laboratory and the field. In Proceedings of the 7th international conference on Multimodal interfaces (Trento, Italy, October 4-6, 2005). ICMI’05. ACM Press, New York, NY, USA, 100-107. DOI= http://doi.acm.org/10.1145/1088463.1088482

[14] Oviatt, S., Coulston, R., and Lunsford, R. 2004. When do we interact multimodally?: cognitive load and multimodal communication patterns. In Proceedings of the 6th international conference on Multimodal interfaces (State

271

[23] Sutton, S., Cole, R., Villiers, J., Schalkwyk, J., Vermeulen, P., Macon, M., Yan, Y., Kaiser, E., Rundle, B., Shobaki, K., Hosom, P., Kain, A., Wouters, J., Massaro, D., Cohen, M. 1998. Universal Speech Tools: The CSLU Toolkit. In Proceedings of 5th International Conference on Spoken Language Processing (Sydney, Australia, November 30 December 4, 1998). ICSLP’98, 3221-3224.

College, PA, USA, October 13-15, 2004). ICMI 2004, ACM Press, New York. NY, 129-136. DOI= http://doi.acm.org/10.1145/1027933.1027957 [15] Pascoe, J., Ryan, N. and Morse, D. 2000. Using while moving: HCI issues in fieldwork environments. ACM Transactions on Computer-Human Interaction (TOCHI), 7, 3 (September 2000), 417-437. DOI= http://doi.acm.org/10.1145/355324.355329

[24] Tamminen, S., Oulasvirta, A., Toiskallio, K., and Kankainen, A. 2004. Understanding mobile contexts. Personal and Ubiquitous Computing, 8, 2, (May 2004), 135-143.

[16] Pirhonen, A., Brewster, S. and Holguin, C. 2002. Gestural and audio metaphors as a means of control for mobile devices. In Proceedings of the SIGCHI conference on Human factors in computing systems: Changing our world, changing ourselves (Minneapolis, MN, USA, April 20-25, 2002). CHI 2002, ACM Press, New York, NY, USA, 291298. DOI= http://doi.acm.org/10.1145/503376.503428

[25] Tollmar, K., Junestrand, S. and Torgny, O. 2000. Virtually living together. In Proceedings of the 3rd conference on Designing interactive systems: processes, practices, methods, and techniques (New York, USA. August 17-19, 2000). DIS’00, ACM Press, New York. NY, USA, 83-91. DOI= http://doi.acm.org/10.1145/347642.347670

[17] Reeves, L.M., Lai, J., Larson, J.A., Oviatt, S. et al. 2004. Guidelines for multimodal user interface design. Communications of the ACM, 47, 1 (January 2004), 57-59. DOI= http://doi.acm.org/10.1145/962081.962106

[26] Wang, K. 2002. SALT: a spoken language interface for webbased multimodal dialog systems. In Proceedings of International Conference on Spoken Language Processing. ICSLP 2002, 2241-2244.

[18] Schmidt, A., Aidoo, K.A., Takaluoma, A., Tuomela, U., Van Laerhoven, K., Van de Velde, W. Advanced Interaction in Context. In Proceedings of the 1st international symposium on Handheld and Ubiquitous Computing, Karlsruhe, Germany, 1999, 89 – 101.

[27] Wensveen, S.A.G., Djajadiningrat, J.P. and Overbeeke, C.J. 2004. Interaction frogger: a design framework to couple action and function through feedback and feedforward. Proceedings of the conference on Designing interactive systems: processes, practices, methods, and techniques (Cambridge, MA, USA, August 1-4, 2004). DIS’04, ACM Press, New York. NY, USA, 177-184. DOI= http://doi.acm.org/10.1145/1013115.1013140

[19] Shneiderman, B. 1997. Designing the User Interface: Strategies for Effective Human-Computer Interaction. Addison-Wesley, Longman Publishing Co. Inc. [20] Siewiorek, D., Smailagic, A., Furukawa, J., Krause, A., Moraveji, N., Reiger, K., Shaffer, J., Fei Lung Wong. SenSay: a context-aware mobile phone. In Proceedings of Seventh IEEE International Symposium on Wearable Computers, 2003, 248- 249.

[28] Verkasalo, H. 2007. A Cross-Country Comparison of Mobile Service and Handset Usage. Licentiate's thesis, Helsinki University of Technology, Networking Laboratory, Finland. [29] Wobbrock, J.O. 2006. The Future of Mobile Device Research in HCI. In Proceedings of the workshop on ‘What is the Next Generation of Human-Computer Interaction?’ at CHI 2006 (Montreal, Canada, April 23, 2006). 118-121.

[21] Sinha, A.K. and Landay, J.A. 2001. Visually Prototyping Perceptual User Interfaces through Multimodal Storyboarding. In Proceedings of the 2001 workshop on Perceptive user interfaces (Orlando, USA, November 15-16, 2001) PUI 2001, ACM Press, New York. NY, USA, 1-4. DOI= http://doi.acm.org/10.1145/971478.971501

[30] Zhao, S., Dragicevic, P., Chignell, M., Balakrishnan, R., Baudisch, P. 2007. Earpod: eyes-free menu selection using touch input and reactive audio feedback. In Proceedings of the SIGCHI conference on Human factors in computing systems (San Jose, CA, USA, April 28 - May 3, 2007). CHI 2007, ACM Press, New York. NY, USA, 1395-1404. DOI= http://doi.acm.org/10.1145/1240624.1240836

[22] Sutcliffe, A.G., Kurniawan, S. and Shin, J.E. 2006. A method and advisor tool for multimedia user interface design. International Journal of Human-Computer Studies 64, 4 (April 2006), 375-392.

272