Device independent mobile multimodal user interfaces ... - CiteSeerX

2 downloads 56453 Views 380KB Size Report
independent authoring of user interfaces for mobile applications. ... At the core of the MONA technology (mona.ftw.at) is the concept of multimodality. ... difficult to create good user interfaces due to the multitude of devices and their different capabilities, .... An alternative approach is model-based user interface development.
Device independent mobile multimodal user interfaces with the MONA Multimodal Presentation Server Georg Niklfeld 1 , Hermann Anegg1 , Alexander Gassner2 , Michael Jank3 , Günther Pospischil4 , Michael Pucher1 , Raimund Schatz1 , Rainer Simon1 , Florian Wegscheider1 1

Telecommunications Research Center Vienna (ftw.); Donau-City-Straße 1, 1220 Wien, Austria {niklfeld, anegg, pucher, schatz, simon, wegscheider}@ftw.at 2

Siemens Österreich AG; Gudrunstraße 11; A-1100 Vienna, Austria [email protected]

3

4

Kapsch CarrierCom AG; Am Europlatz 5; A-1120 Vienna, Austria [email protected]

Mobilkom Austria AG & Co KG; Obere Donaustraße 29; A-1020 Wien, Austria [email protected]

Abstract In the collaborative research-project MONA we have developed a technology that enables device independent authoring of user interfaces for mobile applications. The key innovation consists in the capability of the MONA Multimodal Presentation Server to adapt not only graphical user interfaces, but also to support speech input and output that combine with a visual interface on a range of different mobile terminals, to deliver multimodal applications. This increases the versatility of mobile applications. With MONA technology, users can now continue an application session during intermittent mobile, hands-busy or eyes-busy situations.

1.

Introduction

At the core of the MONA technology (mona.ftw.at) is the concept of multimodality. A modality is a way to convey information between a user and the interface of an application, using a single distinct human capability to process information. The MONA approach supports the modalities of speech, text, graphics, and non-speech audio. Most mobile devices support more than just one modality. In fact, almost all current mobile phones or PDAs, come with a microphone, speaker, and graphical dis play. Yet there exist few applications on mobile terminals that can exploit several modalities in a flexible manner, even though the generally small screen and the frequent change of context suggest that a multimodal user interface should be of great benefit to the user. This potential is linked to the desire of users to continue to use mobile applications also in short intervals during which they are highly mobile, such as when travelling in (or moving between) vehicles. There are several reasons for the dearth of multi-modal applications. First of all, today’s 2G mobile networks and devices encounter problems when transmitting data and voice simultaneously. Second, it is difficult to create good user interfaces due to the multitude of devices and their different capabilities, and finally creating a multimodal interface is more difficult than designing for voice or graphics alone. The first problem is relieved by GPRS and will be fully solved in 3G systems . This work deals with the other two problems by relieving the application from having to adapt the user interface to specific devices and modalities. In section 2 of this paper we describe the MONA Multimodal Presentation Server. Section 3 describes the MONA Editor and the MONA authoring process, section 4 introduces two demo applications we implemented, and section 5 presents an overview of related work. In section 6 we discuss the relevance of this project in terms of deployment potential in the current technical and business environment.

2.

MONA Multimodal Presentation Server

The MONA Multimodal Presentation Server supports the deployment of device independent applications that combine a graphical user interface (GUI) with speech input and output. Figure 1 gives a high level overview of the MONA architecture concept.

MONA Application

MONA Application

MONA Application

. . . . . .

MONA Application

MONA Presentation Server

WLAN

2.5G/3G MOBILE PHONE NETWORK

Figure 1 MONA Server and Clients. Client devices access the server over a wireless network. The MONA system supports 2.5G and 3G mo bile networks as well as wireless LANs. In order to allow for a flexible and yet uniform application concept across all different device categories, we chose a browser-based approach. Client devices interact with applications using a browser and a circuit - or packet-switched voice connection. This way MONA applications scale gracefully from lowend WAP phones to high-end Symbian-based smartphones and powerful (X)HTML-enabled handheld computers. The main innovative features supported by the MONA presentation server are:

2.1.

§

Adaptive rendering of a single, abstract user interface description to a concrete target format suitable for a specific client device.

§

Support for as ymmetrically multimodal interaction between users. The MONA presentation server can convert messages exchanged between users in such a way that they are suitable for the current modality settings of each individual user.

§

Broadcasting of user interfaces to several clients without application involvement.

§

Application-initiated push of new user interfaces. Mona applications are not restricted to the standard browser request/response interaction metaphor. The client browser also allows pushing of user interfaces. On low-end phones that do not support a client browser, WAP push can be used for pushing pages to the user.

Implementation / Server Architecture

The three-tier system architecture is shown in Figure 3.

APPLICATION

Application ADAPTATION

Application

XML/HTTP Interface

SOAP Interface

MONA PRESENTATION SERVER

SQL DB

HTTP Interface

CLIENT

Push Interface

Client for Sequential Multimodality

Client for Concurrent Multimodality

Figure 2 MONA system architecture A more detailed architecture diagram of the MONA Presentation server is shown in Figure 4. XML/HTTP Interface

SOAP Interface

SOAP Push Interface

Binary Upload Processing

SQL DB

Persistence Layer

MONA Portal Device Management Session Management

REQUEST HANDLING Adaptation Engine

User Management

Push Handler

HTTP Interface

HTTP Interface

Figure 3 MONA presentation server architecture The ma in module of the MONA presentation server contains the request handler, the core server component taking care of user session management and broadcasting. An adaptation engine performs the transformation process from abstract user interface to concrete target mark-up language. Currently implemented target languages are HTML, WML, VoiceXML and two proprietary multimodal formats combining VoiceXML and HTML or WML, respectively. Persistent data management handles user-, session-, and application data stored in the MONA database. Input integration and synchronization issues for multimodal client devices are resolved by the Kirusa [11] platforms, which communicate with the MONA server via HTTP. High-end devices like PDAs or Symbian sma rtphones can access the MONA server via the Kirusa platform for concurrent multimodality, which offers a full multimodal user experience with synchronized voice and visual user interfaces. Lowend WAP phones can connect to the MONA server via the Kirusa sequential platform, which only offers sequential multimodality, i.e. the user has to switch between voice-UI and GUI. External components connected to the Kirusa platform include a speech recognition engine by Nuance [15] and a Text -ToSpeech module by SVOX [19]. MONA applications are implemented as web services that interact with the server via a SOAP interface. The payload of the SOAP messages are the user interface descriptions in a MONA-specific UIML format. As an alternative, the user interface descriptions can also be communicated via a basic HTTP/XML interface.

3.

The MONA authoring process and the MONA Editor

One goal of the MONA project was that despite platform-independence, the traditional workflow of designers should be preserved as much as possible. Based on the root interface, the designer should be able to intuitively build the related abstract presentation model. The abstract presentation model should contain enough semantic information to allow an adaptation towards a sufficiently broad spectrum of potential target devices and modalities. Additionally, the designer should have the option to exercise detailed control over particular device-specific presentations, i.e. should be able to treat some devices differently, if desired. Nevertheless, a persisting problem of any single authoring method is a loss of predictability. Despite designer-centric authoring based on a concrete root interface, the designer can never fully anticipate each possible result on all different devices and in all different modalities. We believe that this problem is best addressed by providing tool support – the MONA Editor. The MONA Editor is a rapid prototyping environment for user interfaces based on the mark-up language we have developed. It is implemented in Java, based on the Eclipse platform [8]. The main work area consists of three views (see Figure 2):

Figure 4 MONA Editor with real time GUI previews §

Tree view depicting hierarchical structure of user interface components and grouping elements.

§

Attribute/behaviour table showing all attributes associated to the currently selected user interface component or grouping element.

§

A mark-up source code view.

All views are synchronized, i.e. selecting a user interface component in the tree view will automatically highlight and scroll to the corresponding code section in the source code view and show the corresponding attributes/behaviours in the attribute/behaviour table. Navigating through the source code will also automatically select the corresponding element in the tree and update the attribute/behaviour table accordingly. The construction of a new user interface from scratch is mainly done in the tree view, using a right-click context menu or the component toolbar at the top of the application window. Editing of each component’s attributes and behaviours can be done in the attribute/behaviour table or directly in the source code view.

Different device emulators (including a Symbian UIQ smartphone and two different WML phones) offer a real-time preview of the GUI. Furthermore, a voice-enabled PDA emulator allows a multimodal preview. A simple visual representation of the voice dialog is also available.

4.

Applications

We implemented two sample applications for MONA: a unified messaging client and a multimodal quiz game. In order to cover a high diversity of aspects, we conceived the messaging client (MONA@work) as a single-user, dialog based application and the quiz (MONA@play) as a multi user real-time game .

4.1.

MONA@work application

The MONA@work application represents a mobile multimodal unified messaging client. It enables the user to administrate emails, SMS, MMS and voice messages through an interface that is especially designed to assist operation for a mobile user. Frequently needed functions may be performed visually, with the help of voice commands, or via a voice-only dialogue interface. The underlying scenario is a business user who quickly wants to get an overview over new messages while walking in the city, listen to the terminal reading the content of a new emails, and reply instantly with a voice message.

“Inbox contains 4 new of total 17 messages. Current new email message is 1 of 17, sender …”

Figure 5 Messaging client for PDA and WAP In contrast to the quiz application, the messaging client integrates other services beyond the MONA presentation server: an IMAP / SMTP server for email and voice messages and a ParlayX gateway for SMS and MMS communication. The push mechanism of the server allows the application to inform the user when a new message has arrived. The user may read or listen to the content of text messages by TTS. Prefe rences and user information are stored in the application and not in the terminal. Thus the user may switch devices, e.g. borrow the mobile from a colleague, log on to the MONA server and operate MONA@work without limitations. This is an advantage over a built-in email client that stores access data in the terminal. An address book will assist the user and provide convenience if the user selects a contact by voice. Messages are grouped like in known messaging clients.

4.2.

MONA@play application

The second demo application is a multimodal quiz game in the style of “Who wants to be a millionaire?” This application was chosen because playing a quiz game is usually a social activity that people enjoy with others. We designed the quiz as a multi-player game, and our game offers multimo dal chat for userto-user communication which complements the competitive game play with rich social interaction between the players.

Top bar Chat Ticker Avatars Question Timer Answering options Chat button

Figure 6 MONA@play: Question & Answer (PDA) Since the Mona project features user access with heterogeneous communication devices and modality preferences, the chat has to support asymmetric multimodality (e.g. user A utilizes a different modality than user B): spoken chat messages are automatically translated to text messages for GUI-only receivers by the presentation server’s conversational ASR (automatic speech recognition) subsystem. Vice versa, text messages are translated to voice by TTS (text -to-speech).

5. 5.1.

Related Work Single authoring for device independence

Multiple approaches exist in the scientific literature for representing interfaces for multiple devices and modalities in a single source-code format. Common to all approaches is the need for an adaptation solution somewhere in the delivery path that translates the single source user interface implementation into a device-specific format before it is presented to the user. IBM’s Abstract User Interface Mark-up Language AUIML [3] or generic vocabularies [2][16] defined for the User Interface Mark-up Language UIML [1] – an abstract XML meta-language for the canonical representation of user interfaces – describe user interfaces by means of generic widgets. The adaptation solution transforms the widgets into a suitable concrete representation for each target device. Comparable solutions include Mozilla’s XML User interface Language XUL [24] or Microsoft’s eXtensible Application Markup Language XAML [23]. An alternative approach is model-based user interface development. Model-based methods describe the interaction between user and applications at different levels of abstraction. Examples for model based specification languages are the eXtensible Interface Mark-up Language XIML [17], and the User Interface eXtensible Mark-up Language UsiXML [20]. TERESA [4] is a model-based design environment that generates concrete user interfaces for different platforms from a task model in an automatic or designer-guided process, and GrafiXML, a graphical editor for UsiXML. ReversiXML and Vaquita [5] are tools for reverse-engineering existing HTML pages into an abstract representation. In addition to the above approaches , there are also activities to extend established web markup languages such as XHTML and CSS with features enabling advanced device- and modality-independence. Backward -compatibility and a large base of established practitioners who are familiar with these standards is definitely the main advantage of these evolutionary approaches. An example is the RendererIndependent Mark-up Language RIML [8]. RIML is a custom extension to XHTML and XForms which adds features such as pagination and device independent layout mechanisms. Other commercially available solutions employed in the domains of mobile network operator portals ([14][21]) rely, to our knowledge, on similar techniques.

5.2.

Mobile multimodal applications

Basic-research projects in the area of multimodal systems have often focussed on elaborate system architectures for advanced forms of multi-modality: DARPA communicator [13], various projects at Stanford, e.g. WITAS [12], projects at Oregon Graduate Institute such as QuickSet [6], or the German

Smartkom project [22]. These large and often multi-year projects were important because they provided architectural foundations for sophisticated types of multi-modal integration. On the other hand, for resource-limited mobile devices downward scalability from the often generous resource assumptions in the demo -applications of these basic-research projects is sometimes problematic. There have been several research projects for focussing on telecom applications. The Eurescom project MUST (Multimodal, multilingual information Services for small mobile Terminals) [9] demonstrated a concurrent multimodal user interface on a PDA with touch screen. The SMADA project [18] investigated multi-modal interaction for automatic directory assistance. A number of companies are already active in this area. Scansoft Inc. offers solutions to enable multimodal interfaces on mobile devices and in the automotive environment. Kirusa, Inc., a technology partner of project MONA, is working on technologies that facilitate application development on a number of terminal platforms, starting from GSM+SMS, through WAP-Push based services to PDAs. Microsoft Corporation developed the MIPAD prototype of a mobile multi-modal device in 2000 [10] and facilitates SALT -based multi-modal solutions on its PocketPC and smartphone platforms.

6.

Discussion

Both major topics of this work – device-independent authoring and mobile multimodal user interfaces – have received considerable attention from the research commu nity and from companies in the area of telecommunications. The combination of both aspects is an innovative contribution of the MONA project. However the most important question with respect to the impact of this research is whether and when the technical and commercial conditions in the mobile communications industry will make the commercial deployment of this technology feasible. Below we present our current view on this issue. The case for multimodality in the mobile user interface is strong: face-to-face communication is multimodal, and humans are well adapted to communicating simultaneously on several channels. In mobile situations, such as when walking, driving, or even jogging, conventional ways of dealing with linguistic information such as touch-screen or keypad are unconceivable. Therefore the ‘detour’ to speech as a control and communication channel is not only natural, but also necessary to achieve truly ‘mobile’ communications, in particular for data services. The reasons for the lack of commercial implementation up to this point are twofold: First and foremost, the performance of mobile devices and cellular data networks has been insufficient to enable satisfactory multimodal user interfaces. Poor mobile CPU speed, limited mobile web browsers, and high network latencies have led to poor responsiveness of the multimodal interface in most implemented systems . In the MONA project, we too have observed unacceptable latencies for a highly interactive application such as MONA@play in GPRS networks. However, the same applications performed with satisfactory responsiveness in WLAN deployments, and there is room for optimism that the UMTS-phones reaching the market now will finally provide enough resources and sufficiently low latency in data communication. The second reason for the observed lack of deployments concerns the performance of automatic speech recognition (ASR). The most attractive use cases for multimodality put high requirements on ASR performance, often beyond what is currently achievable. There are three ways to tackle this issue: One is to invest heavily in ASR basic research, as is for example the case with the current Integrated Projects in the EU FP6 research programme within its thematic priority on multimodal interaction. Another approach is to roll out applications that can work with limited vocabularies and other factors that make ASR work more reliably. A third approach is to hide erroneous ASR results from the users. We believe that in many applications, especially in the entertainment area, it is not necessary to display full transcriptions of user speech. For example in the multimodal chat in MONA@play, ASR hypotheses can be mapped to templatic approximations of their information content that are entertaining in the domain of a particular application.

7.

Conclusion

We have presented results of the MONA project on device independent authoring of mobile multimodal user interfaces. The MONA Editor tool supports a development process that efficiently leads to flexible, device independent user interfaces that are rendered by the MONA Presentation Server. Shortcomings in the performance of mobile terminals as well as in the performance of ASR have so far prevented the widespread commercial adoption of multimodality in the mobile user interface. We are convinced however that due to their usability advantages, the time of mobile multimodal user interfaces is coming and will enable physically mobile use of communication services – truly ‘mobile’ communications.

Acknowledgments This work was funded by the Austrian competence centre programme Kplus and the companies Kapsch CarrierCom, Mobilkom Austria, and Siemens Austria. Kirusa Inc. provided their multimodal platform, SVOX Ltd its text -to-speech technology, and Nuance Inc its speech recognition technology.

8.

References

[1]

Abrams, M., et al. “UIML: Appliance-Independent XML User Interface Language.” Proc. 8th International WWW Conference. Toronto, Canada. 1999. Elsevier Science Publishers.

[2]

Ali, M. F., Pérez-Quiñones, M. A., Abrams, M., Shell, E. “Building Multi-Platform User Interfaces with UIML.” 4th International Conference on Computer-Aided Design of User Interfaces (CADUI'2002). France, 2002.

[3]

Azevedo, P., Merrick, R., Roberts, D. "OVID to AUIML - User Oriented Interface Modeling." http://math.uma.pt/tupis00/submissions/azevedoroberts/azevedoroberts.html

[4]

Berti, S., Correani, F., Mori, G., Paternò, F., Santoro, C. “TERESA: A Transformation-based Environment for Designing and Developing Multi-Device Interfaces”. Conference on Human Factors in Computing Systems, CHI 2004. Vienna, Austria, April 2004.

[5]

Bouillon, L., Vanderdonckt, J., Souchon, N. “Recovering Alternative Presentation Models of a Web Page with VAQUITA.” Proceedings of CADUI'02. Valenciennes, France. May 15-17, 2002.

[6]

Cohen, P., Johnston, M., McGe e, D., Oviatt, S., Pittman, J., Smith, I., Chen, L., and Clow, J. “QuickSet: Multimodal interaction for distributed applications”. In Proc. of the Fifth ACM International Multimedia Conference, pages 31– 40, New York. ACM Press, 1997

[7]

Consensus project website. http://www.consensus-online.org/

[8]

Eclipse project website: http://www.eclipse.org

[9]

MUST project webpage: http://www.eurescom.de/public/projects/P1100-series/p1104

[10]

Huang, X. et al. “MIPAD: A next generation PDA prototype”. In Proc. of the Int. Conf. on Spoken Language Processing (ICSLP), Beijing, 2000.

[11]

Kirusa, Inc. website: http://www.kirusa.com/

[12]

Lemon, O., Bracy, A., Gruenstein, A., and Peters, S. The “WITAS multi-modal dialogue system I”. In Proc. of Eurospeech01, Aalborg, DK, 2001.

[13]

MITRE, DARPA communicator. http://fofoca.mitre.org.

[14]

MobileAware company website. http://www.mobileaware.com/

[15]

Nuance, Inc. website: http://www.nuance.com/

[16]

Plomp, C. J., Mayora-Ibarra, O. “A Generic Widget Vocabulary for the Generation of Graphical and Speech-Driven User Interfaces.” International Journal of Speech Technology, V(5), Issue 1, Kluwer Academic Publishers. January 2002.

[17]

Puerta, A. and Eisenstein, J. “XIML: A Common Representation for Interaction Data.” Proceedings of IUI 2002, Int. Conf. on Intelligent User Interfaces. San Francisco, CA. ACM Press.

[18]

Roessler, H., et al. Multimodal interaction for mobile environments. In Proc. of International Workshop on Information Presentation and Natural Multimodal Dialogue, Verona, IT, 2001.

[19]

SVOX Ltd. website: http://www.svox.com/

[20]

Vanderdonckt, J.et al.“USIXML: a User Interface Description Language for Specifying Multimodal User Interfaces.” W3C Workshop on Multimodal Interaction. Sophia Antipolis, 2004.

[21]

Volantis company website.http://www.volantis.com/

[22]

Wahlster, W., Reithinger, N., and Blocher, A. “Smartkom: Multimodal communication with a life like character”. In Proc. of Eurospeech01, Aalborg, DK, 2001

[23]

XAML.NET, a guide to XAML.http://www.xaml.net/

[24]

XM L User Interface Language (XUL) Project. http://www.mozilla.org/projects/xul/