Gesture-based Interaction for a Magic Crystal Ball - Semantic Scholar

5 downloads 1512 Views 3MB Size Report
Nov 7, 2007 - jective, we must work together with software and hardware. In the software ... For the presentation of MaC Ball, we construct a virtual museum.
Gesture-based Interaction for a Magic Crystal Ball Li-Wei Chan, Yi-Fan Chuang, Meng-Chieh Yu, Yi-Liu Chao, Ming-Sui Lee, Yi-Ping Hung and Jane Hsu Graduate Institute of Networking and Multimedia Department of Computer Science and Information Engineering National Taiwan University

Abstract

ball. The witch can have a distant wonderland or precious treasures revealed inside the ball. With the distinct impressions, the crystal ball has revealed itself as a perfect interface for the users to access and to manipulate visual media with intuition and fantasy. In this work, we developed an interactive visual display system named Magic Crystal Ball (MaC Ball). MaC Ball is a spherical display system, which allows the users to see a virtual object/scene appearing inside a transparent sphere, and to manipulate the displayed content with bare hands. The display module of MaC Ball is based on the optical system from i-ball2 proposed in [Ushida et al. 2003]. We redesigned the interface of i-ball2 so the users can feel that they are having magic power while playing with MaC Ball. MaC Ball lets the users to perform gestures by their bare hands. The user can wave hands above the ball, and then computer-generated clouds blowing from bottom of the ball quickly surrounding the displayed content. When the user slides fingers on the ball, the viewing direction of the displayed content is changed accordingly. In addition, MaC Ball provides pointing gesture with which the user uses single finger to select an object or to issue a button. The motivation for MaC Ball is to realize general public’s imaginations to crystal balls. The goal of MaC Ball is to transform different impressions from movies and fiction into the development of a medium for the users to access multimedia in an intuitive, imaginative and playful manner.

Crystal balls are generally considered as media to perform divination or fortune-telling. These imaginations are mainly from some fantasy films and fiction, in which an augur can see into the past, the present, or the future through a crystal ball. With the distinct impressions, crystal ball has revealed itself as a perfect interface for the users to access and to manipulate visual media in an intuitive, imaginative and playful manner. We developed an interactive visual display system named Magic Crystal Ball (MaC Ball). MaC Ball is a spherical display system, which allows the users to see a virtual object/scene appearing inside a transparent sphere, and to manipulate the displayed content with barehanded interactions. Interacting with MaC Ball makes the users feeling acting with magic power. With MaC Ball, user can manipulate the display with touch and hover interactions. For instance, the user waves hands above the ball, causing clouds blowing from bottom of the ball, or slides fingers on the ball to rotate the displayed object. In addition, the user can press single finger to select an object or to issue a button. MaC Ball takes advantages on the impressions of crystal balls, allowing the users acting with visual media following their imaginations. For applications, MaC Ball has high potential to be used for advertising and demonstration in museums, product launches, and other venues. CR Categories: I.3 [Computer Graphics]: ;— [H.5]: Information Interfaces and Presentation—(e.g., HCI) Keywords: 3D interaction, haptics, entertainment

1

Introduction

In ancient times, many people deemed that the crystal balls have incredible strength. Using a crystal ball to divine first appears in the Middle Ages. This skill is called “Crystalomancy”. In other words, they gazed through a crystal ball to find indication about the future. As time goes by, many fantasy films and fiction expand the appearance of the crystal ball. In the magic world, the witch always holds a crystal ball on the table which reflects dim lights and shines mysterious radiance on the face of the witch. Subsequently, she begins to mumble the incantation that nobody understands, and at the same time waves her hands on the crystal ball. Allegedly the augur can see into the past, the present, or the future through the crystal

Figure 1: The user is browsing relics using MaC Ball.

This paper starts by discovering the nature of crystal ball which makes the ball a perfect medium to access virtual content in an intuitive, playful and entertaining manner. We then provide pointers to related research in Section 2. Section 3 discusses the design principles for the MaC Ball to meet several expectations. Section 4 describes the configuration and implementation of MaC Ball including the hardware and software aspects. Section 5 presents a relics exhibition application and together with a discussion from an opening presentation of the system, followed by the conclusion and future work in Section 6.

Copyright © 2007 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions Dept, ACM Inc., fax +1 (212) 869-0481 or e-mail [email protected]. VRST 2007, Newport Beach, California, November 5–7, 2007. © 2007 ACM 978-1-59593-863-3/07/0011 $5.00

157

2

Related Work

3 3.1

Display technology: Display technology plays an important role in changing our lives. Different display techniques are developed and have achieved much progress in recent years. Some of them try to render imagery on different medium for different purposes. Fogscreen1 produces a thin curtain of dry fog that serves as a translucent projection screen, displaying images that literally float in the air. The Jeep waterfall display2 organizes falling droplets into visible logo and other messages, which appear as the droplets descend into a trough. Instead of pursuing for large display or quality displaying, these techniques provides the viewers different ways to experience digit media in a fun and entertaining manner. In the following we focus on the related works which construct a spherical display and interactions for this kind of displays. Volumetric displays generate true volumetric 3D images by actually illuminating points in 3D space. First kind of volumetric displays is swept volume display, such as the display developed by Actuality Systems, Inc[Favalora et al. 2002]. It generates a spherical, 360-degree-viewable 3D volumetric image by sweeping a semi-transparent 2D image plane around the vertical axis. A serial of works which support one to directly interact with and manipulate the 3D data by using the volumetric display is proposed[Grossman et al. 2005][Grossman and Balakrishnan 2006]. One of the major concerns for the works is to provide selection mechanisms. Another kind of volumetric displays is static volume display. SOLID Felix[Langhans et al. 2003], a static volume 3D-laser display, is doped with optically active ions of rare earths. These ions are excited in two steps by two intersecting IR-laser beams with different wavelengths and afterwards emit visible photons. Rendering images in a transparent ball, i-ball[Ikeda et al. 2001] applies another solution. With a special optical system, i-ball is able to bring images displayed on an LCD monitor to be appeared in the air. Notice that the display given by i-ball system is quite different from a volumetric display. The displaying quality for i-ball inherits all the benefits of an LCD monitor, which is quite suitable for presenting high quality imagery to the viewers. However, the optical system only provides single view point and therefore is for single person use. i-ball2[Ushida et al. 2003] is the next version to i-ball. The two systems share the same optical system but different interaction means. i-ball has a motor which automatically rotates the ball according to the motion of the user’s hands captured by the camera. As the ball rotated by the motor, the displayed content is changed accordingly. Since only hand motion is detected, i-ball can not differ whether the user touches the ball. Instead, i-ball2 provides another interaction mode which realizes that the user is physically rotating the ball by using a optical sensor installed under the ball. The optical sensor detected the physical rotation of the ball for interacting with the displayed content. Displaying images on the ball surface, Globe4D3 projects the Earth’s surface on a physical sphere. Globe4D is a representative system which is an interactive four-dimensional globe. The sphere can be freely rotated along all axes, viewed from any angle, and enables the user to control time as its fourth dimension.

Design Principles Obtaining photo-realistic displaying quality

We have searched for different display techniques capable of bringing a photo-realistic imagery into a transparent ball. For this objective, we must work together with software and hardware. In the software, the display content shall achieve photo-realistic quality. For the presentation of MaC Ball, we construct a virtual museum by combining techniques of image-based rendering, virtual reality and augmented reality. Both image-based and model-based techniques can be used to build the virtual exhibition environment, and can provide viewers a realistic and interactive tour. In the image-based approach, photorealistic scenes can be generated, but it is hard for the user to view the scene from arbitrary viewing directions. In model-based approach, 3D models are constructed.By rendering the 3D models, this approach allows users to interactively view the virtual world from arbitrary viewing directions. 3D models used in this approach usually have to be created manually, and the generated virtual world is usually not very realistic. In our virtual museum application, the artifacts are presented as object movies (image-based approach) in the virtual exhibition. The image of the artifact shown on the ball is selected according to the viewing direction of the user. In this way, artifacts can be rotated and moved in 3D. The exhibition rooms are presented as panoramas. By integration of the two image-based techniques[Hung et al. 2002], the virtual museum presents photorealistic quality to the viewers. As the contents from image-based techniques are photo-realistic, for the hardware, we need a display mechanism that brings these images into the transparent ball. i-ball2 system which uses a special optical system well fits for our expectations. By using the special optical system, i-ball2 can render high quality images inside the transparent ball. Referred to related work, volumetric display is another possible solution which displays inside the space of the ball. Yet it is designed for volume data, not images.

3.2

Seeing like a crystal ball

The transparent ball designed for i-ball2 is rotatable, which allows the user to manipulate the displayed content by physically rotating the ball. In MaC Ball, the transparent ball is set to be fixed. The viewers are allowed to manipulate the content by waving hands and pointing a single finger above and on the ball. Three reasons that we make the ball fixed are summarized as follows: For the imagination reason: Instead of rotating the crystal ball physically, an augur’s hands are hovering over the ball while scrying. Physically rotating the ball gives the user a better sense of control, but less fantasy in how we interact with the real world. The development of MaC Ball shall meet general imaginations to crystal balls so as to give the sensation of playing with a real magic crystal ball. For the practical reason: It is better to retain as fewer movable components as possible in order to minimize the hardware maintenance. MaC Ball has the ball fixed, so the whole body is rigid. In addition, interaction means for MaC Ball are given by sensors installed underneath the glass body, so that the system will not be damaged by negligent users. Another consideration to the ball fixed is for the security issue. The movable ball might be taken away easily by guests during an official exhibition. For the cognitive reason: Physically rotating the ball enhances the sense that the ball is strongly connected to the displayed content. Rotation of the ball is highly related to the rotation (viewing direction) of the displayed content. As a result, the viewer regards the ball and the displayed content as one unity. A tiny delay or mis-

1 Fogscreen:

http://www.FogScreen.com waterfall: http://www.pevnickdesign.com/index1.html 3 Globe4D: http: //www.globe4d.com/ 2 Jeep

158

match between the ball movement and the rotation of the displayed content may lead to an undesired separation effect. To avoid the effect, an accurate and responsive sensor for measuring the degrees of rotation of the ball is required. In comparison, MaC Ball has the ball stayed stationary while the user interacting with it. Instead of controlling the ball, the user directly manipulates the displayed content. MaC Ball does not suffer any separation effect.

3.3

allowed to rotate the content by sliding multiple fingers on the ball, or to issue a button by pressing a single finger.

4

System Implementation

The construction of MaC Ball tries to meet the principles mentioned in the previous section so as to give the users the sensation of using a magic crystal ball. The system implementation including the hardware configuration and software implementation is described in detail as follows.

Interacting like an augur/witch

Many systems provide gesture-based interactions which require the users to learn pre-defined gestures. Though these gestures are in principle designed to be meaningful, the users have to memorize them. Moreover, users may feel restricted since they need to force their body/hands to meet some specific shape while issuing certain function. During the design process of MaC Ball, we hope to provide a minimal learning curve for the users. One way to achieve such goal is to allow users to mimic the motion of an augur during the fortunetelling process. In our research, we simulate such motions and extract them to be integrated with MaC Ball’s user interface design. Two types of gestures are defined for MaC Ball, the waving gesture and the pointing gesture, which are recognized by image processing techniques. With the waving gesture, the users may wave their hands at will to perform a direction indication to Mac Ball. This gesture is very natural to use since the recognition is achieved by detecting the motion of the user’s palms, requiring no constraint to the users’s hand postures. For the pointing gesture, the user may use a single finger in order to issue a visual cursor for selection. Switching between the two gestures is automatically detected by the algorithm, so that the users are able to perform most interactions at will. In addition, pressure sensors are installed for MaC Ball as an auxiliary distinguish these two gestures into touch and hover interactions.

4.1

Hardware Configuration

Figure 3: The architecture of MaC Ball system. The architecture for MaC Ball is shown in Figure 3. It consists of two modules, the display module and the detection module. For the display module, the optical system from i-ball2 is adopted since it provides the user the perception of a real crystal ball. The Image displayed on the LCD (15 inch, XGA) is reflected by a mirror and then penetrates the Frenel Lens. As a result, the user can see the image appearing inside the glass ball. In addition, there is a minor advantage from use of i-ball2. The image shown in the ball is slightly distorted under the influence of the lenses, which leads the users to have the experience as viewing real 3D objects in the glass ball. The detection module for MaC Ball consists of one infrared camera and three pressure sensors. The Infrared camera coupled with an infrared illuminator are settled underneath the ball to recognize the users’ hand gestures. Since observing the content in the ball is desirable under a dim light condition along with indirect ambient lighting, the infrared camera works well with our detection algorithms. Three pressure sensors put on the corbelings of the glass ball are used to determine whether the users touch the ball or not. The pressure sensors, called FlexiForce Sensor, are commercial products from Tekscan technology. FlexiForce sensor is an ultra-thin(0.008”) and flexible printed circuit, which can detect and measure a relative change in force or applied load. The sensor output exhibits a high degree of linearity, low hysteresis and minimal drift. FlexiForce Sensors are available with different maximum forces. In the implementation, we use the sensor which is capable of 1 lbs 4 in maximum force. The sensors produce an analog signal and the resolution depend on the electronics. We utilize an 16-Bit (65535 levels) A/D converter, which can produce approximately 0.03 gram in resolution (2.2kg is divided by 65535 = 0.03g). It is sensitive to aware the modification of pressure on the ball. An attachment which functions like a shock absorber is used to bridge

Figure 2: The touch and hover zone. Recall the impression that an augur mumbles the incantation and at the same time waves hands following a spiral shape above the ball. For a while, the clouds glow within the ball, followed by some mysterious images revealed. In order for the realization, the interaction for MaC Ball comprises touch surface and hover zone as shown in Figure 2. Waving hands or pointing a single finger in the hover zone issues hover interactions. In the design, hover interactions are weaker interactions which are subject to generate assistant visual effects like computer-generated flashes and clouds, leaving no effect on the major content. However, repeated weak interactions turn into a strong interaction. For example, a hovering waving gesture would generate clouds spiraling around the major content. If the user performs waving for a while, the clouds are gathering and finally covering the whole major content. Once the waving stopped, the clouds disappear and the content changed. This is how weak interactions grow into a strong interaction and further influence the major content. The interaction makes the users feel that they are performing some magic. In contrast, touch interactions are strong interactions which directly influence the major content. The user is

41

159

lbs equals about 2.2kg

the pressure sensor and the ball surface as shown in Figure 3. When the user touches the ball, the pressure increment is transferred to the pressure sensor for further computation. For a practical consideration, the attachment is adhered to the ball surface, so that the user can not take the ball away.

4.2

Software Implementation

The detection module comprises one infrared camera and three pressure sensors. The infrared camera coupled with an infrared illuminator is settled underneath the ball, observing human hand motions and fingertip positions above the ball. Three pressure sensors are put on the corbelings of the glass ball to determine whether users touch the ball. In the following, we describe the detection means for MaC Ball which consists of (1) hand motion detection, (2) fingertips finding, and (3) pressure sensing. 4.2.1

Motion Detection for Waving Gesture

Waving gesture is for the user to deliver direction indications to MaC Ball. A typical motion detection, optical flow, for developing waving gestures is utilized. The optical flows extracted from consequent image frames are used to determine the major directions of the waving hands. This approach does not rely on hand shapes, so the user can perform waving gesture in free form. The details of the implementation is as follows. Lucas-Kanade method[Bouguet 2000] is applied to extract optical flows from two consequent images. This method starts with building image pyramids and then extracting Lucas-Kanade feature points of two images at different scale levels. By means of iteratively maximizing a correlation measure over a small window, the displacement vectors between the feature points of two images are found from the coarsest level up to the finest one. Then the displacement vectors are considered as the motions in the image. Though motions of the waving gesture can be successfully extracted, some false motions do exist. In the following, we remove these false motions by using a two-step filtering to provide a robust estimate of the waving direction. In the first step, large motions are dropped since they probably result from mismatches of local features in coarser levels of the pyramid. In the second step, we build a motion histogram, H(j), according to the angles of the motions. In the implementation, we set bin size of the histogram as 20 degrees. After that, the bin in the histogram having maximal count is selected and motions in that bin are averaged to come up with the major direction of this image frame. 4.2.2

Figure 4: The images produced during and after processing. Icon (a) is the fingertip template used in the process. Icons labeled (b), (c), and (d) are three cases of gestures. Produced images for each case are arranged in corresponding row. First three columns collect intermediate results during processing. The last column shows the final results.

a template having a square of 25*25 pixels with a circle whose radius r is 7 pixels. This size is determined according to the distance between the camera and the observation. Finally, the finger part is then binarized into the finger regions and the background. Identifying finger region greatly reduces potential area where fingertips might locate. The third column in Figure 4 shows the finger regions that are successfully extracted in all the cases. (3) Calculating principal axis for each finger part region: In this step, we further reduce the potential area to a principal line by using principal component analysis technique. In each finger part, positions around two ends of the principal line are selected as fingertip candidates and form a group. Candidates in each group are scored in next step. The survived candidate with best matched score in the group is then selected as fingertip. The principal lines of finger regions are augmented to potential areas as shown in third column of Figure 4. This step reduces the search space from a region to some handful points. (4) Rejecting fake fingertips by pattern matching and false matching removal: After previous steps, only a few fingertip candidates are passed. In this step, we verify fingertip candidates by using fingertip matching and false matching removal, which are two heuristics borrowed from [KOIKE and KOBAYASHI 2001] and modified to suit our case. In this step, we verify fingertip candidates using background subtracted image (the first column in Figure 4). In the process of fingertip matching, for each candidate, a template-sized region located at the candidate’s position in the background subtracted image is copied, which is referred as the fingertip patch. We then binarize the patch by a threshold set as the average of max and min intensity in the patch. Next, we compute sum of absolute difference between the patch and the fingertip template. Candidates with low scores are discarded. In the process of false matching removal, if pixels in the diagonal direction on the boundary of the fingertip patch coexist, then it is not considered as the fingertip and is removed. Final results are shown in the last column of Figure 4. After fingertip detection, the detected fingertips are multiplied with a homography matrix which transforms the fingertip positions from

Fingertip Finding for Pointing Gesture

The pointing gesture is achieved by applying a fingertip finding algorithm. We use the method proposed in [Chan et al. 2007] and proceed modifications to suit our case. The proposed algorithm is capable of finding multiple fingertips from images. When the algorithm reports only one fingertip found, the pointing gesture is accordingly issued. For more detail, we describe each step of the algorithm in the following few sections. (1) Background subtraction: We first extract region of interest by applying background subtraction as shown in the second column of Figure 4. (2) Separating finger parts from observed image by a morphological opening: After background subtraction, we extract finger part by a morphological opening operation with a structure element having its size larger than a normal finger and smaller than a palm. Specifically, we define a normal fingertip pattern with r as the radius of a circular fingertip (Figure 4a). The size of the structure element for opening is set twice of r. In the implementation, we use

160

the camera coordinate to the display coordinate. The homography is computed during a manual calibration phase in advance. 4.2.3

Pressure Sensing for Separating Touch and Hover Interactions

As we mentioned in System Implementation, the pressure sensors are very accurate, responsive (approximately 200Hz), and capable of measuring applied force in sub-gram scales. In addition, the sensors are stable when some fixed force is applied. That is, the sensors report the measured force with small variations. Intuitively, with the characteristics of the sensors, we can discern touch events by using simple threshold method which reports a touch event as an increment of force in one pressure sensor is larger than a predefined value. In our implementation, it can be more stable when we detect touch events by thresholding on the absolute difference between the current measured force and the background force. The reason of doing so is described in the next paragraph. Specifically, the steps are described as follows. At the first, we take the background force for each of the three sensors when there are no external forces, except for the base force from glass ball, applying to the pressure sensors. The background forces are annotated as B(i), where i = 1, 2, 3. On detecting touch events, the formula is defined as

 T (i) =

T rue, F alse,

if Abs(P (i) − B(i)) > T h otherwise,

Figure 5: A plot showing pressure data collected from two sensors (P1 and P2 ) when the user manipulates MaC Ball lightly and hardly. The blue circle indicates the user lightly manipulates the ball. The red cross indicates the user hardly manipulates the ball.

(1)

where P (i) is the current measured force from sensor i, T h is a predefined threshold, and T (i) is a boolean function indicating whether sensor i reports a touch event. As a result, the system reports a touch event when one of the T (i) is true. It is common sense that if there is no other force applied to the crystal ball, the pressure sensors receive only the force from the ball. In contrast, if some force is applied such as in the cases that a user places his palm on the ball, the pressure sensors receive greater force. However this is not always the case when the user is playing with MaC Ball. Figure 5 is a plot showing pressure data collected from the three sensors when the user manipulates MaC Ball lightly and hardly. To visualize the data in 2D feature space, we project the data on P1 -P2 plane (P1 and P2 are two of the three pressure sensors as indicated in Figure 5). In the plot, the value in x-axis indicates the force received by P1 , and the value in y-axis indicates the force received by P2 . The range of the value is mapped from 0 to -3000, where larger value (approach to 0) relates to greater force measured by the sensors. The yellow diamond in the plot indicates the background force. The blue circles are the data collected when the user lightly manipulates the ball while the red crosses are collected when the user hardly manipulates the ball. If no external force applies to the crystal ball, the received pressure data would stay around the diamond. If the user places a palm on the ball lightly and hardly as indicated in Figure 5a, the data moves forward to the origin. However, if the user slides a palm or fingers from one side of the ball to the other slide as indicated in Figures 5b and 5c, one side of the ball will be lifted. As a result, one sensor receives greater force and the other receives smaller force comparing to the background force. Based on the observation, the formula (1) which counts on absolute difference meets all the cases.

imperceptibly. An update mechanism to correct background force is required to provide sensitive touch sensing while preserving a low false alarm rate. In particular, we design two rules to update the background force so MaC Ball has self-correction characteristic in touch sensing. (1) Update with no palm shown in camera view: The first rule is very simple and effective. If there is no foreground detected by the camera, the pressure background updates. This rule makes sure that the pressure background will be reset in between of two users playing MaC Ball. More closely, in the duration of a user playing MaC Ball, the pressure background updates in intervals of operations, possibly when the user is appreciating a virtual treasure with the hands placed beside. (2) Update with moving palms and stable pressure sensing: Although the first rule works well in most cases, the pressure background might shift imperceptibly if the user playing MaC Ball with the hands seen by the camera all the time. The second rule counts on this situation. If the camera detects a moving palm and the pressure data reveals a small variation for a short time, the pressure background updates. In this case, the user possibly waves hands above the ball. This rule takes advantages on the sensitivity of the pressure sensing.

5

Virtual Exhibition

Although the first rule works well in most cases, the pressure background might shift imperceptibly if the user playing MaC Ball with the hands seen by the camera all the time. The second rule counts on this situation. If the camera detects a moving palm and the pressure data reveals a small variation for a short time, the pressure background updates. In this case, the user possibly waves hands above the ball. This rule takes advantages on the sensitivity of the pressure sensing.

Since we are using a threshold method to determine touch events, the predefined threshold should be small enough in order to give sensitive touch sensing. Noted that the pressure sensor is accurate and responsive, slight changes to the system may shift the background force and thus cause false alarms. Moreover, when the user is operating with MaC Ball, the background force may also shift

161

5.1 5.1.1

Content Production

rections. The panorama recorded in a cylinder would be de-warped to an image plane when being watched, as shown in Figure 8. A panorama is recorded as one single 2D image and an object movie is composed of a set of 2D images taken from different perspectives around a 3D object. The goal is to augment a panorama with object movies in a visually 3D consistent way. Based on the method proposed by Hung et al [Hung et al. 2002], a system for authoring and for browsing augmented panorama is implemented in this work.

Acquisition of Object Movies

Figure 6: Some of the artifacts displayed in MaC Ball. We choose artifacts in the National Palace Museum to be displayed in our virtual museums. The artifacts chosen are valuable for academic research since they are the representatives with complete records of excavation. Some of the artifacts are shown in Figure 6. In order to render high quality and photo-realistic 3D artifacts in the virtual exhibition, we use image-based technique (Object Movie). An object movie is a set of images taken from different views around a 3D object. When the images are played sequentially, the object seems to rotate around itself. This technique was first proposed in Apple QuickTime VR[Chen 1995] and its advantage of being photo-realistic is suitable for delicate artifacts. Furthermore, each image is associated with the angles of the viewing direction. Thus some particular images can be chosen and shown in the ball according to the hand motion of the user. More specifically, when the user slides fingertips on the ball, the motion is computed and translates into changes in viewing direction. Then the object movie is changed accordingly. In this way, the user can interactively rotate the virtual artifacts and enjoy what he/she cannot see or feel in general static exhibitions. For capturing object movies, we use Texnais autoQTVR standard edition, which provides accurate angular control and automatic camera control. As Figure 7 shows, the entire system is controllable with traditional PC. After setup process, pictures will be automatically taken under some commands.

Figure 8: (a) The cylinder used to record the panorama. (b) the panorama (in part) of an exhibition room. (c) a de-warped image from the area within the red rectangle of panorama (b).

5.2

Application I: Virtual Museum

The virtual museum application includes two operation modes, the scene mode and the artifact mode. In the beginning of a tour, the user navigates the exhibition room in the scene mode. The user is allowed to change the viewing direction of the panorama to browse through different object movies. Once the user selects an artifact, the application shows a close-up view of the artifact and switches to the artifact mode. After that, the user is allowed to rotate the artifact, appreciating every angle of it. Figure 9 shows a shot of the virtual museum application in MaC Ball. For interacting with virtual exhibitions, we need to support a variety of basic operations such as rotation and selection.

Figure 9: A shot of seeing the virtual exhibition in MaC Ball. The left image shows a view of the exhibition room and the right one shows a particular view of an artifact. Figure 7: Acquisition of Object Movies with autoQTVR. 5.1.2

Rotation In interactions with virtual museum, rotation is the most basic operation. The rotation operation is achieved by recognition of waving gesture. To perform a rotation operation, the user simply slides fingers on the ball. Since the waving gesture is based on motion of the hands, the user can perform the gesture arbitrarily while appreciating a delicate artifact. Figure 10 shows a user sliding fingers on the ball to appreciate an artifact.

Augmentation of Panoramas with Object Movies

The panorama is the most popular image-based approach, which could provide an omni-directional view. In this system, the panorama is a cylinder view stitched with images acquired by rotating a camera at a fixed point. In this approach, users are only allowed to see the contents of the panorama from specific viewing di-

162

Figure 10: The user is sliding fingers on the ball to rotate a virtual artifact.

Figure 12: The user have fingers forming a circle on the ball to show up the virtual magnifier.

Selection Browsing in a virtual exhibition room, the selection operation allows the user to activate an interesting artifact. The selection operation is achieved by the pointing gesture. The user poses a single finger pressing on the ball to choose an artifact and the system switches to the artifact mode. It is noted that the pointing gesture is only activated when single finger is recognized by the fingertip finding algorithm. Since a rotation operation is usually performed with multiple fingers seen by the camera, the implementation separates the rotation and selection operations in a natural way.

5.3

5.4

Discussion

MaC Ball has been demonstrated in an opening presentation. During the presentation, more than twenty people joined the demonstration. For participants playing with the ball, we would not show them how MaC ball works in the beginning. Instead, the participants will try their way to interact with the transparent ball. One staff is served for guiding the participants in case they have any problem with the system. Two staffs were arranged to observe the participants’ behaviors and their reaction to the system. In the following, we list some lessons learned from an analysis on the observations after the demonstration. Constrained and unconstrained interactions Interactions carried out by different means suffer different constraints. In general, less constrained interaction leads to fluent operations and usually makes the users confident with the system. However, unconstrained means can damage the richness of interaction available to the system. In contrast, constrained interactions, like those based on hand shape analysis, can provide many kinds of interactions. But the users can easily be frustrated by the system if they cannot meet the given constraints well. In this work, we find that a good combination of constrained and unconstrained interactions can work perfectly by use of the users’ intention. In this work, the waving gesture based on motion detection is used. The gesture imposes almost no constraint on the user except moving/waving the palm. Since the detection is robust and responsive, the user can quickly feel confident with the interaction. This is very important because an interactive system can easily frustrate the users especially in their first try. However, when an unconstrained gesture like waving in our case is applied, other gesture-based interactions can hardly be added to the system since it is difficult to separate them from the waving gesture. To add other interactions for MaC ball, we choose the pointing gesture because the users rarely perform waving in pointing posture (single-finger hand shape). In the demonstration, all users could quickly be familiar with waving gesture. However when they were told to use pointing gesture, many of them had to practice several times to successfully issue an pointing gesture before they realized the single-finger hand shape is the basic requirement for the gesture. It is noted that the pointing gesture is more constrained comparing with the waving gesture. When the users issue a button, they are assumed intentionally using the pointing gesture because they have to strictly hold a pointing hand shape. Therefore when the two gestures occur simultaneously, they are separated by the users’ intention. Specifically, MaC ball gives a higher priority for the pointing gesture. Hover zone: an extended interaction space The field of computer-human interface seeks ways to build a more humane user interface. With the goal, the researchers in this community have explored variety means including but not limited to voice, gesture, and bio signal recognitions. Yet, a common problem exists for an interactive installation: people feel uneasy before they realize the correct way to communicate with the installation.

Application II: Relic Browsing

The relic browsing application is designed for the users to focus on the beauty of artifacts. Instead of browsing through artifacts in a virtual exhibition room, the user forthright sees close-up views of artifacts in MaC Ball in relic browsing application. One more operation is needed to switch among artifacts. Here, the application generates clouds while the switch is issued. In addition, a virtual magnifier is provided for the user to see the detail of the artifact. Hovering MaC Ball also provides hover interaction. That is, the user performs hovering palm above the ball. Hover interaction is a weak interaction dedicating to activate the supportive visual effect such as the spiraling clouds in our case. In the relic browsing application, hovering is used to switch among artifacts. Hovering palms above the ball, the user sees the computer-generated clouds generated from the bottom of the ball. If the user keeps hovering for a short time, the clouds gather quickly as shown in Figure 11. While the clouds cover the present artifact, the artifact is then switched. Immediately a new artifact revealed with the clouds dispersed.

Figure 11: The user is having the palms hovering above the ball. The clouds are generated and the artifact switches in a short time. Other Interactions In the application of artifact browsing, a virtual magnifier is provided for the user to see the details of the artifact. To show up a magnifier, the user makes the fingers forming a circle as shown in Figure 12. The circle means a telescope with which the user can see an enlarged view of the artifact. The interaction is implemented by finding a large connected component in camera view.

163

In this case, an interaction responding to the user’s curiosity in early stage can ease the tenseness of the user as soon as possible. In this work, we provide hover and touch interactions. The two kinds of interactions occupy separate interaction spaces of the ball: touch surface and hover zone. The hover zone here is regarded as an extended interaction space which is possible to attract its potential users in early stage. In the demonstration, a beginner was allowed to explore the ball by himself. At this stage, the user revealed uncomfortable since no specific instruction was given by our staff. However, if the user was curiously to use the ball, he/she then found the ball reactive to them responsively. In this case, a cautious user might merely stretch out a hand in the hover zone and some clouds then spiraled in the ball interactively. This response obviously encouraged the user to further touch the ball or to keep waving the hand in the zone. If the user just didn’t know how to do, he/she would be told some indications such as “touch the ball” and “imagine that you are a witch and are trying to do scrying”. Then they could quickly find the correct way to interact with MaC Ball. In MaC Ball, the hover zone provides an extensive space in which the users’ curiosity can be discovered and enlarged by the responsive visual effects.

6

100-million-voxel volumetric display. SPIE, D. G. Hopper, Ed., vol. 4712, 300–312. G ROSSMAN , T., AND BALAKRISHNAN , R. 2006. The design and evaluation of selection techniques for 3d volumetric displays. In UIST ’06: Proceedings of the 19th annual ACM symposium on User interface software and technology, ACM Press, New York, NY, USA, 3–12. G ROSSMAN , T., W IGDOR , D., AND BALAKRISHNAN , R. 2005. Multi-finger gestural interaction with 3d volumetric displays. In SIGGRAPH ’05: ACM SIGGRAPH 2005 Papers, ACM Press, New York, NY, USA, 931–931. H UNG , Y.-P., C HEN , C.-S., T SAI , Y.-P., AND L IN , S.-W. 2002. Augmenting panoramas with object movies by generating novel views with disparity-based view morphing. Journal of Visualization and Computer Animation, Special Issue on Hallucinating the Real World from Real Images 13 (September), 237–247. I KEDA , H., NAEMURA , T., H ARASHIMA , H., AND I SHIKAWA , J. 2001. i-ball: Interactive information display like a crystal ball. In Conference Abstract and Applications of SIGGRAPH, 122. KOIKE, H., AND KOBAYASHI, Y. 2001. Integrating paper and digital information on enhanceddesk: a method for realtime finger tracking on an augmented desk system. ACM Transation Computer-Human Interaction 8, 4, 307–322.

Conclusion and Future Work

In this work, an interactive visual display system named Magic Crystal Ball (MaC Ball) is developed. MaC Ball is a spherical display system, which allows the users to see a virtual object/scene appearing inside a transparent sphere, and to manipulate the displayed content with barehanded interactions. MaC Ball transforms different impressions from movies and fictions into the development of a medium for the users to access multimedia in an intuitive, imaginative and playful manner. Many suggestions from an opening presentation will be included in future work. We will explore other gesture interactions. For example, the position of detected fingertips can feed back to produce sophisticated visual effects not only clouds but some flashes and lightings can be integrated together. In addition, other interactions such as volume and frequency of sounds from the user can blend in to the effect design of MaC Ball. Instead of computer-generated effects, MaC Ball can generate some response from external world. For example, a computer controllable ambient light installed behind the transparent ball will be fun, or a fog generator can be installed around the ball to aid magical sensations.

L ANGHANS , K., G UILL , C., R IEPER , E., O LTMANN , K., AND BAHR , D. 2003. Solid felix: a static volume 3d-laser display. SPIE, A. J. Woods, M. T. Bolas, J. O. Merritt, and S. A. Benton, Eds., vol. 5006, 161–174. U SHIDA , K., H ARASHIMA , H., AND I SHIKAWA , J. 2003. i-ball 2: An interaction platform with a crystal-ball-like display for multiple users. In International Conference on Artificial Reality and Telexistence.

Acknowledgements This work was partially supported by grants from NSC 95-2422-H002-020 and NSC 95-2752-E-002-007-PAE.

References B OUGUET, J.-Y. 2000. Pyramidal implementation of the lucas kanade feature tracker. OpenCV Documents. C HAN , L.-W., C HUANG , Y.-F., C HIA , Y.-W., H UNG , Y.-P., AND H SU , J. 2007. A new method for multi-finger detection using a regular diffuser. In International Conference on HumanComputer Interaction. C HEN , S. E. 1995. Quicktime vr: an image-based approach to virtual environment navigation. In SIGGRAPH ’95: Proceedings of the 22nd annual conference on Computer graphics and interactive techniques, ACM Press, New York, NY, USA, 29–38. FAVALORA , G. E., NAPOLI , J., H ALL , D. M., D ORVAL , R. K., G IOVINCO , M., R ICHMOND , M. J., AND C HUN , W. S. 2002.

164