Using Radial Outflow to Provide Depth ... - MIT Press Journals

2 downloads 0 Views 304KB Size Report
Joshua A. Gomer*. Coleman H. Dash. Kristin S. Moore. Christopher C. Pagano. Department of Psychology. Clemson University. Clemson, SC 29634-1355.
Joshua A. Gomer* Coleman H. Dash Kristin S. Moore Christopher C. Pagano Department of Psychology Clemson University Clemson, SC 29634-1355

Using Radial Outflow to Provide Depth Information During Teleoperation

Abstract Practical experience has shown that teleoperators have difficulty perceiving aspects of remotely operated robots and their environments (e.g., Casper & Murphy, 2003; Smets, 1995; Tittle, Roesler, & Woods, 2002). Previous research has shown that head motions can provide effective information about depth (Bingham, & Pagano, 1998; Pagano & Bingham, 1998). In three experiments, a method for improving depth perception was investigated, where participants viewed remote targets with a moving camera. The camera was mounted on a teleoperated robotic arm that oscillated toward and away from white squares against black space, producing expansion and contraction of targets on a video monitor. Participants viewed this expansion and contraction and then reported the distance between the remote camera and the targets. Under different experimental conditions, motions of the remote camera arm were coupled with the participants’ head movements, were controlled by a joystick, or followed a set of preprogrammed oscillatory motions. Under each of these conditions, participants’ distance judgments varied semantically with actual target distances. In addition, the third experiment demonstrated that using familiar objects and providing feedback could be a successful method of training. This was also the case when applied to a condition where distance feedback was not provided and unfamiliar targets were used. The results indicate that the use of radial outflow produced by active or passive front-to-back camera motions and training with familiar objects may be effective strategies for improving depth perception in teleoperation.

1

Introduction

Visual depth perception allows us to see how far away objects are relative to ourselves and allows us to seamlessly interact with objects in our view. In remote perception during teleoperation, one is typically looking at a flat screen that displays a two-dimensional or compressed view of the three-dimensional world. The operator must manipulate a robot in a three-dimensional environment while viewing a diminished representation of that environment. In this instance, some of the information about depth is no longer related to the actual depths in the remote environment, such as oculomotor information provided by accommodation and convergence of the eyes. Practical experience has

Presence, Vol. 18, No. 4, August 2009, 304 –320 ©

2009 by the Massachusetts Institute of Technology

*Correspondence to [email protected]

304 PRESENCE: VOLUME 18, NUMBER 4

Gomer et al. 305

shown that users of teleoperated systems have difficulty perceiving the depths and sizes of elements in remote environments on the basis of two-dimensional video images (e.g., Smets, 1995; Tittle, Roesler, & Woods, 2002; Woods, Tittle, Feil, & Roesler, 2004; Welch, 2002). During search and rescue operations at the World Trade Center (WTC) in the days following the terror attacks on September 11, 2001, for example, the users of teleoperated robots experienced difficulty identifying key objects and judging whether a robot could pass through openings or over obstacles (Casper & Murphy, 2003). It was found that the usual challenges associated with perception in a remote environment were greatly exacerbated by the highly deconstructed environment in which much of the usual information specifying depth and size was missing. This led Murphy (2004, p. 57) to conclude that “one of the biggest difficulties encountered while using the video cameras was the lack of depth perception.” Modern teleoperators often have the option of using stereo displays, in which the video feed is actually a combination of two individual camera feeds. While there are perceptual advantages for reproducing depth with stereo systems, there are numerous drawbacks to a higher fidelity camera system. First and foremost, using such an approach can be expensive and time-consuming to set up and maintain. Stereo displays also often require the operator to wear special headgear, such as a head mounted display (HMD). However, several researchers have pointed to the shortcomings of HMDs. For example, during urban search and rescue (USAR) missions, HMDs are undesirable due to their added weight and because the operator must also remain aware of his or her immediate surroundings (Casper & Murphy, 2003). This switching between normal viewing and stereoscopic displays has been shown to adversely affect depth perception (Reinhardt-Rutland, 1996). During long periods of use, stereoscopic displays can result in higher levels of user fatigue, discomfort, and difficulties with acclimatization (e.g., Agah & Tanie, 1999). Such fatigue is caused by neck strain and the need for the eyes to accommodate to screens that are very close to them. The HMD also limits operator field of view, and use in remote perception requires that camera tilt and rotation

each translate in sync with the angle of rotation of the head and eyes of the teleoperator (Agah & Tanie). HMDs may also contribute to virtual simulator sickness, caused in part by the conflict between virtual input and self-generated head movements (e.g., Howarth & Finch, 1999). Further, depth information provided by accommodation and convergence typically conflicts with the presented binocular information. A binocular view also requires two high-resolution cameras, necessitating a large amount of bandwidth. Even with high-resolution cameras, the ability of the human binocular system still suffers if the resulting images are rendered low-resolution by degraded conditions such as low illumination, glare, smoke, dust, fog, clouded water, and so on. Finally, around 15% of the population do not have binocular vision or have impaired stereo acuity (Borish, 1970; Ee & Richards, 2002), which is a higher rate of impairment than is found with color deficiencies. Rather than using a stereo feed, it may be advantageous to enhance depth perception through the employment of a single feed from one camera to one monitor. This solution would reduce the required bandwidth and solve several of the problems with stereo feeds discussed previously. The focus of this research is to test a low-cost method of enhancing depth perception during remote operation that capitalizes on motion information provided with a monocular camera. The use of optical motion occurs via a neurological system that is more robust than binocular pathways and allows for perception of lower resolution images. Thus it is less likely to be adversely affected by camera views that are degraded by smoke, dust, and so on (Smets, 1995; Stassen & Smets, 1997). In addition, there are benefits of reduced user fatigue, discomfort, and acclimatization. While this research investigates this method as an alternative to binocular displays, it can also easily be used in conjunction with binocular displays. For example, if the binocular cameras are moveable in the manner proposed below, then the two methods can be used simultaneously, as is the case with natural viewing. Alternatively, if cost were not an issue, many robots could maintain multiple camera systems in the event that one may be superior in a particular situation or environment.

306 PRESENCE: VOLUME 18, NUMBER 4

Depth information can be provided on a two-dimensional screen through the use of a moving camera. When a point of observation moves, the projections of various elements in the environment onto a projection surface, such as the retina or a camera plate, move on that surface in a systematic manner. The lawful transformations of the visual scene that accompany observer movement are referred to as motion perspective or optic flow (Gibson, 1950, 1958, 1979; Lee, 1974; Warren, 1998). Since the relative motions of the projections provided by various elements in the environment are determined by their relative distances from the point of observation, optic flow provides information about the three-dimensional layout of the environment. Witmer and Kline (1998) provide a useful summary of the types of information available in optic flow, as well as a demonstration of the ability of this information to support distance perception in virtual displays. The use of optic flow from head movements to aid distance perception occurs naturally in both animals and humans. A number of animals use head movements to obtain distance information prior to prey strikes, jumps, and other forms of locomotion (Bruce, Green, & Georgeson, 2003; Bruckstein, Holt, Katsman, & Rivlin, 2005; Kral, 2003; Goodale, Ellard, & Booth, 1990). The locust moves its body and head from side-to-side before initiating jumps, producing larger side-to-side movements when a target is farther and smaller side-toside movements when a target is closer, thus generating head amplitudes that are related to distance (Collett, 1978). The distance jumped by the locust can actually be manipulated by altering the amount of object motion on the retina that results from a given head movement (Sobel, 1990; Wallace, 1959). Gerbils also move their heads up and down before jumping over gaps and those denied the use of one eye show an increase in these movements (Ellard, Goodale, & Timney, 1984). In humans, participants with a patch over one eye, as well as participants who have lost an eye due to illness or injury, spontaneously move their heads to obtain depth information (Marotta, Perrot, Nicolle, Servos, & Goodale, 1995). It has also been found that the amount of head movement produced by the enucleated partici-

pants increased with the amount of time that passed since the loss of their eye (Marotta et al.). With sideways movements, the projections of objects farther away from the point of observation move across the retina at a slower rate than the projections of objects that are closer (Ferris, 1972; Foley & Held, 1972; Gogel & Tietz, 1973). This property of optic flow is referred to as motion parallax. It has been shown to be useful in virtual reality systems for both normal and low stereo acuity participants (e.g., Hale & Stanney, 2006), and in laboratory settings it has been implemented for teleoperation and laparoscopy (Smets, Overbeeke, & Stratmann, 1987; Smets, 1995; Voorhorst, Overbeeke, & Smets, 2003). In the now classic experiment, Rogers and Graham (1979) used an artificial display to investigate the role of side-to-side motion parallax in human depth perception. They found that motion parallax alone can provide information about the shape and depth of three-dimensional surfaces viewed from a twodimensional display. In addition, more depth was perceived when the optic flow was the result of selfproduced head movements than when it was viewed passively, even though the optic flow itself was identical in both conditions. Bingham and colleagues generalized side-to-side motion parallax to the use of forward and backward head movements (Bingham & Pagano, 1998; Bingham & Stassen, 1994; Pagano & Bingham, 1998). Such movements generate radial outflow, or looming, when a participant moves toward a target and radial inflow when they move away from a target. As a perceiver moves forward and backward, the optical projection of objects in the environment increases and decreases in size, respectively. For a given rate of forward movement, objects closer to an observer expand on the projection surface at a faster rate than objects farther away. Radial outflow produced by movement toward a target contains information about the observer’s time-to-contact with the target (Hoyle, 1957; Lee, 1974, 1976; Regan & Hamstra, 1993; Todd, 1981). Optical time-to-contact is specified by the relative rate of expansion of a target’s image,

␶ ⫽ ␪ / ␪˙ ,

(1)

Gomer et al. 307

where ␶ is optical time-to-contact, ␪ is the size of the image, and ␪˙ is its rate of expansion. Bingham and Stassen demonstrated mathematically that ␶ can provide information about distance if it is assumed that the head moves toward and away from the target with a consistent sinusoidal velocity profile: D ⫽ ␶2␲A/P.

(2)

In this equation, ␶ is the time-to-contact at peak velocity, A is the amplitude of head movement, P is the period of the head movement, and D is the distance to the target. Thus depth information is conveyed by radial outflow. (For other mathematical derivations of distance from radial outflow that do not rely on ␶ or a consistent sinusoidal velocity profile, see Bingham & Pagano; Bruckstein, Holt, Katsman, & Rivlin, 2005.) Bingham and Pagano (1998; Pagano & Bingham, 1998) demonstrated that information provided by radial outflow can be utilized by human observers to successfully guide blind reaches to targets at various distances. In their experiments, participants viewed targets through a helmet mounted camera and display that isolated optic flow caused by the participants’ voluntary head movements toward and away from the target. The targets appeared as white disks on a black background in such a way that all information about depth other than radial optic flow was removed or rendered ineffective. In this way, they showed that radial outflow alone can be used to convey distance information on an artificial visual display, and that this information can be used by human observers to guide reaches to targets placed at various distances. Others have replicated these results and extended them to distances beyond maximum reach (Peh, Panerai, Droulez, Cormilleau-Pe´re`z, & Cheong, 2002). Additional research revealed that reaches were more accurate when the optical motions were generated by forward and backward head movements than when the head movements were from side-to-side, even though the participant’s head movements were smaller on average when forward to back (Wickelgren, Daniel, & Bingham, 2000). In fact, reaching performance with the front to back movements was as accurate with the head mounted display as with direct monocular viewing of the targets.

In the following three experiments, we further applied this theory and investigated the ability of participants to use radial outflow generated by active and passive camera motions to perceive egocentric depth in a remote environment. In the first experiment, participants viewed remote targets on a monitor that was fed from a moving camera. During each experimental trial, their task was to report the distance from the camera to the targets. The targets consisted of white squares viewed against a black background. As in the work of Bingham and Pagano (1998; Pagano & Bingham, 1998), all information regarding depth was eliminated or rendered ineffective except for radial outflow (i.e., the expansion and contraction of the target on the monitor as the camera moved toward and away from the target). The participants’ abilities to use radial outflow to perceive distance was evidenced by a significant relationship between actual target distance and the distances reproduced by the participants.

2

Experiment 1

In the first experiment, participants’ abilities to use radial outflow produced by camera motions were investigated. Considering the success of previous head motion research (Bingham & Pagano, 1998; Pagano & Bingham, 1998), the next logical step was to determine whether similar performance could be attained with a mechanical, moving camera. The passive nature of displays typically used for teleoperation make the perception of remote environments difficult because the normal coupling between optical stimulation and actions is broken (Smets, 1995; Tittle et al., 2002). In everyday perception, the relation between optic flow and one’s movements is lawful, and as Equation 2 demonstrates, the information for perception is often defined using both the optical stimulation and the actions that produced it (see also Gibson, 1958, 1979; Wexler & Boxtel, 2005). We hypothesized that participants would be successful in their indications of distance and expected that superior performance would take place under the condition of camera motions coupled with head movements.

308 PRESENCE: VOLUME 18, NUMBER 4

2.1 Method 2.1.1 Participants. Six Clemson University graduate students (three males, three females) participated in the first experiment after informed consent was obtained. Each participant received payment for his or her participation and self-reported normal or corrected to normal visual acuity, as well as full use of neck, arms, and hands. 2.1.2 Materials and Procedure. Participants viewed remote targets from a monitor fed by a Panasonic CCTV video camera that was mounted on a Puma 560 industrial robotic arm. The camera had a 22° field of view and fed into a Panasonic 19 in CRT video monitor at a control station where the participants were seated. The camera arm moved forward and backward along a straight line toward and away from the targets and both the camera and targets were occluded from participant view throughout the study. Targets consisted of 15 white foam board squares cut to produce three optical sizes of 5°, 11°, and 14° measured from the initial point of the camera lens. The targets appeared on the monitor as 7.60, 16.50, and 21.30 cm squares. For each of the optical sizes, there were five target distances of 75, 100, 125, 150, and 175 cm from the initial position of the camera, creating a total of 15 randomized image size by distance combinations. Each size by distance manipulation was presented to each participant twice per experimental condition for a total of 30 trials per condition. Targets were positioned so that the center of each target was located at the center of the monitor. A black curtain was positioned 200 cm from the Puma arm to provide a consistent background. Monitor brightness and contrast were adjusted so that participants only viewed a flat white target against a black background. To indicate perceived distance between the camera and the target, participants moved a marker along a 200 cm string and pulley reporting device. This reporting device was located on a table in front of the participants, in their direct line of sight. Alternating trials began with the marker set at either the near end of the device, corresponding to a distance of 0, or the far end, corre-

sponding to a distance of 200 cm to counterbalance any bias for underestimation or overestimation. The experimenter recorded estimations in centimeters using a tape measure affixed to the reporting device that was not visible to participants. Prior to data collection for each camera condition, each participant received three training trials in which he or she viewed a target on the monitor while this target was placed next to the reporting device at that respective distance. Thus, the participant would view a target on the monitor and concurrently view that specific target’s actual size and distance on the reporting device. Following this, the participant was allowed to move the camera arm and become familiar with the expansion and contraction of that particular target. This process was repeated two more times with different size targets at different distances. The three target size by distance combinations viewed during training were such that they produced the same image size on the monitor. This allowed participants to view three different target sizes at three varying distances which all produced the same image size on the monitor, reinforcing that image size did not relate to distance. They were then asked to observe the relationship between the rate of target expansion and contraction and the distance of the target from the camera, with targets closer to the camera expanding and contracting more quickly than targets farther from the camera. The camera was controlled under three conditions; passively by the participant, head-coupled with participant movements, and by a participant-controlled joystick. Under the passive condition, the camera moved forward and backward with a consistent sinusoidal velocity profile. The amplitude of the passive camera motion was 17.40 cm with a period of forward and backward motion equal to approximately 3 s, similar to Bingham and Pagano (1998). These preprogrammed camera motions were initiated and subsequently terminated by a keyboard command from the participant. Under head-coupled viewing, participants wore a visor with a lightweight electronic sensor that tracked head movements (Flock of Birds, Ascension Technologies Corporation, Burlington, VT). The front-to-back components of participants’ head movements were coupled

Gomer et al. 309

to the Puma arm with a delay of approximately 200 ms between movements of the head and motions of the camera (this was reduced in Experiment 2). Any side-toside movements of the head were ignored by the tracking device. Under the third condition, participants controlled the camera movements with a joystick. Front-toback components of the participants’ joystick movements were fed to the camera arm, while any sideto-side movements were ignored. For each viewing condition, once a trial was complete, the monitor was turned off and the camera returned to the starting position while the participant made final adjustments of the distance marker on the reporting device. Participants received the conditions in a randomized order determined by a Latin square design. The three camera conditions were administered to the participants on separate days. During the experiment, participants received feedback on their performance following each trial. When participants finished making their distance judgments, the experimenter provided them feedback by pointing to the actual distance of the target on the reporting device for comparison. They were then instructed to move the camera again and view the expansion and contraction of the target on the monitor, at the appropriate distance, before the reporting device was reset. The design of this study was such that image size did not reveal information about distance and this was demonstrated to participants prior to each viewing condition. However, under direct line of sight viewing conditions, image size typically does provide some information about distance. This is due to the fact that the image size produced by a given object will increase as that object is placed closer to the point of observation. Image size, however, is an unreliable source of information about the actual sizes and distances of unfamiliar objects located in novel or degraded remote environments, such as USAR. Therefore, in the present experiment, optimal performance would be characterized by participant indicated distance values that correspond to actual target distances specified by expansion rate, which do not correspond to target image sizes on the monitor.

2.2 Results and Discussion Regression graphs of indicated target distances as a function of actual target distances from the first experiment are presented in Figure 1. To reduce visual clutter and account for judgment variability, each point in Figure 1 represents the mean of the six responses made by each of the participants to each of the given target distances. Simple regressions predicting indicated target distances from actual target distances were also performed for each of the three conditions using all 30 data points from each of the six participants (for a total n ⫽ 180). Regressions resulted in an r2 ⫽ .44, F(1, 178) ⫽ 141.90, p ⬍ .001 for the joystick condition, an r2 ⫽ .39, F(1, 178) ⫽ 112.70, p ⬍ .001 for the head-coupled condition, and an r2 ⫽ .41, F(1, 178) ⫽ 122.00, p ⬍ .001 for the passive condition. The slopes of the resulting regression lines were 0.58, 0.59, and 0.61 for the joystick, head-coupled, and passive conditions, respectively, with intercepts of 47.60, 41.80, and 42.40 cm. For a comparison to the overall data, simple regressions were repeated separately for the 30 data points from each participant. This resulted in significant r2 values for all six participants under the joystick condition, for five out of the six participants under the head-coupled condition, and for five out of the six participants under the passive condition (all p ⬍ .01, except two nonsignificant regressions from different participants, each p ⬎ .05). Mean r2 values produced by the individual participants were 0.47, 0.38, and 0.48 for the joystick, head-coupled, and passive conditions, respectively, with mean slopes of 0.56, 0.56, and 0.61, and mean intercepts of 47.60, 44.90, and 42.30 cm. Therefore, it was determined that performance was similar under all three viewing conditions, revealed by similar r2 values, slopes, and intercepts in individual and overall regressions. These results indicate that participants were able to use radial outflow provided by camera motions to judge the distances of the targets in a remote environment. To test for practice effects, simple regressions were conducted predicting indicated target distance from actual distance with the data grouped according to the order of presentation rather than in terms of viewing

310 PRESENCE: VOLUME 18, NUMBER 4

Figure 1. Mean indicated distance predicted by actual distance in the (a) joystick, (b) head-coupled, and (c) passive conditions of Experiment 1.

condition. The data for each participant’s first condition, for example, were placed in a simple regression regardless of which viewing condition that data originated from. The resulting mean r2 values for the condition performed first, second, and third were 0.33, 0.43, and 0.57, respectively, with slopes of 0.45, 0.58, and 0.71, and intercepts of 56.90, 46.40, and 31.60 cm. Thus the ability of the participants to perceive target

distance from the moving displays improved over the course of the experiment, indicating a practice effect. Simple regressions were also conducted to predict indicated target distance from image size for the combined data of the six participants. While the r2 values for these simple regressions were statistically significant, the overall amount of variance accounted for by image size was very small, with r2 ⫽ .07, F(1, 178) ⫽ 14.20, p ⬍

Gomer et al. 311

.001 for the joystick condition, r2 ⫽ .10, F(1, 178) ⫽ 19.90, p ⬍ .001 for the head-coupled condition, and r2 ⫽ .04, F(1, 178) ⫽ 6.91, p ⬍ .01 for the passive condition (all n ⫽ 180). Thus, while it was possible for participants to base their distance indications on image size, they were relying on actual target distance, as specified by the expansion and contraction of the target on the screen, to a much larger extent. The results from the first experiment confirm that participants were able to use optical information provided by radial outflow to perceive the distances of remote targets. Further, this study revealed that participants were able to use radial outflow presented on a video monitor when the motion was created by a camera coupled to a joystick, coupled to head movements, or viewed passively on a monitor. Contrary to expectations, participants did not exhibit superior performance under the head-coupled condition, which mimicked the natural coupling between head movement and optical motion. A possible reason for this was the inherent delay between the movements of the head and the resulting camera motions. Also, in the passive condition the camera motions followed a consistent sinusoidal velocity profile. These regular motions may have allowed the participants to more easily become attuned to the differences in expansion rate that occurred from trial to trial in correspondence with the changes in target distance (Peh et al., 2002). Practice was found to improve performance, even with conditions taking place on separate days.

3

Experiment 2

In the second experiment, we tested an additional viewing condition that combined the passive and head-coupled conditions of Experiment 1. In the active condition of Experiment 2, the camera motions were produced passively as in Experiment 1, and therefore occurred in a consistent fashion with a uniform motion from trial-to-trial. The participants were asked to synchronize their head movements with the camera by moving their heads forward and backward

with the expanding and contracting image on the monitor. In this way, participants effectively coupled their head movements with the camera motions without the need for the computer and supporting systems establishing an actual linkage. A large body of work has confirmed that such rhythmic entrainment to visual stimuli is natural and easily achieved (Buekers, Bogaerts, Swinnen, & Helson, 2000; Schmidt, Carello, & Turvey, 1990; Schmidt & Turvey, 1994; Turvey, 1990; Wimmers, Beek, & Wieringen, 1992). It was hypothesized that the active condition would result in superior performance as it provided participants with the regularity of the passive camera motions, a coupling between their active head movements, and an image that changed along with their head movements without a camera delay.

3.1 Method 3.1.1 Participants. Eight Clemson University graduate students (four males, four females) participated in the second experiment after informed consent was obtained. Each participant received payment for his or her participation and self-reported normal or corrected to normal visual acuity, as well as full use of neck, arms, and hands. 3.1.2 Materials and Procedure. The materials and procedures for the second experiment were similar to those of the first, with the exception that the joystick condition was replaced by the active condition. Under the active condition, an apparatus restricted participant head movements to the range of 17.40 cm, similar to the first experiment. The apparatus consisted of two plastic tubes attached perpendicular to an adjustable rail which in turn was mounted on a tripod. Before the experiment, the height of the tubes was set to be even with the participant’s forehead, and the separation was set to be equal to the length of the head, from front-toback, plus 17.40 cm. During the active condition, participants moved their heads forward and backward, synchronizing their head movements to the target movements, with their head movements restricted by the range of the apparatus. Thus, the participants

312 PRESENCE: VOLUME 18, NUMBER 4

moved their heads along with the camera motions, using the same amplitude and frequency of movement. The passive and head-coupled conditions remained the same as in Experiment 1, but due to software improvements the delay between head movements and camera motions in the head-coupled condition was reduced from 200 ms to 70 ms. Participants were tested on three different days, one condition per day.

3.2 Results and Discussion Simple regressions predicting indicated target distances from actual target distances were performed separately for the 30 data points from each individual participant in each of the three conditions. For seven of the eight participants, these regressions resulted in significant r2 values in all three viewing conditions (all p ⬍ .001, except one participant being p ⬍ .05 in one condition). The remaining participant produced a significant r2 value in only one of the three conditions, and thus was dropped from the remaining analyses. As operators should normally be screened for a minimal level of proficiency for actual applications, we felt justified in removing this participant. Mean r2 values produced by the seven individual participants were 0.63, 0.51, and 0.57 for the active, head-coupled, and passive conditions, respectively, with mean slopes of 0.72, 0.64, and 0.72. Simple regressions predicting indicated target distances from actual target distances were performed for each of the three conditions using all 30 data points from the seven participants combined (for a total n ⫽ 210). These regressions resulted in r2 ⫽ .59, F(1, 208) ⫽ 294.50, p ⬍ .001 for the active condition, r2 ⫽ .47, F(1, 208) ⫽ 181.50, p ⬍ .001 for the head-coupled condition, and r2 ⫽ .55, F(1, 208) ⫽ 249.70, p ⬍ .001 for the passive condition. The slopes of the resulting regression lines were 0.72, 0.64, and 0.72 for the active, head-coupled, and passive conditions, respectively, with intercepts of 27.18, 39.52, and 29.34 cm. Figure 2 presents indicated target distance as a function of the actual target distance for the second experiment. Similar to Figure 1, to reduce visual clutter and judgment variability, each point in Figure 2 represents the

mean of the six responses made by each of the seven participants to each of the given target distances. Simple regressions for practice effects in the second experiment resulted in mean r2 values for the first, second, and third conditions equal to 0.48, 0.44, and 0.68, respectively, with slopes of 0.64, 0.60, and 0.84, and intercepts of 36.28, 44.38, and 15.39 cm. While the second condition was marginally worse than the first condition, overall, practice continued to improve performance. Results from the second experiment provided further confirmation that participants can use radial outflow generated by camera motions to perceive the distance of a target in a remote environment. However, the active and passive conditions produced nearly identical results. Both conditions produced slopes of 0.72 and similar mean r2 values of 0.63 and 0.57, respectively. Thus, no additional benefit was obtained by coupling the radial outflow to head movements, whether through an actual coupling or through synchronization with a passively moved camera. One shortcoming of both the headcoupled and active conditions was that while the participants moved their heads forward and backward and the image of the target on the monitor expanded and contracted by a corresponding amount, the participants’ heads were moving toward and away from the monitor. Thus the image of the monitor on the retina expanded and contracted while the image of the target on the monitor was simultaneously expanding and contracting. Some participants in Experiments 1 and 2 reported that this made it difficult for them to utilize the expansion information in the head-coupled condition, and similar comments were made by the participants of Experiment 2 with regard to the active condition.

4

Experiment 3

Past research has shown that for perception to be metrically accurate, feedback is necessary to calibrate the perceptual system (e.g., Bingham & Pagano, 1998; Bingham, Zaal, Robin, & Shull, 2000; Pagano & Bingham, 2008; Wickelgren et al., 2000; Withagen & Michaels, 2005). However, during field exercises or USAR applications, feedback is not al-

Gomer et al. 313

Figure 2. Mean indicated distance predicted by actual distance in the (a) active, (b) head-coupled, and (c) passive conditions of Experiment 2.

ways available, especially when the view of the remote environment is compromised by smoke, dust, and so on. Optimally, teleoperators would have the opportunity to become trained in the use of their equipment before being deployed to the field. This experiment illustrates that training should include feedback regarding the distances of targets observed in a remote environment. While feedback has been shown to be effective for training, it was unknown whether participants would be able to accomplish distance percep-

tion without feedback. This was addressed in the third experiment. Participants were trained to use radial outflow as distance information by participating in a block of trials involving familiar objects. Objects were placed at different distances from a remote camera in an order that illustrated the relationship between each distance and the resulting degree of radial outflow produced by camera motions at that particular distance. A tennis ball, for example, was placed near the camera in one trial, fol-

314 PRESENCE: VOLUME 18, NUMBER 4

lowed by a compact disc placed farther from the camera in the next trial, resulting in the same image size on the monitor. This further demonstrated the lack of relationship between image size and distance.

4.1 Method 4.1.1 Participants. Eight Clemson University students (three males, five females) participated in the third experiment after informed consent was obtained. Each participant received payment for his or her participation and self-reported normal or corrected to normal visual acuity, as well as full use of neck, arms, and hands. 4.1.2 Materials and Procedure. Materials were the same as the first experiment, with the exception of six familiar objects: a 12 oz. soda can, a 2 L plastic soda bottle, a compact disc, a tennis ball, a standard playing card, and a US dollar bill, that were used during the initial training session. The compact disc and the clear portions of the soda bottle were painted white to increase visual contrast. As with the white squares, familiar targets were placed at various distances to create three different visual angles of 5°, 8°, and 11°. The resulting distances ranged from 45.3 to 156.9 cm. The soda bottle, soda can, and playing card were oriented parallel to the lens of the camera in a vertical position, while the dollar bill was rotated 90° so that its longer dimension was vertical. The target visual angles were defined to reflect each object’s vertical dimension. In all three conditions, the targets were viewed passively, as with the passive condition of the first two experiments. The first condition was a training condition. Participants were initially presented with familiar objects in their direct line of sight. They then viewed the objects on the monitor, fed from the passively moved camera, while being shown on the reporting device the actual distance of the object from the camera. Next, participants began the experimental trials in the order given in Table 1. All participants received these trials in that order, and the 15 trials listed in Table 1 were repeated for a total of 30 trials per participant. As indicated in Table 1, participants were shown the 12 oz. soda can three times in a row, each time at a different distance. In this

Table 1. Objects Used in the Training Session of Experiment 3, Along with the Distance (cm) at which Each Was Presented and the Resulting Optical Angle (°) Trial

Object

Distance

Angle (°)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

12 oz. Soda can 12 oz. Soda can 12 oz. Soda can Tennis ball Tennis ball Compact disc Compact disc Compact disc 2 L Soda bottle Playing card Playing card Playing card US Dollar bill US Dollar bill 2 L Soda bottle

62.80 86.80 116.10 61.80 45.30 85.40 114.20 61.70 156.90 45.30 83.70 62.60 111.70 80.80 156.90

11 8 6 6 8 8 6 11 11 11 6 8 8 11 11

way, they observed that expansion rate and image size (presented by optical angle) varied as a function of distance for a given object. In trial 4, they were shown the tennis ball presented at a distance that created the same image size as the presentation of the can in the previous trial. This conveyed that image size did not relate to distance, as two different objects presented at different distances can have the same image size. Each time there was a switch from one object to another, the distances were such that image size did not vary between those two trials. This demonstrated that expansion rate varied with distance, and was further reinforced each time the same object was presented in succession, at a different distance. Object sizes and distances were chosen not only to create the same size image but, as indicated in Table 1, also so that the distances used for the soda can in trial 1, the ball in trial 5, the compact disc in trial 8, and the card in trial 12 were similar. Other distances were repeated in the same fashion. This procedure was essential so that participants could not associate a particular image or object size with a particular distance. The remaining procedures for the ex-

Gomer et al. 315

perimental trials of the first session were identical to those of the passive camera condition of Experiments 1 and 2, with the participants making distance judgments and receiving feedback during each trial. The second and third conditions used the white squares of Experiments 1 and 2 as targets. The second condition was identical to the passive camera condition of Experiments 1 and 2, with the participants receiving three trials of training before the experiment, and feedback throughout the experiment. The third condition was similar, except that the participants did not receive any training or feedback. The second and third conditions of Experiment 3 will be referred to as the “feedback” and “no feedback” conditions, respectively. The third condition was included to determine whether the participants would be able to complete the task without feedback, as in many real world applications. Each participant completed all conditions in this fixed order within the same day, with a minimum break of 1 hr between conditions.

4.2 Results and Discussion Participants’ indicated target distances as a function of the actual target distances in the third experiment are presented in Figure 3 (in each condition n ⫽ 240). Means are not presented for this experiment. Simple regressions predicting indicated target distances from actual target distances for all 240 trials resulted in r2 ⫽ .73, F(1, 238) ⫽ 712.50, p ⬍ .001 for the training condition, r2 ⫽ .42, F(1, 238) ⫽ 175.14, p ⬍ .001 for the feedback condition, and r2 ⫽ .46, F(1, 238) ⫽ 201.40, p ⬍ .001 for the no feedback condition. The slopes of the resulting regression lines were 0.94, 0.61, and 0.69 for the training, feedback, and no feedback conditions, respectively, with intercepts of 2.44, 37.82, and 29.15. The design of the third experiment was different from the first two in that it was completed in a fixed order and did not allow for similar practice effects. Simple regressions repeated separately for the 30 data points of each participant resulted in significant r2 values for all eight participants in all three conditions (all p ⬍ .001, except one participant was p ⬍ .01 in the feedback

condition and p ⬍ .05 in the no feedback condition). The mean r2 values produced by the seven best performing individual participants were 0.79, 0.57, and 0.60 for the training, feedback, and no feedback conditions, respectively, with mean slopes of 0.96, 0.70, and 0.74. Simple regressions were also conducted to predict indicated target distance from image size for the combined data of all eight participants (n ⫽ 240). These resulted in an r2 ⫽ .01, F(1, 238) ⫽ 2.40, p ⬎ .10 for the training condition, an r2 ⫽ .01, F(1, 238) ⫽ 3.00, p ⬎ .05 for the feedback condition, and an r2 ⫽ .06, F(1, 238) ⫽ 15.70, p ⬍ .001 for the no feedback condition. Therefore, simple regressions were significant only for the no feedback condition. When feedback was present, and after a period of training with real objects, participants did not base their judgments on the image size of the target on the monitor. However, with feedback removed, participants did base their distance judgments more on image size, but this accounted only for approximately 6% of the variance in their judgments. The results from the third experiment provided further confirmation that participants can use radial outflow provided by passive camera motions to perceive the distances of targets in a remote environment. After an initial period of training and feedback, participants may be able to continue to use this information when feedback is removed. This could be beneficial if teleoperators are deployed under degraded conditions similar to those encountered during USAR teleoperation. Under such conditions, there are few familiar objects, yet accurate spatial perception remains critical to mission success (e.g., Casper & Murphy, 2003; Murphy, 2004). In addition, the third experiment illustrates a potential benefit to training with familiar objects. Participants in the third experiment exhibited very accurate perceptions of distance during the training condition, with the slopes of the regression lines being around 0.95 and the intercepts around 1.1, which is near to perfect performance of 1.0 and 0, respectively. Thus, it is possible to achieve accurate performance with a camera that is moved passively.

316 PRESENCE: VOLUME 18, NUMBER 4

Figure 3. Indicated distance predicted by actual distance for the (a) training, (b) feedback, and (c) no feedback conditions of Experiment 3.

5

General Discussion

The results from these experiments indicate that teleoperators can use optical information provided by radial outflow to perceive distances of targets in a remote environment. The observed order effects also indicate that practice improved performance. Additionally, after an initial period of training and feedback with fa-

miliar objects, participants were able to use this information with novel, unfamiliar objects and no feedback. The degree to which participants based their distance judgments on image size was very small, indicating they had learned to base distance judgments on radial outflow rather than image size. This distinction is important, as it conveys that participants learned to base judgments on information that actually specified distance,

Gomer et al. 317

rather than on information that failed to relate to distance. While image size in direct line of sight is related to distance and is used to perceive distance, this relation often does not exist in teleoperation and it was removed in the present tasks. Interestingly, participants did not use image size with the familiar objects of Experiment 3, during which one would expect participants to rely on image size the most (Ittelson, 1951). The ability of participants to selectively attune to a specific variable is a hallmark of perceptual learning (Withagen & Michaels, 2005). The present experiments revealed that radial outflow is a variable that participants can become selectively attuned to when perceiving objects in a remote environment. Our experiments were directed at assessing the ability of radial outflow to support the perception of absolute distance, that is, the actual metric distance between the camera and the test object. Another type of depth perception is relative distance perception, where the observer is only required to judge the relative distances between two different targets, such as which one is closer to the camera. Given that the perception of relative distance is typically easier than the registration of absolute distance (e.g., Bingham & Pagano, 1998; Rogers & Graham, 1979), it is possible that radial outflow would support relative depth perception without the need for training and feedback. With two targets appearing side by side on a video monitor, relative depth can be discerned by noting a difference in the rate at which the two targets expand and contract, without the need to relate the expansion rates to specific absolute distances. Future work should investigate the ability of radial outflow to support the perception of relative distance, as well as its ability to support absolute distance perception when multiple targets are present. In our studies, no additional benefit was found to coupling radial outflow to either head movements or joystick movements. These results are similar to those of Witmer and Kline (1998), who found that optical motion improves distance perception in a virtual environment, with no additional advantage achieved by coupling optic flow to walking on a treadmill. Applying this work to field teleoperation, there are some advantages to using passive camera displacements over head-

coupled or joystick-coupled motions. One obvious advantage is that head tracking equipment is not necessary, and even the need for a coupling between a joystick and the camera is eliminated. With a passive camera, teleoperators need not move their head in order to move the camera, and without the joystick coupling the hands are free for other tasks. In their comparison between forward-to-back and side-to-side head movements, Wickelgren et al. (2000) found that forward-toback head movements yielded more accurate reaches to targets placed at various distances. The participants in their experiment were allowed to generate their own head movements and the resulting side-to-side movements were nearly twice as large as the forward-to-back movements. The authors concluded that while forwardto-back movements were more useful, they were less efficient and more difficult to perform. Self-produced head movements could prove fatiguing during extended use, and this may increase instances of simulator sickness when coupled with artificial displays. The research presented here demonstrates that the use of passive forward and back camera movements supports distance perception without the need for users to generate the motions themselves. Without the need to manually control a camera, attentional resources are available for other tasks. Consistent camera motions may also better facilitate the use of information pertaining to the rate of radial outflow. An additional benefit of passive camera motions is that the resulting video can be stored and viewed by other users without the need to couple the video recording to the viewers’ head movements during playback. During USAR, there are typically instances where a camera view needs to be shown to domain experts, such as firefighters, structural engineers, medical personnel, and so on (Casper & Murphy, 2003). If a teleoperator comes upon something of interest, the robot could be paused and an appropriate expert called in to view the remote environment. With the camera set in motion once again, the expert can benefit from the augmented depth information provided by the optical motions. Alternatively, the video could be recorded and then played back for the expert, preserving depth information, thus conserving the robot’s power supply. After the use of mo-

318 PRESENCE: VOLUME 18, NUMBER 4

bile robots at the WTC site, medical personnel were able to identify human remains in the videos months after the incident (Casper & Murphy; Murphy, 2004). It is possible that this process would have benefited by the presence of increased optical motions in the video recordings. In many instances, multiple users may be viewing the same camera feed simultaneously. If the camera is coupled to user head motions or joystick inputs, then the display is only optimal for that individual observer (Stassen & Smets, 1997). It would have to be determined which of the users is tasked with providing the movements, and irregularities in the camera motions may make it difficult for others to utilize the resulting optical motions. This is not the case with recordings or live feeds generated by a passive camera. In contrast to binocular displays, the use of optical motion provided by a single camera is easily stored and can be played back using standard devices. The resulting video can also be displayed simultaneously to multiple viewers, both during the live feed and during later presentations. Given the results of the present experiments, it is recommended that cameras be mounted on mobile robots in a manner that allows the camera to be moved independently of the platform, so as to move forward and backward in a consistent manner, maintaining a sinusoidal velocity profile. It has been shown that this can provide real-time depth information to teleoperators and could potentially be utilized by experts viewing the live feed or previous recordings of it. Finally, moving an entire robot in order to produce optic flow is costly in terms of energy, it increases the risk of collisions between the robot and surfaces in the environment, and it can degrade the environment by stirring up dust or debris. Therefore, it is recommended that the camera be capable of making these motions while the robot itself remains stationary.

Acknowledgments The authors thank Vilas Chitrakaran for programming and maintaining the Puma arm and Flock of Birds system. Assistance in data collection by Megan Smart and Thandi Bland-

ing, along with editing by Kerry Gretchen, is also greatly appreciated. This work was supported by the Defense Advanced Research Projects Agency (DARPA) through contract No. N66001-03-C-8043 and by the National Science Foundation under Grant No. SES-0353698.

References Agah, A., & Tanie, K. (1999). Multimedia human-computer interaction for presence and exploration in a telemuseum. Presence: Teleoperators and Virtual Environments, 8, 104 – 111. Bingham, G. P., & Pagano, C. C. (1998). The necessity of a perception-action approach to definite distance perception: Monocular distance perception to guide reaching. Journal of Experimental Psychology: Human Perception and Performance, 24, 145–168. Bingham, G. P., & Stassen, M. G. (1994). Monocular distance information in optic flow from head movement. Ecological Psychology, 6, 219 –238. Bingham, G. P., Zaal, F., Robin, D., & Shull, J. A. (2000). Distortions in definite distance and shape perception as measured by reaching without and with haptic feedback. Journal of Experimental Psychology: Human Perception and Performance, 26, 1051–1059. Borish, I. (1970). Clinical refraction (3rd ed). Chicago: Professional Press. Bruce, V., Green, P. R., & Georgeson, M. A. (2003). Visual perception: Physiology, psychology, & ecology (4th ed). Hove, UK: Psychology Press. Bruckstein, A., Holt, R., Katsman, I., & Rivlin, E. (2005). Head movements for depth perception: Praying mantis vs. pigeon. Autonomous Robots, 18, 21– 42. Buekers, M. J., Bogaerts, H. P., Swinnen, S. P., & Helsen, W. F. (2000). The synchronization of human arm movements to external events. Neuroscience Letters, 290, 181– 184. Casper, J., & Murphy, R. R. (2003). Human-robot interactions during the robot-assisted urban search and rescue response at the World Trade Center. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 33, 367– 385. Collett, T. S. (1978). Peering—A locust behaviour pattern for obtaining motion parallax information. Journal of Experimental Biology, 76, 237–241.

Gomer et al. 319

Ee, R. van, & Richards, W. (2002). A planar and a volumetric test for stereoanomaly. Perception, 31, 51– 64. Ellard, C. G., Goodale, M. A., & Timney, B. (1984). Distance estimation in the Mongolian gerbil: The role of dynamic depth cues. Behavioral Brain Research, 14, 29 –39. Ferris, S. H. (1972). Motion parallax and absolute distance. Journal of Experimental Psychology, 95, 258 –263. Foley, J. M., & Held, R. (1972). Visually directed pointing as a function of target distance, direction, and available cues. Perception & Psychophysics, 12, 263–268. Gibson, J. J. (1950). Perception of the visual world. Boston: Houghton Mifflin. Gibson, J. J. (1958). Visually controlled locomotion and visual orientation in animals. British Journal of Psychology, 49, 182–194. (Reprinted in 1998, Ecological Psychology, 10, 161–176.) Gibson, J. J. (1979). The ecological approach to visual perception. Boston: Houghton Mifflin. Gogel, W. C., & Teitz, J. D. (1973). Absolute motion parallax and the specific distance tendency. Perception & Psychophysics, 13, 284 –292. Goodale, M. A., Ellard, C. G., & Booth, L. (1990). The role of image size and retinal motion in the computation of absolute distance by the Mongolian gerbil (Meriones unguiculatus). Vision Research, 30, 399 – 413. Hale, K. S., & Stanney, K. M. (2006). Effects of low stereo acuity on performance, presence and sickness within a virtual environment. Applied Ergonomics, 37, 329 –339. Howarth, P. A., & Finch, M. (1999). The nauseogenicity of two methods of navigating within a virtual environment. Applied Ergonomics, 30, 39 – 45. Hoyle, F. (1957). The black cloud. Middlesex, UK: Penguin. Ittelson, W. H. (1951). Size as a cue to distance: Static localization. American Journal of Psychology, 64, 54 – 67. Kral, K. (2003). Behavioural-analytical studies of the role of head movements in depth perception in insects, birds and mammals. Behavioural Processes, 64, 1–12. Lee, D. N. (1974). Visual information during locomotion. In R. B. McLeod & H. L. Pick (Eds.), Studies in perception: Essays in honor of J. J. Gibson (pp. 250 –267). Ithaca, NY: Cornell University Press. Lee, D. N. (1976). A theory of visual control of braking based on information about time-to-collision. Perception, 5, 437– 459. Marotta, J. J., Perrot, T. S., Nicolle, D., & Goodale, M. A. (1995). The development of adaptive head movements following enucleation. Eye, 9, 333–336.

Marotta, J. J., Perrot, T. S., Nicolle, D., Servos, P., & Goodale, M. A. (1995). Adapting to monocular vision: Grasping with one eye. Experimental Brain Research, 104, 107–114. Murphy, R. (2004, September). Trial by fire: Activities of the rescue robots at the World Trade Center from 11–21 September 2001. IEEE Robotics & Automation Magazine, 50 – 60. Pagano, C. C., & Bingham, G. P. (1998). Comparing measures of monocular distance perception: Verbal and reaching errors are not correlated. Journal of Experimental Psychology: Human Perception and Performance, 24, 1037–1051. Peh, C.-H., Panerai, F., Droulez, J., Cormilleau-Pe´re`z, V., & Cheong, L.-F. (2002). Absolute distance perception during in-depth head movement: Calibrating optic flow with extraretinal information. Vision Research, 42, 1991–2003. Regan, D., & Hamstra, S. J. (1993). Dissociation of discrimination thresholds for time to contact and for rate of angular expansion. Vision Research, 33, 447– 462. Reinhardt-Rutland, A. H. (1996). Remote operation: A selective review of research into visual depth perception. The Journal of General Psychology, 123, 237–248. Rogers, B., & Graham, M. (1979). Motion parallax as an independent cue for depth perception. Perception, 8, 202– 214. Schmidt, R. C., Carello, C., & Turvey, M. T. (1990). Phase transitions and critical fluctuations in the visual coordination of rhythmic movements between people. Journal of Experimental Psychology: Human Perception and Performance, 16, 227–247. Schmidt, R. C., & Turvey, M. T. (1994). Phase-entrainment dynamics of visually coupled rhythmic movements. Biological Cybernetics, 70, 369 –376. Smets, G., Overbeeke, C., & Stratmann, M. (1987). Depth on a flat screen. Perceptual and Motor Skills, 64, 1023–1034. Smets, G. (1995). Designing for telepresence: The Delft virtual window system. Local Applications of the Ecological Approach to Human-Machine Systems, 2, 182–207. Sobel, E. C. (1990). The locust’s use of motion parallax to measure distance. Journal of Comparative Physiology A, 167, 579 –588. Stassen, H. G., & Smets, G. J. F. (1997). Telemanipulation and telepresence. Control Engineering Practice, 5, 363–374. Tittle, J. S., Roesler, A., & Woods, D. D. (2002). The remote

320 PRESENCE: VOLUME 18, NUMBER 4

perception problem. Proceedings of the Human Factors and Ergonomics Society 46th Annual Meeting, 260 –264. Todd, J. T. (1981). Visual information about moving objects. Journal of Experimental Psychology: Human Perception and Performance, 7, 795– 810. Turvey, M. T. (1990). Coordination. American Psychologist, 45, 938 –953. Voorhorst, F. A., Overbeeke, C. J., & Smets, G. J. F. (2003). Implementing perception-action coupling for laproscopy. In L. J. Hettinger & M. Haas (Eds.), Virtual and adaptive environments: Applications, implications, and human performance issues (pp. 391– 411). Mahwah, NJ: Lawrence Erlbaum Associates. Wallace, G. K. (1959). Visual scanning in the desert locust Schistocerca gregaria forskål. Journal of Experimental Biology, 36, 512–525. Warren, W. H. (1998). The state of flow. In T. Watanabe (Ed.) High-level motion processing: Computational, neurobiological, and psychophysical perspectives. Cambridge, MA: Bradford/MIT Press. Welch, R. B. (2002). Adapting to virtual environments. In K. Stanney (Ed.), Handbook of virtual environment technology (pp. 619 – 636). Mahwah, NJ: Lawrence Erlbaum Associates.

Wexler, M., & Boxtel, J. A. van (2005). Depth perception by the active observer. TRENDS in Cognitive Sciences, 9, 431– 438. Wickelgren, E. A., Daniel, M. C., & Bingham, G. P. (2000). Reaching measures of monocular distance perception: Forward versus side-to-side head movements and haptic feedback. Perception and Psychophysics, 62, 1051–1059. Wimmers, R. H., Beek, P. J., & van Wieringen, P. C. W. (1992). Phase transitions in rhythmic tracking movements: A case of unilateral coupling. Human Movement Science, 11, 217–226. Withagen, R., & Michaels, C. F. (2005). The role of feedback information for calibration and attunement in perceiving length by dynamic touch. Journal of Experimental Psychology: Human Perception and Performance, 24, 145– 168. Witmer, B. G., & Kline, P. B. (1998). Judging distance and traversed distance in virtual environments. Presence: Teleoperators and Virtual Environments, 7, 144 –167. Woods, D. D., Tittle, J., Feil, M., & Roesler, A. (2004). Envisioning human-robot coordination in future operations. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 34, 210 –218.