Human stereovision without localized image features - CiteSeerX

2 downloads 0 Views 563KB Size Report
A recent study by Blake and Biilthoff (1990,. 1991) showed, however ...... H.A.M. We m grateful to Andrew Blake, Sabine Dartsch, Alice O'Toole,. Tomaso Poggio.
Biol. Cybern. 72, 279-293 (1995)

Human stereovision without localized image features Petra A. Arndt', Hanspeter A. Mallot2, Heinrich H. Biilthoff2

'

Institut fur Neuroinformntik, Ruhr-Universitat-Bochum, 44780 D-Bochum, Germany Max-Planck-Institut fiir biologische Kybernetik, Spemannstrasse 38. D-72076 Tubingen, Germany

Received: 8 February 1994 / Accepted in revised form: 15 September 1994

Abstract. Many theories of human stereovision are based on feature matching and the related correspondence problem. In this paper, we present psychophysical experiments indicating that localized image features such as Laplacian zerocrossings, intensity extrema, or centroids are not necessary for binocular depth perception. Smooth one-dimensional intensity profiles were combined into stereograms with mirrorsymmetric half-images such that these localized image features were either absent or did not carry stereo information. In a discrimination task, subjects were asked to distinguish between stereograms differing only by an exchange of these half-images (ortho- vs. pseudoscopic stereograms). In a depth ordering task, subjects had to judge which of the two versions appeared in front. Subjects are able to solve both tasks even in the absence of the mentioned image features. The performance is compared to various possible stereo mechanisms. We conclude that localized image features and the correspondences between them are not necessary to perceive stereoscopic depth. One mechanism accounting for our data is correlation or mean square difference.

the first question, i.e. which image differences are available for stereovision. The major source of image differences is binocular perspective or parallax. As has been pointed out by Regan et al. (1990), this view of disparity can be traced back to the astronomer Johannes Kepler. It is based on geometrical considerations alone and basically ignores the surface properties of the imaged objects. This is sufficient for Lambertian surfaces where reflection is the same in all directions and therefore both eyes will receive the same irradiance from each object point. In contrast, a point on a specular surface reflects light of different intensity or color to the two eyes. As a result, disparities of specular highlights cannot be interpreted in terms of binocular perspective alone. A recent study by Blake and Biilthoff (1990, 1991) showed, however, that the human visual system can make use of disparate specular highlights for the perception of shape or surface curvature. A further classification of geometric disparities can be based on the notion of a disparity map. Consider the case that both eyes see exactly the same set of object points, i.e. there is no monocular occlusion. If 4 ( 2 , y) and I,.(x. y) denote the left and right image, we can then transform the left image into the right by the equation

1 Introduction

IJz? y) = AIl(x + dh(z,y), y

1.1 Differences between half-images of a stereogram

where A and c are suitable constants. The vector field (dh(z,y), d,(z, y)) is called the disparity map. The computation of simple parameterized disparity maps such as 0 may correspond to the perdh(x, y) = const; d,(x, y) ception of global stereopsis in the sense of Julesz (1971). Locally, the disparity map can be assessed in a number of ways including:

Stereoscopic depth perception is based on the differences between the left and right half-images of a stereogram. However, the nature of the image differences used in human depth perception is not entirely clear. Julesz (197 1) has shown that stereo depth can be obtained from random dot stereograms in which all of the differences are described by the horizontal separation of corresponding dots, i.e. by the point disparity of localized features. While this proves that localized image features are sufficient for stereoscopic depth perception, the reverse question, whether localized features are necessary, is still open. A discussion of disparity can be given either in terms of image formation or stereo mechanisms. We start with Correspondence t o :

H.A. Mdlot

+ d,(x, y)) + c

(1)

-

1. Local values of the disparity map, i.e., shifts in the x-and y-directions, are the ordinary horizontal and vertical disparities. Vertical disparities occur only off the horizontal and vertical meridian; they have been shown to affect stereoscopic depth perception by Rogers and Bradshaw (1993). 2. The first two-dimensional derivative of the disparity map describes local distortions between the left and right images. These distortions can be decomposed in terms of

rotation (orientation disparity), scale, and shear disparities, in the same way as proposed for optical flow fields by Koenderink and van Doorn (1976). The relevance of orientation disparity has been shown with uncorrelated noise patterns made of oriented line segments of slightly different orientation (von der Heydt 1979; Cagene110 and Rogers 1993). Rogers and Cagenello (1989) have used line stereogTams whose half-images differ by a shear transformation to investigate the effect of local deformations. The notion of a disparity map can be easily extended to spatiotemporal images I ( x , y, t ) . Motion in depth results in orientation-type distortions in the spatiotemporal disparity map. Beverly and Regan (1973) measured adaptation to motion in depth and concluded that adaptable channels exist for a number of 3D directions or, equivalently, motion disparities. If the two eyes see different sets of points, the notion of a disparity map becomes less useful. For transparent surfaces, multiple depth maps may be required, each of which applies to a set of corresponding image points. Weinshall (1991) has presented ambiguous stimuli that lead to the perception of multiple transparent planes even if valid interpretations with just two depth planes exist. Finally, the images may differ qualitatively in that whole surfaces are visible to one eye but occluded to the other (Panum's limiting case). Evidence for the utilization of monocular occlusion in stereovision has been presented by Shimojo and Nakayama (1990): If a monocular dot is presented on the appropriate side of a binocularly visible line, the dot is perceived behind the line as if it were monocularly occluded by a step edge in depth. More recently, Liu et al. (1994) presented similar results for surfaces. Another effect related to object disparity has been described by Ninio (1981) who constructed "random curve stereograrns" by imaging random walk trajectories on a 3D surface. He suggests that the curves can be matched globally without identifying individual points along the curves. The critique of the disparity map concept thus leads back to the idea of object disparities (as opposed to point disparities). While we know from Julesz' seminal work (1971) that these are not required to perceive stereoscopic depth, they may still be used if they are present.

1.2 Stereo mechanisms Which of the available image disparities summarized in Sect. 1.1 are actually used and what mechanisms are involved? Although the relevance of a wide variety of image disparities has been demonstrated, the major part of the psychophysical work on stereo mechanisms has focused on horizontal disparities. One major dichotomy of stereo mechanisms is that between intensity-based schemes working on continuous intensity data and feature-based approaches that take a primal sketch as their input (Marr and Poggio 1976, 1979). In computer vision, both approaches are often used successively. Sparse reliable data obtained by a feature-based scheme are refined and completed by an intensity-based second step that

makes use of correlation, phase shift, differential matching techniques, or Bayesian estimation (see Cernuschi-Frias et al. 1989; Dhond and Aggarwal 1989; Jenkin et al. 1991; Haralick and Shapiro 1992; Theimer and Mallot 1994). For human psychophysics, no such two-phase distinction has been demonstrated. A common mechanism using both sources of information simultaneously seems rather more likely (Biilthoff and Yuille 1991, Yuille et al. 1991). In any case, stereovision may be divided into three major steps: (i) monocular preprocessing, (ii) binocular interaction for the computation of local disparities, and (iii) spatiotemporal interactions between these local disparities for the reconstruction of continuous surfaces. There can be no doubt that image features such as edges, peaks, or centroids play an important role in human stereovision. Julesz' work on random dot stereograms shows convincingly that stereovision can be based on localized dots or edges alone. An extreme view supported by his results holds that only the feature locations are signalled from the monocular stage to the binocular interaction stage. If this were true, the following predictions could be made that can be tested psychophysically: 1. Stereovision should not be possible in the absence of the appropriate image features. 2. Variation of stimulus parameters other than feature location should not affect the stereoscopic percept. So far, the first prediction has not been tested systematically. In an earlier study (Biilthoff and Mallot 1988), we showed that images of smoothly shaded ellipsoids (such as the ones shown in Fig. l), which do not contain Laplacian zero-crossings, can still lead to a stereoscopic perception of depth.' This finding has been confirmed recently by Christou and Parker (1993) in a study using images of shaded cylinders. In these realistically rendered stimuli, however, other features such as level crossings (i.e., points where the Laplacian takes some precribed value other than zero), intensity peaks and centroids might still provide feature-based disparity information. In a similar vein, Daugman (1988) constructed images that lack zero-crossings of the Laplacian and argued that motion and texture perception are still possible, but again other features (such as intensity peaks) could account for this result. In stimuli with multiple features, i.e., sinusoidal gratings, Legge and Gu (1989) have measured the effects of binocular contrast, interocular contrast differences, and spatial frequency on stereo acuity. They found that stereo acuity increases with the square root of contrast and suggested from signal-to-noise considerations that intensity peak matching is the most likely mechanism to account for this result. Mayhew and Frisby (1981) constructed stereograms whose zero-crossings and peaks contained somewhat different stereo information. Their demonstration shows that disparities of both types of features determine the perceived surface. Taken together, the results summarized here indicate that both edges and peaks play a role in stereopsis, but

' The Lnplacian of the intensity function is the sum of the second p a l i d derivatives, A I ( z , g ) = $I(z, g ) + &I(z, g). In the "ID stin~uli"used '??I in Ihe major part of this study, LC., stimuli that are constant in the ydirection, it is simply the sccond derivative of the intensity function ( M x r and Hildreth 1980).

Fig. 1. Demonstmtion of intensity-based steleo. Top stereogram of a smoolhly shaded, Lambe~lian ellipsoid illuminated from the viewing direction; Bottom for the pseudoscopic stereogram, the left and right half-images are interchanged. The stereograms are arranged for uncrossed fusion; for crossed fusion, he pseudoscopic stereogram is on top. Thc two stereograms k perceived at different positions in depth. See Sect. 1.3

neither feature alone is necessary for depth perception. In order to test the above prediction (I), different features have to be assessed in a single stimulus simultaneously. One consequence of the second prediction is that if the images are blurred or degraded, the acuities of stereoscopic depth and ordinary (directional) localization should degrade by the same amount. Westheimer and McKee (1980) have measured the effect of defocussing and low-pass filtering on both ordinary (directional) and stereoscopic (depth) acuities and found that the decrement in stereoacuity is actually more severe than that of ordinary acuity. A possible interpretation of this finding is that, in addition to edge localization, contrast also plays a role in stereo acuity. This conclusion is further supported by data from Halpern and Blake (1988) and Legge and Gu (1989), who measured disparity thresholds at various contrast levels. Another line of evidence for the problem of intensitybased vs. feature-based stereo is related to the binocular mechanism that they employ. In feature-based stereo, the binocular stage has to solve the correspondence problem, e.g., by a number of contraints on the possible matches ( M ~ I T and Poggio 1979). The experimental evidence for constraint feature matching, however, is not unequivocal: 1. Properties or attributes of features do enter the correspondence computation. This is well known for contrast polarity (Julesz 1971). Jordan et al. (1990) have investigated the role of color in an ambiguous matching task based on the wallpaper illusion and found that color is used, at least sometimes, depending on background contrast. However, matching of differently colored targets can occur and usually results in binocular color mixing (cf. Metzger 1975). Biilthoff et al. (1991) measured the

decrease of perceived depth for high disparity gradients. They found that the decrease is less pronounced if the involved targets are distinguishable by form (dots vs. asterisks). 2. The uniqueness constraint is not generally satisfied. While Krol and van de Grind (1980) argued that uniqueness is obtained for simple double-nail displays, stimuli with multiple double-nail displays lead to the perception of transparent planes at "ghost" positions (Weinshall 1991). The visual system seems to use an ordering constraint of "small disparities". McKee and Mitchison (1988), using the wallpaper illusion, and Mallot and Bideau (1990), using the double-nail illusion, showed that the assignment of stereo correspondences is influenced by the vergence position of the eyes: small disparities are preferred. In addition to this critique of matching mechanisms, a number of studies have presented direct evidence favoring a correlation-type mechanism in human stereopsis. Cormack et al. (1991), for example, have used random dot stereograms with superimposed uncorrelated noise to show that stereoacuity depends on the interocular correlation of the image intensity distributions. In stereograms showing a large number of double-nail displays at random positions, Weinshall (1991) found that "ghost" planes are perceived at disparities corresponding to peaks in the correlation function. Since transparent ghost planes occur for sufficiently large dot numbers only, this finding is related to global stereopsis. In another "global" task of stereovision, i.e., disparityevoked vergence, Mallot and Arndt (1992) showed that for short presentation times, vergence approaches the 3D center of gravity of the spatial dot distribution used as a stimulus. The center of gravity computation is again weighted

by signal energy or autocorrelation. In summary, there is considerable evidence for a correlation-type mechanism in stereovision similar to the motion mechanism described first by Hassenstein and Reichardt (1956) for insect vision. 1.3 Demonstration of intensity-based stereo

Figure 1A shows a computer-generated stereogram of a smoothly shaded ellipsoid of revolution with a 2 : 1 elongation perpendicular to the paper plane. Illumination was simulated from behind the observer. Surface reflectance was modelled by the'lambertian cosine law. Figure 1B is identical to Fig. 1A except that the two half-images have been exchanged. From stereo geometry, one would expect that the orthoscopic stereogram (A) and the pseudogram (B) are reversed in depth. However, in terms of edge-based stereo, no differences should be visible at all since there are no disparate edges. When viewing the stereograms of Fig. 1 it is quite clear that there is a difference between the ortho- and pseudoscopic versions. We asked 18 naive subjects to verbally describe the differences between the orthoscopic and the pseudoscopic stereograms of a smoothly shaded ellipsoid displayed on a computer screen. The orthoscopic stereogram was seen as a convex ellipsoid, consistent with shape-fromshading as well as stereo information. The elongation of the ellipsoid was usually underestimated. The pse~doscopic stereogram of the same ellipsoid did not lead to the perception of a concave surface (i.e., bowl), as might have been expected from simple binocular perspective. Rather, a convex surface (ellipsoid) was seen which was further away from the observer and appeared flatter than the one evoked by the orthoscopic stereogram. All subjects reported a difference between the stereoscopic and the pseudoscopic image. This de~nonstrationis a strong hint that intensity-based disparities can be used to perceive binocular depth. It should be made clear that by intensity-based stereopsis, we do not mean a correspondence process matching points of equal grey-levels. Grey-level matching could easily be fooled by placing ordinary sunglasses in front of just one eye. The reader can easily verify that this does not affect intensity-based stereo. More likely mechanisms for intensitybased stereo include correlation of normalized images as well as phase shift and Bayesian methods (see above). When texture was added to the surface of the ellipsoid (Fig. 2), the orthoscopic stereogram was seen as an ellipsoid, while most subjects saw the pseudoscopic stereogram as a bowl. Two subjects did not perceive a bowl, but rather a transparent ellipsoid with the blobs inside. This is similar to the "subjective 3D surfaces" described by Biilthoff and Mallot (1988) and by Carman and Welch (1992). Some subjects were able to see the smoothly shaded ellipsoid as a bowl once they had seen the textured bowl with support from edge-based stereopsis. This perception was not stable, however.

2 Profiles without localized features

2.1 Image features It is apparent from Fig. 1 that there are no salient points on a smoothly shaded ellipsoid. For a quantitative analysis of the role of various image features in stereovision, however, a mathematical definition of these features is needed. In this section, we discuss three different feature types for the measurement of horizontal disparity in one-dimensional intensity profiles I,(x), Ib(x) comprising the two half-images of a stereogram. The first two of them are feature-based while the third is area-based. 1 . Laplacian zero-crossings are defined as positions x, where the second derivative of the profile crosses zero, I1l(x,) = 0 (and I1"(x,) $ O), i.e., inflection points of the intensity profile. In the theory of edge-detection developed by Marr and Hildreth (1980), inflection points correspond to zero-crossings at spatial scale zero. Using stimulus profiles without inflection points therefore guarantees that no zero-crossings will be present at any spatial scale, since increases in spatial scale can remove zero-crossings but cannot induce new ones (Yuille and Poggio 1986). 2. Intensity exmm.a (peak and trough) are defined as positions 2, satisfying I1(x,) = 0 and II1(x,) $0. 3. The Center of gravity or centroid of a profile is given by

where the integral is taken over the entire image, in our notation the interval [- 1, I]. This is equivalent to the definition given by Watt and Morgan (1985) who proposed to take the integral over intervals bounded by zerocrossings, since the image margins are the only zerocrossings present in our stimuli. The center of gravity is area-based. In our case, it is just one "feature" for the entire image. With suitable window functions, local centroids can be defined (Watt and Morgan 1985). 2.2 Intensio profiles

In this section, we introduce the intensity functions used in our experiments. In all cases, a simple peak- or troughshaped function I,(x), x E [- 1, 11 is defined; in the experiments, it is combined with its mirror-image Ib(x) := I,(-%) into a stereogram. If profile I, is presented to the left eye and profile Ibto the right, we call the resulting stereogram Sab;the reverse combination is called Sbu. Average intensity and contrast were controlled by a baseline M and an amplitude A characterizing the difference between the darkest and the brightest point in the profile (0 5 M 5 1;O 5 i\/l + A 5 1). Positive amplitudes correspond to the "peak" condition, i.e., intensity is highest in the center of the stimulus and decreases towards the margins. Conversely, negative amplitudes result in "troughw-type profiles. The profiles I, are arranged so that they take one extreme value at the left margin, I,(-1) = M. The other extreme with the intensity !\I + A occurs inside the interval 1-11 11.

Fig. 2. Comparison of intensity- and edge-based stereo. Sarne arrangement as in Fig. 1. Texture has been added to the surface by a solid texturing method described in Biilthoff and iviallot (1990). In this case, the pseudoscopic stereogram is perceived veridically, i.e., as a bowl or concavity. See Sect. 1.3

2.2.1 Parabolic intensity profiles: no Laplacian zero-crossings The simplest stimulus void of Laplacian zero- and levelcrossings is a parabolic intensity profile (Fig. 3A, B), the second derivative of which is constant. We used parabolic luminance profiles to prove the existence and examine the characteristics of intensity-based stereovision. The extremum was located at position d/2, d 2 0 such that in the case of peakor trough-matching, stereogram Sub has disparity d while stereogram St,, has disparity -d. The corresponding equation reads: (1 + d ) + d x -zZ d I,(x) := 174 + A - E [0, 11 ' 2 (l+d/2)2 The profile has no zero-crossing. The locations of extrema and centroid are listed in Table 1. 2.2.2 Cubic intensity profiles: no zero-crossings or disparate extrema Intensity extrema cannot be removed from the stimulus altogether without rendering the profile trivial. In order to test for the relevance of extrema in stereovision, we used a stimulus with an extremum in the center of the interval [-I, 11 such that extrema matching would result in zero disparity for both the stereogram Sab and Sba. Stimuli without Laplacian zero-crossings and disparate intensity peaks were designed from the skewed extremum of third-order polynomials,

(Fig. 3C, D). The arc cut out of the cubic function extends from the extremum to the unique inflection point [ x = 1 in (4)] and to the same distance in the other direction. The cubic profile has no free parameter except the baseline M and the amplitude A.

2.2.3 Rational intensity profiles: no zero-crossings, disparate extrema, or disparate centroids One localized feature in the cubic intensity profiles (4) that might be used for stereomatching is the center of gravity or centroid at position x, (cf. Table 1). We therefore designed a stimulus without zero-crossings where both the intensity peak and the centroid coincide at position n: = 0, i.e., at zero disparity (Fig. 3E, F). It is derived from a rational function2 with a pole at x = -1 and the slanted asymptote y = -x. The asymmetry can be controlled by a form parameter a ; for a = 1, the pole would be included in the profile, while small a results in almost symmetric profiles. The centroid is balanced by an additional cubic term (Fig. 3E, F): 1

f (x) := p(a)x3 - x - 1+ x where p is chosen to satisfy the conditionc:J

zf(x)ds. = 0:

Stretching the interval [-a, a ] to [- 1, 11 and normalizing the intensity values, we obtain: A rational function is the fraction of two polynomials. By analogy, we will also use the term "rational profile" and "rational stimulus".

Table 1. Locations of zero-crossings (x,), extrema (x,), and centroids (z,) for the stimuli used in this study. The disparities predicted for stereogram Sat, by matching each featurc are 2x,, 2x,, and 2a,, respectively. M , intensity baseline; A, amplitude Stimulus

Feature Locations I

TY~e

a

Parameter

Parabolic

(Eq.3. Fig. 3A. B)

Shift d l 2 E [O, 11

Cubic

(Eq. 4, Fig. 3C, D)

None

Rational

(Eq. 7, Fig. 3E, F)

Skewness a E [O, 1)

TZ

xe

None

d

z one^

0

2

Xc d 2 + 3d + 3M/A(I

+ d/2)2

0

i.e., on thc margin of the display interval while a 0.922


0); right colutnn (B, D, F): trough condition ( A < 0). In the experiments, the x-intcrval [-I, I] corresponds to 6 deg of visual angle

The profile is free of zero-crossings for a < 0.922. For the reported measurements we choose a = 0.8. Table 1 summarizes the relevant properties of the three stimulus profiles: Matching of Laplacian zero-crossings is impossible in all three of them. For the parabolic arc, peak and centroid matching would result in stereo discrimination. For the cubic arc, only the centroid contains useful information, since the extremum has no disparity. Finally, for the rational arc, none of the feature locations considered would lead to a stereoscopic perception.

Stereograms were presented on a color display monitor (Mitsubishi Color Display Model No HL6905 STGR) in interlaced mode, each half-image with a frequency of 60 Hz. The presentation of the half-images to the left and right eye was controlled by liquid crystal shutter glasses (Stereographics). The average transmittance of the glasses was 13%; peak transmittance in the open phase was 30%. The luminances produced by the monitor for 52 of its 256 color map slots were measured with an UDT optometer. The resulting calibration curve reflects the built-in 7correction of the display (cf. Foley et al. 1990). For any desired luminance, the required color map slot of the monitor (G255) was determined by inversely evaluating the calibration curve. Stimulus contrast c was defined as the Michelson contrast of maximum (Imax) and minimum (Imi,) image luminances:

where M and A are offset and amplitude as defined in (3, 4, 7). In the experiments, contrast (8) and average intensity I, = f J", I(z)dx were specified, and the appropriate values for M and A were calculated from the above equations. The luminances and contrast values for the different experiments are summarized in Table 2. In the experiments with a dark background condition (see below), contrast was kept constant. For the rational stimulus function, absolute intensity had to be reduced in order to obtain a higher resolution of the upper intensity values. In the experiments studying the effect of background intensity, average intensity and contrast were kept constant.

Table 2. Luminance and contrast settings for the stimuli. Imi,,I,, and I, are minimum, maximum and average intensity. The luminances are the ones presented on the display; they are attenuated to about 30% by the stereo shutter glasses

minimal stimulus luminance, i.e., M in the peak condition and M + A in the trough condition ("dark" background). In Experiment 2, "bright" backgrounds with luminances M + A for the peak condition and M for the trough condition were also used. In controls, an additional "average" background condition was used where the background luminance was 1 set to the average stimulus intensity, J-, I,(z)dz. 3.4 Procedure

Fig. 4a-d. Procedure. Harched squares, half-image I,, open squares, halfimage Ib. a, b Both stereograms displayed are the same, either Sabor Sba. c, d The two stereograms are different, one Saband the other SbaIn the discrimination task, subjects had to judge whether the srereogmms were equal or diffcrent. In the depth ordering task, only cases c and d were used, and subjects had to indicate which stereogram appeared in front

The half-images I, and Ibof the smooth intensity profiles (Sect. 2) were combined into two different stereograms: Sab(Ia is shown to the left eye and Ibto the right eye) and Sba(Ib is shown to the left eye and I, to the right eye). Two stereograms subtending 6 x 6 deg of visual angle were presented side by side with a separation of 2 deg at eye level. Exactly between the stereograms a fixation target with a diameter of 5 min of arc was displayed. The two stereograms shown simultaneously were either identical or pseudoscopic versions of each other. The four resulting stimulus situations are depicted in Fig. 4.

3.3 Parameter settings and stimulus conditions The shift parameter d of the parabolic profiles, i.e., their peak disparity, was 1 deg of visual angle. This large value was chosen in order to obtain sufficient differences between I, and Ib. The stimuli can still be fused since the spatial frequency is on the order of 0.08 cycles per degree. (Schor et al. 1984 report a fusional limit of &lo0 min of arc for a stimulus spatial frequency below 0.1 cpd.) The asymmetry a of the rational luminance profiles was 0.8 [see (7)]. All three profiles (parabolic, cubic, rational) were presented both in a peak (A > 0) and a trough (A < 0) condition. In Experiment 1, background luminance was equal to

All experiments were performed under darkroom conditions. The viewing distance was 115 cm, and the head of the subject was fixed by a forehead and chin rest. Two different tasks were employed. 1. Discrinzination tusk: All four stimulus situations (Fig. 4) for a pair of stereograms (Sab and Sbn) were used. In a two-alternative forced-choice paradigm (2AFC) subjects had to judge whether the two simultaneously presented stereograms were equal or different. In addition, if subjects saw any difference at all, they verbally described the appearance of the two images and the kind of difference they had perceived. The purpose of this procedure was to investigate whether depth differences were perceived spontaneously without subjects being instructed to judge stimulus depth. 2. Depth orderirig task: Only the pseudoscopic versions of a stereogram were shown simultaneously (situations c and d in Fig. 4). Subjects had to indicate which stereogram appeared nearer to them. In both tasks, the two or four stimulus situations for a pair of stereograms (Sab, Sba) were presented in pseudorandom fashion, each equally often in blocks of 20 trials. In the first session, I W O trials were run as training. Subjects themselves decided when to begin the measurements. The presentation time for a stimulus situation was not limited. When subjects indicated their decision by pressing a mouse button, a new stimulus situation was presented after a 2 s break. Subjects did not receive feedback concerning the correctness of their decisions. During the 2 s break only the fixation target was visible, and subjects were instructed to fixate the target during this time, to maintain eye positions similar to the starting point for each trial. During stimulus presentation, subjects were allowed to move their eyes, and eye movements were not recorded.

After each set of 20 trials, the stimulus condition was changed. The task (discrimination or depth ordering) was kept constant during each experimental session. Up to eight blocks of 20 trials were performed in one session. The session ended after 45 min or when the subject reported fatigue or lack of concentration.

cubic trough

cubic peak background bright

background dark

background bright

background dark

3.5 Data evaluation rational peak

Results from both tasks were represented as 2 x 2 contingency tables. In the discrimination task, the columns of the contingency table were "=", no difference perceived and "$', difference perceived; the rows were "=", pair (Sab, S u b ) or ( S b a , S b a ) presented and "$', pair (Sab, Sba) Or Sub) presented. In the depth ordering task, the columns were If, left stereogram perceived in front and rf, right stereogram perceived in front; the rows were Sabl,stereogram S,b presented on left side and Sabrrstereogram Sabpresented on right side. The significance of the statistical dependence between subject response and the presented pair of stereograms was tested with Fisher's exact test. In the depth ordering experiment, the coefficient of association a was calculated:

where rill and n22 are the diagonal elements of the contingency table and 7212 and n21 the off-diagonal elements. The coefficient cu takes the value 1 if stereogram Sabis perceived in front and -1 if Sbais in front. Differences between the percentage correct numbers (discrimination task) for different stimulus conditions were tested with distribution-free statistics using the Wilcoxon matchedpairs signed-rank test in control experiment 4.3.1 and the Wilcoxon rank-sum test for multiple comparisons in control experiment 4.3.2.

4 Results 4.1 Discrimiruztion task The results of the discrimination experiment are summarized in Table 3. Subjects were able to detect the exchange of the two half-images in all stimulus types and in both the peak and the trough conditions, with two exceptions for the cubic profile. Due to shape-from-shading effects, the stimuli with an intensity peak were usually perceived as cylinders, while trough stimuli were either interpreted as a slit between two cylinders or as a dark cloud in front of a bright background. The perceived curvature was lowest in the rational stimulus profile which was described as nearly flat. Subjects reported differences between the stereograms either as depth differences or as differences in perceived size. The stereograms of parabolic luminance profiles contain constant crossed or uncrossed disparities over the entire stereogram. When depth differences were perceived, the stimulus with crossed disparity appeared closer to the subject than the other one. When size differences were seen, stimuli

background bright

background dark

rational trough background bright

backeround dark

Pig. 5. Effcct of stimulus and background condition on pcrccived depth order for four subjects. Columns represent the coefficient of association a defined in (9). The depcndences in the underlying contingency tables were significant on the 1% level except in the cases marked ns. The data from Tablc 4, where different contrasts had bcen used, are not included in this figure

with uncrossed disparity appeared larger than stimuli with crossed disparity. This effect is a result of size constancy (cf. Sedgwick 1986) in depth perception. Since both stereograms subtended the same visual angle, the one perceived more distantly appeared larger when interpreted as an object in space. Depth discrimination was easiest for the parabolic and the rational stimulus profile, while the cubic profile was harder to judge. The increase in error rates for the cubic profile with respect to the parabolic one was significant at the 1% level. 4.2 Depth ordering task

4.2.1 Dark background Results from the depth ordering task for the three stimulus types and the "dark" background condition are summarized in Table 4. Depth perception of parabolic stimuli was equal for the peak and trough conditions: stereograni S , b appeared in front. This percept is consistent with the disparities based on the shift of the half-images. In contrast, results for cubic and rational stimuli showed an inversion of depth perception with change from the peak to trough condition. For cubic stimuli this inversion was consistent with the disparity given by the position of the centroid. However, the same inversion occurred for rational stimuli as well, where the centroid has disparity zero. 4.2.2 Effect of background luminance The effect of depth inversion was studied in more detail with cubic and rational stimuli under two background conditions, "bright" and "dark" (see Sect. 3). Mean luminance

Table 3. Results from the discrimination experiment (Sect. 4.1). "=" and "f' dcnotc the "equal" and "unequol" situations and judgments. Correct dccisions are indicated by .solid blricl; bars arranged in accordance with the correct decision counts in the contingency tables. The significances P are calculated from Fisher's exact test for dependency in contingency tables Subject

20 0 20 2 24

Parabolic profile Perception in peak condition $ 0 20 0 18 6

11

79

Stim. = =

DM

1

KK

-

$ =

# =

LW

Perccpton in trough condition

MA Total

Subiect

=

I

Stim.

$ KK

=

$ SH

=

$ Total

=

$

Subject

Cubic profile Pcrccption in peak condition

11

3 27 1