Combinations of Range Data and Panoramic Images - CiteSeerX

0 downloads 0 Views 358KB Size Report
In general, polycentric panoramas are a set of cylindri- cal panoramic images .... rameters of interior and exterior orientation of a rotating line camera or a rotating ...
Combinations of Range Data and Panoramic Images – New Opportunities in 3D Scene Modeling – Reinhard Klette CITR and Computer Science Department The University of Auckland, New Zealand [email protected]

Abstract The paper informs about rotating line cameras (which capture images of several 100 Megapixel), their use for creating (stereo) panoramas, and how they can be used for texturing clouds of 3D points representing range data captured with a laser range finder (with distance errors of less than 5cm at about 50m distance to the view point). Problems occur at geometry and photometry level, and there are interesting challenges in algorithm design.

Karsten Scheibe Air and Space Institute (DLR) Berlin, Germany [email protected]

Cameras of this architecture have been designed and build at DLR Berlin since end of the 1990s, originating from earlier airborne sensors; see [5]. A rotating line has typically today about H =10k color pixels, and a full 360◦ panorama consists of about W =25k line images; this results into one H × W = 250 Megapixel image. (Stereoanalysis or -visualization requires at least one pair of images.)

1. Introduction Panoramic imaging (see, e.g., [1]) is of importance for visualization or navigation. One particular way of capturing panoramic images is by rotating a line sensor around a fixed axis at off-axis distance R with viewing angle ω (see Figure 1). This model of a panoramic camera has been studied at CITR; see [3, 7] for two related PhD theses (which are briefly reviewed in the first three sections of this paper).

Figure 1. Camera with one rotating CCD line.

Figure 2. Camera with two symmetric lines. Basically a camera with one or more rotating lines is defined by a rotation centre O incident with a rotation axis, which defines the surface of a straight cylinder (of height H, defined by the number of pixels on the rotating line) in off-axis distance R; panoramic images (one for each rotating line) are captured on this 2D manifold. A rotating line has a main point C at each of its W rotational positions, and those main points are on a base circle in the base plane. The base plane intersects a panoramic image at the principal row (i.e., those pixels which are next to the optical axes of the rotating line camera). The distance between principal row and base circle is equal to the (effective) focal length f of the camera. The first laser range finder (using ruby lasers, based on measuring the time-of-flight) was demonstrated in 1960 at Hughes, less than a year after the laser’s discovery. Today’s

phase-difference lasers often promise less than 5mm errors in measurements of distances of less than 50m. Our experiments have shown that these data are related to ideal surface reflectance and surfaces being about orthogonal to the viewing direction. However, laser range finders provide indeed accurate (compared to binocular stereo) and dense distance measurements, where the resolution of the used laser range finder (in our project) was about one forth of the resolution of the panoramic camera. The forthcoming PhD thesis [6] reports on combinations of range and panoramic image data, and the final sections of this paper report on this recent project at DLR (undertaken in collaboration with CITR). In applications we capture cylindrical panoramic images or range data at different points of view, defined by time, location (e.g., provided by GPS), and viewing direction (e.g., provided by IMU – the Inertial Measurement Unit). There is always a difference between viewpoints of the laser range finder and the rotating line camera. Data collected at multiple viewpoints can be merged into one consistent 3D model of complex scenes, based on sensor calibrations (or time, GPS or IMU data), and applying graphics and vision software for (e.g.) 3D surface triangulation, rendering, and visualization. In general, polycentric panoramas are a set of cylindrical panoramic images captured within one 3D scene modeling project. Special cases of such sets are parallel-axis panoramas (in particular leveled panoramas where all base planes coincide), co-axis panoramas with an identical rotation axis (with the special case of concentric panoramas having also an identical rotation centre), or pairs of symmetric panoramas captured with identical parameters (as listed above) besides a change of viewing angle ω into −ω (e.g., captured by one rotating line camera having two symmetric CCD lines on its focal plate). Such classifications can be used for structuring related research. As default we assume that panoramas are captured for an angle of 360◦ .

2. Stereo Imaging with Rotating Line Cameras A stereo image pair is defined by two images E1 , E2 , captured on a pair of Jordan surfaces. (Here we deal with cylindrical surfaces; omnidirectional viewing deals with parabolic surfaces.) Starting at one pixel position p = (x1 , y1 ) in E1 , potential positions of corresponding pixels q = (x2 , y2 ) in E2 are defined by parametrized epipolar curves (i.e., Jordan curves). Consider the special case of leveled panoramas (i.e., both viewpoints differ by one translation t = (tx , 0, tz ), where ty = 0 corresponds to a constant elevation). We have Ri , ωi , Wi , and fi at viewpoints i = 1, 2. As abbreviations, let (i = 1, 2) αi =

2πx1 2πxi +ωi , β1 = α2 −α1 −ω2 , and β2 = α2 − Wi W1

Then the epipolar curve in E2 is given as follows: y2 =

y1 f2 R2 sin ω2 − R1 sin β2 − tx cos α2 + tz sin α2 · f1 −R1 sin ω1 − R2 sin β1 − tx cos β1 + tz sin β1

For the general case of polycentric panoramas, see [2]. See Figure 3 for an example; a meeting room at DLR Berlin was captured by a pair of leveled panoramas, and 30 points in E1 are shown together with their epipolar lines in E2 .

Figure 3. Epipolar lines in a leveled panorama.

Stereo image analysis is not identical with stereoscopic viewing; the existence of parameterized epipolar curves is sufficient to enable stereo image analysis. A pair of images is stereoscopic viewable (e.g., using anaglyphs, see Figure 5) if it possesses standard epipolar geometry where epipolar curves are parallel lines in the image’s manifold. Consider a pair of symmetric 360◦ panoramas of size H × W , with parameters R, ω, and −ω. Epipolar lines are defined by image rows (i.e., this pair of images is stereoscopic viewable). Corresponding pixels have coordinates p = (x, y) in E1 and q = (x, y + d) in E2 , where d is the disparity and θ = 2πd/W is the angular disparity. The depth (i.e., distance between projected point P = (X, Y, Z) in the scene and point p = (x, y) on the camera’s cylindrical

Figure 4. Point P is an example of a sample in 3D space.

Figure 5. Anaglyph of the “Thronsaal” of castle Neuschwanstein. surface) is then equal to D=

R sin ω R sin ω =  dπ sin ω − W sin ω − θ2

where θ is between 0◦ and 180◦ . Point P in the scene is at an intersection of two rays (one for each image). All these intersection points of all rays define the set of potential samples in 3D space. Figure 4 illustrates on the left the set of all rays, and on the right the set of all intersection points, which lay on concentric cylindrical surfaces Γ1 , Γ2 , . . . around the rotation axis (counting outward, starting at the surface closest to the panoramic image). These samples form on Γk a grid, with a vertical sample distance Vk and a horizontal sample distance Hk between adjacent samples. Let Dk be the distance between Γk and Γk+1 . Then it follows that Vk and Hk linearly increase with distance to the rotation axis, but Dk increases exponentially with this distance. The total number of potential samples of a symmetric panorama is equal to   ωW (2W − 1) × H × π

timized (e.g., R and ω for optimum stereoscopic viewing). Calibration will be considered in the next section. Regarding optimization, we discuss a scene model and a viewer model and the resulting optimization for optimized stereoscopic viewing. At the time of capturing a panoramic image, we estimate the closest (D1 ) and furthest (D2 ) distance between viewpoint and objects of interest in the scene. For example, an indoor scene may have D1 between 1m and 4m, and D2 between 3m and 10m, always satisfying D1 < D2 (see Figure 6). Assume that we have to ensure a defined sampling density at distance D1 , defined by the distance d1 between intersection points of adjacent rays (in the base plane) with the cylinder with radius D1 . Then we have to ensure a width W of at least 2πD1 2πf D1 = d1 µH1 where µ is the size of one CCD cell. The angle ω defines the distance H1 between main point C and the cylinder with radius D1 (measured along the projection ray). Figure 6 shows two examples, one inward case and one outward case. Values D1 , D2 and H1 specify our model of the

and independent of off-axis distance R. The total number of potential samples increases with ω, with possible values 0◦ < ω < 180◦ . Angle ω ≥ 90◦ define the outward case of symmetric panoramas, and ω > 90◦ defines the inward case. When capturing a 3D scene by a pair of symmetric panoramas, we intend to maximize stereo acuity (i.e., the number of depth levels in the scene) by maximizing the total number of potential samples for the region of interest. Increasing stereo acuity helps to avoid “card board effects” in stereo viewing.

3. Optimized Camera Parameters Some camera parameters need to be calibrated (such as viewpoint parameters, f , the principal row, R, or ω) or op-

Figure 6. Scene model.

Figure 7. Example of optimum values.

region of interest. Parameter H1 defines the height of visible objects at distance D1 (also defined by the focal length f and the height H of the panoramic image). Now assume that a captured pair of symmetric panoramas needs to be optimized for stereoscopic viewing. For avoiding dipodia, a human has an upper disparity limit of about 0.03 times the viewing distance. The upper disparity limit applies to “closest objects” on the cylinder with radius D1 . The computational problem is as follows: calculate R and ω such that a symmetric pair of panoramas is below the upper disparity limit θW (for objects at distance D1 ), and such that the total number of potential samples is maximized in the region of interest, defined by D1 , D2 and H1 . This problem has a unique solution. For abbreviation, let X = 2D1 H1 p

D1 − D2 cos(θW /2) D12

+ D22 − 2D1 D2 cos(θW /2)

Then we have the following unique solution to this optimization problem: q D12 + h21 + X R =   2 D1 − H12 − R2 ω = arccos 2H1 R

Figure 8. Two scan angles.

In this section, coordinates of captured data are denoted by i (column) and j (row). In case of the camera, we have 1 ≤ i ≤ W and 1 ≤ j ≤ H; i, j represent grid points in the cylindrical manifold, defined by i · µ or j · µ (e.g., with µ = 0.007mm as above). In case of a laser range finder (or camera), i, j may specify angles i·∆ϕ and j·δ. See Figure 8 for an illustration of two scan angles. Angular increments ∆ϕ or δ specify the geometric resolution of the sensor, as µ does for the camera. Figure 9 illustrates the x0 y 0 z 0 -coordinate system of a rotating line camera (with R = 0) within a world xyzcoordinate system. The line is actually composed of multiple CCD lines (3 in case of color images). We assume an ideal line which is parallel to the z 0 -axis (the rotation axis), and where each pixel on this line (identifying a viewing direction or projection ray) is assumed to be one (ideal) point, → defined by the vector − rd = (0, f, j · δ)T . Because of R = 0, the rotation axis of the camera is incident with the main point of the optics. The focal plane

Note that we calculate R first, and then we have also the optimum ω. For example, viewing a 17” screen (1024 × 768 pixels) frontal from about 40cm distance leads to an upper disparity limit of 70 pixels (i.e., 70 columns in our cylindrical panoramic image). Assume also f = 21.7mm, H = 5184 pixels, and µ = 0.007mm. Then we obtain optimum R and ω values as shown in Figure 7. Case (4a) is for the case that R is not limited, and case (4b) considers the additional constraint that R can be 1m at most.

4. Calibration of Sensors We describe a least-square approach for determining parameters of interior and exterior orientation of a rotating line camera or a rotating laser range finder. At first we assume ω = 0.

Figure 9. Coordinate systems.

is located at focal length f (measured along the positive y 0 axis). Scans typically begin at a horizontal angle of 100 gon1 . A scanned surface point is identified by pixel number j and a camera rotation Aϕ , defined by the rotation angle ϕ = π2 − ∆ϕ · i. → Reference vector − r 0 (in world coordinates; see Figure 9) and rotation matrix A define the affine transform as follows: → − → → r =− r 0 + A · Aϕ · λ · − rd

~ (This is possible for small angles, the parameter AO and ∆ and, practically, this should be less than 1◦ ). The following equations result: → − −1 → → → − (− r −− r 0 ) · A−1 · A−1 ϕ · AO − R = λ · r d

or for the off-axis case (i.e., R > 0, but we assume ω = 0) as follows:  → − → − → → r =− r 0 + A · Aϕ λ · − rd+R

Therefore we can determine the viewing direction as follows: → − 1  → − → − → − · ( r − r ) − R r d = · A−1 · A−1 0 ϕ λ

→ We have to calibrate translation − r 0 and rotation A; λ is an unknown scaling factor of the camera coordinate system. We also model the following deviations from an ideal case:

Its components ∆xj , f , and j · δ + ∆yj are equal to

• The CCD line is tilted by three angles AI regarding the main point. → − • The CCD line has an offset vector ∆ regarding the main point. • The optical axis is rotated by AO regarding the rotation axis. The difference between the parameters AO and AI are the description of a rotation of the whole camera head (optical axis), and a rotation of the CCD line without moving the lens. These deviations are described in the following equation: → − − r = → r0 + 











∆x 0 λ · AAϕ AO AI  0  + f + ∆y  j·δ ∆z − → The parameter ∆ is the intersection point of the optical axis with the focal plane. In the off-axis case, this equation is → − expanded by R to the following general equation: − → → r =− r0 + 





   0 ∆x λ · AAϕ AO AI  0  + f + ∆y  + j·δ ∆z

  Rx 1  Ry  λ 0

An IMU (inertial measuring unit) was used for identify1 degree). ing the rotation angle ϕ (with an accuracy of 1000 We use the general equation above (also allowing the offaxis case). For simplification we assumed that the CCD is only tilted about the y 0 -axis (AI ). In case the CCD line is tilted about the x0 -axis we compensate this by changing 1 The

unit gon is defined by 360◦ = 400gon.

with

    rdx ∆xj → −  f rd = rdy  =  rdz j · δ + ∆zj

1 · (a11 (rx − rx0 ) + a21 (ry − ry0 ) + a31 (rz − rz0 ) − Rx ) λ 1 · (a12 (rx − rx0 ) + a22 (ry − ry0 ) + a32 (rz − rz0 ) − Ry ) λ 1 · (a13 (rx − rx0 ) + a23 (ry − ry0 ) + a33 (rz − rz0 )) λ

respectively. The reals a11 and a33 are elements of the rotation matrices A and Aϕ . Therefore, the collinearity equations are defined as follows (we use X = rx − rx0 , Y = ry − ry0 , and Z = rz − rz0 ): ∆xj =

a11 X + a21 Y + a31 Z − Rx ·f a12 X + a22 Y + a32 Z − Ry

and j · δ + ∆yj =

a13 X + a23 Y + a33 Z ·f a12 X + a22 Y + a32 Z − Ry

The unknown parameters are functions of these collinearity equations and of the focal length f : ∆xj = Fx · f j · δ + ∆yj = Fz · f By linearization of these equations it is possible to estimate iteratively the unknown parameters; we use X=

∂Fx ∂Fx ∂Fx k k k · ∆rx0 , Y = · ∆ry0 , Z= · ∆rz0 ∂rx0 ∂ry0 ∂rz0

Ω=

∂Fx ∂Fx ∂Fx · ∆ω k , Φ = · ∆φk , K = · ∆κk ∂ω ∂φ ∂κ ∂Fx ∂Fx A= · ∆Rxk , B = · ∆Ryk ∂Rx ∂Ry

and obtain: ∆xj

=

∆xkj + f · (X + Y + Z + Ω + Φ + K + A + B)

j · δ + ∆yj

=

(j · δ + ∆yj ) + f · (X + Y + Z + Ω + Φ + K + A + B)

k

Based on the matrix equation l = A · x, the solution is x = A−1 · l . We have v = Aˆ x − l for n > u observations. By applying the method of least-square error minimization, the minimum error is defined as follows: min

T

= v T v = (Aˆ x − l) (Aˆ x − l) = x ˆT AT Aˆ x − 2lT Aˆ x + lT l

We obtain  ∂ vT v = 2ˆ xT AT A − 2lT A = 0 ∂x ˆ and this leads to the solution x ˆ = AT A

−1

AT l

After calibration for ω = 0, we can change this camera parameter, and can now attempt to calibrate it (together with further internal camera parameters). For methods for calibrating R, ω, f , and the principal row of rotating line cameras, see [3, 7]. The focal length and the principal row can be calibrated within an independent process. [3] specifies a least-squares error approach using at least five pairs of calibration points. (This follows known linear techniques for pinhole cameras, but using one image column only at a time.) For the calibration of R and ω, [3] discusses three alternative methods. The inherent nonlinearity of this calibration problem (basically due to the cylindrical image manifold) creates problems for translating methods known for pinhole cameras into methods for rotating line cameras. However, two methods have been identified based on either a set of at least three parallel line segments in the 3D scene (also parallel to the rotation axis of the rotating line camera), or a set of three parallel line segments defining an orthogonal edge in the scene. These methods allow accurate calibration of R and ω.

5. Unification of Multiple Scans The fusion of range data and pictures is a relatively new approach for 3D scene rendering; see, for example, [8] for combining range data with images acquired by a video camera. Combinations of panoramic images [1] and LRF data provide a new technology for high-resolution 3D documentation and visualization. For our project we had a visual laser scanner at disposal which is based on the phase measurement principle [9]. The LRF scans, point by point, are uniformly defined in two dimensions, vertically by a rotating deflecting mirror, and horizontally by rotating the whole measuring system. In our case, the vertical scan range was 310◦ (which leaves 50◦ uncovered), and the horizontal scan range was 360◦ . Actually a horizontal scan range of 180◦ is sufficient to measure

each visible 3D point once, because the LRF scans overhead. However, we always scan the full 360◦ horizontally, therefore we have each visible 3D point twice. (This redundance supports calibration.) Figure 10 shows a raw data set without redundancy (i.e., 310◦ times 180◦ ), captured by the LRF. 3D points, calculated at a single viewpoint of an LRF, are modeled by assuming an error ε > 0 describing a ball Uε (~ p) around a measured 3D point p~. The correct 3D point is in this ball. Practically, we can assume that ε is specified by several components such as a distance error (caused by the LRF measurement unit), the eccentricity of the scan center, the incident angle (angle between the laser ray and the surface), different surface material properties, such as of wood or metal, a collimation axis error (error of the optical axis), a vertical and horizontal axis error, the trunnion axis error (i.e., the oscillation around axes). and a scale factor. We transform all LRF data at one viewpoint into a polar coordinate system with an horizontal range of 360◦ and a vertical range of 180◦ . At this step, all LRF calibration data are available and required. Each 3D point obtained with the LRF is described either in polar coordinates by the triple → (R, ϑ, ϕ), or in Cartesian coordinates as a vector − p , which are related to one-another as follows: px py pz

= R · sin ϑ · cos ϕ = R · sin ϑ · sin ϕ = R · cos ϑ

The orientation and position with respect to a reference vec→ tor − r in the world coordinate system is defined by one ro→ tation matrix A and a translation vector − r0 : → − → − → − r = r +A· p 0

(as calibrated in the previous section). Figure 11 shows a 3D model rendered by central projection based on image data as shown in Figure 10 (and range and image data captured at further viewpoints). The figure shows measured 3D points with measured (LRF) gray levels (i.e., no camera data used in this case).

Figure 10. Raw LRF data.

Figure 12. Panoramic image data have been fused in a subwindow near the center of the shown range image. (The figure shows the Thronsaal of castle Neuschwanstein.) Figure 11. Central projection of the same hall as shown in Figure 10.

00

(rz − rz0 ) = λ · j · δ 00

j·δ =

6. Textured Surfaces The fusion of camera and LRF data sets (multiple viewpoints) starts with transforming coordinate systems (i.e., those of LRF and camera viewpoints) into one world coordinate system. For this step, position and orientation of these system needs to be known (see Section 4). → The LRF generates 3D object points − r , which are triangulated, and these triangles are then color-textured textured based on available panoramic images. The panoramic images are preprocessed using calibration data (Section 4). As a result, all image coordinates are in rectified cylindrical → coordinates. The requested viewing direction − rd of the panoramic camera for the normal case (without off-axis R > 0) is described by the following equation: → − → → (− r −− r0 ) · A−1 · A−1 ϕ = λ rd We apply the calculated exterior orientation A−1 to the camera location. This allows to specify the pixel column i in the panoramic image: 0

(rx − rx0 ) = −sin(i · ∆ϕ) · λ · f 0

(ry − ry0 ) = cos(i · ∆ϕ) · λ · f 0 ! (rx − rx0 ) i · ∆ϕ = −arctan 0 (ry − ry0 ) By substituting equations (now with known parameters of the exterior orientation), and taking into account that the rotation of the CCD line corresponds to index i, the pixel row j can be determined as follows: 00

(ry − ry0 ) = λ · f

(rz − rz0 )

00

(ry − ry0 )

·f

This allows to map color texture at pixel (i, j) onto the corresponding triangle of the triangulated LRF data.

7. Conclusions The reported projects illustrate the general development towards multi-sensor applications as characterized in [4]. The first part reviewed basics of polycentric panoramas. The second part informed about calibration (with ω = 0 for the camera), and introduced an algorithm, how to fuse laser scanning data with images of a rotating line camera. The coordinate systems of both systems and the transformation of both data sets into one common reference system (world coordinate system) have been described. Compared to earlier publications, the given approach also utilizes an improved method for calculating the spatial (geometric) correspondence between laser diode of the laser range finder and the main point of the rotating CCD line camera. For fusing data at a single viewpoint (i.e., camera and LRF at about the same position), we assumed that the main points of LRF and panorama camera are identical. This allows a simple spatial assignment of camera and LRF. However, our experience is that larger or complex 3D scenes require multiple viewpoints, and multiple scans did not allow the same (simple) approach as for a single viewpoint. The common approach here is that we first calculate a 3D surface model by using different laser scans, and then we map color information for texturation. Triangulation and meshing of these huge unorganized clouds of points is an issue which we have not discussed in this paper. There are actually already many publications on these triangulation

subjects. Inaccuracies of LRF data lead to future subjects of filtering, geometric understanding, or interpretations also using available color images about the scene. For example, surface edges, specularities, or windows contribute to inaccurate LRF data. We briefly illustrated texture mapping with panoramic data. There are radiometric problems (not discussed here) which require future research. Homogeneous lighting for different camera positions during data acquisition is very difficult to achieve. Shadows (e.g., caused by day light) need to be localized, and must be modified radiometrically, or masked out. Acknowledgments: The authors thank R. Reulke for ongoing collaboration on the subjects of this paper, F. Huang and S.-K. Wei for years of stimulating joint work, B. Strackenburg for support in experimental work, and many more at DLR Berlin which contributed to these projects (which we only discussed here on very limited space).

References [1] R. Benosman, S. B. Kang, and O. Faugeras (editors). Panoramic Vision. Springer, New York, 2001. [2] F. Huang, S.-K. Wei, and R. Klette. Geometrical fundamentals of polycentric panoramas. In Proc. Int. Conf. Computer Vision, pages 560–565, 2001. [3] F. Huang. Epipolar geometry and camera calibration of cylindrical panoramas. PhD thesis, CITR, The University of Auckland, 2002. [4] R. Klette and R. Reulke. Modeling 3D scenes: paradigm shifts in photogrammetry, remote sensing and computer vision. In Proc. Int. IEEE Conf. ICSS 2005 (on CD, keynote, 8 pages), Taiwan, 2005. [5] K. Scheibe, H. Korsitzky, R. Reulke, M. Scheele, and M. Solbrig. EYESCAN - a high resolution digital panoramic camera. In Proc. Robot Vision (R. Klette, S. Peleg, and G. Sommer, editors), pages 77–83, LNCS 1998, Springer, Berlin, 2001. [6] K. Scheibe. Design and test of algorithms for the evaluation of modern sensors in close-range photogrammetry. Draft of PhD thesis, G¨ottingen University, 2005. [7] S.-K. Wei. Analysis, design, and control of stereoscopic panoramic imaging. PhD thesis, CITR, The University of Auckland, 2002. [8] F. Kern. Supplementing laserscanner geometric data with photogrammetric images for modelling. In Int. Symposium CIPA (J. Albertz, editor), pages 454–461, 2001.

[9] F. Haertl, I. Heinz, and C. Fr¨ohlich. Semi - automatic 3D CAD model generation of as - built conditions of real environments using a visual laser radar. In Proc. IEEE Int. Workshop Robot-Human Interactive Communication, pages 400–406, 2001.