depth estimation from a sequence of images using

0 downloads 0 Views 59KB Size Report
I. I. T. Delhi. New Delhi - 110 016, India. E mail: “[email protected]. V. Shantaram. Dept. of Computer Engg. Delhi College of Engineering. Kashmere Gate.
Proceedings of the 1997 IEEE Intemationd Conference on Robotics and Automation Albuquerque, New Mexico - April 1997

DEPTH ESTIMATION FROM A SEQUENCE OF IMAGES USING SPHERICAL PROJECTION M. Hanmandlu Dept. of Electrical Engg. I. I. T. Delhi New Delhi - 110 016, India. E mail: “[email protected]

V. Shantaram Dept. of Computer Engg. Delhi College of Engineering Kashmere Gate Delhi- 110 006, India.

Abstract A recursive estimation of depth from a sequence of images is proposed. Using the spherical projection, a simple equation is derived that relates image motion with the object motion. This equation is reformulated into a dynamical state space model for which Kalman filter can be easily applied to yield the estimate of depth. Point correspondences have been used to obtain feature points and the motion parameters are assumed to be known. The results are illustrated on a real object. 1.

INTRODUCTION The problem of analyzing sequence of monocular images (intensity or range) or stereo images to extract three-dimensional motion and structure is an active area of research in computer vision. While analyzing monocular images reliable tokens (features) such as points, lines, comers and curves are detected from the spatial variation of image intensities, assuming that they correspond to markings on 3D objects. Next, these tokens are tracked over time to recover depth and 3D velocities of the corresponding 3D tokens. A lot of work has been reported using stereo images to reconstruct a depth [ Zhang et al., 19881. The main problem in these methods is to establish correspondence between the images and to construct a dense depth map. A review of all types of feature correspondences in monocular and stereo sequence of images for motion and structure can be found in [Huang and Netravali, 19941. A significant progress has been made in theory and algorithms dealing with estimating motion and depth/structure from a sequence of monocular images under perspective projection ever since the paper of Longuet-Higgins [19811. A critical analysis of methods for structure from motion is given in Jerian and Jain [1991]. Spherical projection where points on image plane are represented by their central projections on the unit

K. Sudheer Dept. of Computer Sc & Engg. I.I.T. Delhi NewDelhii- 110 016, India.

sphere is proposed in Yen and Huang [1983] for determining 3D motion and structure of a rigid from an image sequence. Based on simple geometry of points on unit sphere corresponding to points on image plane, methods are presented to determine the structure of the object for the pure translation case. Matthies et al. [1989] have used a Kalman filter formulation for the estimation of depth assuming translational motion with feature based scene representation. Broida and Chellappa [19911 have proposed Kalman filter based recursive algorithm for the estimation of motion and structure of 3D objects and more general kinematic models. They have used Iterated Extended Kalman Filter (IEKF) to effectively implement their recursive algorithm for 3D motion and structure estimation. Tsai et al. [1993] have compared two statistical approaches for 3D reconstruction from an image sequence: the asymptotic Bayesian surface reconstruction and the Kalman filter based depth estimation. Both techniques are recursive in nature. Some of the recent works on structure from motion are briefly reviewed in the following: Taylor and Kriegman [19951 have also estimated the structure of scene composed of straight line segments by minimizing a nonlinear objective function. This gives the disparity between the observed line segments and the predicted lines. The minimization is done with respect to both the line parameters and camera positions. Wu et al.[19951 have presented a robust approach for estimating the kinematics of camera and structure of the objects using noisy monocular image sequences. The motion is represented by rectilinear motion parameters whereas the structure parameters are the 3D coordinators of the salient feature points. Then the incremental motion and structure are estimated by both the iterated extended Kalman filter and the nonlinear least squares method. A formulation for recursive recovery of motion,

pointwise structure and focal length from feature correspondence tracked through an image sequence is presented in [Azarbayejani and Pentland, 19951. A stable and accurate framework (EKF) which applies uniformly to both true perspective and orthographic projections is the result of several representational improvements over structure from motion formulation. They also estimate the focal length by adding it to the state vector. A dynamic solution to the nonlinear reconstruction of the 3D structure and motion of a planar facet moving with arbitrary but constant motion relative to a camera is presented in [Murray and Shapiro, 19961 using EKF. In order to disambiguate the two possible values of rotational motion, the algorithm integrates the visual motion over time and restores the coupling between the scene structure and rotational motion. In this work, we choose a patterned surface to provide feature points in several frames where correspondence can be easily established. Next, we use these feature points to obtain depth recursively by putting the motion equation in a form suitable for applying Kalman filtering. Although extensive use of Kalman filter has been made for the estimation of motion and structure/depth, all the formulations assume perspective projection. Matthies et al. derived a linearized motion equation from a perspective projection for use in Kalman filtering. However, we use spherical projection which leads to simplified motion equation. The organization of this paper is as follows: The imaging model and the motion equation are discussed in section 2. The equation for depth is derived in section 3. Section 4 gives the recursive estimation of depth using Kalman filtering. Point correspondence is discussed in section 5. A case study is presented in section 6 followed by conclusions in section 7. 2. IMAGING MODEL AND THE EQUATION OF IMAGE MOTION The orientation of any ray can be determined by a monocular observer when it is projected on its imaging surface. However, the observer can not determine the distance along the ray of the object feature. By choosing the direction of an incoming ray to represent a unit vector, determination of ray's direction is equivalent to considering the imaging device as a spherical pin-hole camera of unit radius. For the spherical projection, let the direction of ray to a world point P with a position vector r(s, t) be a unit vector on the image sphere Q(s,t), defined at time 't' by[Cippola,19911 r(s,t) = v(t) + X A Q ( s A (2.1) where X(s,t) is the distance along the ray to the

2265

viewed point P and v(t) is the viewer's position as shown in Fig.1. For a given vantage position to, the apparent contour Q/s.y is a continuous family of rays emanating from the camera's optical center which touch the surface forming the contour generator r(s,tJ so that Q.n = 0 (2.2) where n is the surface normal. The tangent to the contour generator is also perpendicular to the normal r,. n = 0 (2.3) As a result, the moving observer at position v(t) sees a two parameter family of apparent contours Q(s,t). Note that Q is the direction of the light ray in the fixed reference frame R3. It is determined by a spherical image position vector Q/ (the direction of the light ray in the camerahiewer coordinate system) and the orientation of the camera coordinate system relative to the reference frame. For a moving observer, the viewer coordinate system moves with respect to the reference frame. We can, therefim, express the relationships between Q and Q' in terms of a rotation operator R(t) Q = RWQ (2-4) At t = 0,Q and Q' coincide, at any other t the relative translational and rotational velocities U and a are respectively: U = V, (2.5) n x Q ' = R,Q' (2.6) where the subscript 't' indicates the differentiation with respect to t. The relation between temporal derivatives of measurements made in the camera coordinate system and those made in the reference Coordinate system is obtained by differentiating Q with respect to time denoted as Q, Q, = R(t)Q,' + a x Q ' (2.7) As the viewer moves, a family of apparent contours Q'(s,t) is swept out on the image sphere. However, the spatio-temporal parameterization of the family is not unique. The mapping between contour generators and hence between apparent contours at successive instants is underdetermined. To circumvent this problem, use is made of epipolar parameterization defined bY rtxQ = O (2-8) such that r, , the tangent tlo the 't' parameter curve, is in

q^

the direction of ray, Q. Next, a natural correspondence can be set up between points on successive snapshots of apparent contour. For this, we differentiate (2.1) with respect to 't' and enforce the epipolar constraint (2.8) to obtain

qi