Structure from Motion & Camera Self-Calibration ...

Structure from Motion & Camera Self-Calibration

Kassem Al-ISMAEIL

Supervised by Prof.David Fofi

Dr.Adlane Habed

Laboratoire Electronique, Informatique et Image (Le2i) UMR CNRS 5158 Université de Bourgogne

A Thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Vision and Robotics (Erasmus-Mundus VIBOT) · 2011 ·

Abstract Structure from motion (SfM) is the process of recovering the three-dimensional (3D) structure of a scene and associated camera pose from image sequences. Critical for the success of many applications in a wide range of domains, research on SfM-related problems has long received significant attention. In particular, with the surge in digital photography, the past two decades have witnessed a tremendous impetus resulting in a substantial growth of the literature on the topic. In this respect, numerous SfM algorithms with variable levels of accuracy, speed and practicality have been proposed making the selection of the appropriate algorithm dependent upon the targeted application and the data at hand. In addition, the amount and relevance of the 3D information that is to be extracted from the reconstructed scene depends on whether or not the camera is calibrated. The goal of this thesis is twofold; first, we provide a detailed classification of SfM methods thus allowing both comparison and clarification of the assumption under which they have been designed. In particular, we classify SfM techniques into two large categories depending on whether or not prior camera calibration is required. Related algorithms are further grouped under each category as to put their similarity forth. Then, we address the problem of camera self-calibration and propose a novel stratified approach that is based on a new set of quartic polynomial equations. The assessment of our new method is carried out through extensive experiments on simulated data and the results are compared with those of an existing method.

...

Contents Acknowledgments

vi

1 Introduction

1

2 State of the art

3

2.1

Classification of SfM methods

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

3

2.2

Calibrated SfM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4

2.2.1

Generic SfM

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4

2.2.2

Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6

2.2.3

Object space cost function . . . . . . . . . . . . . . . . . . . . . . . . . . .

6

2.2.4

Second order cone programming . . . . . . . . . . . . . . . . . . . . . . .

8

2.2.5

Extended kalman filter EKF . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3

Uncalibrated SfM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3.1

Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3.2

Bilinear programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3.3

DMS (Direct metric structure) & BA . . . . . . . . . . . . . . . . . . . . 13

2.3.4

Extended kalman filter EKF . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.3.5

Line-Based & orthonormal representation . . . . . . . . . . . . . . . . . . 15

2.3.6

Explicit reconstruction method . . . . . . . . . . . . . . . . . . . . . . . . 15 i

3 Stratified Camera Self-calibration 3.1

3.2

17

Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.1.1

The pinhole camera model . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.1.2

Epipolar geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.1.3

3D reconstruction strata . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

Projective reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2.1

Projective reconstruction from two views . . . . . . . . . . . . . . . . . . 22

3.2.2

Estimating the third camera

. . . . . . . . . . . . . . . . . . . . . . . . . 23

3.3

Affine upgrade through the modulus constraint . . . . . . . . . . . . . . . . . . . 24

3.4

Metric upgrade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.5

3.4.1

Metric geometry and the absolute conic . . . . . . . . . . . . . . . . . . . 26

3.4.2

Refining calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

Stratified self-calibration algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4 More Trivariate Quartics for Stratified Camera Self-Calibration

31

4.1

Motivational arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.2

A unified parameterization of homography matrices . . . . . . . . . . . . . . . . . 32

4.3

A new set of quartic polynomial equations . . . . . . . . . . . . . . . . . . . . . . 34

4.4

Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5 Conclusion and future work

42

Bibliography

43

ii

List of Figures 2.1

Pipeline of the generic SfM approach with performance analysis, adapted from [32]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6

2.2

Object Space Error eki , adapted from [18]. . . . . . . . . . . . . . . . . . . . . . .

7

2.3

General Camera Model, adapted from [18]. . . . . . . . . . . . . . . . . . . . . .

8

2.4

Camera set-up for mapping point correspondences between constrained cameras, adapted from [19]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9

2.5

Four most different points after removal of mismatches, adapted from [26]. . . . . 10

2.6

EKF and SfM algorithm for fusing feature correspondences and inertial data, adapted from [31]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.1

Pinhole camera model, adapted from [49]. . . . . . . . . . . . . . . . . . . . . . . 18

3.2

Projective reconstruction. (a) Original image pair. (b) 2 views of a 3D projective reconstruction, adapted form [12].

3.3

. . . . . . . . . . . . . . . . . . . . . . . . . . 20

Affine reconstruction. (a) Original images. (b) 2 views of a 3D Affine reconstruction, adapted form [12]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.4

Metric reconstruction. (a) Original images. (b) 2 views of a 3D Metric reconstruction, adapted form [12].

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.5

Shapes which are equivalent to a cube for the different geometric ambiguities . . 22

3.6

The Absolute Conic and its image . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.7

A line tangent to the Absolute Conic . . . . . . . . . . . . . . . . . . . . . . . . . 29 iii

4.1

Experiments with 3 images and various levels of noise using our new quartic polynomials: (a) 3D reconstruction RMS error using two objective functions, (b) Nbr of sequences in which only one solution has been found and whose RMS error < 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.2

Experiments with 4 images and various levels of noise using our new quartic polynomials: (a) 3D reconstruction RMS error using two objective functions, (b) Nbr of sequences in which only one solution has been found and whose RMS error < 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.3

Experiments with 4 images and various levels of noise using our new quartic polynomials and those due to the modulus constraints: (a) 3D reconstruction RMS error using our new polynomials and our new objective function versus errors obtained using Pollefeys’ method, (b) Nbr of sequences in which only one solution has been found and whose RMS error < 1. . . . . . . . . . . . . . . . . . 40

4.4

Experiments with 1 pixel noise and varying number of images using our new quartic polynomials and those due to the modulus constraints: (a) 3D reconstruction RMS error using our new polynomials and our new objective function versus errors obtained using Pollefeys’ method, (b) Nbr of sequences in which only one solution has been found and whose RMS error < 1.

4.5

. . . . . . . . . . . 40

3D affine reconstruction using estimated affine parameters : (a) a view of the original 3D cube, (b) a view of a 3D affine reconstruction of the cube. . . . . . . 41

iv

List of Tables 2.1

Classification of SfM methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

v

5

Acknowledgments I would like to express my gratitude to my advisors, Prof.David Fofi and Dr.Adlane Habed for their great help and guidance during this work and to be part of their research group. I am specially indebted to Dr.Adlane Habed for the constructive discussions we had which give me the opportunities to develop new ideas. I warmly thank him for his time, patience and effort.

My gratitude goes also to Prof. Fabrice Meriaudeau and Dr.Yohan Fougerolle for giving me the chance to follow this nice program. Special thanks to Dr.Cédric Demonceaux and Dr.Désiré Sidibé for their help. My best friends, AbdElrahman Shabayek and Souhaiel Khalfaoui thank you for your massive help and great advices.

Thanks to all my colleagues in the MsCV and VIBOT program and those in Le2i lab for being so friendly and helpful during this wonderful period of my life. It was such a great experience and joy for me being here with you.

Of course, I am also indebted to my parents, my family and my friends for their great patience and support which was very important to me.

vi

Chapter 1

Introduction Structure from Motion (SfM), i.e. the problem of estimating the 3D structure of a scene and camera motion from image sequences, has been and continues to be an active research area which has attracted the attention of many researchers. The SfM problem has gained most of its interest from the existence of countless underlying and potential applications that can benefit from it in a wide range of domains (entertainment, manufacturing, terrain estimation, surgery, . . . etc.). Of the numerous methods that have been proposed in the literature, we mainly distinguish two cases: one in which the cameras are calibrated and the recovered 3D structure is Euclidean, and the other case were the cameras are uncalibrated and the 3D reconstruction is only projective. While a projective reconstruction meets the needs of certain applications, the recovered structure lacks metric measurements (distances, ratios and angles are not preserved) and looks nothing like the true scene. On the contrary, Euclidean reconstruction is dominantly and overwhelmingly required in most applications as it is, in every aspect, similar to the true scene. A projective reconstruction can be upgraded into an Euclidean one (or more precisely metric) by means of a camera calibration or self-calibration procedure. While a camera calibration procedure requires the presence of a special pattern in the scene, camera self-calibration approaches are more flexible and solely rely on point correspondences across images. Most self-calibration methods [45],[46],[47] are based on a prior projective reconstruction of the scene followed by the calculation of the camera’s intrinsic parameters through solving a set of nonlinear equations in 8 or more unknowns. Stratified methods, which are known to be more efficient, proceed by breaking the problem into two simpler tasks: the projective reconstruction is first upgraded to affine (by locating the plane at infinity), through solving nonlinear trivariate quartic polynomial equations, then the latter reconstruction is upgraded into metric by solving a system of linear equations (generally using Singular Value Decomposi1

Chapter 1: Introduction

2

tion (SVD)). The trivariate polynomial equations, which allow the affine upgrade of the scene, are due to the so-called modulus constraint (see Pollefeys [44]); that is the unimodularity of the eigenvalues of inter-image homography of the plane at infinity between every pair of images in a sequence. The work presented in this thesis is twofold; first, the most representative structure from motion (SfM) algorithms are surveyed. In contrast to other surveys [1] [2], we do not restrict ourselves to special types of motion, cameras, or specific parts of the SfM pipeline. For instance, Häming and Peters [1] have paid special attention to two major tasks in the SfM pipeline; namely feature detection and filtering good matches. Lu et al. [2] have surveyed a number of 3D reconstruction methods that exploit motion parallax. Our goal here is to present a comprehensive survey and a classification of SfM methods to offer precise up-to-date guidelines to support the selection of one method over another. Then, we address the problem of camera self-calibration, an essential step in recovering the calibration parameters of the camera and hence the metric structure of a scene. In particular, we propose a novel stratified approach that is based on a new set of quartic polynomial equations due to the no-skew constraint of the camera. To our knowledge, this is the first work relating the position of the plane at infinity to the absence of image skew. Our new equations can be solved along with the modulus constraints thus allowing in general to isolate a unique solution for the plane at infinity from shorter image sequences than when using Pollefeys’ method. Our claim is supported by the results of the extensive experiments we have carried out using simulated data. These results also show that our new equations lead to a more accurate and stable 3D reconstruction in the presence of noise in the images. This thesis is organized as follows. Our new classification of SfM methods is presented in Chapter (2) which also includes a table summarizing existing SfM techniques. In the same chapter, calibrated and un-calibrated SfM methods are all presented, compared and discussed. An explanation of the main 3D reconstruction strata and a review of Pollefeys’ stratified selfcalibration method in Chapter (3). The derivation of our new equations along with our stratified self-calibration are detailed in Chapter (4), which also includes the results of extensive experiments on simulated data. Finally, a conclusion of this work and description of the future work in (5).

Chapter 2

State of the art In this chapter, a comprehensive survey and a relevant classification of SfM methods are presented with the aim of allowing both comparison and clarification of the assumptions under which these methods have been designed. SfM techniques are classified into two large categories. The first category covers SfM methods that require a prior calibration of the camera while the second category we have included only methods applicable in the uncalibrated case. Algorithms based on the same principles and/or techniques are further grouped under each category.

2.1

Classification of SfM methods

The goal of SfM is the problem of retrieving the 3D reconstruction of a scene and camera motion between from 2D images taken from different viewpoints. The surveyed algorithms have been divided into two large categories, "Calibrated SfM", where a prior calibration of the camera is required and "Uncalibrated SfM". Under each category, related algorithms are further grouped according to the common techniques they share. Then, the classification is carried out in terms of common criteria which are present in most of SfM algorithms. This classification is reported in Table.2.1) and is based on the following criteria: (i) the features extracted and matched across images - In general, the image features that are extracted and matched are either points (often corresponding to corners), lines or edges. The most popular corner detectors are those based on Harris’ operator [37], SIFT1 [33] and SURF2 [36], 1 Scale-invariant 2 Speeded

feature transform Up Robust Features

3

Chapter 2: State of the art

4

(ii) the method used to compute the motion - Whether least-squares optimization is used or alternative techniques, (iii) the robustness of the method to image noise, (iv) SfM strategy - SfM strategies can differ depending on whether all data from all views are used in the calculations at the same time (Multi-view), or only the data from - generally consecutive - pairs of images are considered (Pair-wise). Some of the methods start with a pair of images only to gradually incorporate data from a new image (Incremental), (v) working with missing data - It is not unusual that the same features are not visible throughout the sequence of images. This makes some SfM methods fail while others are specifically tailored to handle such situation. We will refer to such situation as "working with missing data".

2.2

Calibrated SfM

In the following, we discuss SfM methods in the case where the cameras are calibrated. Such methods lead to a 3D structure of a scene expressed in an Euclidean frame and in which measurements such as lengths and angles correspond to those of the true scene.

2.2.1

Generic SfM

A variety of vision systems, including pinhole cameras, stereo pairs, catadioptric, omnidirectional and non-central cameras are commonly the object of study in Computer Vision. Each system comes with its own advantages and drawbacks with regard to 3D reconstruction and motion estimation. For example, omni-directional cameras provide more stable ego-motion estimation and lager field of view than pinhole cameras. With such cameras less images and lower spatial resolution are required. Pinhole cameras provide more useful texture maps while non-central cameras eliminate scale ambiguity in motion estimation. Ramalingam et al.[32] use a generic camera calibration technique and a generic SfM algorithm see (Fig.2.1) which can be applied to all types of cameras with no distinction whatsoever. The main drawback of this work is that they have to provide correspondences manually as it is reported that application of SIFT [33] fails. Mouragnon et al.[34] present a generic realtime SfM method based on incremental 3D reconstruction and local generic bundle adjustment where angular error is used as a minimization objective. The algorithm is incremental and for each new frame: (i)points are detected and matched with those of the last key frame, (ii)camera pose is robustly estimated through angular error minimization, (iii) in case the new frame is

5

2.2 Calibrated SfM

Table 2.1: Classification of SfM methods.


6

selected as a key frame, new 3D points are estimated and a local bundle adjustment [12],[13], is carried out.

Figure 2.1: Pipeline of the generic SfM approach with performance analysis, adapted from [32].

2.2.2

Factorization

The reconstruction and motion estimation of a rigid object from a sequence of images is one of the most popular problems in Computer Vision. Factorization methods are undoubtedly the most popular and effective algorithms to solve the SfM problem. Although various factorization techniques exist, see for instance [9],[10], and [11], they all allow to linearize the camera observation model, provide fast results. These methods do not require an initial guess for the solution. Instead, their result is generally used as an initialization to other algorithm (such as bundle adjustment) for providing an optimal and definitive solution to the problem. Tomasi and Kanade [11] represent a sequence of images as a measurement matrix W2F ×P , where P and F are the horizontal and vertical coordinates of P points detected through F frames. They prove that the rank of the measurement matrix under orthography is 3. Hence the measurement matrix can be factored into the product of two matrices R2F ×3 and S3×P . R is the camera rotation, and S is the shape in coordinate system attached to the object centroid. The main drawback of this method is that when the camera moves the features appear and disappear from the images and feature tracking methods are not always successful. This makes the shape and motion computation method [11] impractical except when the same points are visible in all images. One way to get around this issue is by breaking large sequences into a number of smaller sequences.

2.2.3

Object space cost function

The object space cost function is the distance between the interpretation line of a pixel and the reconstructed 3D point as shown in Fig 2.2 where Xi represents the estimated 3D point

7

2.2 Calibrated SfM

and eki stands for the corresponding cost.

Figure 2.2: Object Space Error eki , adapted from [18]. Bundle adjustment is based on least-squares optimization. The optimal solution cannot always be guaranteed because the optimization procedure may stall in local minimum. It is known to be expensive in terms of both time and memory consumption. Hence, it is not possible to be used in real-time and by low-cost robotic systems. Schweighofer and Pinz [18] present another algorithm to solve the SfM problem. The main idea is to use general camera model (Fig.2.3), which does not constrain the algorithm to specific camera. They use the object space error for the general model as the cost to minimize. Using this cost function, the structure and the translation part of the motion can be estimated from the rotation part of the motion in a closed form. The main steps of their algorithm are: • Object space cost function. • General camera model. • Closed form structure estimation. • Iterative formulation. The main drawback of [18] is that the time complexity is high which makes the proposed algorithm not suitable for an on-line system, because it significantly slows down the process as more frames are added. Schweighofer et al.[16] proposed a novel refinement approach based on the absolute orientation algorithm by Horn [17], this algorithm may be seen as an extension of the proposed algorithm in [18]. In this extension, they address the time-complexity limitation and turn their method into one that is applicable in real-time applications. They observed that as more frames are added, the first frames are not affected by the optimization procedure any more and remain


8

constant. If the visible points inside the frame are changed, the frame will change its position and orientation. Otherwise, the frame remains constant. Based on this observation, they split the set of frames into two sets: (i) frames which stay constant Fc := {1, 2, ..., nk − f0 } , and (ii) those which are still remain to be optimized Fo := {nk − f0 + 1, ..., nk } where nk is the number of the general cameras and (f0 = |F0 |). The authors proved that their algorithm is 8 times faster than the bundle adjustment method[12],[13]. A good initial estimate of the structure and camera pose allows fast convergence. However, Schweighofer et al. argue that an initial estimate can essentially be randomly chosen and that their algorithm should be capable of converging globally from any initial estimate.

Figure 2.3: General Camera Model, adapted from [18]. One drawback of Schweighofer et al.’s work [16] lies in the fact that the implementation causes the algorithm to generate (nc ∗ ni ) 3D points given a set of nc constrained cameras with ni unique imaged points. To cope with this problem, Mitchell et al.[19] have proposed an extension of Schweighofer et al.’s method that supports mapping between corresponding points. For each constrained camera rig (e.g. a stereo rig) they identify one general camera, camera k, under which they identify one point set q for every constrained camera (refer for instance Fig. 2.4). This modification reduces the size of the matrix that needs to be solved by a factor proportional to the number of cameras in each set. In the case of stereo cameras this allows to speed up the algorithm by a factor of 8 [16].

2.2.4

Second order cone programming

Most SfM problems such as triangulation, homography and epipolar geometry estimation,. . . etc., can be recast as quasi-convex optimization problems. These problems can efficiently be solved using Second Order Cone Programming (SOCP) [23] which is a standard

9

2.2 Calibrated SfM

Figure 2.4: Camera set-up for mapping point correspondences between constrained cameras, adapted from [19]. technique in convex optimization. These problems minimize the infinity norm L∞ − norm instead of using the common sum-of-squares cost function, i.e. L2 − norm. Most methods, such as factorization-based methods [7],[8] and [9], require points to be visible. Other methods [24] need points to be visible in at least three images. Martinec and Pajdla [25] have proposed a new algorithm that copes with missing data, i.e. the visibility of the points in three or more images is not necessary. They proceed in three steps: (i) a linear estimate of a consistent rotation (see [25] for more details), (ii) refinement of the estimate using BA, and (iii) camera translation and point recovery using SOCP [23]. The main idea is to make the rotation consistent using a linear formulation. This makes the method applicable even in extreme cases of missing data. On the other hand, making the rotation consistent increases the the reprojection errors. Hence, refinement using bundle adjustment is desired for each partial reconstruction. SOCP is used only as an emergency solution when some of the points fall behind the camera during the BA, but it is time consuming compared to BA. Martinec and Pajdla [26] present robust techniques for the global estimation of pair-wise camera rotations and translations. Robustness is achieved using a subset of points instead of all the points. A single mismatch may cause a complete failure of translation registration when minimizing the maximum reprojection error with SOCP. Therefore, they proposed to choose in a special way only four points. The idea comes from projective factorization and the rescaled


10

measurement matrix λx = P X, where P represents the projection matrices and X represents all points reconstructed in the partial reconstruction. They mainly work with projected points P X instead of of the rescaled image points. Firstly, they clear P X from mismatches, then they choose the four "most different" points (see Fig.2.5).

Figure 2.5: Four most different points after removal of mismatches, adapted from [26]. The algorithm proceeds as follows: (i) rotation registration, using quaternions and approximate rotations, (ii) in each pairwise reconstruction a Gaussian is fitted in the rescaled image space to remove possible mismatches, (iii) four points are carefully chosen among the remaining points thus boosting speed and memory saving. Unlike in [25] a BA keeping rotations registered is needed, the work in [26] does not require such step. The reason is that the precision of the rotation registration using the four points chosen carefully is quite satisfactory. SOCP is applied only once on the data from all the partial reconstructions.

2.2.5

Extended kalman filter EKF

The algorithms for matrix factorization can be divided into three main categories: recursive, batch [27] and hierarchical. In the recursive methods, the convergence to the global minimum is not guaranteed because optimization may stall in a local minimum. Batch methods usually minimize an approximation to the reprojection error to simplify the optimization and avoid the locale minima problem. These methods could be used to provide initial solutions to be fed into iterative algorithms (such as extended Kalman filter (EKF)) in order to reduce the computational cost. Broida et al. [28] use a recursive procedure in which the batch method is applied to an

11

2.2 Calibrated SfM

initial set of data (like the first few frames). The result is then used to initialize the recursive method. The recursive algorithm is used to track the object and its motion. The main drawback of this method is the possibility of filter divergence, this may be overcome by having suitable parameters as initialization, and robust numerical techniques. Qian et al. [31] use inertial data as additional measurements (sparse feature correspondences) which are fed into EKF along with feature correspondences in order to estimate camera motion and scene structure. They have shown that the inertial data play an important role in improving robustness to noise and reducing inherent ambiguities. Using these inertial data has the advantage of reducing the number of feature points as to provide robust motion estimation. In their approach, the exploited inertial data are the noisy rotational angular rates of the camera. It is assumed that the calibration of the inertial sensors has been carried out off-line. The measurement equations for the inertial data can be written as: ω˜x = ωx + nx , ω˜y = ωy + ny , ω˜z = ωz + nz ,

(2.1)

where Ω = (ωx , ωy , ωz )T is the camera rotational rate vector and nΩ = (nx , ny , nz )T is the measurement noise. In various Kalman-filter-based algorithms, e.g. [28],[29] and [30], the previous structure estimates are fused with current feature correspondences to refine the structure estimates. Their contribution to the new estimates is characterized and weighted by the corresponding covariance matrices. As the rate data directly measure the rotation dynamics of the camera, they treat them as another set of measurements for estimating the camera motion. Hence the inertial data can be used in the existing Kalman-filter-based algorithm as illustrated (see Fig 2.6).

Figure 2.6: EKF and SfM algorithm for fusing feature correspondences and inertial data, adapted from [31]. Qian et al.’s method [31] has the advantage to work with mixed domain sequences, i.e.


12

sequences containing both small and large camera translations, while previous works [28], [29] and [30] are reported to fail in the case of mixed domain.

2.3

Uncalibrated SfM

In this section we review projective SfM methods. In these methods images are assumed to be uncalibrated sequence and only a projective reconstruction can be obtained. Note that a projective reconstruction does not preserve measurement such as lengths, angles and parallelism that are present in the true scene. The calibration of the camera is a pre-requisite for upgrade such reconstruction to either an Euclidean or metric one.

2.3.1

Factorization

Sturm and Triggs[7] have proposed one of the most popular factorization-based methods for estimating the 3D structure and camera matrices from point correspondences across images. Unlike [11], the factorization part of the algorithm requires an initial estimate of the projective depth. 3F × P rescaled measurements matrix W will be formed using the estimated projective depth where F is the number of frames and P the number of feature points detected in each image of the sequence. With the projective depth, the rescaled measurement matrix W has rank 4, hence if the depth is properly recovered, an SVD factorization technique, similar to the one used in [11] to W , suffices to recover the 3D structure and camera motion. Several iterative methods have been proposed in [8] in order to improve the result of [7]. For instance, the authors suggest to initialize the projective depth to 1, estimate the structure and camera matrices, and use this estimation to recompute the projective depth. The new estimated depth is then used to recompute the structure and camera until convergence is achieved. One common use for this method is for initializing bundle adjustment [12],[13]. Most researchers do not run iteration algorithms [7],[8] until convergence but rather provide the first result as an initial estimate for BA. In fact the issue of convergence is important. Oliensis and Hartley [14] show that [7],[8] are unstable because of the convergence problem even with a few iterations as an initialization technique for BA. They proposed a new iterative extension of [7] and [8] for solving this problem. In [7],[8], and [14] the projective depths are initialized to unity and later iteratively refined. This works well only if the estimated depths of all feature points remain approximately unchanged throughout the image sequence. Unfortunately, this is rarely the case in many important applications.

13

2.3 Uncalibrated SfM

2.3.2

Bilinear programming

Ramachandran et al. [15] proposed another algorithm to solve the projective depth estimation problem raised in [7],[8] and [14]. They studied the benefits of having additional information like the vertical direction (gravity) and the height of the camera. In particular, they show that the SfM problem can be simplified into a bilinear form and solved using a fast scalable iterative procedure. The additional information can be measured using inertial sensors and a monocular video sequence for 3D urban modeling. The main contribution of [15] could be presented as :

• They propose a robust and scalable SfM algorithm that uses additional measurements. • They describe simulation results demonstrating that the proposed algorithm leads to solutions with lower error than sparse bundle adjustment and takes less time for convergence. While BA remains the best known algorithm to improve SfM when the initial solution is estimated correctly, Ramachandran et al.’s algorithm may be used as an alternative when the initial solution is likely to be bad. The proposed algorithm could be used along with BA in a hybrid algorithm that performs better than both of them separately when starting from poor initial solutions.

2.3.3

DMS (Direct metric structure) & BA

In contrast to other approaches [12],[21] which start the SfM process by projective reconstruction then upgrade and refine it to a metric reconstruction, Brown and Lowe [20] solve directly for the metric structure and camera parameters without the initial projective reconstruction. Starting with the best matching pair and adding the cameras one by one. As the best matching image is found its rotation and translation are used to initialize those of the new images. They show that this works well even if the images have different rotations and scales. The BA algorithm does not suffer from any convergence issues. Sanvely et al. [22] propose a new approach (Photo Tourism) whose purpose is to benefit from the huge amount of available images on the internet. This imagery is almost untapped and unexploited by Computer Vision researchers. Because of that the imagery is not in a form amenable to processing, specially by the traditional methods: the images are uncalibrated, unorganized, with uncontrolled illumination, resolution and image quality. The SfM approach used in this work is similar to [20] with several modifications to robustness for different types of various datasets. They estimate the parameters of an initial single pair of cameras which must exhibit a large number of matches and a wide baseline. These parameters are estimated using Nistér’s five points algorithm. This is followed by the triangulation of the visible points in the


14

two images. Finally, they apply a two-frame bundle adjustment starting from this initialization. As they a new camera is added to the optimization, they select the camera which observes the largest number of visible points whose 3D locations have already been estimated. The new camera’s extrinsic parameters are estimated using direct linear transform DLT technique and it will be initialized inside a RANSAC procedure. The DLT technique returns upper-triangular matrix K with an estimate of the camera’s intrinsics. Bundle adjustment step is applied using this initial set of parameters allowing only the new camera and the points it observes to change, the rest of the model is held fixed. Finally they add the points observed by the new camera into the optimization procedure. Once the new points are added, global bundle adjustment is applied to refine the entire model.

2.3.4

Extended kalman filter EKF

In contrast to traditional methods which require prior knowledge of the camera or do not attempt to recover metric geometry, Azarbayejani and Pentland [29] recover the focal length. This provides the advantage to work with an uncalibrated camera to recover metric geometry . Their work could be considered as an extension of [28] in the basic computational procedure with one extra step for focal length estimation. The main difference from [28] is the way they treat the translation, structure, and scale. The translation motion is represented as the 3D location of the scene’s reference frame relative to the current camera reference frame using the vector: t = (tx , ty , tz ). A three-parameter incremental inter-frame rotation estimation, similar to that used in [30], can be used at each frame. This incremental rotation can be computed at each frame and then composed with external rotation to maintain an estimate of global rotation. The global rotation is then used in the linearization process at the next frame. (interf rame rotation) = (ωx , ωy , ωz )

(global rotation) = (q0 , q1 , q2 , q3 ) The state vector consists of 7 + N parameters, six for motion, one for camera geometry, and N (the number of tracked features) for structure. X = (tx , ty , tz β, ωx , ωy , ωz , β, α1 , ..., αN )

(2.2)

15

2.3 Uncalibrated SfM

2.3.5

Line-Based & orthonormal representation

Any motion is represented by a unique fundamental matrix (up to an arbitrary and unknown non-zero scale factor). Two constraints should be considered for deriving a minimal representation of projective two-view motion from its fundamental matrix: the rank-2 constraint and a normalization constraint which fixes the relative scale of the fundamental matrix. These constraints are tricky to be enforced directly on the fundamental matrix. To overcome this problem, instead of dealing with the fundamental matrix directly Bartoli [3] analyzed its singuP lar value decomposition F ∼ U V T . U and V are orthonormal matrices. The two constraints which are be enforced are: • rank-2: F is rank-2 matrix, so

P

∼ diag(σ1 , σ2 , 0) where σ1 ≥ σ2 > 0.

• normalization: this SVD can be scaled such that F ∼ U.diag(1, σ, 0).V T where σ = σ1 /σ2 . Any projective two-view motion can be represented using two orthonormal matrices and a scalar. In the case of 3D lines, any 3D line can be presented by (U, W ) ∈ SO(3)XSO(2) where SO(2) and SO(3) are respectively the special orthogonal groups containing (2X2) and (3X3) rotation matrices, U and W are the orthonormal representations of the 3D line. This representation is consistent with the fact that 3D line has four degree of freedom, since SO(2) has one degree of freedom and SO(3) has three degree of freedom. Bartoli and Sturm [4] use the conventional step of SfM but they proposed an efficient optimal triangulation algorithm and orthonormal representation of 3D lines. Their experimental results on simulated and real data show that the linear three view algorithm [5] and the algorithm of [6] perform very badly. This is due to the fact that the Plücker constraints are not taken in account while solving the linear least square system and none of them maximizes the individual likelihood of the reconstructed lines. The main drawback of [6] is that the random initial rotation estimation should not exceed specific interval otherwise the algorithm will returns bad results.

2.3.6

Explicit reconstruction method

The algorithm is based on an explicit reconstruction method where a direct relationship between the structure computation and the use of linear camera matrices is built. The threedimensional coordinates of points are computed directly using geometric arguments made within a three-dimensional frame. Rothwell et al. [38] proposed a method based on the explicit reconstruction. Their method demonstrates how three-dimensional projective reconstruction can be recovered from uncalibrated image pairs when no initial assumptions of the intrinsic or extrinsic parameters are made.


16

The only information needed is the fundamental matrix F between a pair of images. They estimate the 3 × 4 projective cameras matrices (Pi ) from the fundamental matrices between the image pairs. Due to freedom in choice of the projective frame they set the first camera matrices as P1 = [I|0]. Given the epipolar geometry between the first tow images which is represented by F12 , the second camera matrix is on the form " P2 = [e21 ]× F12 | e21

I

0

αT

α4

# (2.3)

where α is an arbitrary 3 × 1 vector and α4 is non-zero scalar. For the sake of improving the estimation of the 3D structure, additional cameras can be used. The fundamental matrices Fij between the first two cameras and the additional one cease to be independent. Given the fundamental matrix between the first camera and the additional one F1i they deduce that " Pi = [ei1 ]× F1i | e3i

I

0

βT

β4

# (2.4)

where β is a 3-vector and β4 is a scalar calculated such that the 3D reconstruction obtained by all pairs of cameras are expressed in the same frame. This method gives better results for a generic viewpoint configurations than that for two cameras. This is because of integrating more information over a larger number of views which suppress noise better. The details of this method are reported in Chapter 3 in which stratfied self-calibration and related background are presented. We have implemented and used Rothwell et al.’s method to recover the projective structure of the scene and cameras on which camera self-calibration methods are based.

Chapter 3

Stratified Camera Self-calibration

An essential step in the SfM process consists in the recovery of the calibration parameters of the camera which the metric structure of the scene cannot be calculated. The traditional pattern-based calibration procedure is fastidious and often impractical. A more flexible and elegant approach, known as camera self-calibration, consists in the recovery of the sought parameters solely using point correspondences across images. All self-calibration methods rely on a prior calculation of the projective structure of the scene and cameras. Some of those methods, such as [45],[46] and [47], attempt to directly retrieve the intrinsic parameters of the camera and are generally confronted with the difficult problem of solving nonlinear equations in several unknowns. Stratified methods, such as [44] and the new method we present in chapter 4, allow to break the problem into two more easily tractable problems. In these methods, the scene and cameras are first upgraded to affine (essentially by locating the plane at infinity) which are then turned into metric.

In this chapter, we first present the basic notions of the (pinhole) camera model, the twoview geometry and the properties of the main 3D reconstruction strata. Then, we describe in detail the projective reconstruction method of Rothwell et al. [38] which we have chosen and implemented to serve as a starting reconstruction. The self-calibration method [44], based on the so-called modulus constraints, is also detailed in this chapter as it is directly related to our new method. 17

Chapter 3: Stratified Camera Self-calibration

3.1 3.1.1

18

Background The pinhole camera model

A pinhole camera follows a perspective projection model that maps 3D points into their pixel projections in a 2D image. This model is described by a set of parameters that approximate the behavior of physical sensors Salvi [39],Faugeras [40] . The geometrical model of the pinhole consists of a retinal plane Π in which the image is formed and from which an optical center (focal point) C is placed at a fixed distance. Physically, the light rays reflected by the object pass through the focal point and project onto the retinal plane thus forming the perspective image of the object.

Figure 3.1: Pinhole camera model, adapted from [49]. A detailed representation of the pinhole camera model reveals that the latter is described by two kinds of parameter sets. One set describes the internal geometry of the camera and includes the so-called intrinsic parameters (generally packaged into an upper-triangular matrix K). The other set, known as the extrinsic parameters set, describes the relationship between the camera and the world coordinate system. The intrinsic parameters are (i) the focal length f which presents the distance from the optical center to the retinal plane (in mm.), (ii) the parameters (ku , kv ) which are inversely proportional to the dimension of a pixel, (iii) the projection of the principal point (u0 , v0 ) on the image plane, expressed in pixels. The extrinsic parameters set defines the relationship between the world coordinate system and the camera coordinate system that is attached to its optical center. This relationship is represented by the position (T3×1 translation vector) and the orientation (R3×3 rotation matrix) of the camera coordinate system with respect to the world coordinate system.

19

3.1 Background

Any 3D scene point with coordinates X = (X, Y, Z, 1)T (expressed in the world coordinate system) is mapped onto the image plane as follows:   x' | |

f ku

0

0

f kv

0

0 {z K

u0



1

0

0

 v0   0 0 1 } {z

1

0

0

1

0



!  R T X. 0 0 1 0 | {z }

(3.1)

4×4

}

P

P is called the perspective projection matrix and we can write (3.1): x ' PX

(3.2)

For additional generality, the camera calibration matrix K may be written in the form :   K=

f ku

γ

0

f kv

0

0

u0



 v0  1

(3.3)

where γ refers to the skew of the camera. The skew factor γ is generally zero (in particular on recent cameras). A non-zero skew would mean that the x and y axes of the image plane are not perpendicular. The ' symbol denotes the equality up to an unknown non-zero scale factor.

3.1.2

Epipolar geometry

A pair of images is said to be weakly calibrated if the so-called fundamental matrix [12] relating corresponding points in the two views is known. Let xi and xj be two corresponding points in two images, then xj is constrained to lie on the line Fij xi where Fij is the fundamental matrix between the image i and image j. The epipolar constraint between two images i and j is given by: xjT Fij xi = 0.

(3.4)

Note that the fundamental matrix can be calculated solely from point correspondences and does not require any knowledge about the intrinsic and extrinsic parameters of the considered cameras. The special points eij in image i and eji in image j are called the epipoles. They correspond to the projection of the optical center of one camera on the image plane of the other. These special points are defined by Fij eij = 0 and Fji eji = 0 where Fji = FTij .


3.1.3

20

3D reconstruction strata

The projective stratum: A projective reconstruction [12] neither requires information about the camera’s intrinsic parameters nor does it require the knowledge of its extrinsic parameters. If a set of point correspondences in two views determine the fundamental matrix F uniquely, then the scene and cameras may be reconstructed from these correspondences alone. Any two such reconstructions from these correspondences are projectively equivalent. The camera matrices are retrieved from F and 3D point coordinates are computed by triangulation [12]. Note that the true scene differs from the reconstructed scene by a projective ambiguity which does not preserves angles (thus parallelism is lost) and lengths. It does however preserve incidence and cross-ratios which are important in projective geometry (see Fig.3.2).

Figure 3.2: Projective reconstruction. (a) Original image pair. (b) 2 views of a 3D projective reconstruction, adapted form [12]. The affine stratum: An affine reconstruction is a specialization of the general projective one. When a scene is reconstructed in the affine space, points that were at infinity in the true scene can only remain so in this space. The plane at infinity can be viewed as a plane intersecting pencils of parallel lines. Since a point that is at infinity in the true scene remains at infinity in the affine space, then parallel lines can only remain parallel. In addition to parallelism, other invariants in the affine space are ratios of lengths of directional lines pointing in the same direction and ratio of surfaces areas (see Fig.3.3). The Metric stratum: A metric reconstruction is an affine reconstruction that gets even closer to the true scene. In addition to parallelism, such reconstruction exhibits the same angles and length ratios (in all directions) as the true scene. Typically, it is a rescaled, possibly rotated and translated, version of the true 3D structure (Fig.3.4). Similar to the affine case, these new invariant properties are related to an invariant geometric entity. If the projection matrices differ by a projective ambiguity from the real one (Euclidean) then

21

3.2 Projective reconstruction

Figure 3.3: Affine reconstruction. (a) Original images. (b) 2 views of a 3D Affine reconstruction, adapted form [12].

Figure 3.4: Metric reconstruction. (a) Original images. (b) 2 views of a 3D Metric reconstruction, adapted form [12].

the recovered reconstruction is projective. If the ambiguity is affine, then the reconstruction is affine. A metric ambiguity leads to a metric reconstruction (Fig.3.5).

3.2

Projective reconstruction

The three-dimensional projective structure of a scene can be recovered from uncalibrated images through point correspondences. The fundamental matrix F is an essential ingredient for the calculation of such structure as it provides very strong constraints on the three-dimensional geometry of the camera and point set configuration. We present here Rothwell et al.’s projective reconstruction method [38] from two and more views of a rigid scene. This method, which we


22

Figure 3.5: Shapes which are equivalent to a cube for the different geometric ambiguities

have implemented to serve as a basis for camera self-calibration, required a prior calculation of the epipolar geometry of each pair of cameras in the sequence. In our implementation, the fundamental matrices have been calculated using the eight-point algorithm.

3.2.1

Projective reconstruction from two views

The reconstruction approach employs the usual projection equations (3.2). Let xi , i ∈ {1, 2}, be corresponding observations in two images of the same scene point X and Pi the associated camera matrices. It is both customary and useful to decompose the projection matrices into two components as follows: Pi = [H1i |qi1 ]

(3.5)

where H1i is 3 × 3 homography matrix relating image 1 and i and qi1 a 3-vector corresponding to the projection of the optical center of the first camera on the image plane of the ith camera. This form enables absolute freedom in the positioning of the cameras in the projective world. Let X = (X3 , X4 ) be the coordinate vector of X expressed in the world coordinate system. Projecting X on the image plane using (3.2) we have: λi xi = H1i X3 + qi1 X4 . |{z} |{z} 3×1

scalar

(3.6)

23

3.2 Projective reconstruction

Given two images i ∈ {1, 2}, X3 can be eliminated and the components of the projection matrices can be related to the epipolar geometry of the two images: η eij = qi1 − H1i H−1 1j qj

(3.7)

µ Fij = [eji ]× H1j H−1 1i .

Assuming the reference frame is attached to the first camera, the projection matrix of the latter simplifies toP1 = [I|0] where I is 3 × 3 identity matrix. This, together with the parametrization (3.7) yield: η e21 = q21

and

µ F12 = [e21 ]× H12

(3.8)

which define the projection matrix of the second camera: " P2 = [e21 ]× F12 | e21

I

0

αT

α4

# .

(3.9)

Note that α is an arbitrary 3 × 1 vector and α4 is non-zero scalar.

3.2.2

Estimating the third camera

In the presence of more than two images, the projection matrix of the third camera must be consistent with the same 3D structure obtained by the two initial cameras. Given F13 and using the parameterizations in section 3.2.1, the third camera matrix is on the form: "

P3 = [e31 ]× F13 | e31

I

0

βT

β4

# (3.10)

for some 3-vector β and scalar β4 to be defined. Indeed, unlike (α,α4 ), (β,β4 ) cannot be randomly chosen as they depend on the particular choice of α and α4 . Vector β can be calculated by solving the following linear system of 9 equations using singular value decomposition SVD: " µF23

[e21 ]× F12 | e21

I αT

#

" = [e32 ]×

[e31 ]× F13 | e31

I βT

# (3.11)

where µ is also unknown and it will be calculated along with β. The scalar β4 is then obtained by considering the following linear constraints: η e32 = β4 e31 − α4 H13 H−1 12 e21

(3.12)


24

where " H12

= [e21 ]× F12 | e21

I

#

αT

" and H13

= [e31 ]× F13 | e31

I βT

# .

Note η is also unknown and it will be calculated along with β4 .

3.3

Affine upgrade through the modulus constraint

An affine 3D reconstruction is one in which the plane at infinity has remained globally invariant, i.e. lines or planes that were parallel in the original scene remain parallel in the affine reconstruction. In order to achieve this, the location of the plane at infinity in the projective reconstruction has to be identified then mapped back to its canonical position. The main ingredient here is that the homography matrix H0 1i of any plane is related to the homography matrix H1i of any other plane through the relationship H0 1i = H1i + ei1 .vT

(3.13)

for some appropriate 3-vector v. In particular, given the directional component a of the coordinates of the plane at infinity, the affine camera projection matrices - with Π∞ = (a, 1)T as reference plane- can be obtained as follows: PiA = H1i + ei1 .aT |ei1 ' [H1i∞ |ei1 ] .

(3.14)

However, Π∞ = (a, 1)T is unknown and generally required a priori knowledge about the scene. We describe in the following how this can be achieved using the so-called modulus constraints; This method was introduced by Pollefeys in [44] and does not require any knowledge about the scene and cameras except that its intrinsic parameters must be kept unchanged throughout the sequence. The modulus constraint [44] is due to the unimodular property of the eigenvalues homography matrix of the plane at infinity between pairs of images with identical intrinsic parameters. This constraint allows to locate the plane at infinity by solving sets of quartic trivariate polynomials in the unknown directional component a of the plane at infinity. Let P1 be the projection matrix of the fist image (reference frame) and Pi those of the other frames. The H∞1i is 3 × 3 matrix of the homography induced by the plane at infinity (between images 1 and i) can be expressed by follow: λi H∞1i = H1i + ei1 aT

(3.15)

25

3.3 Affine upgrade through the modulus constraint

for some generally unknown scalar λi and the affine parameters a.

Given the homography matrices H1i and H1j induced by any plane between image 1 and any two distinct images, the one induced by the same plane and relating images i and j can easily be obtained through Hij ' H1j H−1 1i .

(3.16)

In this fashion, the infinity homography matrix H∞ij between any two cameras i and j can be defined by: H∞ij ' H∞1j H−1 ∞1i .

(3.17)

From (3.17) and (3.15 ), the infinity homography from view i to j can be written as a function of projective entities and the position of the plane at infinity in the considered projective reference frame:

λj T T −1 H∞ij = H∞1j H−1 . ∞1i = (H1j + ej1 a )(H1 i + ei1 a ) λi | {z }

(3.18)

'Projective

Furthermore, H∞ij can be expressed exactly (i.e. with a fixed scale) through a product involving the intrinsic parameter matrices Ki and Kj of the considered cameras as well as the rotation matrix Rij relating the two views. H∞ij = Ki Rij K−1 j . | {z }

(3.19)

Metric

We will assume throughout this work that all cameras have the same intrinsic parameters, i.e. Ki = K for all i = 1 . . . n, and that K is parameterized as follows: 



f

γ

u0

 K= 0

τf

0

0

 v0  . 1

(3.20)

where τ is the aspect ratio. With the above assumption, H∞ij is conjugated with a rotation matrix Rij (the two matrix have the same eigenvalues), up to a scale factor. This implies that the 3 eigenvalues of H∞ij must have the same moduli [41],[42]. This property requires the intrinsic parameters to be constant. Pollefeys [44] has derived a necessary condition on the vector a that must be satisfied by each pair of images. This condition involves the coefficients of the characteristic equation det(H1j + ej1 aT − λ(H1i + ei1 aT )) = c3ij λ3 + c2ij λ2 + c1ij λ + c0ij = 0.

(3.21)


26

To have a equal moduli, the following condition on the roots of (3.21 ) is necessary: Mij : c3ij c31ij − c32ij c0ij = 0

(3.22)

Note that c3 ,c2 ,c1 ,c0 are linear functions of the 3 affine parameters embedded in a. Equation (3.22 ) is a quartic polynomial equation. Three such equations can be obtained as soon as three images are available. A system of polynomial equations can then be be solved using a continuation method [43]. Such methods recover all possible solutions (i.e. a maximum of 64 solutions in this case) from which complex solutions can be discarded. More than three images are generally used in order to successfully isolate a unique solution. Once a unique candidate of the plane at infinity has been obtained, a minimization step is done using an objective function that minimizes the cost function Cmodulus (a) =

X (c3ij c31ij − c32ij c0ij )2 .

(3.23)

ij

3.4

Metric upgrade

The absolute conic (AC) Ω 3.4.1 plays a central role in camera self-calibration both in direct and stratified methods. The AC is a particular conic lying on the plane at infinity (Fig.3.6)which can be thought of as a virtual calibration pattern that is visible in all images. The image of this conic and its dual solely depend upon the intrinsic parameters of the camera. When these parameters are kept constant all frames, the AC projects onto the same conic in all images. Once the plane at infinity Π∞ is obtained the image of the AC (or its dual) can be linearly calculated and the intrinsic parameters are easily extracted.

3.4.1

Metric geometry and the absolute conic

Let PiM be a projection matrix expressed in the metric frame and X∞ = (X∞ , Y∞ , Z∞ , 0)T a point on the plane at infinity Π∞ which projects onto xi in the image plane: xi ' PM X∞

(3.24)

If additionally the point with coordinates X∞ lies on the absolution conic, this point must verify XT∞ X∞ = 0.

(3.25)

27

3.4 Metric upgrade

Ω

Π∞

ωi Oi

Figure 3.6: The Absolute Conic and its image

In the image frame, the AC equation translates into: xiT ω i xi = 0

ω i = K−T K−1 .

with

(3.26)

This shows that every points X∞ on Ω projects onto a point xi on a conic ωi , where ωi is the projection of the absolute conic on the image plane (IAC). The matrix ωi which represents ωi is symmetric positive-definite and its entries solely depend upon the intrinsic parameters of the camera on which Ω is projected. So the calculation of ωi allows to recover the intrinsic parameters of the camera.

Because ω i is a conic of points, its dual ω ∗i is a conic of lines. If a line l belongs to the image plane and tangent to ω i (Fig. 3.7), it satisfies: lT ω ∗i l = 0

where

ω ∗i = K−T K−1 .

(3.27)

The matrix ω ∗i which represents the dual of the image of the absolute conic (DIAC) is also symmetric positive-definite and depends only on the intrinsic parameters of the camera. Moreover, the intrinsic parameters K can be found directly from ω ∗i through its Cholesky factorization. ω ∗i can be written as follow: 



k1

k2

k3

 ω ∗i =  k2

k4

k3

k5

 k5  k1

(3.28)

where the entries kj , j = 1, ..., 5, of the ωi∗ are known as Kruppa’s coefficients. Then, the


28

intrinsic parameters matrix K can be recovered using Cholesky factorization as follow: 

fu =

q

k1 −

    K=   

(k2 −k3 k5 )2 k4 −k52

0

− k32

k2 −k3 k5 s= √ 2 k4 −k5

fv =

p

0

k4 − k52

0

u0 = k3



    v 0 = k5     1

(3.29)

When two images are available, the projections ω i and ω j of the absolute conic on these images verify: HT∞ij ω j H∞ij ' ω i .

(3.30)

Similarly, the DIACs ωi∗ and ωj∗ , dual of ωi and ωj respectively, verify: H∞ij ω ∗i HT∞ij ' ω ∗j

(3.31)

In the case of a moving camera with constant intrinsic parameters, both the (IAC) and (DIAC) remain invariant throughout the sequence of images. Moreover, when H∞ij is known, then it is known exactly with the appropriate scale as H∞ij must verify: det(H∞ij ) = 1 The constraint that the dual image of the absolute conic (DIAC) should be the same for all the views can be expressed as follows: H∞ij ω ∗ HT∞ij = ω ∗

(3.32)

Therefore, once H∞ij is known and from (3.32), each pair of images provides four linearly independent equations. These equations are linear and can be solved using singular value decomposition (SVD) as soon as three images are available to recover the five Kruppa’s coefficients in (3.28).

3.4.2

Refining calibration

Once the plane at infinity and the affine parameters a have been obtained, equation (3.32) can be used to find the absolute conic using the linear method explained in 3.4.1. These results can be refined through a non-liner minimization step. The same constraint (3.30) can be used,

29

3.5 Stratified self-calibration algorithm

Ω

Π∞

li ωi

Oi

Figure 3.7: A line tangent to the Absolute Conic

but the affine parameters a should appear explicitly in H∞ij as follow: T H + e.aT ω ∗ H + e.aT ' ω ∗

(3.33)

In this case however the scale factor λ can not easily be eliminated. In practice it is better to eliminate λ by normalizing both sides of (3.32) to unit Frobenius norm. These equations can be solved using the following cost function: C(fu , fv , γ, u0 , v0 , a1 , a2 , a3 ) =

X ij

k

H∞ij KKT HT∞ij KKT − kF T k KKT kF k H∞ij KKT H∞ij kF

Note that f and τ f are denoted by fu and fv respectively.

3.5

Stratified self-calibration algorithm

The stratified algorithm can be summarized as follow: Step 1: projective calibration

• step 1.1: sequentially compute the projective camera matrices for all the views. Step 2: affine calibration • step 2.1: formulate the modulus constraint Mij for all pairs of views.

(3.34)


30

• step 2.2a: find a set of initial solutions through continuation.

• step 2.2b: solve the set of equations Mij through minimization (for n>3)

• step 2.3: compute the affine projection matrices " PAi = Pi T−1 with T =

I

0

−aT

1

# (3.35)

step 3: metric calibration • step 3.1: compute the dual image of the absolute conic from (3.31). • step 3.2: find the intrinsic parameters K through cholesky factorization (3.29). • step 3.3: refine the results through nonlinear minimization of C. • step 3.4: compute the metric projection matrices : " PMi = Pi T−1 with T =

K

0

−aT K

1

#

Chapter 4

More Trivariate Quartics for Stratified Camera Self-Calibration In this chapter we present a new set of quartic trivariate polynomial equations in the unknown parameters locating the plane at infinity. This equations are due to the zero-skew constraint imposed on the camera and explicitly enforce the constancy of the coordinates of the principal point across all images. Six such polynomials, four of which are independent, are obtained for each triplet of images. Along with the three quartic polynomials due to the modulus constrains in three views, they provide a total of nine equations which we divide into 3 groups of 3 equations and solve each group by a continuation method. Our experiments show that in most cases these equations alone are sufficient to single out a unique solution and the performance of the stratified method is significantly improved. We also show that incorporating these new equations in the refinement of the position of the plane at infinity though nonlinear optimization allows to improve both the accuracy of the result and robustness to noise.

4.1

Motivational arguments

Stratified camera self-calibration based on the modulus constraints turned out to be an efficient and promising approach for retrieving the plane at infinity and hence the intrinsic parameters of the camera. From our own experience and the results reported by Pollefeys in [44], the method provides very satisfactory results in particular when dealing with long image sequences. However, the method tends to perform poorly with short sequences of typically 5 31

Chapter 4: More Trivariate Quartics for Stratified Camera Self-Calibration

32

and less images. There are several reasons to this: - First, the modulus constraint is only a necessary condition on the position of the plane at infinity which can be persistently verified by several planes, other than the plane at infinity, across several frames. - Second, in the case of critical motion sequences, or when the motion of the camera is close to critical, more than one solution may consistently verify both the modulus constraint and more than one conic would identically project throughout the image sequence. - Third, in the presence of more than one candidate plane at infinity, the solutions are inspected by calculating the intrinsic parameters through SVD: a numerical method whose outcome significantly depends upon the amount of noise in the images and the weight that may have one pair of views over the others. As it turns out, it is not easy to establishing their quasi-constancy across the frames and whether the camera’s parameters are appropriate as this is subject to several thresholds. One way to cope with he above mentioned problems is to take into account prior knowledge about the camera, such as zero-skew and known aspect ratio, during the self-calibration process and not after. Such knowledge is easily incorporated when using direct self-calibration methods as the latter simultaneously calculate the plane at infinity and the intrinsic parameters. However, as these methods rely nonlinear optimization, they unfortunately require a good initial estimate of both the plane at infinity and the camera’s parameters in order to achieve proper convergence. Unfortunately, the modulus constraints do not allow to incorporate any additional knowledge about the camera’s parameters since the latter are not explicitly present in the underlying polynomial equations. The work presented in this chapter allows to enforce the traditional zero-skew and constant parameters knowledge in the process of retrieving the plane at infinity.

4.2

A unified parameterization of homography matrices

It has already been established that the matrix H01i of an inter-image homography induced by a plane between the reference image and any image i can be expressed as H01i = H1i + ei1 vT in terms of 3 parameters vector v, the epipole ei1 the homography of some other plane H1i . The problem at hand is that the reference image must remain the same and all homographies must be expressed with respect to that image. By changing the reference image, the above relationship will be in terms of a particular vector v0 different than v and only valid for the current reference choice. In this section, we show that all inter-image homographies between any two images, including the inverse homographies, can be expressed in terms of the same 3

33

4.2 A unified parameterization of homography matrices

parameters and the expression remains linear. This new expression is crucial for deriving our new self-calibration equations.

Consider three distinct but linearly dependent planes Π, Π0 and Π00 which intersect in a line L in space. Let x1 and x2 be the projections in two images of a space point X on this line. Given the matrices H12 , H012 and H0012 of the homographies induced by Π, Π0 and Π00 , it is not hard to notice that x2 ' H12 x1 ' H012 x1 ' H0012 x1 .

(4.1)

Furthermore, because the homography induced by any plane are always related by a similar relationship to that in (3.15), we can easily deduce that if H012 ' H12 + e21 l1T then H0012 ' H12 + α00 e21 l1T

(4.2)

for some non-zero scalar α00 and a properly scaled vector l1 verifying x1T l1 = 0 and hence (4.1). As such, l1 represents the projection of the line L on the first image. The equations in (4.2) also suggest the following: if H12 , H012 and H0012 three plane homography matrices induced by linearly dependent planes then the matrices themselves are linearly dependent H12 ' α00 H012 + H0012 .

(4.3)

This linear relationship between homographies will be used in the sequel to express the inverse 0 homography H0−1 12 of an arbitrary plane Π as an affine function of the entries of l1 , relating H012 to some reference plane Π with homography H12 . Indeed, the linear dependency (4.3) is

also true if we consider the inverse homographies of these planes, i.e. those relating points in the second image to those in the first image. The relationship is then expressed through the existence of non-zero scalars β 0 and β 00 such that 00 0−1 0 00−1 H−1 12 ' β H12 + β H12 .

(4.4)

furthermore, because x1 lies on the epipolar line FT12 x2 as well as on l1 , then we have x1 ' [ l1 ]× FT12 x2

(4.5)

from which we can deduce that [ l1 ]× FT12 is the homography, between image 2 and 1, of some plane Φ passing through the optical center of the first camera and intersecting its image plane in l1 . As such, the latter plane also intersects Π, Π0 and Π00 in L and we are free to choose T arbitrarily Π00 such that it coincides with Φ. With this choice, we have H00−1 12 ' [ l1 ]× F12 and


34

hence (4.4) becomes −1 T H0−1 12 ' αH12 + β[ l1 ]× F12

(4.6)

for some non-zero scalars α and β. These scalars can be easily fixed as they must verify 0−1 0 H0−1 12 H12 = I. In particular, because (4.6) provides H12 up to an additional unknown scale

factor, then α and β must verify T (H12 + e21 l1T )(αH−1 12 + β[ l1 ]× F12 ) ' I.

(4.7)

Expanding (4.7) show that an appropriate choice of α is α = det(H12 )

(4.8)

β[ l1 ]× FT12 = [ l1 ]× HT12 [ e21 ]T .

(4.9)

if β is chosen such that

This results in an affine expression of H0−1 12 in terms of l1 −1 ∗ T T ∗ H0−1 12 ' H12 + [ l1 ]× H12 [ e21 ] where H12 = det(H12 )H12

(4.10)

provided H012 ' H12 + e21 l1T . We recall that H∗12 as defined above is the adjoint matrix of H12 defined as the transpose of the matrix of cofactors and can thus be calculated without resorting to matrix inversion. More importantly, all H01i for any image i and all the inverses H0−1 1i are uniquely parameterized by l1 and so will be any H0ij = H01j H0−1 1i . Equation (4.10) is the main ingredient to derive our new constraints on the vector a relating the a given homography matrix H12 to that of the plane at infinity.

4.3

A new set of quartic polynomial equations

Since the inverse homography expression (4.10) was derived for any arbitrary plane, and given the relationship H∞1i ' H1i + ei1 aT , then the following ∗ T T H−1 ∞1i ' H1i + [ a ]× H1i [ ei1 ]

(4.11)

also applies to the inverse of the homography of the plane at infinity. While (4.11) is defined for homographies between any image i and the reference camera, it can be easily shown that the expression of the inter-image homography between any two views remains affine in a. The homography between two views i and j is given by H∞ij = H∞1j H−1 ∞1i and its inverse by

35

4.3 A new set of quartic polynomial equations

H∞ji = H∞1i H−1 ∞1j . These yield the following expressions H∞ij ' H1j (H∗1i + [ a ]× HT1i [ ei1 ]T ) + ej1 aT H∗1i

(4.12)

H∞ji ' H1i (H∗1j + [ a ]× HT1j [ ej1 ]T ) + ei1 aT H∗1j

(4.13)

T K−1 . As a result, denoting Furthermore, because H∞ij ' KRij K−1 , then H∞ji ' KRji ˜ ∞ij and H ˜ ∞ji the right-hand sides of (4.12) and (4.13) respectively, there exists a scalar by H

µij for which ˜ ∞ij − H ˜ ∞ji = K[ q1i ]× K−1 µij H

(4.14)

T where [ qij ]× ' Rij − Rij is obviously skew-symmetric. The trace of K[ q1i ]× K−1 being null,

the scalar µij can be eliminated as it verifies µij =

˜ ∞ji ) trace(H . ˜ trace(H∞ij )

(4.15)

Moreover, assuming the skew parameter of the camera is negligible (γ = 0 in (3.20)) and denoting ˜ ∞ji )H ˜ ∞ij − trace(H ˜ ∞ji )H ˜ ∞ji , Qij = trace(H

(4.16)

Equation (4.14) becomes



− q2fu0

  v0 q2 Qij =   τ q3 − f  − qf2

− qτ3 +

u0 q1 τf

q 1 v0 τf

+ q2 + u0 q2fu0 − qτ1 vf0 τ f − q3fu0 − q1 + v0 q2fu0 − qτ1 vf0 f

q 3 v0 τf

q1 τf

q2 u 0 f

−

q1 v0 τf

     

(4.17)

where q = (q1 , q2 , q3 )T . Note that the entries of Qij are quadratic polynomials and each pair of images allows one to extract the exact expressions of the coordinates (u0 , v0 ) of the principal point in terms of some these entries. For instance, it can be deduced that

u0 =

(Qij )11 (Qij )31

(4.18)


36

and

v0 =

(Qij )22 (Qij )32

(4.19)

where the subscripts (Qij )rc refers to the entry at the rth row and cth column of Qij . Given any two pairs of images (i, j) and (k,`) we have

(Qij )11 (Qk` )11 = (Qij )31 (Qk` )31

(4.20)

(Qij )22 (Qk` )22 = (Qij )32 (Qk` )32

(4.21)

and

leading thus to the quartics polynomials (Qij )11 (Qk` )31 − (Qk` )11 (Qij )31 = 0

(4.22)

(Qij )22 (Qk` )32 − (Qk` )22 (Qij )32 = 0

(4.23)

Equations (4.22) and (4.23) are our new quartic polynomials (no-skew polynomials)that can be solved in conjunction with the modulus constraints.

An image triplet provides three pairs of images and therefore six such equations, four of which are independent, while the modulus constraints provides only three equations for the same number of images. For each triplet of images the 9 equations are divide into 3 groups of 3 independent equations and solved using a continuation method. The final result of solving a system formed using these equation provides us with the affine parameters a which define the position of the plan at infinity. Then the affine projection matrices are calculated by (3.35). Once the plane at infinity has been obtained, a minimization step is done using a new objective function (4.24) which minimizes both the modulus and the no-skew constraints

37

4.4 Experiments

Cnoskew (a) =

X

(c3ij c31ij − c32ij c0ij )2 +

ij

XX 1 ((Qij )11 (Qk` )31 − (Qk` )11 (Qij )31 )2 + 3n(n − 1) ij k` XX 1 ((Qij )22 (Qk` )32 − (Qk` )22 (Qij )32 )2 . (4.24) 3n(n − 1) ij k`

4.4

Experiments

We have conducted experiments using randomly generated scene and cameras for each level of image noise varying from 0 to 2 pixels (with a step of 0.25). For each noise level and length sequence, we have run 100 independent trails. Each trial consisted of 100 randomly generated points and randomly generated cameras. The points were scattered in a unit-radius sphere and the cameras were generated at an average distance 2.5 units (with 0.25 standard deviation) from the center of the sphere. The optical axes of the generated camera were roughly oriented pointing towards the center of the sphere. We have repeated these experiments with 3 and 4 images with various levels of noise before we tested our method by varying the sequence length and 1 pixel of noise. Our new quartic polynomial (which include those due to Pollefeys’ modulus constraints and our no-skew constraints) have been solved using PhcPack (through its PhcLab interface) [48] . Every time we have obtained a unique solution, it has been refined using two different objective functions: one objective function that minimizes the modulus constraints (3.23 ), and another objective function which minimizes both the modulus and the no-skew (4.24). Finally, we have calculated the affine reconstruction of the scene and aligned it with the true 3D data. Through these experiments, we demonstrate that 1. Unlike when using the modulus polynomials alone, our no-skew polynomials allow in most cases to single out the right solution even with 3 images only (without any additional knowledge). 2. The objective function which includes the no-skew constraints provides a more accurate 3D affine reconstruction than the objective function that is only based on the modulus constraints. In all the reported Figures (4.1, 4.2, 4.3 and 4.4), the RMS error appearing Figures (a) was averaged over the number of solutions retained in (b). The number of retained solutions are those in which only one solution was provided by the equations. This indicates that more constraints or additional knowledge need to be taken into account in order to single out a solution.


38

100

New objective Modulus objective

90

0.045 0.04

80

Number of sequences

Average RMS error on 3D reconstruction

Nbr of sequences with 1 solution and RMS < 1: our method with3 images 3D Reconstruction Errors: our method with3 images

0.035 0.03 0.025 0.02 0.015

70 60 50 40 30

0.01

20

0.005

10

0

0 0

0.5

1

1.5

2

Noise level (pix.)

(a)

0

0.5

1

1.5

2

Noise level (pix.)

(b)

Figure 4.1: Experiments with 3 images and various levels of noise using our new quartic polynomials: (a) 3D reconstruction RMS error using two objective functions, (b) Nbr of sequences in which only one solution has been found and whose RMS error < 1.

For comparison, we also report in (Fig. 4.2-(b)) the number of sequences in which a unique solution has been found using Pollefey’s quartic polynomials. Note that when using 3 images only, Pollefeys polynomials has always provided more than one real solution. We have run another batch of experiments with 4 images (also with 100 random scene/camaras per noise level), in order to compare the quality of the reconstruction obtained with our method against that obtained with Pollefeys’. The results of these experiments are summarized in (Fig. 4.3) in which (a) shows that the quality of the reconstruction is very good with both methods but the 3D errors are much more stable with respect to noise using our method. (Fig.(b)) clearly shows that taking into account the no-skew is much more efficient in singling out the correct and unique solution. More experiments with 1 pixel noise and varying number of images (3, 5, 7 and 9) images have been carried out. The results of these experiments, which are reported in (Fig.4.4), suggest that (a) the quality of the reconstruction using both method is comparable starting from 5 images and (b) the new polynomial equations contribute more efficiently to finding a unique solution even for longer image sequences. Finally, for visualization purposes we have used a simulated scene representing a cube (Fig. 4.5-(a)) projected onto four 2D images through randomly generated cameras around the simulated cube. The affine parameters are estimated using our algorithm, then affine projection matrices are calculated by (3.35 ). It is clear from the affine 3D reconstruction reported

39

4.4 Experiments

Nbr of sequences with 1 solution and RMS < 1 using4 images 100

0.025

90

New objective Modulus objective

80

0.02

Number of sequences


3D Reconstruction Errors: our method with4 images

0.015

0.01

70 60

Our method Pollefeys method

50 40 30 20

0.005

10 0

0 0

0.5

1

Noise level (pix.)

(a)

1.5

2

0

0.5

1

1.5

2

Noise level (pix.)

(b)

Figure 4.2: Experiments with 4 images and various levels of noise using our new quartic polynomials: (a) 3D reconstruction RMS error using two objective functions, (b) Nbr of sequences in which only one solution has been found and whose RMS error < 1. in (Fig. 4.5-(b)) that the affine properties parallelism and directional ratios are preserved. α = Skew angle γ = tan(α )


3D Reconstruction Errors:4 images

Nbr of sequences with 1 solution and RMS < 1:4 images

0.08

100 Our method Pollefeys

0.07

90 80

0.06 Number of sequences


40

0.05 0.04 0.03

70 Our method Pollefeys method

60 50 40 30

0.02 20 0.01 0

10 0

0.5

1 Noise level (pix.)

1.5

0

2

0

0.5

1 Noise level (pix.)

(a)

1.5

2

(b)

Figure 4.3: Experiments with 4 images and various levels of noise using our new quartic polynomials and those due to the modulus constraints: (a) 3D reconstruction RMS error using our new polynomials and our new objective function versus errors obtained using Pollefeys’ method, (b) Nbr of sequences in which only one solution has been found and whose RMS error < 1.

3D Reconstruction Errors: 1 pixel of noise Our method Pollefeys method

0.016

Nbr of sequences with 1 solution and RMS < 1: 1 pixel of noise 100 90

0.014

80

0.012

Number of sequences


0.018

0.01 0.008 0.006

70 Our method Pollefeys method

60 50 40 30 20

0.004 10

0.002

3

4

5

6 Number of images

(a)

7

8

9

0

3

5

7

9

Number of images

(b)

Figure 4.4: Experiments with 1 pixel noise and varying number of images using our new quartic polynomials and those due to the modulus constraints: (a) 3D reconstruction RMS error using our new polynomials and our new objective function versus errors obtained using Pollefeys’ method, (b) Nbr of sequences in which only one solution has been found and whose RMS error < 1.

41

4.4 Experiments

The ground truth

15

10

5

0

−5 15 10

15 10

5

5

0

0 −5

−5

(a)

3D affine reconstructin

9

8

7

6

5

4

3

2

1

0 10000 5000 0

0

1000

2000

3000

4000

5000

6000

7000

8000

9000 10000

(b) Figure 4.5: 3D affine reconstruction using estimated affine parameters : (a) a view of the original 3D cube, (b) a view of a 3D affine reconstruction of the cube.

Chapter 5

Conclusion and future work In the first part of this thesis, most representative structure from motion (SfM) algorithms have been surveyed. The surveyed methods are explained in detail focusing the analysis on the main contribution of each method and especially detailing their pros and cons. Calibrated SfM algorithms are used when prior information about the camera is available. In terms of computational time, factorization-based methods, which do not require any initialization, give the best performance. The main drawback of these methods is that the points should be visible in all the frames otherwise the algorithm fails. Methods which are based on Extended Kalman filter solved this problem within acceptable running-time for real-time applications. In this case, factorization is applied only on the first few frames to initialize the filter. The problem of this method is the possibility of the filter divergence. Qian method solved this problem by using inertial data as additional measurement to improve robustness to noise and allowing the method to work with small and large camera translations alike. While previous methods need points to be visible in three or more images, Martinec SOCP based method has the advantage to work with missing data. It is however quite expensive in terms of computational time. Generic based methods give the advantage of using different types of cameras, but the drawback of this method is that they have to manually give the correspondences which is quite fastidious. In addition, considering all un-calibrated SfM methods, Rothwell’s explicit solution to the projective reconstruction problem has turned out to be very effective as demonstrated by the results of our own experiments. The DMS-based method solves directly for the metric structure using bundle adjustment which is time consuming. In this sense, factorization based methods are faster but the estimated projective depth should remain approximately the same in the image sequence. This problem is solved in the bilinear programming based method using additional information. 42

43

In the second part of this thesis, we have presented a novel stratified self-calibration method based on a new set of quartic polynomial equations. These new quartic polynomials can be solved in conjunction with the modulus constraints. Experiments showed that our method is numerically stable and robust for both short and large image sequence where it allows in most cases to single out the right solution even with 3 images only with no additional constraints or knowledge. The objective function which includes the no-skew constraints provides a more accurate 3D affine reconstruction than the objective function that is only based on the modulus constraints. As a part of the future work, we will upgrade the affine reconstruction to metric one by finding the intrinsic parameters and assess the quality of these parameters. Moreover, we are planing to conduct the experiments on real image datasets.

Bibliography [1] K.Häming, & G.Peters, The structure-from-motion reconstruction pipeline–a survey with focus on short image sequences Kybernetika, Institute of Information Theory and Automation AS CR, 2010, 46, 926-937. [2] Y.Lu, J.Zhang, Q.Wu, & Z.Li, A survey of motion-parallax-based 3-D reconstruction algorithms Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, IEEE, 2004, 34, 532-548. [3] A.Bartoli, On the Non-linear Optimization of Projective Motion Using Minimal Parameters Proceedings of the 7th European Conference on Computer Vision-Part II, Springer-Verlag, 2002, 340-354. [4] A.Bartoli, & P.Sturm, Multiple-View Structure and Motion From Line Correspondences Proceedings of the Ninth IEEE International Conference on Computer Vision Volume 2, IEEE Computer Society, 2003, 207. [5] J.Weng, ; T.S.Huang, & N.Ahuja, Motion and Structure from Line Correspondences; Closed-Form Solution, Uniqueness, and Optimization IEEE Trans. Pattern Anal. Mach. Intell., IEEE Computer Society, 1992, 14, 318-336. [6] C.J. Taylor, & D.J.Kriegman, Structure and Motion from Line Segments in Multiple Images IEEE Transactions on Pattern Analysis and Machine Intelligence, 1995, 17, 1021-1032. [7] P.Sturm, and B.Triggs, A Factorization Based Algorithm for Multi-Image Projective Structure and Motion, Proc. EuropeanConf. ComputerVision, Springer-Verlag, 1996, 709-720. [8] S.Mahamud,M.Hebert, Y.Omori, and J.Ponce, Provably-Convergent Iterative Methods for Projective Structure from Motion, Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol.1, pp.1018-1025, Dec.2001. 44

45

BIBLIOGRAPHY

[9] S.Christy, & R.Horaud, Euclidean Shape and Motion from Multiple Perspective Views by Affine Iterations IEEE Trans. Pattern Anal. Mach. Intell., IEEE Computer Society, 1996, 18, 1098-1104. [10] D.W.Jacobs, Linear Fitting with Missing Data for Structure-from-Motion Computer Vision and Image Understanding, 2001, 82, 57 - 81. [11] C.Tomasi, & T.Kanade, Shape and motion from image streams under orthography: a factorization method International Journal of Computer Vision, International Journal of Computer Vision, Springer Netherlands, 1992, 9, 137-154. [12] R. Hartley and A.Zisserman, Multiple View Geometry in Computer Vision, 2000. [13] B.Triggs, ; P.McLauchlan, ; R.Hartley, & A.Fitzgibbon, Bundle Adjustment - A Modern Synthesis Proceedings of the International Workshop on Vision Algorithms: Theory and Practice, Springer-Verlag, 2000, 298-372. [14] J.Oliensis & R.Hartley, Iterative Extensions of the Sturm/Triggs Algorithm: Convergence and Nonconvergence IEEE Transactions on Pattern Analysis and Machine Intelligence, IEEE Computer Society, 2007, 29, 2217-2233. [15] M.Ramachandran, A.Veeraraghavan, R.Chellappa, A Fast Bilinear Structure from Motion Algorithm Using a Video Sequence and Inertial Sensors vol.33, January 2011. [16] G.Schweighofer, S.Segvic, & A.Pinz, Online/Realtime Structure and Motion for General Camera Models Proceedings of the 2008 IEEE Workshop on Applications of Computer Vision, IEEE Computer Society, 2008, 1-6. [17] B.K.Horn, Closed-form solution of absolute orientation using unit quaternions Journal of the Optical Society of America A, 1987, 4, 629-642. [18] G.Schweighofer, & A.Pinz, Fast and globally convergent structure and motion estimation for general camera models In Proc. of BMVC, 2006, 147-157. [19] S.Mitchell, M.Warren, D.McKinnon, & B.Upcroft, (Eds.) A robust structure and motion replacement for bundle adjustment Australasian Conference on Robotics and Automation, 2010. [20] M.Brown & D.G.Lowe Unsupervised 3D Object Recognition and Reconstruction in Unordered Datasets 2005. [21] M.Pollefeys ,3D Modelling from Images. In Proceeding of the European Conference on Computer Vision, Dublin, June, 2000.

BIBLIOGRAPHY

46

[22] N.Snavely, S. M.Seitz, & R.Szeliski, Modeling the World from Internet Photo Collections International Journal of Computer Vision, 2008, 80, 189-210. [23] F.Kahl, & R.Hartley, Multiple View Geometry Under the L-infinity Norm IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30, 1603-1617. [24] D.Martinec,& T.Pajdla, 3D Reconstruction by Fitting Low-Rank Matrices with Missing Data Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) - Volume 1 - Volume 01, IEEE Computer Society, 2005, 198-205. [25] D.Martinec, & T.Pajdla, 3D Reconstruction by Gluing Pair-Wise Euclidean Reconstructions, or "How to Achieve a Good Reconstruction from Bad Images" Proceedings of the Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT’06), IEEE Computer Society, 2006, 25-32. [26] D.Martinec, & T.Pajdla,Robust Rotation and Translation Estimation in Multiview Reconstruction. CVPR, IEEE Computer Society, 2007. [27] J.Tardif, A.Bartoli, M.Trudeau, N.Guilbert, & Roy, S. Algorithms for batch matrix factorization with application to structure-from-motion 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007, 1-8. [28] T.Broida, S.Chandrashekhar, & R.Chellappa, Recursive 3-D motion estimation from a monocular image sequence Aerospace and Electronic Systems, IEEE Transactions on, IEEE, 1990, 26, 639-656. [29] A.Azarbayejani, & A.Pentland, Recursive estimation of motion, structure, and focal length Pattern Analysis and Machine Intelligence, IEEE Transactions on, IEEE, 1995, 17, 562575. [30] T.Broida, & R.Chellappa, Estimation of object motion parameters from noisy images Pattern Analysis and Machine Intelligence, IEEE Transactions on, IEEE, 1986, 90-99. [31] G.Qian, R.Chellappa, & Q.Zheng, Robust structure from motion estimation using inertial data JOSA A, Optical Society of America, 2001, 18, 2982-2997. [32] S.Ramalingam, S.Lodha, & P.Sturm, A generic structure-from-motion framework Computer Vision and Image Understanding, Elsevier, 2006, 103, 218-228. [33] D.Lowe, Object recognition from local scale-invariant features iccv, 1999, 1150.

47

BIBLIOGRAPHY

[34] E.Mouragnon, M.Lhuillier, M.Dhome, F.Dekeyser, & P.Sayd, Generic and real time structure from motion Proc. BMVC, 2007. [35] D.Marr, Vision New York NY: WH Freeman, 1982. [36] H.Bay, T.Tuytelaars, & L.Van Gool, Surf: Speeded up robust features Computer Vision– ECCV 2006, Springer, 2006, 404-417. [37] C.Harris, & M.Stephens, A combined corner and edge detector Alvey vision conference, 1988, 15, 50. [38] C.Rothwell, O.Faugeras, & G.Csurka, Different Paths Towards Projective Reconstruction 1995. [39] J.Salvi, An Approach to Coded Structured Light to Obtain Three Dimensional Information. PhD Thesis, Universitat de Girona, 1997. [40] O.Faugeras, Three-dimensional computer vision: a geometric viewpoint the MIT Press, 1993. [41] M.Pollefeys, L.V.Gool, & M.Proesmans, Euclidean 3D reconstruction from image sequences with variable focal lengths Springer-Verlag, 1996, 31-42. [42] M.Pollefeys, & L.V.Gool, A stratified approach to metric self-calibration cvpr, 1997, 407. [43] A.Morgan, Solving Polynomial Systems Using Continuation for Engineering and Scientific Problems, Prentice-Hall Englewood Cliffs(N.J.),1987. [44] M.Pollefeys, & L.Van Gool, Stratified self-calibration with the modulus constraint Pattern Analysis and Machine Intelligence, IEEE Transactions on, IEEE, 1999, 21, 707-724. [45] O.Faugeras, Q.Luong, & S.Maybank, Camera self-calibration: Theory and experiments Computer Vision - ECCV 92, Lecture Note in Computer Science,Vol. 588, Spring Verlag, pp.321-334, 1992. [46] C.Zeller, O.Faugeras, Camera Self-Calibration from Video Sequences: the Kruppa Equations Revisited, Research Report 2793, INRIA,1996. [47] A.Heyden, & K.Astrom, Euclidean reconstruction from constant intrinsic parameters Pattern Recognition, 1996., Proceedings of the 13th International Conference on, 1996, 1, 339-343. [48] http://www.math.uic.edu/ jan/download.html

BIBLIOGRAPHY

48

[49] M.Obeysekera, Affine Reconstruction from multiple views using Singular Value Decomposition, 2003.