MDL Selection for Piecewise Planar

0 downloads 0 Views 881KB Size Report
MDL Selection for Piecewise Planar Reconstruction. Konrad Schindler 1, Joachim Bauer 2, Horst Bischof 1. 1 ICG, TU Graz, Austria, 2 VRVis Research Center, ...
MDL Selection for Piecewise Planar Reconstruction

1

Konrad Schindler 1, Joachim Bauer 2, Horst Bischof 1 ICG, TU Graz, Austria, 2 VRVis Research Center, Graz, Austria

Abstract: We present a three-step approach for the detection of 3D planes in image sequences of complex architectural scenes. In the first step 3D features (points or lines) are extracted from the images. A large number of potential planes are then fitted to the features in a second step. In the third step the relevant planes are selected with an algorithm based on the Minimum Description Length (MDL) principle. The presented techniques are demonstrated with synthetic and real-world data.

1

Introduction

Automatic building reconstruction is a continuing goal of computer vision. Here we address piecewise planar reconstruction, an important special case, because many buildings are partly bounded by planar patches. More specifically, we address the detection and reconstruction of planar patches in a sequence of oriented images. We assume a recording setup in which each part of the object we want to reconstruct is visible in at least three images. This way of recording is common practice in close-range photogrammetry and multi-image vision in order to provide sufficient redundancy for automatic modeling algorithms. The paper is organized as follows: Sections 2 and 3 describe the detection and reconstruction of object features and the generation of a large number of plane hypothesis. In section 4 the MDL principle is presented as a selection criterion for plane selection and section 5 deals with the delimitation of object planes. In section 6 we show examples with synthetic and real-world data.

2

Feature Reconstruction

The modeling process starts with the reconstruction of low-level 3D features, i.e. points or lines. Points have the advantage that they can be reconstructed using a dense matching algorithm and thus provide almost complete coverage of the object. A wide range of such algorithms exist, e.g. [5], [10]. We use a hierarchical algorithm described in [13]. On the other hand lines can be reconstructed with higher accuracy, because more image information is used. To construct a line the usual algorithms use the positions and gradient

directions of a large number of image points. In our experiments lines reconstructed with the Canny edge-detector [3] were about 3 times more accurate than individual points. Therefore it is advantageous to use lines as base features if the modeled object is sufficiently covered with lines. This is the case for many buildings, because their walls intersect in straight edges and contain polygonal structures such as windows. We also use the this prior knowledge about buildings to yield more complete line sets. As a starting point we extract an initial line set with the Canny algorithm [3]. Then vanishing points (VPs) are robustly detected for this line set with the method of Rother [9]. For typical architectural scenes one VP for vertical lines and one or two VPs for horizontal lines are extracted. However, the approach can be extended to a higher number of vanishing points, if necessary. Given the VPs, weaker lines pointing towards a VP (i.e. lines with a lower gray-value gradient) are recovered with a sweeping technique. For details about this step we refer to [1]. The process is illustrated in figure 1.

(a) vanishing points

(b) 3D lines - front view

(c) 3D lines - top view

Figure 1: Enhanced line detection: (a) Detection of vanishing points. (b), (c) Two views of the reconstructed 3D line set.

With the known image orientations, the line segments in the images can be robustly matched to 3D line segments as described in the work of Schmid and Zisserman [11]. For most buildings the sweeping method significantly improves the completeness of the line set: for the example presented in section 6, line extraction and matching without vanishing points yielded 134 line segments in 3D. Matching the line segments obtained with the support of 3 vanishing points yielded 296 line segments.

3

Hypothesis Generation

For plane generation we follow a RANSAC-style sampling strategy: a large number of potential planes are randomly generated with minimum information (three points or two coplanar lines), and their support through the remaining features is tested. Planes with high support are

considered potential object planes. The threshold for accepting a plane is set rather low, since we want to use a better selection method than simple voting, as described in the following section (theoretically one could completely skip this step and pass all generated planes to the model selection step, which would correctly discard planes with too low support. However this is computationally not feasible, therefore a loose threshold is applied). Werner and Zisserman also use vanishing points to reduce the complexity of the sampling process [12]. They assume that each pair of two principal vanishing points can be used to construct the vanishing line of one principal object plane. Given the vanishing line only one point is needed to construct a plane. However this assumption can only be made in an iterative framework, where it can be dropped after the detection of all principal planes. If we want to apply a single-pass model selection criterion, all potential planes have to be constructed before the selection step.

4

Model Selection

In the following we present in detail the MDL-principle as a model selection criterion for object planes. The selection does not depend on whether points, lines or other features (or a combination) have been used to generate the hypothetical planes. We will therefore use the generic word features for these base features. Using a model selection criterion is motivated by the following considerations: The most straight-forward way to solve the selection problem would be to simply use a higher threshold in the sampling procedure and thus select planes with high support. However this is bound to indiscriminately discard planes with few points, because the threshold must be chosen high enough to discard wrong planes. A refinement of this approach is to give each feature a vote inversely proportional to its fitting error, so that small planes are preserved, if they fit well to the data. However this will still miss a plane with relatively few features and average fitting errors, even if it is the only plane which explains the affected points and thus does not compete with a better hypothesis. One could try to solve this problem by iterative selection, i.e. selecting the globally best plane, removing the corresponding features and repeating the selection. However this procedure does not allow a feature to contribute to more than one hypothesis. It also has the defect that one wrong hypothesis may corrupt all following detections and thus the whole result, because the global information over all planes is not used. A number of model selection criteria have been proposed – for an overview see [6]. The MDL principle [8] is an elegant way of selecting a model based on the trade-off between model complexity and model error. It has been successfully used in image segmentation for fitting of parametric models [7]. Other applications of MDL in image processing include [2] and [4].

The MDL principle assumes that the best description of a dataset D = {l1 , l2 , . . . , ln } is the one that minimizes its description length LD . It can be derived via a communication game: Let us assume that we want to transfer the data to a receiver without loss of information. The description with the help of a parametric model P consists of • the parametric model (i.e. the selected plane set) with length L(P ) • an index for each inlier (i.e. for each feature in the selected plane set) with length L(IP ) • the residual error of each inlier (i.e. the fitting error of each feature) with length L(EP ) • the description of the outliers (i.e. features not assigned to any plane) with length L(OP ) The total description length of the data D using model P is given by LD (P ) = L(P ) + L(IP ) + L(EP ) + L(OP )

(1)

Our goal is to minimize LD by choosing the right model P . We thus introduce a boolean vector b, which indicates the presence (bi = 1) or absence (bi = 0) of each hypothetical plane in the model. Using b, equation 1 can be rewritten as (all)

LD (b) = 4K4 npl (b) + K3 nf (b) + K2 (b) + K1 (nf

− nf (b))

(2)

where K1 , K2 , K3 , K4 are the number of bits to encode a feature, a fitting error, an index and (all) a parameter, respectively, nf is total number of lines, nf (b) is the number of successfully explained lines, (b) is the total fitting error of a plane and npl (b) is the number of object (all) planes in the model. Since nf is constant, minimizing equation 2 is equivalent to maximizing the expression F (b) = K 1 nf (b) − K2 (b) − K 3 npl (b)

(3)

This equation states that we are searching for a model in which • the number of features described by the planar patches is high • the number of unexplained features is low • the features’ deviations from the planes are low, and • each new plane adds to the cost (to avoid overfitting) Equation 3 can be formulated as a quadratic, symmetric boolean optimization problem: the b for which shortest description is given by the vector b 

q11  T T b =b b Qb b=b b  ... F (b)

... .. .

 q1M ..  b .  b = max(F (b))

qM 1 . . . qM M

(4)

The diagonal elements of Q express the trade-off between cost and benefit of a plane pi , whereas the off-diagonal elements handle the interaction between planes pi and pj with common features: (i)

e 2 i − K e3 qii = nlin − K

,

1 e 2 max(i , j )) qij = (n(i,j) +K com com 2 com

(5)

(i,j)

where i is the sum of fitting errors of the features of pi , ncom is the number of lines in common, icom , jcom are the sums of fitting errors of the common lines in pi and pj , respectively, and e 3 = K 3 are the new constants. e 2 = K2 and K K K1 K1 e 3 is the cost ratio between a plane and a feature and determines how many Parameter K features must be explained in order to justify an additional plane. If, for example, we want e 3 = 3.1. K e 2 is the cost of a unit of to add a plane, if it explains more than 3 lines, we set K error and depends on the definition of the fitting error. The selection is robust against too e 2 should be considered an upper bound. Too high estimates result in overly low estimates, K e2 expensive fitting errors and thus an overly complex model. For a statistical derivation of K see [7].

5

Plane Verification and Delimitation

A dense set of object points is used to construct the object planes’ borders. Note that, if the planes were recovered using a line set, this step also helps to verify the detected planes, because points and lines were reconstructed independently. With a distance threshold the points belonging to each plane are detected and the planes are refitted. Once the planes and the corresponding point-sets are found, the boundaries of the planar patches can be constructed. Each plane is subdivided into a regular grid and transformed to a binary image, where cells containing points are set to 1 and empty cells are set to 0. After applying an iterative median filter to remove quantization artefacts edgetracing is used to find the boundaries of planar object patches. An example is shown in Figure 3.

6

Experiments

A synthetic dataset was used to assess the algorithms performance in the presence of noise. The ’house’ dataset consists of 5278 points lying in 11 planes. This dataset was corrupted by adding eqally distributed random noise of different magnitude to the point coordinates. Then the plane reconstruction was carried out for each point cloud with the according precision parameter. The results are summarized in table 1 and in figure 2. It can be seen that the algorithm performs correctly up to a noise level of 2% or 1:50. At this level the smallest plane

noise level 0,0 0,4 2,0 4,0

% % % %

object precision

planes found

planes missed

false positives

1:∞ 1:250 1:50 1:25

11 11 10 5

0 0 1 6

0 0 0 2

Table 1: Plane detection results at different noise levels. All point coordinates were perturbed with equally-distributed random noise. Noise levels are given relative to the largest object dimension. For convenience the fraction notation, which is widely used as a photogrammetric precision measure, is given in the second column. 1 cannot be detected anymore (its dimension is only 30 of the total object, so that the local noise level of the plane is 1:1,66 !). Since the precision of a photogrammetric reconstruction should usually be better than 1:500, the algorithm is easily robust enough for the task.

(a) noise level 0,0%

(b) noise level 0,4%

(c) noise level 2,0%

(d) noise level 4,0%

Figure 2: Results for synthetic ’house’ dataset at different noise levels. Noise levels are given relative to of the largest object dimension. Color variations within one patch encode fitting residuals: darker points have higher residuals. See also table 1.

The algorithm has been successfully tested with several real-world datasets. One of these datasets, the inner court of the ’Landhaus’ in the historic center of Graz, serves as an example to demonstrate the results.

(a) image with detected lines

(b) dense set of 3D points

(c) binary map for delimitation

(d) Reconstructed planes

Figure 3: Results for example dataset ’Landhaus’. (a) One out of 5 images with the detected lines superimposed. (b) The camera positions and a dense set of object points. Points were assigned the gray-value of the central image to give a better 3D impression. (c) The binary map used to delimit the central wall, before and after filtering. (d) 3D-model of the reconstructed object patches. Gray-value variations within one patch encode fitting residuals: darker points have higher residuals.

A sequence of 5 images with a resolution of 2160 × 1440 Pixels was used. A total of 296 3Dlines were matched, of which 289 are correct. 10 planar patches in 5 planes were recovered, all of which are correct. The 2 smallest of the 7 visible building planes were missed: the sidewall of the tower and the roof of the staircase, which is partially occluded by decorations. Note that the roof is not plane but cylindric. Several patches of the back wall in the center could not be reconstructed due to the lack of reliable matches. 55968 matched points were used for verification. The results are summarized in figure 3.

7

Conclusion

We have shown how model selection with the MDL principle can be used for robust featurebased reconstruction of planar buildings. Experiments with synthetic data were presented to

verify the performance in the presence of noise and a real world example was presented to demonstrate that the approach works even with complex geometry, when simple regression algorithms fail. Future work will concentrate on modeling and using the inhomogeneous distribution of points and their varying accuracy in photogrammetric point clouds for plane reconstruction. Another possible improvement could be to extend the model to other surface primitives than planes, especially cylinders, in order to correctly model more complex buildings.

References [1] Joachim Bauer, Andreas Klaus, Konrad Karner, Christopher Zach, and Konrad Schindler. MetropoGIS: A feature based city modeling system. In Proc. Photogrammetric Computer Vision 2002 (PCV’02). ISPRS - Commission III, Symposium, 2002. to appear. [2] H. Bischof, A. Leonardis, and A. Selb. Mdl principle for robust vector quantisation. Pattern Analysis and Applications, 2:59–72, 1999. [3] J. Canny. A computational approach to edge detection. Pattern Analysis and Machine Intelligence, 8(6):679–698, November 1986. [4] G. Fernandez, R. Beichel, H. Bischof, and F. Leberl. Wavelet denoising method for ct images. In Proc. 26th Workshop of the Austrian Association for Pattern Recognition, 2002. to appear. [5] R. Koch. Automatische Oberfl¨ achenmodellierung starrer, dreidimensionaler Objekte aus stereoskopischen Rundum-Ansichten. PhD thesis, Universit¨at Hannover, 1996. [6] B. Kverh. Selection of Parametric Models in Data Segmentation Framework. PhD thesis, University of Ljubljana, Faculty of Computer and Information Sciene, 2001. [7] A. Leonardis, A. Gupta, and R. Bajcsy. Segmentation of range images as the search for geometric parametric models. International Journal of Computer Vision, 14(1):253–277, 1995. [8] J. Rissanen. Universal coding, information, prediction and estimation. IEEE Transactions on Information Theory, pages 629–636, July 1984. [9] C. Rother. A new approach for vanishing point detection in architectural environments. In Proc. 11th British Machine Vision Conference, pages 382–391, 2000. [10] R. Sara. Sigma-delta stable matching for computational stereopsis. Technical report, Center for Machine Perception, Czech Technical University, 2001. [11] C. Schmid and A. Zisserman. Automatic line matching across views. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, pages 666–671, 1997. [12] T. Werner and A. Zisserman. New techniques for automated architecture reconstruction from photographs. In Proc. 7th European Conference of Computer Vision, 2002. [13] C. Zach, A. Klaus, J. Bauer, K. Karner, and M. Grabner. Modeling and visualizing the cultural heritage data set of Graz. In Proc. VAST2001 Virtual Reality, Archaeology, and Cultural heritage, 2002.