A New Adaptive Threshold Technique for Improved ... - IEEE Xplore

56 downloads 0 Views 1MB Size Report
In SIFT, threshold is applied to determine local extrema (keypoint selection) and global extrema. (keypoint refinement). Next, descriptor matching is performed.
A New Adaptive Threshold Technique for Improved Matching in SIFT l 4 Syed Jahanzeb Hussain Pirzada , Mirza Waqar Bait, Ehsan ul hal , and Hyunchul Shin

School ofElectrical and Computer Engineering, Hanyang University, ERICA Campus, Korea {jahanzebi, waqa/, ehsan3}@digital.hanyang.ac.kr, [email protected]

Abstract-Scale Invariant Feature Transform (SIFT) is widely used in vision systems for various applications such as object

II.

THE SCALE INVARIANT FEATURE TRANSFORM

determine local extrema (keypoint selection) and global extrema

As proposed by D. Lowe 1 [ ], SIFT consists of four stages: scale-space extrema detection, orientation and magnitude

(keypoint refinement). Next, descriptor matching is performed

calculation, keypoint descriptor calculation, and matching.

detection and face recognition. In SIFT, threshold is applied to

with selected keypoints. This paper presents a new method of adaptive thresholding which improves

keypoint

selection in

SIFT. The value of adaptive threshold depends upon the average regional intensity of an image. Experimental results show that our method is robust for matching the keypoints among the images

with

illumination

differences.

Our

new

adaptive

threshold technique for keypoint selection reduces false matches and shows significantly improved performance in experimental results.

I.

INTRODUCTION

A. Scale-space extrema detection. In fIrst stage, keypoints are identifIed in the scale-space by looking for image locations that represent maxima or minima of the difference-of-Gaussian function [ 8]. The scale space of an image is defmed as a function L that is produced from the convolution of a variable-scale Gaussian G [ 3], with the input image l(x, y)

L(x, y,a)

=

C (x, y, a) * I (x, y)

Scale Invariant Feature Transform (SIFT) was proposed by D. Lowe [1] [ 2] to detect and describe local features in images. The SIFT features are invariant to image scaling, rotation, and occlusion to a certain extent in illumination. The SIFT technique works by fIrst detecting a number of keypoints in the given image and then computing local image descriptors at the locations of these keypoints. The methods based on SIFT characteristics have been used in many fIelds, such as in panorama stitching [ 6], remote sensing image registration [7] and so on.

Although SIFT technique represents one of state-of-the-art approaches to object detection, it causes some false matches under various illumination conditions. To overcome these shortcomings of the original SIFT, a new SIFT-based approach for image matching is proposed in this paper. In our method, keypoints are selected depending on adaptive threshold for brighter and darker regions of an image. With this procedure, the keypoint-removal step using threshold is optimized and the number of false matching is reduced. The suggested approach gains signifIcant robustness to the illumination variance as compared to previous SIFT algorithm [1] [ 2].

(1) (2)

where (J denotes the standard deviation. The difference-of­ Gaussian function D can be computed from the difference of two nearby Gaussians. Nearby scales are separated by a multiplicative factor k:

D(x, y, a) =

=

(C(x, y, ka)

L(x, y, ka)

-

L(x, y, a)

-

G(x, y, a)) * I (x, y)

(3)

Local maxima and minima of difference-of-Gaussian D are computed by comparing the sample point with its eight neighbors in the current image as well as the nine neighbors in the scales above and below (total of 26 neighbors). If the pixel represents a local maximum or minimum, it is selected as a candidate keypoint. The fmal keypoints are selected based on measures of their stability. During this stage low contrast points (sensitive to noise) and poorly localized points along edges (unstable) are discarded. Two criteria are used for the detection of unreliable keypoints. The fIrst criteria evaluate the value of IDI at each candidate keypoint. If the value is below certain threshold, which means that the structure has low contrast, the keypoint is removed. The second criteria evaluate the ratio of principal curvatures of each candidate keypoint to

978-1-61284-857-0/11/$26.00 (92011

IEEE

search for poorly defmed peaks in the Difference-of- Gaussian function. Hence, to remove unstable edge keypoints based on the second criterion, the ratio of principal curvatures of each candidate keypoint is checked. If the ratio is below the threshold, the keypoint is kept, otherwise it is removed.

keypoint descriptors from the training images. If the distance ratio between the closest neighbor and the second closest neighbor is below some threshold, than the match is kept, otherwise the match is rejected and the keypoint is removed. III.

ADAPTIVE THRESHOLD SELECTION METHOD

Local maxima and minima of difference-of-Gaussian D are computed by comparing the sample point with its eight neighbors in the current image as well as the nine neighbors in the scales above and below (total of 26 neighbors). If the pixel represents a local maximum or minimum, it is selected as a candidate keypoint. The adaptive threshold method is applied after keypoint extrema calculation. When all the extrema are calculated, corresponding Gaussian values to the keypoints are compared with the threshold calculated for brighter and darker

Figure 1. Keypoint extrema calculation

B.

region. Then two criteria are used for the detection of unreliable keypoints. The first criteria evaluate the value of IDI at each candidate keypoint. If the value is below certain

Orientation assignment. An orientation is assigned to each keypoint by building a

histogram of gradient orientations weighted by the gradient magnitudes m(x,y) from the keypoint's neighborhood: m.

,( L

, )= _

+1

,

) - L(.

-I;

)f+

L(.

,

+ I ) -L

-

4

, -I)f

5 where L is a Gaussian smoothed image with a closest scale to that of a keypoint.

C.

Keypoint descriptor calculation.

The keypoint descriptor is created by first computing the gradient magnitude and orientation at each image point of the 16xl6 keypoint neighborhood. Each histogram contains 8 bins; therefore each keypoint descriptor features 4x4x8 128 elements. The coordinates of the descriptor and the gradient orientations are rotated relative to the keypoint orientation to achieve orientation invariance and the descriptor is normalized to enhance invariance to changes in illumination. =

• Keypoillt (0) Image graOlel1!S

(b) K.ypoinl aesc'iplor

Figure 2. Keypoint descriptor calculation D.

Matching.

When using the SIFT algorithm for object recognition, each keypoint descriptor extracted from the query image is matched independently to a database of descriptors extracted from all training images. The best match for each descriptor is found by identifying its nearest neighbor [ 4] in the database of

threshold, which means that the structure has low contrast, the keypoint is removed. The second criteria evaluate the ratio of principal curvatures of each candidate keypoint to search for poorly defmed peaks in the Difference-of- Gaussian function.

A.

Threshold Calculation Firstly, for selection of threshold for darker and brighter

regions, a boundary (middle point) is selected between darker and brighter pixels. For example, in a Gaussians of an image we have pixel values from 0 to 1 . We select middle pixel value of 0.5 as a boundary. The brighter threshold is selected by taking the mean Gaussian value of brighter pixels. Similarly darker threshold is selected by calculating the mean Gaussian value for darker pixels. Darker pixel values vary from 0 to 0.5 and brighter pixel values vary from 0.5 to 1 . Keypoints i n the darker region of image are compared with darker threshold. If keypoint is smaller than darker threshold then it is selected otherwise it is rejected. Similarly, keypoints in brighter region of image are compared with brighter threshold. Likewise, the keypoints are selected only if they are greater than this brighter threshold. B.

Matching.

In the matching phase, the keypoints with darker region of test image are matched separately with keypoints of darker region of training image. Similarly, brighter region of test image is matched with brighter region of training image to avoid false matches due to illumination changes. If user requires changing the brighter and darker region for including some specific keypoints, then it can be performed by changing the boundary (midpoint) pixel value of darker and brighter region. Experiment results show that our approach works well in many cases. Most keypoints are present either in darker or in brighter region and only few keypoints are along the boundary of brighter and darker regions in image. So selecting two thresholds adaptively for brighter and darker regions collects the useful keypoints in both regions and removes the effect of mismatches. For example, if a keypoint is in brighter region of the training image and is in the darker region of the test image, then the by SIFT method it can make them a correct match.

But in our method, two adaptive thresholds remove such mismatch errors and keypoints are matched with improved accuracy in specific regions. IV.

reduces false matches in the image. Experimental results show that the proposed method improves the number of matches by 30 % on average when compared to the original SIFT implementation. This shows that our adaptive thresholding method can improve the accuracy of matching.

EXPERIMENTAL RESULTS

We used a set of images with different scales, viewpoints, and illumination conditions. Experiments are carried out on MATLAB using Intel Core2 Duo with 2 GB RAM. In this experiment, the comparison between original SIFT algorithm based matching and our new method of adaptive thresholding based matching is simulated and results are shown in Figures 3, 4 and 5. The images under consideration were taken from different categories, e.g., satellite images with different scales, bubble gum packet images with different viewpoints and jar images with different scales and viewpoints. Figures 3(a), 4 (a) and 5(a) show the results of SIFT matching. Figures 3(b), 4 (b) and 5 (b) show the results with our new adaptive threshold method. In these figures, the training image on the left was matched with the test image on the right. Table I shows the

(a)

number of keypoints, matches, computational time and number of false matches for Figures 3, 4 and 5. Using Fig. 3 (a) and Fig. 3 (b), two satellite images with different scales are matched using the original SIFT algorithm and our new adaptive thresholding method, respectively. We can easily detect mismatches in Fig. 3(a) by closely looking at the magnified view of the figure. Single threshold produces less keypoints along with two mismatched keypoints. By using the adaptive thresholding method, the number of keypoints has been increased as well as number of correct matches have been increased as shown in Fig. 3 (b). Similarly, in Fig. 4 (a) and Fig. 4(b), the two images of bubble gum packets were matched with different viewpoints. There is one mismatch in Fig. 4 (a), which were removed in Fig. 4 (b) using adaptive thresholding. Fig. 5 (a) and Fig. 5 (b) use two jars images with different scales and viewpoints. There was one mismatch with original SIFT as in Fig. 5 (a), while these were removed by using adaptive threshold as in Fig. 5 (b). It shows that by selecting two thresholds adaptively, the true matches are improved and false matches are reduced.

(b) Figure 3 Satellite Images

Table -1 SIFT matching results S. No

Features

1

Keypoints

2

Matches

3

Computation Time

4

Figure 3

Figure 5

(b)

(a)

(b)

(a)

(b)

696

708

228

235

363

402

76

102

23

30

23

29

319

480

259

300

260

430

2

0

1

0

1

0

ms

False Matches

Figure 4

(a)

V.

ms

ms

ms

ms

(a)

ms

CONCLUSIONS

A new adaptive threshold based SIFT matching technique is proposed in this paper. By applying threshold for brighter, darker regions and matching keypoints of these regions separately tends to increase number of true matches and also

(b) Figure 4 Bubble gum packets

VI.

ACKNOWLEDGEMENT

This work was supported by Ministry of Knowledge Economy (MKE) and IDEC Platform center (IPC). VII.

REFERENCES

[I]

D.G. Lowe, "Distinctive image features from scale invariant keypoints," International Journal of Computer Vision, vol. 60, pp. 91-

[2]

D.G. Lowe, "Object recognition m local scale-invariant features," Proc of 7th IEEE International Conference on Computer Vision. Kerkyra, Greece, pp. 1150-1157, 1999.

[3]

M.I. Jordan, "Properties of kernels and the Gaussian kernel," Learning and Decision Making, 2004.

[4]

S. Arya, D.M. Mount, N.S. Netanyahun, R. Silverman, A.Y. Wu, "An optimal algorithm for approximate nearest neighbor searching," Journal of the ACM, 45(6) : pp. 89 1.923. 1998.

1 10,2004.

(a)

Topics in

[5] J. Zhou, J.Y. Shi, "A robust algorithm for feature point matching," International Journal of Computers & Graphics, vol 26, pp. 429- 436, 2002. [6] X. Li, L. Zheng, Z. Hu, "SIFT based automatic registration of remotely-sensed imagery," Journal of Remote Sensing, 10(6), pp. 885- 892, 2006.

Figure 5 Jars

[7]

H. Li, J. Niu, H.E. Guo, "Automatic seamless image mosaic method based on feature points," Computer Engineering and Design, vol. 28(9), pp. 2082-2085,2007.

[8]

T. Linderberg, "Scale space theory in computer vision," The Kluwer International Series in Engineering and Computer Science, Kluwer Academy Publishers, Dordrecht, Netherlands, 1994.